ORCID ID
0000-0002-6199-346X
Date of Award
8-2022
Degree Type
Dissertation
Degree Name
Ph.D.
Degree Program
Engineering and Applied Science - Computer Science
Department
Computer Science
Major Professor
Shaikh Arifuzzaman
Second Advisor
Khaled Ibrahim
Third Advisor
Md Tamjidul Hoque
Fourth Advisor
Dimitrios Charalampidis
Fifth Advisor
Mahdi Abdelguerfi
Abstract
Parallel computing plays a crucial role in processing large-scale graph data. Complex network analysis is an exciting area of research for many applications in different scientific domains e.g., sociology, biology, online media, recommendation systems and many more. Graph mining is an area of interest with diverse problems from different domains of our daily life. Due to the advancement of data and computing technologies, graph data is growing at an enormous rate, for example, the number of links in social networks is growing every millisecond. Machine/Deep learning plays a significant role for technological accomplishments to work with big data in modern era. We work on a well-known graph problem, community detection (CD). We design parallel
algorithms for Louvain method for static networks and show around 12-fold speedup. The implementations use both shared-memory and distributed memory parallel algorithms. We also show the change of communities in dynamic networks in different time phases computing several graph metrics based on their temporal definition. We detect temporal communities in dynamic
networks representing social/brain/communication/citation networks in a more concrete way. We present both shared-memory and distributed-memory parallel algorithms for CD in dynamic graphs using permanence, a vertex-based metric. The parallel CD algorithm implemented using Message Passing Interface (MPI) for temporal graphs is the first MPI-based algorithm to the best of our knowledge. Our algorithm achieves 30× speedup for the largest network with billions of edges. We present a scalable method for CD based on Graph Convolutional Network (GCN) via semi-supervised node classification using PyTorch with CUDA on GPU environment (4× performance gain). Our model achieves up to 86.9% accuracy and 0.85 F1 Score on different real-world datasets from diverse domains. We provide a scalable solution to the Sparse Deep Neural Network (DNN) Challenge by designing data parallel Sparse DNN using TensorFlow on GPU (4.7× speedup). We include the applications of webspam detection from webgraphs (billions of edges), sentiment analysis on social network, Twitter (1.2 million tweets) to reveal insights about COVID-19 vaccination awareness among the public and timeseries forecasting of the vaccinated population in the USA to portray the importance of graph mining in our daily activities.
Recommended Citation
Sattar, Naw Safrin, "Parallel Algorithms for Scalable Graph Mining: Applications on Big Data and Machine Learning" (2022). University of New Orleans Theses and Dissertations. 3014.
https://scholarworks.uno.edu/td/3014
Included in
Artificial Intelligence and Robotics Commons, Computer and Systems Architecture Commons, Data Science Commons, Systems Architecture Commons, Theory and Algorithms Commons
Rights
The University of New Orleans and its agents retain the non-exclusive license to archive and make accessible this dissertation or thesis in whole or in part in all forms of media, now or hereafter known. The author retains all other ownership rights to the copyright of the thesis or dissertation.