Date of Award
Summer 8-2018
Degree Type
Thesis
Degree Name
M.S.
Degree Program
Computer Science
Department
Computer Science
Major Professor
Dr. Mahdi Abdelguerfi
Second Advisor
Dr. Elias Ioup
Abstract
Processing real-world data requires the ability to analyze data in real-time. Data processing engines like Hadoop come short when results are needed on the fly. Apache Spark's streaming library is increasingly becoming a popular choice as it can stream and analyze a significant amount of data. To showcase and assess the ability of Spark various metrics were designed and operated using data collected from the USGODAE data catalog. The latency of streaming in Apache Spark was measured and analyzed against many nodes in the cluster. Scalability was monitored by adding and removing nodes in the middle of a streaming job. Fault tolerance was verified by stopping nodes in the middle of a job and making sure that the job was rescheduled and completed on other node/s. A full stack application was designed that would automate data collection, data processing and visualizing the results. Google Maps API was used to visualize results by color coding the world map with values from various analytics.
Recommended Citation
Dahal, Janak, "Assessing Apache Spark Streaming with Scientific Data" (2018). University of New Orleans Theses and Dissertations. 2506.
https://scholarworks.uno.edu/td/2506
Rights
The University of New Orleans and its agents retain the non-exclusive license to archive and make accessible this dissertation or thesis in whole or in part in all forms of media, now or hereafter known. The author retains all other ownership rights to the copyright of the thesis or dissertation.