Date of Award

Summer 8-6-2018

Degree Type

Thesis

Degree Name

M.S.

Degree Program

Computer Science

Department

Computer Science

Major Professor

Dr. Mahdi Abdelguerfi

Second Advisor

Dr. Elias Ioup

Abstract

Processing real-world data requires the ability to analyze data in real-time. Data processing engines like Hadoop come short when results are needed on the fly. Apache Spark's streaming library is increasingly becoming a popular choice as it can stream and analyze a significant amount of data. To showcase and assess the ability of Spark various metrics were designed and operated using data collected from the USGODAE data catalog. The latency of streaming in Apache Spark was measured and analyzed against many nodes in the cluster. Scalability was monitored by adding and removing nodes in the middle of a streaming job. Fault tolerance was verified by stopping nodes in the middle of a job and making sure that the job was rescheduled and completed on other node/s. A full stack application was designed that would automate data collection, data processing and visualizing the results. Google Maps API was used to visualize results by color coding the world map with values from various analytics.

Rights

The University of New Orleans and its agents retain the non-exclusive license to archive and make accessible this dissertation or thesis in whole or in part in all forms of media, now or hereafter known. The author retains all other ownership rights to the copyright of the thesis or dissertation.

Share

COinS