Date of Award
Data reconstruction is significantly improved in terms of speed and accuracy by reliable data encoding fragment classification. To date, work on this problem has been successful with file structures of low entropy that contain sparse data, such as large tables or logs. Classifying compressed, encrypted, and random data that exhibit high entropy is an inherently difficult problem that requires more advanced classification approaches. We explore the ability of convolutional neural networks and word embeddings to classify deflate data encoding of high entropy file fragments after establishing ground truth using controlled datasets. Our model is designed to either successfully classify file fragments that contain hidden patterns and high dimensional features, or to gracefully fail if there are no patterns to be recognized. Our experimental results of the model that we built show high accuracy of 99.82%, 99.73%, and 99.6%, when classifying BZ2, PNG, and GZ against JPEG file fragments, respectively.
Ameen, Nehal, "Convolutional Neural Networks for Deflate Data Encoding Classification of High Entropy File Fragments" (2021). University of New Orleans Theses and Dissertations. 2853.