JPEG Steganalysis Using Machine Learning
Introduction & Background
Steganography, from the Greek meaning “covered writing”, conceals or embeds a covert message or payload in an
overt carrier such that the very existence of the message is unknown. For example, a text file might be hidden in an
audio file. In this example, the text file is the “hidden” or “covert” message and the audio file is the “carrier” or
“overt” file. Before hiding, the text file, the audio file is sometimes referred to as “clean”; after hiding the text file,
the audio file is often referred to as “dirty”.
Whereas the goal of steganography is to conceal a message, the goal of steganalysis is to detect and possibly recover
the message. Most steganalysis techniques either rely on signatures, similar to antivirus software, or statistical
analysis to detect anomalies. Now a new system based on machine learning techniques has emerged. Machine
Learning (ML) is composed of two broad approaches; supervised and unsupervised. In supervised machine learning
the system is trained with data that has been categorized with the goal of creating a function to categorize new data.
Conversely, in unsupervised machine learning the system must create the function without any prior knowledge of
Applying ML to JPEG steganalysis is gaining popularity, which makes this project leading-edge research.
Broad Project Goals
This project’s goal is to use Binghamton’s feature extraction software to train a ML framework to detect F5
embedded payloads in JPEG images.
1. Learn about JPEG image format, JPEG steganography and steganalysis.
2. Understand F5 and how it embeds a payload into a JPEG carrier.
3. Use Binghamton’s feature extraction software to extract features from JPEG images.
4. Build a Machine Learning model that can detect the presence of F5 steganography.
You may find other goals as you progress, but the above items will allow you to design your research effort.
To Quickly Begin
a) Teach yourself the structure and syntax of JPEG and JFIF files. You will need to know this in detail in
order to do the work. There is an excellent Wikipedia article on JPEG listed in the Starter References.
b) Learn how to use MATLAB. MATLAB is needed to run Binghamton’s feature extraction software.
c) Learn how to use Python. The majority of ML frameworks use Python.
d) Learn how to use TensorFlow/Keras. TensorFlow and Keras, which is built on TensorFlow, are two of the
most popular ML frameworks.
Specific Project Goals by Semester
Goals for Dec 2019
• MilSto01: Due in class Wed 28 Aug 2019; see syllabus for more details. Download feature extraction
software from Binghamton and F5 embedding software from GitHub. We will provide you with 10 JPEG
images and 2 documents to use as payloads.
• MilSto02: Due in class Wed 18 Sep 2019; see syllabus for more details. Convert five of the JPEG images
to BMP using ImageMagik. The original 10 JPEG images can be considered “clean”. Using F5, embed the
provided documents into the BMP images to produce 10 JPEG “dirty” images. Using the Binghamton
software extract the features from all 20 images. Instructor and mentors will help.
• MilSto03: Due in class Wed 16 Oct 2019; see syllabus for more details. Using the 20 images and the ML
framework, train your model to detect clean and dirty images.
• MilSto04: Due in class Wed 06 Nov 2019; see syllabus for more details. Demonstrate your functioning
software across a number of different covert and overt files.
• MilSto05: Due in class Wed 20 Nov 2019; see syllabus for more details. Demonstrate your refined
functional system and give an updated presentation. This presentation should include (a) your current work
refined from your MilSto04 presentation, (b) the plan you create for the Spring 2020 semester and (c) your
ideas about the additional overt images and covert payloads.
Goals for May 2020
Adding additional images with different payloads.
Continue testing and improving the effectiveness of your ML model at detecting F5 steganography over
additional overt images and covert payloads.
Presentation and demonstration at ForenSecure20 in April 2020.
projDesc-JPEGMLstego-v3.docx Page 2 of 2
As the project progresses, advanced versions of each of the deliverables will be required so that the instructor and
mentors can aid and facilitate the students’ work. This will be true for the Fall 2019 and also the Spring 2020
semesters. The four deliverables are:
Technical document (TecDocXX)
User manual (UsrManXX) for your implemented software
Demonstration video (DemVidXX) demonstrating your nicely working proof-of-principle system
Presentation (PresXX) for describing your work to the mentors, the instructor, and to an audience.
The XX is used to define your version of the deliverable. For instance, TecDoc03 would be your 3rd TecDoc
deliverable. The deliverable schedule is defined in the syllabus class schedule. You will give a presentation and
demonstration of your work at ForenSecure20 in April 2020.
Estimated Level of Effort and Challenges
This will be a two-semester project. Take ITMS539 or IT-S839 in Spring 2020 to continue and complete the
Starter References and Prior Work
 Fridrich, J. (2009). Steganography in Digital Media: Principles, Algorithms, and Applications. Cambridge:
Cambridge University Press. doi:10.1017/CBO9781139192903
 Schaathun, H. G. (2012). Machine learning in image steganalysis. Chichester, West Sussex, U.K: John Wiley.
 JPEG https://en.wikipedia.org/wiki/JPEG This is an excellent Wikipedia entry. Read and understand it.
 Binghamton University http://dde.binghamton.edu/download/stego_algorithms/
 Extractor of 274/548 Merged Features http://dde.binghamton.edu/download/ccmerged/
 F5 https://code.google.com/archive/p/f5-steganography/
 F5 https://github.com/matthewgao/F5-steganography
 F5 https://www2.htw-dresden.de/~westfeld/publikationen/f5.pdf
 F5 http://www.ws.binghamton.edu/fridrich/Research/f5.pdf
TensorFlow / Keras https://www.youtube.com/watch?v=E0-mp5UlWzo
TensorFlow / Keras https://www.youtube.com/watch?v=DFKHh7_zzJc
TensorFlow / Keras https://www.youtube.com/playlist?list=PLQ0sVbIj3URf94DQtGPJV629ctn2c1zN-
TensorFlow / Keras https://www.youtube.com/playlist?list=PLQVvvaa0QuDfhTox0AjmQ6tvTgMBZBEXN
ML and Python https://www.youtube.com/watch?v=7x2YZhEj9Dw
For references that you cannot find on the Internet, use the IIT Library. There are instructions from the IIT library
discussing how to access and copy references. If you have problems accessing the reference, the librarians should be
willing to help.