(Back to the main page)

Homework 5: Murder Mystery (100 pts, due TBD)

This homework will focus heavily on the use of the different streams that are available in the C++ libraries. (This homework is based on an assignment created by David J. Malan.)

The Murder

Philadelphia, PA (DP) --- The body of a graduate student was found yesterday in the graduate student offices of the CIS department in Levine Hall. Campus police said that graduate students in adjacent offices heard noises coming from the office in question earlier in the day. However, since all graduate students are anti-social and never leave their desks because they're constantly slaving away at research, no one actually witnessed the incident or saw the body. The janitorial staff reported the incident to campus police when they found the body during their nightly rounds.

Police released the identity of the graduate student as one Poor G. Student. They also stated that Student was found dead in their own office although the police did not release how they were murdered.

Campus police have arrested four graduate students under suspicion of murder. All four graduate students are inhabitants of the office. However, insider sources have claimed that the police have no evidence that these students have committed the acts, as the office was found by campus police to be clean, with no sign of the weapons that were used in the attack. Our insider sources also claim that the only piece of evidence found at the scene was a digital camera destroyed in the attack, possibly held by the victim while the attack was taking place. Unfortunately, the memory card of the camera was damaged in the alleged attack as well.

The suspects (left-to-right, top-to-bottom): Daniel, Brent, Vilhelm, and Aileen.

Background

As computer scientists without direct ties to the CIS graduate students, you have been tasked by the campus police with trying to extract data from the damaged flash card found at the scene of the murder. While we can't directly load the data on the card, we can still use our hacking skills to extract the evidence. To do so, we first need to know some things about JPEGs and how they are stored on the flash card.

JPEG headers

Most JPEGs have a unique signature or "header" that distinguishes them from other types of files. More specifically, the first four bytes of most JPEGs are either

0xff 0xd8 0xff 0xe0

0xff 0xd8 0xff 0xe1

where we read the bytes from left to right, first to fourth. If you scan the raw bytes of the flash card and come across these patterns of bytes, it is highly likely that you have found a JPEG.

FAT and storing JPEGs on the flash card

Even though we can find the beginning of a JPEG, this is not necessarily the end of the story. The way that a JPEG (more generally, any file) is stored on the flash card is also important. For example, if the JPEG is stored on many different, non-contiguous memory blocks of the flash card (e.g., due to fragmentation), then a simple, naive scan of the flash card will erroneously put these unrelated blocks together. This may result in a corrupt or unreadable file.

Luckily for us, digital cameras only write to the flash card to save a photo. And also, since the camera was new, our victim probably did not get a chance to delete any files from it. Thus, it is safe to assume that the photos are contiguously stored on the card.

Furthermore, many flash card/camera systems use a FAT file system where blocks have a size of 512 bytes. This means that cameras only write to the flash card in blocks of 512 bytes. So, for example, a file that is 520 bytes uses the same number of blocks (2 blocks) on the memory card (and thus the same amount of storage) as a file that takes 1024 bytes. This unused space in the first case (504 bytes) is called slack space. Since the camera and flash card are new, the slack space should be all 0s (so it shouldn't hurt if we appended this "data" to the end of a .jpg file).

Instructions

Extracting Evidence

With all this in mind, we can come up with a basic strategy for recovering the JPEGs from the damaged flash card. We can iterate over the bytes of the flash card and look for JPEG headers. Once we find a header, we can open a new file and start filling that file with bytes from the flash card. Once we find a new header, we can close the previous file and open a new file for writing, continuing in this manner until we reach the end of the flash card.

Since I can't give everyone in the class a physical copy of the flash card, I've made an image of the flash card for you to use. The image only contains the portion of the flash card that we suspect contains the data in question. It can be found here:

evidence.raw

Given this file, create a program called recover that asks the user for the name of the raw data file and extracts all JPEGs from that file. The extracted JPEGs should be named ###.jpg where "###" is a three-digit decimal number starting at 000. For example, if there are three files in the card image, then the program should create and write to files 000.jpg, 001.jpg, and 002.jpg.

Finding the Killer

After you've extracted the photos, take a look at them and figure out the mystery. Who killed Poor G. Student? Like any good crime, there must be means, motive, and opportunity. Create a text file, solutions.txt that describes:

Who committed the crime?
How did they commit the crime (i.e., what weapon did they use)?
Why did they commit the crime?

You will get full credit for simply attempting to answer these questions. But don't forget to show off your sleuthing skills -- and be creative! Remember that graduate students are silly, superficial beings so don't be afraid to let that factor into your conclusion.

Tips

Since this assignment focuses on streams, you will have to use many stream-related functions from the C++ library. In particular, you should keep in mind the classes ifstream, ofstream, ostringstream, and the manipulators from <iomanip>.
You are encouraged to use the cplusplus.com website for reference.
Unfortunately, there is no "byte" type in C++. It is possible to instead use char since it is defined to be the size of one byte. Remember, however, that char is a signed value in C++, and the values you might compare it against, such as 0xff are unsigned. You will have to use casts to sort this out.
Keep in mind that operator>> extracts formatted input and treats whitespace specially. Thus, it will not work correctly for iterating over the contents of the raw files.
You will need to open up and inspect the images to verify that your program is doing the right thing (and to solve the mystery!). The easiest way to do this is to download the images from your eniac fileshare onto your computer and view them in Firefox or some other image viewer. You can use an FTP program (such as filezilla) to accomplish this. See the CETS article on transferring files for Windows and Mac (with Fetch). sftp is a command-line option that functions similarly to ssh, with added get and put commands for retrieving files from the server.
In the interest of avoiding unnecessary consts, you are allowed to hard code in the JPEG header bytes (0xff, etc.).

Example Output

Here is an example of my recover program on a test card image:

> ls
example_evidence.raw  Makefile  recover  recover.cpp
> ./recover
What is the name of the file storing the flash card data? example_evidence.raw
Recovered 3 jpegs from example_evidence.raw...
> ls
000.jpg  001.jpg  002.jpg  Makefile  example_evidence.raw  recover  recover.cpp
>

You do not need to duplicate the console output from my program, but your program should give the user helpful information as it works.

Challenge (3 extra-credit points)

Make your own raw file conforming to these specifications. When your recover program is run on this file, it should extract at least 3 images. (The images do not have to tell a story, but the TAs certainly would enjoy a laugh during their long, lonely hours of grading!) Please name this file something other than evidence.raw (we don't want the police investigators to get confused).

Deliverables

recover.cpp
Makefile -- where running the make command compiles your recover.cpp code and creates an executable called recover
solutions.txt -- your answers to the murder mystery questions
Optional: your raw file for the challenge problem (make sure you name it something other than evidence.raw)