This homework will focus heavily on the use of the different streams that are available in the C++ libraries. (This homework is based on an assignment created by David J. Malan.)
Philadelphia, PA (DP) --- The body of a graduate student was found yesterday in the graduate student offices of the CIS department in Levine Hall. Campus police said that graduate students in adjacent offices heard noises coming from the office in question earlier in the day. However, since all graduate students are anti-social and never leave their desks because they're constantly slaving away at research, no one actually witnessed the incident or saw the body. The janitorial staff reported the incident to campus police when they found the body during their nightly rounds.
Police released the identity of the graduate student as one Poor G. Student. They also stated that Student was found dead in their own office although the police did not release how they were murdered.
Campus police have arrested four graduate students under suspicion of murder. All four graduate students are inhabitants of the office. However, insider sources have claimed that the police have no evidence that these students have committed the acts, as the office was found by campus police to be clean, with no sign of the weapons that were used in the attack. Our insider sources also claim that the only piece of evidence found at the scene was a digital camera destroyed in the attack, possibly held by the victim while the attack was taking place. Unfortunately, the memory card of the camera was damaged in the alleged attack as well.
As computer scientists without direct ties to the CIS graduate students, you have been tasked by the campus police with trying to extract data from the damaged flash card found at the scene of the murder. While we can't directly load the data on the card, we can still use our hacking skills to extract the evidence. To do so, we first need to know some things about JPEGs and how they are stored on the flash card.
Most JPEGs have a unique signature or "header" that distinguishes them from other types of files. More specifically, the first four bytes of most JPEGs are either
0xff 0xd8 0xff 0xe0or
0xff 0xd8 0xff 0xe1where we read the bytes from left to right, first to fourth. If you scan the raw bytes of the flash card and come across these patterns of bytes, it is highly likely that you have found a JPEG.
Even though we can find the beginning of a JPEG, this is not necessarily the end of the story. The way that a JPEG (more generally, any file) is stored on the flash card is also important. For example, if the JPEG is stored on many different, non-contiguous memory blocks of the flash card (e.g., due to fragmentation), then a simple, naive scan of the flash card will erroneously put these unrelated blocks together. This may result in a corrupt or unreadable file.
Luckily for us, digital cameras only write to the flash card to save a photo. And also, since the camera was new, our victim probably did not get a chance to delete any files from it. Thus, it is safe to assume that the photos are contiguously stored on the card.
Furthermore, many flash card/camera systems use a FAT file system where blocks have a size of 512 bytes. This means that cameras only write to the flash card in blocks of 512 bytes. So, for example, a file that is 520 bytes uses the same number of blocks (2 blocks) on the memory card (and thus the same amount of storage) as a file that takes 1024 bytes. This unused space in the first case (504 bytes) is called slack space. Since the camera and flash card are new, the slack space should be all 0s (so it shouldn't hurt if we appended this "data" to the end of a .jpg file).
With all this in mind, we can come up with a basic strategy for recovering the JPEGs from the damaged flash card. We can iterate over the bytes of the flash card and look for JPEG headers. Once we find a header, we can open a new file and start filling that file with bytes from the flash card. Once we find a new header, we can close the previous file and open a new file for writing, continuing in this manner until we reach the end of the flash card.
Since I can't give everyone in the class a physical copy of the flash card, I've made an image of the flash card for you to use. The image only contains the portion of the flash card that we suspect contains the data in question. It can be found here:
Given this file, create a program called recover that asks the user for the name of the raw data file and extracts all JPEGs from that file. The extracted JPEGs should be named ###.jpg where "###" is a three-digit decimal number starting at 000. For example, if there are three files in the card image, then the program should create and write to files 000.jpg, 001.jpg, and 002.jpg.
After you've extracted the photos, take a look at them and figure out the mystery. Who killed Poor G. Student? Like any good crime, there must be means, motive, and opportunity. Create a text file, solutions.txt that describes:
You will get full credit for simply attempting to answer these questions. But don't forget to show off your sleuthing skills -- and be creative! Remember that graduate students are silly, superficial beings so don't be afraid to let that factor into your conclusion.
> ls example_evidence.raw Makefile recover recover.cpp > ./recover What is the name of the file storing the flash card data? example_evidence.raw Recovered 3 jpegs from example_evidence.raw... > ls 000.jpg 001.jpg 002.jpg Makefile example_evidence.raw recover recover.cpp >
You do not need to duplicate the console output from my program, but your program should give the user helpful information as it works.
Make your own raw file conforming to these specifications. When your recover program is run on this file, it should extract at least 3 images. (The images do not have to tell a story, but the TAs certainly would enjoy a laugh during their long, lonely hours of grading!) Please name this file something other than evidence.raw (we don't want the police investigators to get confused).