CIT 5950 (Spring 2023) HW 01: File Readers

Implementing C++ classes that abstract away POSIX file I/O.

Goals

Collaboration

For assignments in CIT 5950, you will complete each of them on your own or solo. However, you may discuss high-level ideas with other students, but any viewing, sharing, copying, or dictating of code is forbidden. If you are worried about whether something violates academic integrity, please post on Ed or contact the instructor.

Contents

Overview

A commonly utilized feature in programming is file I/O (Input/Output). In past programming experiences, you have likely used a standard library implementation that provided many useful features. In this homework assignment, you will be implementing two file readers that have similarities with other higher level file I/O implementations. You will start with a simpler file reader, and the move on to create a file reader that implements more features and utilizes caching to aid performance.

Each file reader will be their own class, meaning we will have to define the constructor, destructor, and methods for our implementation.

Note that our user-defined programs don’t have direct access to files and disk. These are protected by the operating system. As a result, both of our readers will make use of system calls through POSIX (Portable Operating System Interface X) to perform the file I/O. You will likely find the following POSIX functions necessary for your implementation:

For each of the functions, note that we provided a link to some documentation. You can also access that documentation through the terminal by typing in the link text as a command (e.g. typing in man 2 open in the terminal).

SimpleFileReader

This is the first file reader that you will implement as part of the homework. SimpleFileReader, as its name suggests, is a simple wrapper around the POSIX file I/O implementation. One can open a file, close a file, read one or more characters, and a few other simple features. Most of the work done in these functions will be handled largely by POSIX, but it is up to you though to figure out how to do this. See the Instructions section below for details on the code files

There are also further notes about this reader in SimpleFileReader.h. We recommend that you read through this file before you start your implementation.

BufferedFileReader

This is the second (and last) file reader you will implement for this homework. You will notice some similarities between BufferedFileReader and SimpleFileReader, but there is some added complexity.

BufferedFileReader implements a caching strategy. To understand this strategy, consider the following example:

BufferedFileReader bf (some_file_name);
char c = bf.get_char();

If we had used SimpleFileReader in this case, get_char() would simply call read() requesting one character from the file. Instead, BufferedFileReader maintains an internal buffer (char array) of size 1024, and will use this array to minimize calls to POSIX read(). In the code example above, when BufferedFileReader calls get_char(), it will not just read 1 character, instead it will call read() and try to fill in the entire buffer of 1024 characters. While this is more work initially, this means that on subsequent calls to get_char(), BufferedFileReader can simply return the next char in its internal buffer, rather than having to call POSIX read(). Once you have reached the end of the buffer and need to process more characters, the BufferedFileReader should attempt to repopulate the buffer with the next 1024 characters of the file. Other functions for reading like get_token and get_line will also make use of the buffer. To maintain the buffer requires multiple fields, which are described in BufferedFileReader.h.

BufferedFileReader also implements the ability to read “tokens” from the file. A token is a sequence of characters whose end is marked by a delimiter character and does not contain any delimiters in it. Note that the “end of file” character can be thought of as a delimiter. Also note that a token can be the empty string. For example: if the file had the contents “hi,there,,aaaaa,!0 fds” and the delimiters for the file was specified to only be the ‘,’ character, then the tokens in the file would be:

There are also further notes about this reader in BufferedFileReader.h. We recommend that you read through this file before you start your implementation.

Instructions

You will find the starter code in Codio, where you will also submit the assignment. Among these files, a few stand out:

We recommend reading all of these files before your start, especially the .h files listed. Note: you are only allowed to modify the following files SimpleFileReader.cc, BufferedFileReader.cc, and BufferedFileReader.h. This means that your code should work with the other files as they are when initially supplied. Note that you are allowed to modify BufferedFileReader.h but it still must work with the tests. This means that any modifications you may make would likely be for adding new private data members or private helper methods. We even suggest a few private helper methods in the Hints section below. Additionally, you can modify the test files if you like (for debug purposes), but we will be testing your readers against un-modified versions of the files.

Suggested Approach

Below we have provided a suggested approach to this homework. Note that you are not required to follow this ordering if you believe another approach would work better for you. Also note that you can gradually check your progress, and run specific tests. Look at the gtest section below for more details on running individual tests.

  1. Start by populating SimpleFileReader.cc and BufferedFileReader.cc with “empty” definitions of every member function. Afterwards, make sure that you can compile successfully. For example, you would write the following “empty” function in SimpleFileReader.cc:
int SimpleFileReader::tell() {
  return -1;
}
  1. Implement the constructor, destructor, open_file(), and close_file() of SimpleFileReader. Then make sure you pass the Test_SimpleFileReader.open_close test.

  2. Implement get_char() and tell() and make sure you pass all SimpleFileReader tests except for complex, and get_chars.

  3. Implement get_chars(), rewind(), and good() and make sure you pass all SimpleFileReader tests. Note that for good(), you may have to add on to the functions you have already written.

  4. Implement the constructor, destructor, open_file(), and close_file() of BufferedFileReader. Then make sure you pass the Test_BufferedFileReader.open_close test.

  5. Implement get_char() and make sure you now pass the Test_BufferedFileReader.Basic test

  6. Implement tell(), rewind(), and good(). Make sure you passs the Test_BufferedFileReader.get_char test.

  7. Implement get_token() and make sure you pass the Test_BufferedFileReader.get_token test.

  8. Implement the get_line() function and make sure you pass all tests in the test_suite.

  9. Run your test_suite under valgrind and fix any valgrind errors.

Hints

Here are a few hints and tips that you may find useful when approaching this homework.

  1. Don’t wait until the end to test. As mentioned in the suggest approach above, you can create “empty” definitions for functions, just enough to compile, so that you can test your other functions.
  2. There are many utilities of the string class that you may find useful, such as string concatenation, and the find() member function.
  3. File descriptors will be positive, so you can use a negative number to represent an invalid file descriptor.
  4. Create some helper functions to aid in your implementation of BufferedFileReader. Some example functions include
    • bool is_delim(char c); which checks to see if a character is a delimiter.
    • void fill_buffer(); which refills the internal buffer with 1024 characters and will set data memebers so that the next time you a character is retrieved from the buffer, it is retrieved from the start of the buffer.
  5. You may find it useful to have get_token() and get_line() to call the get_char() member function, and just have get_char() manage the buffer and it’s related data members.
  6. You can add new data members (fields) to BufferedFileReader.h if that would be useful. However, you MUST make use of the data member buffer.
  7. EOF is a literal value that represents the end of file as a character. Example: char end = EOF;
  8. get_line() returns a variable sized array that is on the heap. Since you don’t know the size of the array you will need, it is intended that you re-allocate the size of the array to be bigger each time you run out of space in the array. However, it is acceptable to simply allocate a large array and assume no line will contain enough tokens to exceed that length. What number is large enough would be for you to figure out.

Grading and Testing

Compilation

We have supplied you with a makefile that can be used for compiling your code into an executable. To do this, open the terminal in codio (this can be done by selecting Tools -> Terminal) and then type in make.

You may need to resolve any compiler warnings and compiler errors that show up. Once all compiler errors have been resolved, if you ls in the terminal, you should be able to see an executable called test_suite. You can then run this by typing in ./test_suite to see the evaluation of your code.

Note that your submission will be partially evaluated on the number of compiler warnings. You should eliminate ALL compiler warnings in your code.

Valgrind

We will also test your submission on whether there are any memory errors or memory leaks. We will be using valgrind to do this. To do this, you should try running: valgrind --leak-check=full ./test_suite

If everything is correct, you should see the following towards the bottom of the output:

 ==1620== All heap blocks were freed -- no leaks are possible
 ==1620==
 ==1620== For counts of detected and suppressed errors, rerun with: -v
 ==1620== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

If you do not see something similar to the above in your output, valgrind will have printed out details about where the errors and memory leaks occurred.

Note: It is expected to take a while for valgrind to run on the whole test_suite. You can possible go faster by only running valgrind on tests. See the Gtest section below for information on running individual tests.

gtest

As with hw0, you can compile the your implementation by using the make command. This will result in several output files, including an executable called test_suite.

After compiling your solution with make, You can run all of the tests for the homwork by invoking:

./test_suite

You can also run only specific tests by passing command line arguments into test_suite

For example, to only run the SimpleFileReader tests, you can type in:

./test_suite --gtest_filter=Test_SimpleFileReader.*

If you only want to test open and close from SimpleFileReader, you can type in:

./test_suite --gtest_filter=Test_SimpleFileReader.open_close

You can specify which tests are run for any of the tests in the assignment. You just need to know the names of the tests, and you can do this by running:

./test_suite --gtest_list_tests

These settings can be helpful for debugging specific parts of the assignment, especially since test_suite can be run with these settings through valgrind and gdb!

Submission:

Please submit your completed SimpleFilerReader.cc, BufferedFileReader.cc and BufferedFileReader.h to Gradescope

Note: It is expected to take a while for this to run. The Sample solution takes close to 4 minutes to finish running on the autograder.