Goals
- Practice with objects in C++
- Gain familiarity with using POSIX, specifically for FILE I/O
- See the relationship between user-level and system level libraries
- Analyze the performance gain from utilizing buffering/caching when doing I/O
Collaboration
For assignments in CIT 5950, you will complete each of them on your own or solo. However, you may discuss high-level ideas with other students, but any viewing, sharing, copying, or dictating of code is forbidden. If you are worried about whether something violates academic integrity, please post on Ed or contact the instructor.
Contents
Overview
A commonly utilized feature in programming is file I/O (Input/Output). In past programming experiences, you have likely used a standard library implementation that provided many useful features. In this homework assignment, you will be implementing two file readers that have similarities with other higher level file I/O implementations. You will start with a simpler file reader, and the move on to create a file reader that implements more features and utilizes caching to aid performance.
Each file reader will be their own class, meaning we will have to define the constructor, destructor, and methods for our implementation.
Note that our user-defined programs don’t have direct access to files and disk. These are protected by the operating system. As a result, both of our readers will make use of system calls through POSIX (Portable Operating System Interface X) to perform the file I/O. You will likely find the following POSIX functions necessary for your implementation:
-
int open(const char *pathname, int flags);
man 2 open -
int close(int fd);
man 2 close -
ssize_t read(int fd, void*buf, size_t count)
man 2 read -
off_t lseek(int fd, off_t offset, int whence);
man 2 lseek
For each of the functions, note that we provided a link to some documentation.
You can also access that documentation through the terminal by typing in the link text as a command (e.g. typing in man 2 open
in the terminal).
Note: You are required to use these POSIX Functions. If we detect that you are using something other than POSIX to preform the File I/O (e.g. fopen
, ifstream
, etc.), then we will be awarding 0 points.
SimpleFileReader
This is the first file reader that you will implement as part of the homework. SimpleFileReader, as its name suggests, is a simple wrapper around the POSIX file I/O implementation. One can open a file, close a file, read one or more characters, and a few other simple features. Most of the work done in these functions will be handled largely by POSIX, but it is up to you though to figure out how to do this. See the Instructions section below for details on the code files
There are also further notes about this reader in SimpleFileReader.h
.
We recommend that you read through this file before you start your implementation.
BufferedFileReader
We highly recommend you have SimpleFileReader pass all tests before attempting BufferedFileReader.
This is the second (and last) file reader you will implement for this homework.
You will notice some similarities between BufferedFileReader
and SimpleFileReader
, but there is some added complexity.
BufferedFileReader
implements a caching strategy. To understand this strategy, consider the following example:
BufferedFileReader bf (some_file_name);
char c = bf.get_char();
If we had used SimpleFileReader
in this case, get_char()
would simply call read()
requesting one character from the file.
Instead, BufferedFileReader maintains an internal buffer (char
array) of size 1024, and will use this array to minimize calls to POSIX read()
.
In the code example above, when BufferedFileReader calls get_char()
, it will not just read 1 character, instead it will call read()
and try to fill in the entire buffer of 1024 characters.
While this is more work initially, this means that on subsequent calls to get_char()
, BufferedFileReader can simply return the next char
in its internal buffer, rather than having to call POSIX read()
.
Once you have reached the end of the buffer and need to process more characters, the BufferedFileReader should attempt to repopulate the buffer with the next 1024 characters of the file.
Other functions for reading like get_token
and get_line
will also make use of the buffer.
To maintain the buffer requires multiple fields, which are described in BufferedFileReader.h
.
It is important for the testing code that whenever you need to fill the buffer, that your code attempts to fill in all 1024 characters of the buffer if possible.
BufferedFileReader also implements the ability to read “tokens” from the file. A token is a sequence of characters whose end is marked by a delimiter character and does not contain any delimiters in it. Note that the “end of file” character can be thought of as a delimiter. Also note that a token can be the empty string. For example: if the file had the contents “hi,there,,aaaaa,!0 fds” and the delimiters for the file was specified to only be the ‘,’ character, then the tokens in the file would be:
- “hi”
- “there”
- ””
- “aaaaa”
- “!0 fds”
There are also further notes about this reader in BufferedFileReader.h
.
We recommend that you read through this file before you start your implementation.
Instructions
You will find the starter code in Codio, where you will also submit the assignment. Among these files, a few stand out:
-
SimpleFileReader.h
: Contains the declarations and extensive comments detailing the functions you will have to implement for the SimpleFileReader. -
SimpleFileReader.cc
: Where you will be writing all of your code for the SimpleFileReader. You will have to implement all of the functions specified inSimpleFileReader.h
. -
BufferedFileReader.h
: Contains the declarations and extensive comments detailing the functions you will have to implement for the BufferedFileReader. -
BufferedFileReader.cc
: Where you will be writing all of your code for the BufferedFileReader. You will have to implement all of the functions specified inBufferedFileReader.h
. -
test_simplefilereader.cc
: The tests that we will be using to evaluate your SimpleFileReader. -
test_bufferedfilereader.cc
: The tests that we will be using to evaluate your BufferedFileReader. -
test_performance.cc
: contains a single test used to compare the performance of the two readers.
We recommend reading all of these files before your start, especially the .h
files listed.
Note: you are only allowed to modify the following files SimpleFileReader.cc
, BufferedFileReader.cc
, and BufferedFileReader.h
.
This means that your code should work with the other files as they are when initially supplied.
Note that you are allowed to modify BufferedFileReader.h
but it still must work with the tests.
This means that any modifications you may make would likely be for adding new private data members or private helper methods.
We even suggest a few private helper methods in the Hints section below.
Additionally, you can modify the test files if you like (for debug purposes), but we will be testing your readers against un-modified versions of the files.
There are many aspects of C++ that have not been covered in the course and this assignment was designed to only take advantadge of what has been covered. Students who have used topics that haven’t been talked about in the course (like STL containers) have generally found it made their logic more complicated. As a result, we recommend mostly sticking with things covered in the course.
Suggested Approach
Below we have provided a suggested approach to this homework. Note that you are not required to follow this ordering if you believe another approach would work better for you. Also note that you can gradually check your progress, and run specific tests. Look at the gtest section below for more details on running individual tests.
- Start by populating
SimpleFileReader.cc
andBufferedFileReader.cc
with “empty” definitions of every member function. Afterwards, make sure that you can compile successfully. For example, you would write the following “empty” function in SimpleFileReader.cc:
int SimpleFileReader::tell() {
return -1;
}
-
Implement the constructor, destructor,
open_file()
, andclose_file()
ofSimpleFileReader
. Then make sure you pass theTest_SimpleFileReader.open_close
test. -
Implement
get_char()
andtell()
and make sure you pass allSimpleFileReader
tests except forcomplex
, andget_chars
. -
Implement
get_chars()
,rewind()
, andgood()
and make sure you pass allSimpleFileReader
tests. Note that forgood()
, you may have to add on to the functions you have already written. -
Implement the constructor, destructor,
open_file()
, andclose_file()
ofBufferedFileReader
. Then make sure you pass theTest_BufferedFileReader.open_close
test. -
Implement
get_char()
and make sure you now pass theTest_BufferedFileReader.Basic
test -
Implement
tell()
,rewind()
, andgood()
. Make sure you passs theTest_BufferedFileReader.get_char
test. -
Implement
get_token()
and make sure you pass theTest_BufferedFileReader.get_token
test. -
Implement the
get_line()
function and make sure you pass all tests in thetest_suite
. -
Run your
test_suite
undervalgrind
and fix any valgrind errors.
NOTE: Just becuase you pass a test mentioned in one of the steps above doesn’t garuntee that the function is now correct. Other tests may reveal errors that you may have to go back and fix.
Hints
Here are a few hints and tips that you may find useful when approaching this homework.
- Don’t wait until the end to test. As mentioned in the suggest approach above, you can create “empty” definitions for functions, just enough to compile, so that you can test your other functions.
- There are many utilities of the string class that you may find useful, such as string concatenation, and the
find()
member function. - File descriptors will be positive, so you can use a negative number to represent an invalid file descriptor.
- Create some helper functions to aid in your implementation of BufferedFileReader. Some example functions include
-
bool is_delim(char c);
which checks to see if a character is a delimiter. -
void fill_buffer();
which refills the internal buffer with 1024 characters and will set data memebers so that the next time you a character is retrieved from the buffer, it is retrieved from the start of the buffer.
-
- You may find it useful to have
get_token()
andget_line()
to call theget_char()
member function, and just haveget_char()
manage the buffer and it’s related data members. - You can add new data members (fields) to BufferedFileReader.h if that would be useful. However, you MUST make use of the data member
buffer
. -
EOF
is a literal value that represents the end of file as a character. Example:char end = EOF;
-
get_line()
returns a variable sized array that is on the heap. Since you don’t know the size of the array you will need, it is intended that you re-allocate the size of the array to be bigger each time you run out of space in the array. However, it is acceptable to simply allocate a large array and assume no line will contain enough tokens to exceed that length. What number is large enough would be for you to figure out.
Grading and Testing
Compilation
We have supplied you with a makefile
that can be used for compiling your code into an executable. To do this, open the terminal in codio (this can be done by selecting Tools -> Terminal) and then type in make
.
You may need to resolve any compiler warnings and compiler errors that show up. Once all compiler errors have been resolved, if you ls
in the terminal, you should be able to see an executable called test_suite
. You can then run this by typing in ./test_suite
to see the evaluation of your code.
Note that your submission will be partially evaluated on the number of compiler warnings. You should eliminate ALL compiler warnings in your code.
Valgrind
We will also test your submission on whether there are any memory errors or memory leaks. We will be using valgrind to do this. To do this, you should try running:
valgrind --leak-check=full ./test_suite
If everything is correct, you should see the following towards the bottom of the output:
==1620== All heap blocks were freed -- no leaks are possible
==1620==
==1620== For counts of detected and suppressed errors, rerun with: -v
==1620== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
If you do not see something similar to the above in your output, valgrind will have printed out details about where the errors and memory leaks occurred.
Note: It is expected to take a while for valgrind to run on the whole test_suite. You can possible go faster by only running valgrind on tests. See the Gtest section below for information on running individual tests.
gtest
As with hw0, you can compile the your implementation by using the make
command. This will result in several output files, including an executable called test_suite
.
After compiling your solution with make
, You can run all of the tests for the homwork by invoking:
./test_suite
You can also run only specific tests by passing command line arguments into test_suite
For example, to only run the SimpleFileReader tests, you can type in:
./test_suite --gtest_filter=Test_SimpleFileReader.*
If you only want to test open and close from SimpleFileReader, you can type in:
./test_suite --gtest_filter=Test_SimpleFileReader.open_close
You can specify which tests are run for any of the tests in the assignment. You just need to know the names of the tests, and you can do this by running:
./test_suite --gtest_list_tests
These settings can be helpful for debugging specific parts of the assignment, especially since test_suite
can be run with these settings through valgrind
and gdb
!
Submission:
Please submit your completed SimpleFilerReader.cc
, BufferedFileReader.cc
and BufferedFileReader.h
to Gradescope
Note: It is expected to take a while for this to run. The Sample solution takes close to 4 minutes to finish running on the autograder.