Goals
- Practice with objects in C++
- Gain familiarity with using POSIX, specifically for FILE I/O
- See the relationship between user-level and system level libraries
- Analyze the performance gain from utilizing buffering/caching when doing I/O
Collaboration
For assignments in CIT 5950, you will complete each of them on your own or solo. However, you may discuss high-level ideas with other students, but any viewing, sharing, copying, or dictating of code is forbidden. If you are worried about whether something violates academic integrity, please post on Ed or contact the instructor.
Contents
Setup
For this assignment, you need to setup a Linux C++ development environment. You can use the same environment as the one used in the previous assignment. The instructions are here if you need them: Enivronment setup
You can downlowd the starter files into your docker container by running the following command:
curl -o readers.zip https://www.seas.upenn.edu/~cit5950/current/projects/code/readers.zip
You can also download the files manually here if you would like: readers.zip
From here, you need to extract the files by running
unzip readers.zip
From here you can either open the project in Vim or VSCode.
For Vim, you just need to run
cd readers
vim SimpleFileReader.cpp
For VSCode you will have to follow steps similar to what we did to open chcek_setup
in the setup document.
Overview
A commonly utilized feature in programming is file I/O (Input/Output). In past programming experiences, you have likely used a standard library implementation that provided many useful features. In this homework assignment, you will be implementing two file readers that have similarities with other higher level file I/O implementations. You will start with a simpler file reader, and the move on to create a file reader that implements more features and utilizes caching to aid performance.
Each file reader will be their own class, meaning we will have to define the constructor, destructor, and methods for our implementation.
Note that our user-defined programs don’t have direct access to files and disk. These are protected by the operating system. As a result, both of our readers will make use of system calls through POSIX (Portable Operating System Interface X) to perform the file I/O. You will likely find the following POSIX functions necessary for your implementation:
-
int open(const char *pathname, int flags);
man 2 open -
int close(int fd);
man 2 close -
ssize_t read(int fd, void*buf, size_t count)
man 2 read -
off_t lseek(int fd, off_t offset, int whence);
man 2 lseek
For each of the functions, note that we provided a link to some documentation.
You can also access that documentation through the terminal by typing in the link text as a command (e.g. typing in man 2 open
in the terminal).
Note: You are required to use these POSIX Functions. If we detect that you are using something other than POSIX to preform the File I/O (e.g. fopen
, ifstream
, etc.), then we will be awarding 0 points.
SimpleFileReader
This is the first file reader that you will implement as part of the homework. SimpleFileReader, as its name suggests, is a simple wrapper around the POSIX file I/O implementation. One can open a file, close a file, read one or more characters, and a few other simple features. Most of the work done in these functions will be handled largely by POSIX, but it is up to you though to figure out how to do this. See the Instructions section below for details on the code files
There are also further notes about this reader in SimpleFileReader.h
.
We recommend that you read through this file before you start your implementation.
BufferedFileReader
We highly recommend you have SimpleFileReader pass all tests before attempting BufferedFileReader.
This is the second (and last) file reader you will implement for this homework.
You will notice some similarities between BufferedFileReader
and SimpleFileReader
, but there is some added complexity.
BufferedFileReader
implements a caching strategy. To understand this strategy, consider the following example:
BufferedFileReader bf (some_file_name);
char c = bf.get_char();
If we had used SimpleFileReader
in this case, get_char()
would simply call read()
requesting one character from the file.
Instead, BufferedFileReader maintains an internal buffer (char
array) of size 1024, and will use this array to minimize calls to POSIX read()
.
In the code example above, when BufferedFileReader calls get_char()
, it will not just read 1 character, instead it will call read()
and try to fill in the entire buffer of 1024 characters.
While this is more work initially, this means that on subsequent calls to get_char()
, BufferedFileReader can simply return the next char
in its internal buffer, rather than having to call POSIX read()
.
Once you have reached the end of the buffer and need to process more characters, the BufferedFileReader should attempt to repopulate the buffer with the next 1024 characters of the file.
Other functions for reading like get_token
and get_line
will also make use of the buffer.
To maintain the buffer requires multiple fields, which are described in BufferedFileReader.h
.
It is important for the testing code that whenever you need to fill the buffer, that your code attempts to fill in all 1024 characters of the buffer if possible.
BufferedFileReader also implements the ability to read “tokens” from the file. A token is a sequence of characters whose end is marked by a delimiter character and does not contain any delimiters in it. Note that the “end of file” character can be thought of as a delimiter. Also note that a token can be the empty string. For example: if the file had the contents “hi,there,,aaaaa,!0 fds” and the delimiters for the file was specified to only be the ‘,’ character, then the tokens in the file would be:
- “hi”
- “there”
- ””
- “aaaaa”
- “!0 fds”
There are also further notes about this reader in BufferedFileReader.h
.
We recommend that you read through this file before you start your implementation.
Instructions
You will find the starter code in Codio, where you will also submit the assignment. Among these files, a few stand out:
-
SimpleFileReader.hpp
: Contains the declarations and extensive comments detailing the functions you will have to implement for the SimpleFileReader. -
SimpleFileReader.cpp
: Where you will be writing all of your code for the SimpleFileReader. You will have to implement all of the functions specified inSimpleFileReader.hpp
. -
BufferedFileReader.hpp
: Contains the declarations and extensive comments detailing the functions you will have to implement for the BufferedFileReader. -
BufferedFileReader.cpp
: Where you will be writing all of your code for the BufferedFileReader. You will have to implement all of the functions specified inBufferedFileReader.hpp
. -
test_simplefilereader.cpp
: The tests that we will be using to evaluate your SimpleFileReader. -
test_bufferedfilereader.cpp
: The tests that we will be using to evaluate your BufferedFileReader. -
test_performance.cpp
: contains a single test used to compare the performance of the two readers.
We recommend reading all of these files before your start, especially the .hpp
files listed.
Note: you are only allowed to modify the following files SimpleFileReader.cpp
, BufferedFileReader.cpp
, and BufferedFileReader.hpp
.
This means that your code should work with the other files as they are when initially supplied.
Note that you are allowed to modify BufferedFileReader.hpp
but it still must work with the tests.
This means that any modifications you may make would likely be for adding new private data members or private helper methods.
We even suggest a few private helper methods in the Hints section below.
Additionally, you can modify the test files if you like (for debug purposes), but we will be testing your readers against un-modified versions of the files.
Suggested Approach
Below we have provided a suggested approach to this homework. Note that you are not required to follow this ordering if you believe another approach would work better for you. Also note that you can gradually check your progress, and run specific tests. Look at the catch2 section below for more details on running individual tests.
- Start by populating
SimpleFileReader.cpp
andBufferedFileReader.cpp
with “empty” definitions of every member function. Afterwards, make sure that you can compile successfully. For example, you would write the following “empty” function in SimpleFileReader.cc:
int SimpleFileReader::tell() const {
return -1;
}
-
Implement the constructor, destructor,
open_file()
, andclose_file()
ofSimpleFileReader
. Then make sure you pass theTest_SimpleFileReader.open_close
test. -
Implement
get_char()
andtell()
and make sure you pass allSimpleFileReader
tests except forcomplex
, andget_chars
. -
Implement
get_chars()
,rewind()
, andgood()
and make sure you pass allSimpleFileReader
tests. Note that forgood()
, you may have to add on to the functions you have already written. -
Implement the constructor, destructor,
open_file()
, andclose_file()
ofBufferedFileReader
. Then make sure you pass theTest_BufferedFileReader.open_close
test. -
Implement
get_char()
and make sure you now pass theTest_BufferedFileReader.Basic
test -
Implement
tell()
,rewind()
, andgood()
. Make sure you passs theTest_BufferedFileReader.get_char
test. -
Implement
get_token()
and make sure you pass theTest_BufferedFileReader.get_token
test. -
Implement the
get_line()
function and make sure you pass all tests in thetest_suite
. -
Run your
test_suite
undervalgrind
and fix any valgrind errors.
NOTE: Just becuase you pass a test mentioned in one of the steps above doesn’t garuntee that the function is now correct. Other tests may reveal errors that you may have to go back and fix.
Hints
Here are a few hints and tips that you may find useful when approaching this homework.
- Don’t wait until the end to test. As mentioned in the suggest approach above, you can create “empty” definitions for functions, just enough to compile, so that you can test your other functions.
- There are many utilities of the string class that you may find useful, such as string concatenation, and the
find()
member function. - File descriptors will be positive, so you can use a negative number to represent an invalid file descriptor.
- Create some helper functions to aid in your implementation of BufferedFileReader. Some example functions include
-
bool is_delim(char c);
which checks to see if a character is a delimiter. -
void fill_buffer();
which refills the internal buffer with 1024 characters and will set data memebers so that the next time you a character is retrieved from the buffer, it is retrieved from the start of the buffer.
-
- You may find it useful to have
get_token()
andget_line()
to call theget_char()
member function, and just haveget_char()
manage the buffer and it’s related data members. - You can add new data members (fields) to BufferedFileReader.hpp if that would be useful. However, you MUST make use of the data member
buffer
. -
EOF
is a literal value that represents the end of file as a character. Example:char end = EOF;
-
get_line()
returns a variable sized array that is on the heap. Since you don’t know the size of the array you will need, it is intended that you re-allocate the size of the array to be bigger each time you run out of space in the array. However, it is acceptable to simply allocate a large array and assume no line will contain enough tokens to exceed that length. What number is large enough would be for you to figure out.
Grading and Testing
We will be evaluating your submission mostly automatically. Specifically, we are checking for the following automatically:
- that you have no compilation errors or warnings
- that catch2 passes all tests
- that you have no valgrind errors
- that clang-tidy and clang-format contain no errors
More details on each of these are in the below sections
We will also briefly check submission by hand to make sure you do not do anything weird to pass the tests with an incorrect solution.
Compilation
We have supplied you with a Makefile
that can be used for compiling your code into an executable. To do this, open the terminal in codio (this can be done by selecting Tools -> Terminal) and then type in make
.
You may need to resolve any compiler warnings and compiler errors that show up. Once all compiler errors have been resolved, if you ls
in the terminal, you should be able to see an executable called test_suite
. You can then run this by typing in ./test_suite
to see the evaluation of your code.
Note that your submission will be partially evaluated on the number of compiler warnings. You should eliminate ALL compiler warnings in your code.
Valgrind
We will also test your submission on whether there are any memory errors or memory leaks. We will be using valgrind to do this. To do this, you should try running:
valgrind --leak-check=full ./test_suite
If everything is correct, you should see the following towards the bottom of the output:
==1620== All heap blocks were freed -- no leaks are possible
==1620==
==1620== For counts of detected and suppressed errors, rerun with: -v
==1620== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
If you do not see something similar to the above in your output, valgrind will have printed out details about where the errors and memory leaks occurred.
Note: It is expected to take a while for valgrind to run on the whole test_suite. You can possible go faster by only running valgrind on tests. See the Gtest section below for information on running individual tests.
catch2
As with hw0, you can compile the your implementation by using the make
command. This will result in several output files, including an executable called test_suite
.
After compiling your solution with make
, You can run all of the tests for the homwork by invoking:
./test_suite
You can also run only specific tests by passing command line arguments into test_suite
For example, to only run the SimpleFileReader tests, you can type in:
./test_suite [Test_SimpleFileReader]
Note: you may have to type ine ./test_suite \[Test_SimpleFileReader\]
for it to work.
If you only want to test open and close from SimpleFileReader, you can type in:
./test_suite [Test_SimpleFileReader] open_close
You can specify which tests are run for any of the tests in the assignment. You just need to know the names of the tests, and you can do this by running:
./test_suite --list-tests
These settings can be helpful for debugging specific parts of the assignment, especially since test_suite
can be run with these settings through valgrind
and gdb
!
Clang Format and Clang Tidy
The makefile we provided with this assignment is configured to help make sure your code is properly formatted and follows good (modern) C++ coding conventions.
To do this, we make use of two tools: clang-format
and clang-tidy
To make sure your code identation is nice, we have clang format
. All you need to do to use this utility, is to run make format
, which will run the tool and indent your code propely.
Code that is turned in is expected to follow the specified style, code that is ill-formated must be fixed and re-submitted.
clang-tidy
is a more complicated tool. Part of it is checking for style and readability issues but that is not all it does. Examples of readability issues include:
Not using curly braces around if statements and loops:
if (condition) // clang-tidy will complain about missing curly braces
cout << "hello!" << endl;
Declaring variables or parmaters with names that are too short:
void foo(char c) { // clang-tidy will complain about the name `c`
// does something
}
Having functions that are too complex and long. The tool calculates “cognitive complexity” of your code and will complain about anything that is too complex.
This means you should think about how to break your code into helpers, because if you don’t, clang-tidy
will complain and you will face a deduction.
More on this specific error can be found here: Cognitive Complexity
clang-tidy
is also useful for noticing some memory errors and pointing out bad practices when writing C++ code.
Because of all this, we are enforcing that your code does not produce any clang-tidy
errors. You can run clang-tidy on your code by running: make tidy-check
.
Whenever you compile your code using make
then it should also re-run clang-tidy
to check your code for errors.
Note that you will have to fix any compiler errors before clang-tidy will run (and be useful).
Code that has clang-format
, clang-tidy
or any compiler errors will face a deduction.
If you have any questions about understanding an error, please ask on Ed discussion and we will happily assist you.
Submission:
Please submit your completed SimpleFilerReader.cpp
, BufferedFileReader.cpp
and BufferedFileReader.hpp
to Gradescope
Note: It is expected to take a while for this to run. The Sample solution takes close to 4 minutes to finish running on the autograder.