Goals
In completing this assignment, you will learn how to:
- Share data between threads in a program
- Gain more practice writing C++ code to solve “normal” problems
Collaboration
For assignments in CIT 5950, you will complete each of them on your own or solo. However, you may discuss high-level ideas with other students, but any viewing, sharing, copying, or dictating of code is forbidden. If you are worried about whether something violates academic integrity, please post on Ed or contact the instructor.
Contents
Setup
For this assignment, you need to setup a Linux C++ development environment. You can use the same environment as the one used in the previous assignment. The instructions are here if you need them: Enivronment setup
You can downlowd the starter files into your docker container by running the following command:
curl -o grep.zip https://www.seas.upenn.edu/~cit5950/current/projects/code/grep.zip
You can also download the files manually here if you would like: grep.zip
From here, you need to extract the files by running
unzip grep.zip
From here you can either open the project in Vim or VSCode.
For Vim, you just need to run
cd grep
vim grep_sequential.cpp
For VSCode you will have to follow steps similar to what we did to open chcek_setup
in the setup document.
Overview
You must use pthreads for this assignment. You are not allowed to use C++ threads.
grep sequentially
The first thing you need to do in this assignment is finish grep_sequential.cpp
. Most of the code is written for you except for the split function and the grep
function.
grep
is a built in utility in unix shells that is usually used by: giving a file name and giving one or more search terms, and then it prints every line in the file that contains all of the given search terms.
Grep can be used for other things as well, but we will focus on implementing a basic version of this for our program.
For our implementation it only needs to implement the functionality that is given above. We will also simply things by saying that we treat each “term” as just a string of characters split on whitespace. e.g. “hello world!” is two terms: “hello” and “world!” (notably, the exclamation point !
is still within the world! term) and we would only search for lines that contain both the terms “hello” and “world!”
A brief example of this program running is:
$ ./grep_sequential
test_files/Hello.txt Hello
test_files/Hello.txt: [Hello]
Hello World!
test_files/Bye.txt bye
test_files/Bye.txt: [bye]
test_files/Bye.txt Goodbye
test_files/Bye.txt: [Goodbye]
Goodbye world
Goodbye, Goodbye, Goodbye
Notably, the lines:
test_files/Hello.txt Hello
test_files/Bye.txt bye
test_files/Bye.txt Goodbye
are all typed in by the user. The lines following each user input are printed by your program.
- First it prints the query which is just the filename followed by the list or arguments
- It prints out each line that contains all words in the request in the order the line shows up in the file.
For the last example: test_files/Bye.txt: [Goodbye]
the lines printed ar:
Goodbye world
Goodbye, Goodbye, Goodbye
Since those are the lines in test_files/Bye.txt
that contain the term Goodbye
and these two lines MUST be printed in that order since Goodbye world
shows up earlier in the file than Goodbye, Goodbye, Goodbye
.
grep with threads
In this part of the assignment, you will write a simple multithreaded version of the program you wrote in the previous setp in which one thread blocks/waits to read queries from a user while another thread will receive those values and perform the grep operation on what the user has typed in.
You may be tempted to use a single global grep_query
variable that can be used to store the values communicated by the reader thread to the other thread.
However, if we only used a single request
variable, it would not support the case where the reader thread reads in multiple requests before the other thread that uses those requests has a chance to get the values.
So, before we can write a program support this, you will need to implement a datastructure to handle the transfer of information from one thread to another.
This data structure will be called a RequestQueue
.
You should not use any global variables for any of the code you write in this assignment.
See the Instructions section below for details on the code files and how you should write some programs to do this
Instructions
Step 1
You should start by finishing the sequential version of our grep program which can be found in grep_sequential.cpp
.
More information on testing it can be found in the testing section below.
Step 2
Before we write our overall multi-threaded program, we must first write a data structure to facilitate the communication of data (grep_request
s) between threads.
To do this, you will have to implement the class RequestQueue
.
You MUST use the <pthread.h>
library for synchronization.
RequestQueue
acts as a queue of request
s, but it is thread-safe. We have specified the public declarations of the RequestQueue
class, and a declaration for a private struct type that may be useful for your implementation. Note that we do not provide any data members for you to use, you should modify RequestQueue.hpp
to add any private data members you wish to use (hint: one of them should be of type pthread_mutex_t
). More information on the object and behavior can be found in RequestQueue.hpp
.
We have provided a test_suite
for the RequestQueue so that you can verify the implementation works correctly, but you may want to run the test_suite
under helgrind before moving on to the next step.
Note that to have RequestQueue
be thread safe, you MUST use a lock in your data structure. It is up to you to decide when to use it. You may also find the use of a condition variable to be useful, but it is not required to get the solution working.
Step 3
Create a C++ program in a file called grep.cpp that consists of two threads:
- The first thread continuously loops, prompting the user to input a grep request via the keyboard, sending the number to the second thread, and then sleeping for 1 second.
- The second thread gets the request that was read in by the first thread, and prints out the result of that grep request.
To facilitate the creation of such a program, you will have to use the class RequestQueue
implemented in the previous step.
You MUST use the <pthread.h>
library to create, start, and join threads.
You can look back at your gre_sequential.cpp
as a starting point for this program.
We highly suggest that you take inspiration from that when writing your multithreaded version.
If the user enters an input that is not a valid request (e.g. not at least a file name and at least one query), your program should simply ignore it (but should not crash, of course!). This is handled for you. If the second thread processes a request on a file that doesn’t exist, it should still print out the request, but it should not print anything else out. Although you may of course ask for clarification on this, please do not spend too much time worrying about the different possible inputs, as that is not the focus of this assignment.
The program should end gracefully (like, not crash) when the user inputs the EOF signifier (ctrl + d) in the terminal. You should make sure that both threads know to clean up and terminate once the EOF has been found.
How you organize your code is up to you, in terms of what the main thread does and whether you create one or two additional threads, but you should not continuously be starting new threads.
That is, there should only be two executions of pthread_create
at most, and each should run its own loop: one is waiting for input from the user or sleeping; the other is waiting for new values and then executing the request.
You should also make sure that the reader and writer threads are running at the same time, you cannot just create a reader thread, join it, and then create a writer thread.
You should not use any global variables in this assignment. Instead remember that you can give threads access to a shared resources via the arguments you give to a pthread when you create it. One of these things you share should be a RequestQueue.
To make sure that the autograder interacts with your code correctly, your reader thread must sleep()
for 1 second after adding a grep_request
to the shared RequestQueue
.
To check whether your code is threadsafe, use Helgrind http://valgrind.org/docs/manual/hg-manual.html to check that you do not have any concurrency errors. We will be running Helgrind against your program during grading, and it should pass with no problems reported.
See the Helgrind section below for more details.
Suggested Approach
Below we have provided a suggested approach to this homework. Note that you are not required to follow this ordering if you believe another approach would work better for you. Also note that you can gradually check your progress, and run specific tests. Look at the catch and Testing section below for more details on running individual tests.
- Start by populating
RequestQueue.cpp
with “empty” definitions of every member function. Afterwards, make sure that you can compile successfully. For example, you would write the following “empty” function in RequestQueue.cpp:optional<grep_request> RequestQueue::wait_remove() { return std::nullopt; }
-
Create an empty
grep.cpp
file, you can start by copyinggrep_sequential.cpp
. At this point, you should make sure that everything compiles successfully when you runmake
. -
Implement
grep_sequential.cpp
and make sure you can get the same behaviour as the sample tests. Make sure your code also properly handles the case when the user types inctrl + d
. -
Implement all of
RequestQueue.cpp
except forwait_remove()
and make sure you pass theTest_RequestQueue.add_remove
tests. -
Implement
RequestQueue::wait_remove()
and make sure you pass the wholetest_suite
. Make sure that you also pass thetest_suite
under helgrind with no errors. - Implement
grep.cpp
and make sure that you can run it with no helgrind errors and that the program terminates properly when the user types inctrl + d
.
NOTE: Just becuase you pass a test mentioned in one of the steps above doesn’t guarantee that the function is now correct. Other tests may reveal errors that you may have to go back and fix.
Hints
Here are a few hints and tips that you may find useful when approaching this homework.
-
Don’t wait until the end to test. As mentioned in the suggested approach above, you can create “empty” definitions for functions, just enough to compile, so that you can test your other functions.
-
In
grep.cpp
the reader thread needs to let the printer thread know when the EOF is read in, so that both threads can exit. However, the printer thread may be waiting using theRequestQueue::wait_remove
function. Don’t forget about that thread.
Grading and Testing
Compilation
We have supplied you with a Makefile
that can be used for compiling your code into an executable.
You may need to resolve any compiler warnings and compiler errors that show up. Once all compiler errors have been resolved, if you ls
in the terminal, you should be able to see various executable listed.
One of which is called test_suite
and the others grep
and grep_sequential
.
How to run these executables is described below.
Note that your submission will be partially evaluated on the number of compiler warnings. You should eliminate ALL compiler warnings in your code.
Valgrind
We will also test your submission on whether there are any memory errors or memory leaks. We will be using valgrind to do this. To do this, you should try running:
valgrind --leak-check=full ./test_suite
You will also want to run valgrind on the other programs you write for this assignment, you can do this by running:
valgrind --leak-check=full ./grep
valgrind --leak-check=full ./gep_sequential
If everything is correct, you should see the following towards the bottom of the output:
==1620== All heap blocks were freed -- no leaks are possible
==1620==
==1620== For counts of detected and suppressed errors, rerun with: -v
==1620== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
If you do not see something similar to the above in your output, valgrind will have printed out details about where the errors and memory leaks occurred.
Helgrind
We will also test your submission on whether there are any thread errors.
We will be using the tool helgrind to do this, which is a tool under valgrind
To do this, you should try running something like:
valgrind --tool=helgrind ./test_suite
If everything is correct, you should see the following towards the bottom of the output:
==14004== For counts of detected and suppressed errors, rerun with: -v
==14004== Use --history-level=approx or =none to gain increased speed, at
==14004== the cost of reduced accuracy of conflicting-access information
==14004== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 216 from 42)
Your output may look slightly different, but the most important thing here is that it says 0 errors from 0 contexts
.
Testing
Catch2
As with hw0, you can compile the your implementation by using the make
command. This will result in several output files, including an executable called test_suite
.
After compiling your solution with make
, You can run all of the tests for RequestQueue
by invoking:
./test_suite
You can also run only specific tests by passing command line arguments into test_suite
There are only three tests in this test suite so your only options are:
./test_suite add_remove
./test_suite wait_remove
and
./test_suite close
These settings can be helpful for debugging specific parts of the assignment, especially since test_suite
can be run with these settings through valgrind
and gdb
!
Testing Grep
To test your program grep
and grep_sequential
, you can compare the behaviour/output of it to the sample inputs and outputs in tests
that you can use for testing purposes.
To run the code normally you can do:
./grep_sequential
and then type whatever inputs you want and make sure the outputs match.
If you want to make it so that it runs with one of the sample inputs you can run:
./grep_sequential < tests/simple.txt
You can replace sequential.txt
with the other sample input if you want.
You can then compare your program’s output to the accompanying example output tests/simple_expected.txt
to make sure they are correct.
This can be automated sorta by doing:
./grep_sequential < tests/simple.txt > out.txt
This will make your program read input from tests/simple.txt
and print its output to out.txt
.
You can then run the commmand:
diff out.txt tests/simple_expected.txt
which will compare the two files, one of which is what your program printed and the other is the expected output. If nothing is printed then the files are the same!
NOTE: if you have fully implemented the grep.cpp
program and output looks correct when you run the program, but it fails on the autograder, make sure that the reading thread sleeps for 1 second after adding a request to the request queue.
Submission:
Please submit your completed grep_sequential.cpp
, grep.cpp
, RequestQueue.hpp
and RequestQueue.cpp
to Gradescope
Note: In addition to the autograder, we will be manually reviewing the submissions for this homework assignment to ensure that you are using threads and that you are using them as described in the write-up. Just because you get full points on the test suite does not mean that you will get full points after we review your submissions manually.
Note: It is expected to take a while for this to run.