CIT 5950 (Spring 2025) Home Schedule Assignments Tools & Refs HW 09: Multithreaded Data Sharing

Threads & Synchronization in C++.

Goals

In completing this assignment, you will learn how to:

Collaboration

For assignments in CIT 5950, you will complete each of them on your own or solo. However, you may discuss high-level ideas with other students, but any viewing, sharing, copying, or dictating of code is forbidden. If you are worried about whether something violates academic integrity, please post on Ed or contact the instructor.

Contents

Setup

For this assignment, you need to setup a Linux C++ development environment. You can use the same environment as the one used in the previous assignment. The instructions are here if you need them: Enivronment setup

You can downlowd the starter files into your docker container by running the following command:

curl -o grep.zip https://www.seas.upenn.edu/~cit5950/current/projects/code/grep.zip

You can also download the files manually here if you would like: grep.zip

From here, you need to extract the files by running

unzip grep.zip

From here you can either open the project in Vim or VSCode.

For Vim, you just need to run

cd grep
vim grep_sequential.cpp

For VSCode you will have to follow steps similar to what we did to open chcek_setup in the setup document.

Overview

grep sequentially

The first thing you need to do in this assignment is finish grep_sequential.cpp. Most of the code is written for you except for the split function and the grep function. grep is a built in utility in unix shells that is usually used by: giving a file name and giving one or more search terms, and then it prints every line in the file that contains all of the given search terms. Grep can be used for other things as well, but we will focus on implementing a basic version of this for our program.

For our implementation it only needs to implement the functionality that is given above. We will also simply things by saying that we treat each “term” as just a string of characters split on whitespace. e.g. “hello world!” is two terms: “hello” and “world!” (notably, the exclamation point ! is still within the world! term) and we would only search for lines that contain both the terms “hello” and “world!”

A brief example of this program running is:

$ ./grep_sequential
test_files/Hello.txt Hello
test_files/Hello.txt: [Hello]
Hello World!
test_files/Bye.txt bye
test_files/Bye.txt: [bye]
test_files/Bye.txt Goodbye
test_files/Bye.txt: [Goodbye]
Goodbye world
Goodbye, Goodbye, Goodbye

Notably, the lines:

are all typed in by the user. The lines following each user input are printed by your program.

  1. First it prints the query which is just the filename followed by the list or arguments
  2. It prints out each line that contains all words in the request in the order the line shows up in the file.

For the last example: test_files/Bye.txt: [Goodbye] the lines printed ar:

Goodbye world
Goodbye, Goodbye, Goodbye

Since those are the lines in test_files/Bye.txt that contain the term Goodbye and these two lines MUST be printed in that order since Goodbye world shows up earlier in the file than Goodbye, Goodbye, Goodbye.

grep with threads

In this part of the assignment, you will write a simple multithreaded version of the program you wrote in the previous setp in which one thread blocks/waits to read queries from a user while another thread will receive those values and perform the grep operation on what the user has typed in. You may be tempted to use a single global grep_query variable that can be used to store the values communicated by the reader thread to the other thread. However, if we only used a single request variable, it would not support the case where the reader thread reads in multiple requests before the other thread that uses those requests has a chance to get the values. So, before we can write a program support this, you will need to implement a datastructure to handle the transfer of information from one thread to another.

This data structure will be called a RequestQueue.

See the Instructions section below for details on the code files and how you should write some programs to do this

Instructions

Step 1

You should start by finishing the sequential version of our grep program which can be found in grep_sequential.cpp. More information on testing it can be found in the testing section below.

Step 2

Before we write our overall multi-threaded program, we must first write a data structure to facilitate the communication of data (grep_requests) between threads. To do this, you will have to implement the class RequestQueue.

You MUST use the <pthread.h> library for synchronization.

RequestQueue acts as a queue of requests, but it is thread-safe. We have specified the public declarations of the RequestQueue class, and a declaration for a private struct type that may be useful for your implementation. Note that we do not provide any data members for you to use, you should modify RequestQueue.hpp to add any private data members you wish to use (hint: one of them should be of type pthread_mutex_t). More information on the object and behavior can be found in RequestQueue.hpp.

We have provided a test_suite for the RequestQueue so that you can verify the implementation works correctly, but you may want to run the test_suite under helgrind before moving on to the next step.

Note that to have RequestQueue be thread safe, you MUST use a lock in your data structure. It is up to you to decide when to use it. You may also find the use of a condition variable to be useful, but it is not required to get the solution working.

Step 3

Create a C++ program in a file called grep.cpp that consists of two threads:

To facilitate the creation of such a program, you will have to use the class RequestQueue implemented in the previous step.

You MUST use the <pthread.h> library to create, start, and join threads.

You can look back at your gre_sequential.cpp as a starting point for this program. We highly suggest that you take inspiration from that when writing your multithreaded version.

If the user enters an input that is not a valid request (e.g. not at least a file name and at least one query), your program should simply ignore it (but should not crash, of course!). This is handled for you. If the second thread processes a request on a file that doesn’t exist, it should still print out the request, but it should not print anything else out. Although you may of course ask for clarification on this, please do not spend too much time worrying about the different possible inputs, as that is not the focus of this assignment.

The program should end gracefully (like, not crash) when the user inputs the EOF signifier (ctrl + d) in the terminal. You should make sure that both threads know to clean up and terminate once the EOF has been found.

How you organize your code is up to you, in terms of what the main thread does and whether you create one or two additional threads, but you should not continuously be starting new threads. That is, there should only be two executions of pthread_create at most, and each should run its own loop: one is waiting for input from the user or sleeping; the other is waiting for new values and then executing the request. You should also make sure that the reader and writer threads are running at the same time, you cannot just create a reader thread, join it, and then create a writer thread.

You should not use any global variables in this assignment. Instead remember that you can give threads access to a shared resources via the arguments you give to a pthread when you create it. One of these things you share should be a RequestQueue.

To make sure that the autograder interacts with your code correctly, your reader thread must sleep() for 1 second after adding a grep_request to the shared RequestQueue.

To check whether your code is threadsafe, use Helgrind http://valgrind.org/docs/manual/hg-manual.html to check that you do not have any concurrency errors. We will be running Helgrind against your program during grading, and it should pass with no problems reported.

See the Helgrind section below for more details.

Suggested Approach

Below we have provided a suggested approach to this homework. Note that you are not required to follow this ordering if you believe another approach would work better for you. Also note that you can gradually check your progress, and run specific tests. Look at the catch and Testing section below for more details on running individual tests.

  1. Start by populating RequestQueue.cpp with “empty” definitions of every member function. Afterwards, make sure that you can compile successfully. For example, you would write the following “empty” function in RequestQueue.cpp:
    optional<grep_request> RequestQueue::wait_remove() {
      return std::nullopt;
    }
    
  2. Create an empty grep.cpp file, you can start by copying grep_sequential.cpp. At this point, you should make sure that everything compiles successfully when you run make.

  3. Implement grep_sequential.cpp and make sure you can get the same behaviour as the sample tests. Make sure your code also properly handles the case when the user types in ctrl + d.

  4. Implement all of RequestQueue.cpp except for wait_remove() and make sure you pass the Test_RequestQueue.add_remove tests.

  5. Implement RequestQueue::wait_remove() and make sure you pass the whole test_suite. Make sure that you also pass the test_suite under helgrind with no errors.

  6. Implement grep.cpp and make sure that you can run it with no helgrind errors and that the program terminates properly when the user types in ctrl + d.

Hints

Here are a few hints and tips that you may find useful when approaching this homework.

  1. Don’t wait until the end to test. As mentioned in the suggested approach above, you can create “empty” definitions for functions, just enough to compile, so that you can test your other functions.

  2. In grep.cpp the reader thread needs to let the printer thread know when the EOF is read in, so that both threads can exit. However, the printer thread may be waiting using the RequestQueue::wait_remove function. Don’t forget about that thread.

Grading and Testing

Compilation

We have supplied you with a Makefile that can be used for compiling your code into an executable.

You may need to resolve any compiler warnings and compiler errors that show up. Once all compiler errors have been resolved, if you ls in the terminal, you should be able to see various executable listed. One of which is called test_suite and the others grep and grep_sequential.

How to run these executables is described below.

Note that your submission will be partially evaluated on the number of compiler warnings. You should eliminate ALL compiler warnings in your code.

Valgrind

We will also test your submission on whether there are any memory errors or memory leaks. We will be using valgrind to do this. To do this, you should try running:

valgrind --leak-check=full ./test_suite

You will also want to run valgrind on the other programs you write for this assignment, you can do this by running:

valgrind --leak-check=full ./grep

valgrind --leak-check=full ./gep_sequential

If everything is correct, you should see the following towards the bottom of the output:

 ==1620== All heap blocks were freed -- no leaks are possible
 ==1620==
 ==1620== For counts of detected and suppressed errors, rerun with: -v
 ==1620== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

If you do not see something similar to the above in your output, valgrind will have printed out details about where the errors and memory leaks occurred.

Helgrind

We will also test your submission on whether there are any thread errors. We will be using the tool helgrind to do this, which is a tool under valgrind To do this, you should try running something like: valgrind --tool=helgrind ./test_suite

If everything is correct, you should see the following towards the bottom of the output:

==14004== For counts of detected and suppressed errors, rerun with: -v
==14004== Use --history-level=approx or =none to gain increased speed, at
==14004== the cost of reduced accuracy of conflicting-access information
==14004== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 216 from 42)

Your output may look slightly different, but the most important thing here is that it says 0 errors from 0 contexts.

Testing

Catch2

As with hw0, you can compile the your implementation by using the make command. This will result in several output files, including an executable called test_suite.

After compiling your solution with make, You can run all of the tests for RequestQueue by invoking:

./test_suite

You can also run only specific tests by passing command line arguments into test_suite

There are only three tests in this test suite so your only options are:

./test_suite add_remove
./test_suite wait_remove

and

./test_suite close

These settings can be helpful for debugging specific parts of the assignment, especially since test_suite can be run with these settings through valgrind and gdb!

Testing Grep

To test your program grep and grep_sequential, you can compare the behaviour/output of it to the sample inputs and outputs in tests that you can use for testing purposes.

To run the code normally you can do:

./grep_sequential

and then type whatever inputs you want and make sure the outputs match.

If you want to make it so that it runs with one of the sample inputs you can run:

./grep_sequential < tests/simple.txt

You can replace sequential.txt with the other sample input if you want.

You can then compare your program’s output to the accompanying example output tests/simple_expected.txt to make sure they are correct.

This can be automated sorta by doing:

./grep_sequential < tests/simple.txt > out.txt

This will make your program read input from tests/simple.txt and print its output to out.txt. You can then run the commmand:

diff out.txt tests/simple_expected.txt

which will compare the two files, one of which is what your program printed and the other is the expected output. If nothing is printed then the files are the same!

Submission:

Please submit your completed grep_sequential.cpp, grep.cpp, RequestQueue.hpp and RequestQueue.cpp to Gradescope

Note: In addition to the autograder, we will be manually reviewing the submissions for this homework assignment to ensure that you are using threads and that you are using them as described in the write-up. Just because you get full points on the test suite does not mean that you will get full points after we review your submissions manually.

Note: It is expected to take a while for this to run.