Note that the assignment is designed to be done in pairs. **If you want to work with someone, you must be in a student group in canvas. People not in a group will be manually assigned.
Goals
- Practice C++, including objects and STL
- Implement network code utilizing the POSIX sockets API
- Write a multi-threaded server to receive HTTP requests from client, and respond to them
Collaboration
For almost all assignments in CIT 5950, you will complete each of them on your own or solo. However, you may discuss high-level ideas with other students, but any viewing, sharing, copying, or dictating of code is forbidden. If you are worried about whether something violates academic integrity, please post on Ed or contact the instructor.
In this assignment, you are allowed to work on the project with a partner however, collaboration outside of that is not allowed. Similar to assignments working solo, you are only allowed to discuss high-level ideas with people outside your project “team”, but viewing, sharing, copying or dictating of code is forbidden.
If you are unsure about whether something violates academic integrity, please post on Ed or contact the instructor.
Contents
Setup
For this assignment, you need to setup a Linux C++ development environment. You can use the same environment as the one used in the previous assignment. The instructions are here if you need them: Enivronment setup
Download Starter Files
You can downlowd the starter files into your docker container by running the following command:
curl -o searchserver.zip https://www.seas.upenn.edu/~cit5950/current/projects/code/searchserver.zip
You can also download the files manually here if you would like: searchserver.zip
From here, you need to extract the files by running
unzip searchserver.zip
From here you can either open the project in Vim or VSCode.
Once this is done, running make
should successfully compile the test_suite
.
Code Sharing
Since this is a group assignment, you will have to have some way to work with your partner. There are two ways we endorse:
One is to use the github repository that we will create for you. We prepared an autograder to automatically invite you into the GitHub Organization of the course and create a GitHub repository for you to organize your code.
To have your github repo setup for your group, you should:
Step 1. Create/Prepare your GitHub account
Please set up a GitHub account with your primary upenn.edu
e-mail address (e.g. @seas.upenn.edu).
Note: If you already have a Github account with a non-Penn email, you should add your Penn email address as a secondary email address. You should not have to create a separate Github account for this class!
Step 2. account.txt
After you have done this, please create a text file named “account.txt
”. Then write your e-mail address in the first line and your GitHub account username in the second line. Then repeat again for your partner so that the file is 4 lines total. Then one of you should submit it to the autograder.
For example, if your e-mail is abc@seas.upenn.edu
and your GitHub account is xyz
,and your partner has e-mail qrs@seas.upenn.edu
and their Github account is jkl
then the account.txt
should be as follows:
abc@seas.upenn.edu
xyz
qrs@seas.upenn.edu
jkl
Shortly afterwards, a repo should be created and your group should be invited to it. If you are having issues, please post on Ed.
Please only use the GitHub repository created for you by the course staff. Do not work in public GitHub repositories! Please avoid publishing this project at any time, even post-submission, to observe course integrity policies.
The other way is to make use of VSCode live share https://code.visualstudio.com/learn/collaboration/live-share There should be enough information there to get you setup
Overview
In this assignment you will implement a multi-threaded Web server that provides simple searching and file viewing utilities. This assignment is broken up into three parts. In part A, you will finish implementing some code that will allow the server to read from files, and parse those files to record any words that show up in those files. In part B, you will implement parts of the web server to allow connections and create threads. In Part C, you will combine what you did in part A and B to finish implementing a functioning web server.
We HIGHLY recommend you read through this entire document. At the very least, please read the overview associated with each part before you start working on it.
Part A
In Part A, you will implement four files: HttpUtils.cpp
, WordIndex.cpp
, WordIndex.hpp
, and finally, CrawlFileTree.cpp
.
HttpUtils
HttpUtils provides some utilities that we will need for our search server. Most of it is implemented for you but you should look over most of it as it may be useful for you to understand what may be useful for you to use in this assignment. One function you will need to implement is split()
. This is a function that you have implemented many times before, but having access to it in this assignment will make your life a lot easier. We would give this function to you if it weren’t for the fact that people could re-open assignments and use this as a solution.
WordIndex
In the next part of the assignment, you need to implement WordIndex.hpp
and WordIndex.cpp
. In these files, you need to implement a data-structure that will allow us to record which documents contain a word and how many times each of those words are contained in each document. You will likely need to use STL containers to implement this, and it is up to you to decide which are most appropriate to use.
Note: Your choice in structure can actuall have significant impact later in the assignment when you run the HTTP server. Unoptimized solutions can take upwards of 20+ minutes to get the server started, making it very tedious to debug. Better implementations can shorten this to around 20 seconds.
CrawlFileTree
You will need to implement almost everything in CrawlFileTree.cpp
. As a hint we provide you the declaration for two helper functions. You should use these two helperfunctions so that when a user calls the function crawl_filetree
it reads all files in the specified directory (and files in subdirectories of that directory) and populates a WordIndex with the words found in all files you read. There is a function in HttpUtils that is absolutely necessary for you to use to to implement this file.
Part B
In Part B, you will implement three files: ServerSocket.cpp
, HttpSocket.cpp
and ThreadPool.cpp
.
ServerSocket
This file contains a more user friendly C++ class for creating a server-side listening socket, and accepting new connections from connecting clients. We’ve provided you with the class declaration in ServerSocket.hpp
but you will need to implement it in ServerSocket.cpp
.
HttpSocket
HttpSocket handles the reading in of an HTTP request over a network connection, and also the writing of responses back over the connection. This will largely deal with string manipulation to read and parse the HTTP requests. HttpSockets are created by ServerSocket when it accepts an incoming connection, so you should be reading in this code as if you are reading from the network.
ThreadPool
ThreadPool is an object that mantains a collection of N threads. These N threads are spawned during construction of the object and wait for work.
Users of the threadpool can then send a ThreadPool::Task
struct to the threadpool (via dispatch
) which will then get handled by a thread.
Once implemented, a threadpool is a convenient way to give some work to a thread whenever work becomes available.
Note that for your implementation you are required to use pthreads and meet the requirements of ThreadPool outlined here and in the threadpool source files. We will double check these when we manually grade your code.
Also note that there is a lot of public members of the threadpool class, this is so that threads can easily access these variables.
Part C
In this part, you will take what you did in part A and part B to implement a web server. Your main function should be in seasrchserver.cpp
.
In this step we aren’t giving you almost any source code. It is your job to figure out how to combine these things together into a full program. However, since we know this can be a lot for you we have provided some guidance below on what the code you write should do:
- main should take in a port and a directory as a command line arg (e.g.
./searchserver 5950 ./test_tree
) - Populate a word index with the all the files in the specified directory
- Construct a serversocket and threadpool
- Continually accept new connections, with each new connection becoming delegated as a new task to the threadpool.
- We advise you to have your program print something out when it is ready to accept connections similar to what we display in lecture. This will make testing your code easier for both you and the graders.
Each thread should:
- Continually read requests from the user
- Parse the HttpRequest String to figure out what the user wants
- perform the action (either doing a query request or a file request)
- write the response to the user
- repeat until the client closes the connection or sends a HttpRequest containing:
connection: close
While the description above helps, there is a lot left out. To help you get a grasp of what you need to do we are including some sample Http Requests and their corresponding Http Responses that the staff solution does. You are not required to, but you are heavily encouraged to add new .hpp and .cpp files as needed for the project. You may need to edit the makefile to support this but it shoudln’t be very complicated (add the name of the corresponding .o to the COMMON_OBJS, .cpp to CPP_SOURCE_FILES and .hpp to HPP_SOURCE_FILES)
TODO
To run your completed searchserver binary, try runnning the following command from the terminal: ./searchserver 5950 ./test_tree/
after it prints accepting connections...
, you should be able to connect to it by opening a web browser and going to the address localhost:5950
. From here, you can test your server to make sure it looks correctly.
Suggested Approach
Below we have provided a suggested approach to this homework. Note that you are not required to follow this ordering if you believe another approach would work better for you. Also note that you can gradually check your progress, and run specific tests. Look at the Catch2 section below for more details on running individual tests.
Also note that the only parts of the homeworks that have a direct order you need to follow is that the you do part C last, and you need to implement WordIndex
before you implement CrawlFileTree.
-
Implement the split function in HttpUtils.cpp (and read through the file to see what utilities we provide you)
-
Implement
WordIndex.cpp
andWordIndex.hpp
and then make sure you pass the word index tests. -
Implement
CrawlFileTree.cpp
, making sure you pass the tests for it. -
Implement all of
ServerSocket.cpp
, making sure you pass the tests for it. -
Implement
get_request
andwrite_response
forHttpSocket.cpp
, and then make sure you pass those tests. -
Implement
ThreadPool.cpp
and make sure you pass the tests for it. At this point, you should have passed the entiretest_suite
-
Debug any valgrind errors with the code tested by the
test_suite
-
Implement
searchserver.cpp
and test it as described above.
Sample HTTP
To help with understanding what you need to do for parsing Http requests we have given some practice in lecture and recitation.
We have also provided a folder called sample_http
that contains some requests and their corresponding responses.
We advise you to look at these for guidance on what your server should rougly expect as requests from the user and what responses it should send back.
Grading & Testing
Compilation
We have supplied you with a Makefile
that can be used for compiling your code into an executable.
You may need to resolve any compiler warnings and compiler errors that show up. Once all compiler errors have been resolved, if you ls
in the terminal, you should be able to see an executable called test_suite
. You can then run this by typing in ./test_suite
to see the evaluation of your code.
Note that your submission will be partially evaluated on the number of compiler warnings. You should eliminate ALL compiler warnings in your code
Valgrind
We will also test your submission on whether there are any memory errors or memory leaks. We will be using valgrind to do this. To do this, you should try running:
valgrind --leak-check=full ./test_suite
If everything is correct, you should see the following towards the bottom of the output:
==1620== All heap blocks were freed -- no leaks are possible
==1620==
==1620== For counts of detected and suppressed errors, rerun with: -v
==1620== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
If you do not see something similar to the above in your output, valgrind will have printed out details about where the errors and memory leaks occurred.
Note that you should avoid memory errors in your HttpServer.cc but there isn’t a good way to test this since we can’t make the server exit gracefully. As a result, you do not need to test valgrind on the ./httpd
binary.
Catch2
As with previous homework assignments, you can compile the your implementation by using the make
command. This will result in several output files, including an executable called test_suite
.
After compiling your solution with make
, You can run all of the tests for the homwork by invoking:
./test_suite
You can also run only specific tests by passing command line arguments into test_suite
For example, to only run the HttpConnection tests, you can type in:
./test_suite [Test_HttpSocket]
Note: you may have to type ine ./test_suite \[Test_HttpSocket\]
for it to work.
If you only want to test write_respons
e from HttpSocket
, you can type in:
./test_suite [Test_HttpSocket] write_response
You can specify which tests are run for any of the tests in the assignment. You just need to know the names of the tests, and you can do this by running:
./test_suite --list-tests
These settings can be helpful for debugging specific parts of the assignment, especially since test_suite
can be run with these settings through valgrind
and gdb
!
SearchServer Testing
Note that a big part of the assignment is completing the last part, searchserver
. To test your server you should be able to follow instructions laid out elsewhere in the specification on how to run your server, connect to it via a browser and it should behave as expected during the demo in lecture.
We just want to emphasize that passing the test_suite
does not mean you are done with the assignment. You need to do the last part of the assignment still.
Submission:
Please submit your completed files FileReader.cpp
, WordIndex.hpp
, WordIndex.cpp
, CrawlFileTree.cpp
, ServerSocket.cpp
, HttpSocket.cpp
, HttpUtils.cpp
, Makefile
, searchserver.cpp
and any other files you created to Gradescope
Each individual in a group must submit the code to gradescope.