Note that the assignment is designed to be completable by one person, but you may work in pairs. Regardless of if you work in pairs, you MUST fill out the Github & Partner Survey on Canvas to indicate your partner status and how to locate your project grade
Goals
- Practice C++, including objects and STL
- Implement network code utilizing the POSIX sockets API
- Write a multi-threaded server to receive HTTP requests from client, and respond to them
Collaboration
For almost all assignments in CIT 5950, you will complete each of them on your own or solo. However, you may discuss high-level ideas with other students, but any viewing, sharing, copying, or dictating of code is forbidden. If you are worried about whether something violates academic integrity, please post on Ed or contact the instructor.
In this assignment, you are allowed to worj on the project with a partner however, collaboration outside of that is not allowed. Similar to assignments working solo, you are only allowed to discuss high-level ideas with people outside your project “team”, but viewing, sharing, copying or dictating of code is forbidden.
If you are unsure about whether something violates academic integrity, please post on Ed or contact the instructor.
Contents
Setup
Setup for this assignment is different than the others, please be sure to read this section completely.
For this assignment, you need to setup a Linux C++ development environment. You can use mostly the same environment as the one used in the previous assignment. The instructions are here if you need them: Enivronment setup
Registering a New Port
Our project wil need to use the network, so we need to configure some things first before we can run it.
First, we need to make sure our container (which most of you named jammy-container
in the beginning of the semester) is turned off.
Once it is, you should hit the three dots next to the container and select copy docker run
Once you have done this, open the terminal where you would normally run docker exec -it jammy-container zsh
, but DO NOT RUN THAT COMMAND. Instead paste, what you copied above and then modify it.
From here, you need to modify your command to add two things, namely you need to add -p 5950:5950
For example, when I clicked copy docker run
, if I were to paste it in the terminal, I woudl get:
docker run --hostname=dbd8f45192c1 --env=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin --volume=/home/tqmcgaha/5950/docker_setup_testing/jammy:/root/workspace --workdir=/root/workspace --restart=no --label='desktop.docker.io/wsl-distro=Ubuntu' --label='org.opencontainers.image.ref.name=ubuntu' --label='org.opencontainers.image.version=22.04' --runtime=runc -t -d tqmcgaha/docker-env
I need to modify it so that it says:
docker run --hostname=dbd8f45192c1 --env=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin --volume=/home/tqmcgaha/5950/docker_setup_testing/jammy:/root/workspace --workdir=/root/workspace --restart=no --label='desktop.docker.io/wsl-distro=Ubuntu' --label='org.opencontainers.image.ref.name=ubuntu' --label='org.opencontainers.image.version=22.04' --runtime=runc -p 5950:5950 -t -d tqmcgaha/docker-env
The only thing I did was add -p 5950:5950
near the end of it. Your command should look similar.
Once you have modified your command, run it. If your OS asks for permission to add the port, grant it for private networks at least.
again: DO NOT COPY THIS COMMAND, YOU SHOULD HAVE YOUR OWN YOU GOT FROM YOUR DOCKER THAT YOU MODIFY
After running the code, you should now see a new container registerd in your docker desktop and it should show your port. The name should be somewhat randomly generated and different than jammy-container
. For example, mine looks like:
From here, you should use this container instead of jammy container
If you are using VS Code, you may need to attach to the new container.
Download Starter Files
You can downlowd the starter files into your docker container by running the following command:
curl -o searchserver.zip https://www.seas.upenn.edu/~cit5950/current/projects/code/searchserver.zip
You can also download the files manually here if you would like: searchserver.zip
From here, you need to extract the files by running
unzip searchserver.zip
From here you can either open the project in Vim or VSCode.
For Vim, you just need to run
cd searchserver
vim FileReader.cpp
For VSCode you will have to follow steps similar to what we did to open chcek_setup
in the setup document.
Installing Boost
DO NOT SKIP THIS PART
Once you have downloaded and configured everything you will need to install the boost library. We have made this easier for you by providing a script that will do it.
All you need to do is navigate to the searchserver
directory and run the following commands:
chmod +x ./setup.sh
./setup.sh
Note: if you are on another enviroment, you may have to do sudo ./setup.sh
to run the script instead.
Once you do this, it will print a lot of things and it may take a long time to run. This is perfectly normal, it took Travis 18 minutes for this script to run on his laptop.
Once this is done, running Make
should successfully compile the test_suite
.
Code Sharing
Since this is a group assignment, you will have to have some way to work with your partner. There are two ways we endorse
One is to use the github repository that we will create for you. If you filled out the survey on canvas, we will create a repo for you two to use.
The other way is to make use of VSCode live share https://code.visualstudio.com/learn/collaboration/live-share There should be enough information there to get you setup
Overview
In this assignment you will implement a multi-threaded Web server that provides simple searching and file viewing utilities. This assignment is broken up into three parts. In part A, you will finish implementing some code that will allow the server to read from files, and parse those files to record any words that show up in those files. In part B, you will implement parts of the web server to allow connections and handle Http requests. In Part C, you will combine what you did in part A and B to finish implementing a functioning web server.
We HIGHLY recommend you read through this entire document. At the very least, please read the overview associated with each part before you start working on it.
Part A
In Part A, you will implement four files: FileReader.cpp
, WordIndex.cpp
, WordIndex.hpp
, and finally, CrawlFileTree.cpp
.
FileReader
In this file, you will be implementing a simple file reader. The name of the file will be read at the time of construction, and the function read_file
will read the entire contents of the file into a singular string. You may use POSIX, the C FILE interface, a stream from C++, or whatever you think works best to implement the reader.
WordIndex
In the next part of the assignment, you need to implement WordIndex.hpp
and WordIndex.cpp
. In these files, you need to implement a data-structure that will allow us to record which documents contain a word and how many times each of those words are contained in each document. You will likely need to use STL containers to implement this, and it is up to you to decide which are most appropriate to use.
Note: Your choice in structure can actuall have significant impact later in the assignment when you run the HTTP server. Unoptimized solutions can take upwards of 20+ minutes to get the server started, making it very tedious to debug. Better implementations can shorten this to around 20 seconds.
CrawlFileTree
Most of the file has already been implemented for CrawlFileTree.cpp
, you just need to implement the last function at the bottom of the file called HandleFile
. This function will take in a file name and a WordIndex
. It is up to this function to read the specified file, and record each word that is found in it. This file is the core of our file processing that we will later use for search engine results for the server.
Part B
In Part B, you will implement three files: ServerSocket.cpp
, HttpConnection.cpp
and HttpUtils.cpp
.
ServerSocket
This file contains a helpful class for creating a server-side listening socket, and accepting new connections from connecting clients. We’ve provided you with the class declaration in ServerSocket.hpp
but you will need to implement it in ServerSocket.cpp
.
HttpConnection
HttpConnection handles the reading in of an HTTP request over a network connection, parsing such requests into an object, and also the writing of responses back over the connection. This will largely deal with string manipulation to read and parse the HTTP requests.
HttpUtils
HttpUtils provides some utilities that we will need for our search server. In particular, there are two functions that you will need to implement for making sure that our server handles some security concerns. You will only need to implement those two functions (escape_html
and is_path_safe
). You may still want to take a look at the other functions declared in HttpUtils.hpp
since they will likely help you with implementing part C.
the function is_path_safe
is used to make sure that anyone using the server can only access files under the specified static files directory. If we don’t implement and use the function, then it is possible that an attacker could request any file on our computer that they would like with something called a directory traversal attack.
The other function escape_html
is used to prevent a “cross-site scripting” flaw. See this for background if you’re curious: http://en.wikipedia.org/wiki/Cross-site_scripting
Part C
In this part, you will take what you did in part A and part B to implement a web server in HttpServer.cpp
. Most of the file is filled out for you, but there are a few places that you will need to implement. Particularly, you will need to finish handling the thread function (HttpServer_ThrFn
), where each thread would handle a connection, and two helper functions to server the two types of requests the server may get, requests to see a file (ProcessFileRequest
), and requests to process a search query (ProcessQueryRequest
).
Once you have them working, test your httpd
binary to see if it works. Make sure you exercise both the web search functionality as well as the static file serving functionality. You can look at the source of pages that our solution binary serves and emulate that HTML, if you would like to get the same “look and feel” to your server as ours. However, as long as you mimic the same behaviour (have a search bar, process files and queries correctly, and show their results similarly), you are free to modify the look of your site. In the past, some students implemented “dark mode”, had a Shrek theme, etc.
To run your completed httpd binary, try runnning the following command from the terminal: ./httpd 3000 ./test_tree/
after it prints accepting connections...
, hit the “Open server on port 3000” button at the top of codio. From here, you can test your server.
Suggested Approach
Below we have provided a suggested approach to this homework. Note that you are not required to follow this ordering if you believe another approach would work better for you. Also note that you can gradually check your progress, and run specific tests. Look at the Catch2 section below for more details on running individual tests.
Also note that the only parts of the homeworks that have a direct order you need to follow is that the HttpServer
last, and you need to implement FileReader
and WordIndex
before you implement CrawlFileTree.
-
Start by implementing
FileReader::read_file
and making sure you pass the provided tests. -
Implement
WordIndex.cpp
andWordIndex.hpp
and then make sure you pass the word index tests. -
Implement
CrawlFileTree.cpp
handle_file
function, making sure you pass the tests for it. -
Implement all of
ServerSocket.cpp
, making sure you pass the tests for it. -
Implement
get_request
andparse_request
fromHttpConnection.cpp
, and then make sure you pass those tests. -
Implement
write_response
inHttpConnection.cpp
and make sure you pass the tests for it -
Implement both incomplete functions in
HttpUtils.cpp
. At this point, you should have passed the entiretest_suite
-
Debug any valgrind errors with the code tested by the
test_suite
-
Implement
HttpServer.cpp
and test it as described above.
solution_binaries
With this assignment, we are providing a compiled solution that you can use to implement your search server. You can run the solution from the command line by running ./solution_binaries/httpd 5950 ./test_tree/
. For your implementation, you may want to compare your behaviour to it so that you can ensure you have a correct implementation.
Boost
In this homework, we have installed the external C++ library called boost. Boost is a decently used library and contains many useful functions. In particular, the string functions split()
, replace_all()
and trim()
will likely be the most useful to you. You can take a look at the boost reference for strings here: https://www.boost.org/doc/libs/1_78_0/doc/html/string_algo.html but we also have one of the recitations dedicated to going over the project and getting you Practice with those boost functions.
Hints
- As mentioned above, the boost library is available to you in this homework, and it will make your life a lot easier with this project if you know how to use
replace_all
,split
, and\ortrim
from boost. - In particular for the function
split
, it is helpful that you are not required to useis_any_of
as the predicate for splitting. You can also use functions likeisalpha
which will split on any alphabetic character, or write your own function that takes in a character and returns a boolean, true iff that character is a delimiter - We HIGHLY recommend you take inspiration from the provided lecture code
server_accept_rw_close
for your implementation ofServerSocket.cc
- There is a function that will make your implementation of
is_path_safe
a lot easier. Read the comment left for you inHttpUtils.cc
and do a little digging online to see if you can find it. - For
FileReader
, we need to handle binary files which may hold a 0 byte in it, as a result you may find the 2 argument string constructor to be useful here.
Grading & Testing
Compilation
We have supplied you with a makefile
that can be used for compiling your code into an executable. To do this, open the terminal in codio (this can be done by selecting Tools -> Terminal) and then type in make
.
You may need to resolve any compiler warnings and compiler errors that show up. Once all compiler errors have been resolved, if you ls
in the terminal, you should be able to see an executable called test_suite
. You can then run this by typing in ./test_suite
to see the evaluation of your code.
Note that your submission will be partially evaluated on the number of compiler warnings. You should eliminate ALL compiler warnings in your code
Valgrind
We will also test your submission on whether there are any memory errors or memory leaks. We will be using valgrind to do this. To do this, you should try running:
valgrind --leak-check=full ./test_suite
If everything is correct, you should see the following towards the bottom of the output:
==1620== All heap blocks were freed -- no leaks are possible
==1620==
==1620== For counts of detected and suppressed errors, rerun with: -v
==1620== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
If you do not see something similar to the above in your output, valgrind will have printed out details about where the errors and memory leaks occurred.
Note that you should avoid memory errors in your HttpServer.cc but there isn’t a good way to test this since we can’t make the server exit gracefully. As a result, you do not need to test valgrind on the ./httpd
binary.
Catch2
As with previous homework assignments, you can compile the your implementation by using the make
command. This will result in several output files, including an executable called test_suite
.
After compiling your solution with make
, You can run all of the tests for the homwork by invoking:
./test_suite
You can also run only specific tests by passing command line arguments into test_suite
For example, to only run the HttpConnection tests, you can type in:
./test_suite [Test_HttpConnection]
Note: you may have to type ine ./test_suite \[Test_HttpConnection\]
for it to work.
If you only want to test write_respons
e from HttpConnection
, you can type in:
./test_suite [Test_HttpConnection] write_response
You can specify which tests are run for any of the tests in the assignment. You just need to know the names of the tests, and you can do this by running:
./test_suite --list-tests
These settings can be helpful for debugging specific parts of the assignment, especially since test_suite
can be run with these settings through valgrind
and gdb
!
Partners
Note that the assignment is designed to be completable by one person, but you may work in pairs. Regardless of if you work in pairs, you MUST fill out the partner sign up form on canvas to indicate your partner status and how to locate your project grade.
Submission:
Please submit your completed files FileReader.cpp
, WordIndex.hpp
, WordIndex.cpp
, CrawlFileTree.cpp
, ServerSocket.cpp
, HttpConnectionc.cpp
, HttpUtils.cpp
and HttpServer.cpp
to Gradescope
Each individual in a group must submit the code to gradescope.