Food Recommender

Summary of Deliverables

By the end of this, here’s what you’ll need to submit to Gradescope:

  • food_recommender.py
  • test_food_recommender_student.py
  • readme_food.txt

0. Getting Started

All files should be available to you in the “Food Recommender” assignment on Codio. If you need to download the starter files again, you can do so here.

A. Background & Goals

One thing that is commonly done when looking for restaurants is to consider all of the ones in our area and then filter out some of them based on certain criteria. This critera could be how far away the restaurants are, what cuisine they serve, how expensive they are, and if they are recommended by others. This criteria can heavily influence which restaurants we go to, and thus can affect the viability of a restaurant as a business. These criteria may also affect us users by framing what make a restaurant good for us. This data may also be imperfect and lead us to false conclusions about places we don’t know much about.

We have gathered some data about restaurants near UPenn for you to analyze, but it is not the same data that Yelp or Google Maps may use to recommend restaurants. In our next assignment you will handle “real” data, but for now we want to give you an entry point in data science. This assignment also aims to show you how we may do data science without using tools specially designed for data science, which we will use in the next assignment.

What we want you to do in this assignment is write code to help us parse restaurant data, and in the process gain familiarty with sets, maps, and unit testing.

B. What You Will Do

In this assignment you will start by filling out a function that reads a CSV into a dictionary structure. Afterwards you will write various functions that interact with that structure to extract information on the restaurants whose data is stored in that structure. While you are doing this you will also need to write some unit tests to test the code you write for food_recommender.py. Lastly you will write some tests to make sure that your code is working properly!

1. Reading Data

For the first part, you will implement load_restaurant_csv function. This function takes in a string with the file path of the CSV file we want to read. The CSV file should be laid out something like this:

Name,Cuisine,Price,Distance,1100 Staff Endorsements
Masala Kitchen,Indian,$,0.8,15
Hangry Joes,Chicken,$,0.4,12
# ... more rows with more restaurants

This is a CSV file (Comma, Separated, Values) and you can think of this almost as being a table. If we reason about this file like a table, it would look like:

Name Cuisine Price Distance 1100 Staff Endorsements
Masala Kitchen Indian $ 0.8 15
Hangry Joes Chicken $ 0.4 12


Note that the first line in our CSV file has a special value, it has the name of each column in the table. Every line after the first has a comma separated list describing a restaurant. The first line of our CSV should always be the same, while the rest of the file can vary (but should still follow the format shown here). You can assume that all files you process will be in this format.

Your first job is to write the function load_restaurant_csv() so that it reads this file and store all of its values into a dict object properly.

The dictionary returned by this function must follow a specific structure. In particular, you should return a dictionary object where:

  • The keys are strings, each string is a different restaurant name
  • The values are tuples containing four different values in the following order:
    • A string containing the cuisine of that restaurant
    • An int representing the cost level of the restaurant. $ would become a “1” and refer to relatively cheap restaurants. $$ would become the integer 2. The higher the number (and thus the more $) the more expensive it is.
    • A float that represents the distance (in miles) the restaurant is from Levine Hall
    • An int that represents the number of endorsements it has from CIS 1100 Course Staff :)

If we were to take the example data above, properly parse it into the desired structure, and print it we would get:

{'Masala Kitchen': ('Indian', 1, 0.8, 15), 'Hangry Joes': ('Chicken', 1, 0.4, 12)}

If we indent this into a more readable state we get:

{
    'Masala Kitchen': ('Indian', 1, 0.8, 15),
    'Hangry Joes': ('Chicken', 1, 0.4, 12)
}

This may be a lot to do at once, so we have a suggested path to figuring out how to do this:

  • First you should try modifying load_restaurant_csv() so that it reads all of the lines from the specified file, and printing out every line except for the first one.

  • Second you should modify your code so that you split() every line read, identify the Name, Cuisine, Price, Distance and 1100 Staff Endorsements on each line, and print them. Remember that you can pass an argument into the split() function to sepcify what you want to split on. For example "aaBacBa".split("B") returns ["aa", "ac", "a"].

  • Lastly, you should now take each line you have read and store it in a dictionary object in the structure described above. Be sure you properly convert each item into its proper type!

Once you are sure that your dictionary works, you should try running:

python -m unittest test_food_recommender.TestLoadCSV -v

and make sure that it passes the tests. If it does not, then you should fix your code until it works properly.

2. Processing the Dictionary

You will now write 5 more functions, each of which will interact with the same dictionary structure that you built up when reading from a file.

a. Basic Processing

These first four functions are all required for you to implement and suppport the basic functionality of our food recommender. You may notice that some of them are very similar to each other.

We highly recommend you test your code along the way and that you also write your tests too. See the section below on writing unit tests for more.

Note: For each of these you will need to write a function header comment that gives an overview of what it does.

get_cuisines

Takes in a dictionary in the same format as we built up in load_restaurant_data() and returns a set of all the different cuisines that are availble from restaurants in that dictionary.

You should be able to use the TestGetCuisines class to test this:

python -m unittest test_food_recommender.TestGetCuisines -v

max_distance

Takes in a dictionary in the same format as we built up in load_restaurant_data() and a float representing a distance in miles. The function returns a set of strings, where each string is a restaurant in the passed in dictionary whose distance is less than or equal to the passed in distance.

You should be able to use the TestMaxDistance class to test this. Simply modify the commands used for testing get_cuisine and load_restaurants_csv so that it uses the correct class name. E.g. replace TestGetCuisines with TestMaxDistance in the command and then run it.

ta_endorsements

Takes in a dictionary in the same format as we built up in load_restaurant_data() and an integer. The function returns a set of strings, where each string is a restaurant in the passed in dictionary whose endorsement count is greater than or equal to the passed in endorsement requirement. This function should be VERY similar to the previous function.

You should be able to use the TestTAEndorsements class to test this.

filter_cuisine

Takes in a dictionary in the same format as we built up in load_restaurant_data() and a string representing a specific cuisine. The function returns a set of strings, where each string is a restaurant in the passed in dictionary that is the same cuisine as the one specified. Note that your code should match cuisines if they contain the same letters but are a different case (uppercase vs lowercase). For example, the cuisine chicken and CHIckeN should be considered the same. You may want to take advantage of str.upper() or str.lower() for this function.

You should be able to use the TestFilterCuisine class to test this.

b. Advanced Processing

For the next part of code writing, you will have to implement only one of the two following functions. If you do both, we will ignore one of them. The one ignored will be the one with a lower score.

Note: For whichever function you implement, you will need to write a function header comment that gives an overview of what it does.

Takes in a dictionary in the same format as we built up in load_restaurant_data(). The function returns a set of strings, where each string is the name of the most endorsed restaurant in the dictionary for each cuisine present in the dictionary. This means the set will contain as many restaurant names as there are cuisines in the dictionary.

You can assume that no restaurant has a recommendation score less than or equal to -10.

If there are two restaurants in a specific cuisine that tie for most endorsements, choose the one that your program found in the dictionary first.

You should be able to use the TestMostRecommendedDiverse class to test this.

Takes in a dictionary in the same format as we built up in load_restaurant_data(). The function returns a string that represents the cuisine with the highest average endorsements across its restaurants.

You can assume that no restaurant has a recommendation score less than or equal to -10.

If there is a tie for which cuisine has the highest average endorsements, you may return any of the cuisines that tie.

You should be able to use the TestMostRecommendedAverageCuisine class to test this.

3. Writing Unit Tests

For each of the 5 functions you completed in the previous section (This does NOT include load_restaurant_csv), you will write one test case to test your code. These test cases should be written in test_food_recommender_student.py.

a. Motivation

While we’ve been using print statements in previous homeworks, for this homework we’re going to write actual unit test with python unittest, which is a industry-grade framework used by companies like Amazon and Google to test their code. unittest is valuable because it lets us maintain enormous test suites that can run automatically and let us know if changes to our code breaks any of our functionality - this becomes super crucial as the projects you’ll be making in your CIS career will grow in size and complexity. In other words, testing is actually very very very important for CS, even outside of academics.

b. unittest Reminders

As a review, unittest works by comparing what is actually returned by your functions and object state. Here is an example:

import unittest
import food_recommender 
# above line imports the code we want to test

class ExampleTest(unittest.TestCase):

    def test_get_cuisine_example():
        """
        Test that get_cuisines called on a dictionary with just one
        restaurant returns a set with just that restaurant's cuisine.
        """
        # declare what we will use as input for the test
        test_input = {
            "Harry and Travis's Chicken Tendie Surprise": ("Chicken", 1, 0.005, -8),
        }

        # get what the actual output is by calling the code we want to test
        actual = food_recommender.get_cuisines(test_input)

        # declare the expected output:
        expected = { "Chicken" }

        self.assertEqual(actual, expected)

if __name__ == '__main__':
    unittest.main()

You’ll find the following functions from the unittest library useful:

assertTrue(someConditon)
assertFalse(someCondition)
assertEqual(firstValue, secondValue)

Note that you can put multiple assert statements in one test. With multiple assert statements in a test, if a single one of them fail, the whole test will fail. However for this homework, the use of multiple assert statements is not expected nor needed. Multiple asserts will be useful in future homeworks.

c. Writing Tests

To start, open test_food_recommender.py and take a look at the test cases that have already been given to you. The test cases that you write yourself must be different than the given cases - credit will not be given to a repeated case.

Write your test cases in test_food_recommender_student.py. You are only required to write one new test per function, but we encourage you to write as many tests as you need to feel confident that your code works. Once you’ve written these, run the tests (directions described below) and make sure they pass. We recommend testing your functions with multiple edge cases and varied input arguments to ensure that your code works for all possible scenarios.

Instead of a detailed function header comment, please write a short (about one-line) summary of what is being tested in each case. You can see we do something like this in the provided test_food_recommender.py

d. Running your Tests

To run your tests, all you need to do is go to the terminal where we would normally run our programs and run the following command:

python -m unittest test_food_recommender_student -v

If you want to run the test functions that we give you, all you need to do is replace one part of the command. Instead of test_food_recommneder_student it would just be test_food_recommender:

python -m unittest test_food_recommender -v

If you want to run only some specific tests you can do that by specified further in the command. For example, if I want to run all tests that are in the TestLoadCSV class of test_food_recommender.py, then I could run:

python -m unittest test_food_recommender.TestLoadCSV -v

If I wanted to be more specific and run a specific test function within a certain class, I could do that. Lets say I want to run test_checkpoint_0_loading_empty_dataset within the TestLoadCSV class, I could do:

python -m unittest test_food_recommender.TestLoadCSV.test_checkpoint_0_loading_empty_dataset -v

4. Using the whole program

Once everything is done and working, you can now use the main of our food recommender program! The program is designed to analyze the restaurants that CIS 1100 course staff recommend :) To run it, all you need to do is do:

python food_recommender_main.py restaurants.csv

From here it should start running the application. Try typing “help” and then hitting enter for a list of commands. Once you do this, try running some of the commands to have some restaurants recommended to you. We will ask you in the readme for what recommendations you got :)

Hopefully you get a good restaurant recommendation from it!

5. Readme & Submission

A. Readme

Complete readme_food.txt in the same way that you have done for previous assignments.

B. Submission

Submit food_recommender.py, test_food_recommender_student.py and readme_food.txt on Gradescope.

Your code will be tested for compilation and checkstyle errors upon submission.

Important: Don’t forget to write comments for your function headers and test cases as mentioned earlier in the specifiaction.

Important: Remember to delete any print statements you added to your code before submitting.

If you encounter any autograder-related issues, please make a private post on Ed.