CIT 5950 (Spring 2024) Home Schedule Assignments Tools & Refs HW 00: SimpleKV

This assignment aims to be a thorough introduction to some fundamental types in C++

Goals

Collaboration

For assignments in CIT 5950, you will complete each of them on your own or solo. However, you may discuss high-level ideas with other students, but any viewing, sharing, copying, or dictating of code is forbidden. If you are worried about whether something violates academic integrity, please post on Ed or contact the instructor.

Setup

For this assignment, you need to setup a Linux C++ development environment. Please follow the instructions here: Enivronment setup

Instructions

You can downlowd the starter files into your docker container by running the following command:

curl -o simplekv.zip https://www.seas.upenn.edu/~cit5950/current/projects/code/simplekv.zip

You can also download the files manually here if you would like: simplekv.zip

From here, you need to extract the files by running

unzip simplekv.zip

From here you can either open the project in Vim or VSCode.

For Vim, you just need to run

cd simplekv
vim SimpleKV.hpp

For VSCode you will have to follow steps similar to what we did to open chcek_setup in the setup document.

Here is a brief overview of the files:

Files that you will edit:

Compilation & Test Files

We recommend reading all of the files for a part before your start that part, especially SimpleKV.hpp. All of the code you will submit for this assignment should be contained in SimpleKV.cpp and SimpleKV.hpp. This means that your code should work with the other files as they are when initially supplied. Note that you can modify the other files if you like, but we will be testing your SimpleKV.cpp and SimpleKV.hpp against un-modified versions of the files.

Overview

In this assignment we will be implementing a C++ object called SimpleKV that supports the functionality of a Key-Value store. This object will be very to a “Dictionary” or “Map” data structure but with some slight alterations.

The core of what we are implementing is a map<string, string> (A map from string keys to string values). There is more functionality on top of this, but this is the base of it. You should be familar with the Map data structure from CIT 5910 already, and I recommend youbut I will breifly go over it again here just in case. If you feel comfortable with the idea of a map<string, string> then you can move on to the next section: Name Spaces, but I will breifly go over what a map is here.

In a key value store, we store pairs called key value pairs. That is, we store key and values. We can then look up values based on their key, similarly to how we can look up values in an array or list based on their index.

Consider the following example code that uses a map: We want to store assignments and their corresponding grade, so we can store that information in a map (or unordered_map);

// creates a map to store assignment names
// and their associated score is
map<string, string> grades {};

// add new assignments and their scores to the map
grades["hw0"] = "B+";
grades["hw1"] = "A-";
grades["hw2"] = "C-";
grades["hw3"] = "C+;
grades["hw4"] = "F";
grades["project"] = "A+";

// a key can only exist once in a map, so this next line
// updates the score for "hw4" to be a new value and
// does not insert a new key called "hw4"
grades["hw4"] = "A+";

// check to see if there is an assignment called "pennos"
// inside the map
bool pennos_exists = grades.contains("pennos"); // should return false

// print out all of the assignments and their grades
// hw0: B+
// hw1: A-
// hw2: C-
// hw3: C+
// hw4: A+
// project: A+
for (auto& assignment: grades) {
  cout << assignment.first << ": " << assignment.second << endl;
}

for more information on how maps work, you can take a look at these slides from CIT 5910 that covered maps

Namespaces

Each key-value pair we store in the SimpleKV object will also belong to a specific “namespace”.

A namespace is a concept that exists beyond this assignment. Generally namespaces are a way to separate names so that items can be uniquely identified by their name. Namespaces are a way to prevent conflicts when multiple items have the same name.

Consider the following C++ code:

void foo() {
  int x = 3;
  int x = 5;
}

In this example, the code would not compile because we try to declare two integer variables of the same name. We have a name conflict in the name of these variables. In this example, we could say we are trying to put the same name twice in the same namespace, which would be a conflict.

Consider the following similar code:

int x = 1;

void foo() {
  int x = 3;
}

void bar() {
  int x = 10;
}

Again we try and delcare the variable x multiple times, but each time we do it in a separate scope. This is similar to the idea of separate namespaces, where we have the variable “x” but each declaration is in a different namespace. The compiler (and programmer) can distinguish between different “x” variables based on whether it is “the x in foo” or “the x in bar”.

Lets see how this applies to our idea of a map data structure. Consider we again wanted to store assignments and their corresponding grade, but this time, we want to store not only the grades for CIT 5950, but also for another course CIT 5940.

// creates a map to store assignment names
// and their associated score
map<string, string> grades {};

// add all of the assignments for CIT 5950:
grades["hw0"] = "B+";
grades["hw1"] = "A-";
grades["hw2"] = "C-";
grades["hw3"] = "C+;
grades["hw4"] = "F";
grades["project"] = "A+";

// add all of the assignments for CIT 5940:
grades["hw0"] = "B+";
grades["hw1"] = "A-";
grades["hw2"] = "A-";
grades["hw3"] = "B;
grades["hw4"] = "D";
grades["hw5"] = "A";

// print out all of the assignments and their grades
// hw0: B+
// hw1: A-
// hw2: A-
// hw3: B
// hw4: D
// hw5: A
// project: A+
for (auto& assignment: grades) {
  cout << assignment.first << ": " << assignment.second << endl;
}

In this example, we try to store the assignments for both classes in the same map, and we have name conflicts for some of the assignments, meaning that some of the assignments for 5950 have their grade overwritten. We can fix this by introducing a namespace. Here I will demonstrate how we would solve this problem using the SimpleKV object you will be implementing in this assignment

SimpleKV kv{};

// add all the CIT 5950 grades to the
// "CIT 5950" namespace
// sset takes in a namespace, a key and a value
kv.sset("CIT 5950", "hw0", "B+");
kv.sset("CIT 5950", "hw1", "A-");
kv.sset("CIT 5950", "hw2", "C-");
kv.sset("CIT 5950", "hw3", "C+");
kv.sset("CIT 5950", "hw4", "F");
kv.sset("CIT 5950", "project", "A+");

// add all the CIT 5940 grades to the
// "CIT 5940" namespace
// sset takes in a namespace, a key and a value
kv.sset("CIT 5940", "hw0", "B+");
kv.sset("CIT 5940", "hw1", "A-");
kv.sset("CIT 5940", "hw2", "A-");
kv.sset("CIT 5940", "hw3", "B");
kv.sset("CIT 5940", "hw4", "D");
kv.sset("CIT 5940", "hw5", "A");

In SimpleKV, we store our key-values in an associated namespace to avoid the name conflict issue. This is a feature you will have to support when you implement SimpleKV.

List Values

Another feature that we support with SimpleKV, is that the values can be either strings or lists of strings. Note that this is an exclusive “or”, a value cannot be both a string and a list of strings. Keys and namespace names can only be strings. This adds complication since you will need some way of storing values that are of either type and supporting operations on either type.

Documentation for what exactly can be done with a list type can be found in the comments in SimpleKV.hpp

Handling two differnt types of values

Your kv needs to be able to hold values that are either string or list (never both at the same time). Keys and namespace names can only be strings, only values may be lists or strings. This can be a bit complicated to do, and there are many ways to do it. One way is to maintain two different private data structures that keep track of list values and string values separately. Another is to use the variant type that was briefly discussed in class.

You are free to do either of these (or something else), but this is something you should take into consideration when desigining your code.

Allowed Objects & Functions

You are restricted to using the #include’s mentioned in lecture. you may ask if a specific include is allowed on Ed if you are unsure.

Suggested Ordering

One of the first things you will have to do in this assignment is choose which data structures you will need to implement the SimpleKV. We provide a header file for you, but it does not contain any data members, and you will have to fill that out.

We highly recommend you read all of the spec and read through most of the header file to get an idea for what you need to do in the assignment. This will also help you decide which data structure you can use. As a hint, I recommend taking a look at the Allowed Objects & Functions section above. You are limited to those data structure, think about wht combination of them will you need. (remember that you can put data structures inside of other data structures, e.g. map with vector values)

From here, we recommend implementing a couple functions in combination: sget, sset, namespaces, keys, ns_exists, key_exists, and del.

You should be able to pass a some of the basic tests, in particular, running

$ ./test_suite [basic]

should pass all tess.

Once you have that passing, implement lpush, lmembers and type. You should then pass a few more tests

$ ./test_suite [basic list]

From here, gradually implement functions untill you are passing the [complex list], and [stress] tests. From here, you should be able to pass all tests in the testsuite if you just run ./test_suite

Grading, Testing and Checking

We will be evaluating your submission mostly automatically. Specifically, we are checking for the following automatically:

More details on each of these are in the below sections

We will also briefly check submission by hand to make sure you do not do anything weird to pass the tests with an incorrect solution.

Compilation

We have supplied you with a makefile that can be used for compiling your code into an executable. To do this, open the terminal and then type in make.

You may need to resolve any compiler warnings and compiler errors that show up. Once all compiler errors have been resolved, if you ls in the terminal, you should be able to see an executable called test_suite. You can then run this by typing in ./test_suite to see the evaluation of your code.

Note that your submission will be partially evaluated on the number of compiler warnings. You should eliminate ALL compiler warnings in your code.

Valgrind

We will also test your submission on whether there are any memory errors or memory leaks. We will be using valgrind to do this. To do this, you should try running: valgrind --leak-check=full ./test_suite

If everything is correct, you should see the following towards the bottom of the output:

 ==1620== All heap blocks were freed -- no leaks are possible
 ==1620==
 ==1620== For counts of detected and suppressed errors, rerun with: -v
 ==1620== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

If you do not see something similar to the above in your output, valgrind will have printed out details about where the errors and memory leaks occurred.

Note: It is expected to take a while for valgrind to run on the whole test_suite. You can possible go faster by only running valgrind on tests. See the testing section above for information on running individual tests.

Clang Format and Clang Tidy

The makefile we provided with this assignment is configured to help make sure your code is properly formatted and follows good (modern) C++ coding conventions. To do this, we make use of two tools: clang-format and clang-tidy

To make sure your code identation is nice, we have clang format. All you need to do to use this utility, is to run make format, which will run the tool and indent your code propely. Code that is turned in is expected to follow the specified style, code that is ill-formated must be fixed and re-submitted.

clang-tidy is a more complicated tool. Part of it is checking for style and readability issues but that is not all it does. Examples of readability issues include:

Not using curly braces around if statements and loops:

if (condition)  // clang-tidy will complain about missing curly braces
  cout << "hello!" << endl;

Declaring variables or parmaters with names that are too short:

void foo(char c) { // clang-tidy will complain about the name `c`
  // does something
}

Having functions that are too complex and long. The tool calculates “cognitive complexity” of your code and will complain about anything that is too complex. This means you should think about how to break your code into helpers, because if you don’t, clang-tidy will complain and you will face a deduction. More on this specific error can be found here: Cognitive Complexity

clang-tidy is also useful for noticing some memory errors and pointing out bad practices when writing C++ code. Because of all this, we are enforcing that your code does not produce any clang-tidy errors. You can run clang-tidy on your code by running: make tidy-check. Whenever you compile your code using make then it should also re-run clang-tidy to check your code for errors. Note that you will have to fix any compiler errors before clang-tidy will run (and be useful).

If you have any questions about understanding an error, please ask on Ed discussion and we will happily assist you.

Submission

Please submit your completed SimpleKV.hpp, and SimpleKV.cpp to Gradescope