cis5480-25sp (Spring 2025) Home Schedule Assignments Tools & Refs Project 0: penn-vec

Operating Systems Design and Implementation (Spring 2025)

Goals

Updates

Contents

Collaboration

This is an individual assignment. You may not work with others. Please regularly check EdStem throughout this course for project specification clarifications. You are required to version control your code, but please only use the GitHub repository created for you by the staff. Do not work in public GitHub repositories! Please avoid publishing this project at any time, even post-submission, to observe course integrity policies.

Overview

Homework 00 is designed to help you get familiarized in writing good C code. Successful completion of this assignment will get you to better understand C programming topics such as pointer arithmetic, structures, and memory allocation. In this assignment, you will implement a vector data structure, named penn-vector, that allows a user to create a vector that can hold any pointer type. Additionally, the vector data structure must have a predifined capacity and an optional custom element deletion/free function that you will provide. A more robust specification of this data structure can be seen below. You will complete this assignment using only a specified set of C library functions.

Setup

For this assignment, you need to setup a development environment. Please follow the instructions below to setup your account:

Once you have installed and setup your docker container, execute the following commands in the terminal of your container to get the source files.

curl -o penn-vec.zip https://www.seas.upenn.edu/~cis5480/current/code/00/penn-vec.zip
unzip penn-vec.zip

First are the two main starter code files, Vec.c and Vec.h. Read through them carefully. Note how the struct and it’s members are defined.

Next is the Makefile used to compile your code. Later sections describe how you can run the Makefile. The last is the test_suite.cpp, test_basic.cpp and test_panic.cpp, which you will be running to check your vectors’s functionality. More information in section 3.4

Specification

1. Penn Vector

The Penn Vector is a dynamically resizable array that can store pointers to any type of data. While higher-level languages often provide built-in data structures for dynamic arrays, C requires us to implement our own. Thankfully, in C, every pointer is the same size! Thus, the Penn Vector, called Vec in the code, holds an array of void * elements, making it flexible enough for pointers to different data types. It also supports an optional element destructor function to clean up memory for each pointer if needed.

typedef void* ptr_t;
typedef void(*ptr_dtor_fn)(ptr_t); //function pointer that takes a ptr_t as argument, and returns nothing
typedef struct vec_st {
  ptr_t* data;
  size_t length;
  size_t capacity;
  ptr_dtor_fn ele_dtor_fn;
} Vec;

The Vec struct holds the following information:

1.1 Data Storage

The vector allocates space for pointers (i.e., void *) on the heap. As its size grows beyond its current capacity, the vector re-allocates its internal array (more on this below) with more space and copies over existing elements.

Because every pointer in C is the same size, storing them in an array of void * allows us to sidestep more complex generic programming approaches. For example, suppose we have a custom struct MyStruct; we can create a Vec of MyStruct * without needing to rewrite the data structure for different pointer types.

1.2 Element Destructor

One distinguishing feature of this vector is the optional element destructor function pointer, ele_dtor_fn. If provided, this function is called on each element (i.e., each pointer stored in the vector) before it is removed or when the vector is destroyed. This allows safe cleanup of dynamically allocated memory or any other special behavior that must happen when an element is removed. Again, not all vectors will need ele_dtor_fn, as this implies there is something special about how these pointers should be discarded (e.g. they are heap-allocated themselves or if they point to a struct containing heap-allocated memory, etc.). As it implies, the destrictor funciton is something you must provide if necessary. Onto the methods.

1.3 Penn Vector Methods

A Vec supports the following core operations:

  1. Creation with an initial capacity (vec_new).
  2. Insertion, erasure, and resizing:
    • vec_push_back adds an element to the end, resizing if needed.
    • vec_pop_back removes the last element (and calls the destructor if available).
    • vec_insert inserts an element at a specified position, shifting the subsequent elements up.
    • vec_erase removes an element from a specified position, shifting subsequent elements down.
    • vec_resize changes the capacity of the vector if the current capacity is not sufficient.
  3. Access and modification:
    • vec_get retrieves the element at a given index.
    • vec_set modifies the element at a given index.
    • vec_capacity retrieves the current capacity of the vector.
    • vec_len retrieves the current length of the vector.
    • vec_is_empty checks if the vector is empty.
  4. Cleanup:
    • vec_clear removes all elements but leaves capacity as-is.
    • vec_destroy deallocates all internal storage and calls element destructors if applicable.

1.3 Overall

Your implementation of Vec and its methods will do the following:

  1. Allocate memory for the array of pointers that serves as the ‘vector’.
  2. Maintain the inner state of the Vec, updating its fields accordingly.
  3. Be resizable and mutable.
  4. Destroy itself once done, with no memory leakage.

An example use case of the Vec data structure is shown below:

  Vec v = vec_new(10, NULL); //no dstr function needed.
  int fst = 10;
  int snd = 11;
  int trd = 12;
  vec_push_back(&v, &fst);
  vec_push_back(&v, &snd);
  vec_push_back(&v, &trd);
  int *ptr_int = vec_get(&v, 0); // returns a pointer to fst
  vec_pop_back(&v);
  vec_pop_back(&v);
  vec_pop_back(&v); // empty vec with capacity for 10 pointers.
  size_t len = vec_len(&v);
  printf("Length is %d.\n", len); //should be 0
  vec_destroy(&v); // data storage deallocated

There are multiple ways to implement the methods for this data structure, and the details are entirely up to you.

2. Implementation

The main work occurs in the .c file where the vector’s methods will be defined. We provide you a template for one of the methods, leaving you to do the rest. Below is an overview of each function and its role in maintaining the Vec’s internal state. Refer to the .h file for detailed information, particularly when a function is being used incorrectly.

2.1 Creating a Vector (vec_new)

Vec vec_new(size_t initial_capacity, ptr_dtor_fn ele_dtor_fn);

2.1.1 Utility Macros for Vec

Something unique about these “functions” is that they shouldn’t be implemented as actual C functions. Instead, they should be created using preprocessor directives. Keep in mind that in C, #define effectively replaces every occurrence of the defined token with the text on the right-hand side. For instance, take a look at the following macro:

#define true 1

Anywhere true appears in the source file, the compiler will replace it with the value 1.

We can also have function-like macros. For example we can do the following to define a macro “function” to add two things together.

#define ADD(x, y) ((x) + (y))

int main() {
  int z = ADD(3, 5);
  ...
}

The compiler will see this code as the following, which will properly add the two “inputs” to the “function”.

int main() {
  int z = ((3) + (5));
  ...
}

With that in mind, let’s explore the function-like macros needed for our Penn Vector.

#define vec_capacity(vec) ... 
#define vec_len(vec) ....
#define vec_is_empty(vec) ...

2.2 Getting and Setting Elements

ptr_t vec_get(Vec* self, size_t index);
void  vec_set(Vec* self, size_t index, ptr_t new_ele);

2.3 Adding and Removing Elements

void vec_push_back(Vec* self, ptr_t new_ele);
bool vec_pop_back(Vec* self);

2.4 Inserting and Erasing at Arbitrary Positions

void vec_insert(Vec* self, size_t index, ptr_t new_ele);
void vec_erase(Vec* self, size_t index);

2.5 Resizing the Vector

void vec_resize(Vec* self, size_t new_capacity);

2.6 Clearing and Destroying the Vector

void vec_clear(Vec* self);
void vec_destroy(Vec* self);

2.7 Incorrect Usage of Methods

We provide two other files, panic.c and it’s corresponding header file. However, you don’t have to modify these files at all. The functionality for our panic function is defined here and allows you to throw an error message when necessary. See the Vec.h file for more detailed instructions for when certain methods should panic and when they should not. Its’ usage can be seen below.

void vec_method(Vec* self){
  if(/*used incorrectly*/){
    panic("Panic message here.\n");
  }
}

3. Testing Your Penn Vector

3.1 Creating Test Cases

We strongly encourage you to write extra test cases that exercise each vector function. The tests we provide are not all encompassing. For example, test what happens when you:

3.2 Verifying Memory Safety

It is critical to run tools like Valgrind to check for memory leaks. The vec_destroy function must free all allocated memory, and the destructor function must clean up each element’s own allocated data if present. See the section below for more!

3.3 Integration with Future Assignments

Remember, you will be using this Vec in future assignments that require storing pointers to various dynamically allocated structs. Ensuring your vector is robust and leak-free now will save you considerable time and effort in the future.

3.4 Testing your vector with test-suite

Testing will be done through the C++ Catch2 test suite. The provided test file checks for the basic functionality of the vector. Once you are done implementing the vector, you can run the following commands in the docker terminal.

make
./test_suite

This will output the test results to your terminal. We encourage you to write your own test cases to check for more edge cases as the gradescope submission will test for more cases than the test suite file we have provided.

Tip: Make sure you study the test cases we provide you. They are important indicators of your vector’s functionality

3.5 clang-tidy and clang-format

The makefile we provided with this assignment is configured to help make sure your code is properly formatted and follows good C coding conventions. To do this, we make use of two tools: clang-format and clang-tidy

To make sure your code identation is nice, we have clang format. All you need to do to use this utility, is to run make format, which will run the tool and indent your code propely. Code that is turned in is expected to follow the specified style, code that is ill-formated must be fixed and re-submitted.

clang-tidy is a more complicated tool. Part of it is checking for style and readability issues but that is not all it does. Examples of readability issues include:

Not using curly braces around if statements and loops:

if (condition)  // clang-tidy will complain about missing curly braces
  printf("hello!\n");

Declaring variables or parmaters with names that are too short:

void foo(char c) { // clang-tidy will complain about the name `c`
  // does something
}

Having functions that are too complex and long. The tool calculates “cognitive complexity” of your code and will complain about anything that is too complex. This means you should think about how to break your code into helpers, because if you don’t, clang-tidy will complain and this will result in point deductions. More on this specific error can be found here: Cognitive Complexity

One common thing that adds to cognitive complexity is to do the following:

bool foo(char* param) {
  if ( !error1 ) {
    if ( !error2 ) {
      if ( !error3 ){
        // do some computation
        return true;
      }
    }
  }
  return false;
}

If you have a lot of nested code, then it contributes more to cognitive complexity. Rewriting the code as shown below will reduce the cognitive complexity.

bool foo(char* param) {
  if ( error1 ) {
    return false;
  }
  if ( error2 ) {
    return false;
  }
  if ( error3 ) {
    return false;
  }
  // do some computation
  return true;
}

clang-tidy is also useful for noticing some memory errors and pointing out bad practices when writing C code. Because of all this, we are enforcing that your code does not produce any clang-tidy errors. You can run clang-tidy on your code by running: make tidy-check. Whenever you compile your code using make then it should also re-run clang-tidy to check your code for errors. Note that you will have to fix any compiler errors before clang-tidy will run (and be useful).

If you have any questions about understanding an error, please ask on Ed discussion and we will happily help you.

4. Miscellaneous

4.1 Code Organization

Clean code organization is critical for all software. You must try to modularize your code as much as you can, separating repeated parts of the code into functions, etc.

Also make sure that your source code (.c and .h files) is in the top-level directory and can be built by the autograder’s Makefile.

Your code should adhere to DRY (Don’t Repeat Yourself). If you are writing code that is used in more than one place, you should most likely write a function or a macro.

Note that this is actually somewhat enforced by the autograder, as it will check to make sure you have no functions that are too long. You must modularize your code well or face a deduction.

4.2 Memory Errors

You are required to check your code for memory errors. This is a nontrivial task, but an extremely important one. Code with memory leaks and memory violations will incur deductions. Arguably, code that has memory access errors should be considered incorrect and not receive points for any test cases that face those errors. Fortunately, there is a very nice tool valgrind that is available to help you. You must find and fix any bugs that valgrind locates, but there is no guarantee it will find all memory errors in your code. Below is a sample run of the test_suite in valgrind:

root@57fb0c278533:~/cis5480/course_projects_5480/p0_vector# valgrind --leak-check=full --show-leak-kinds=all ./test_suite
==493== Memcheck, a memory error detector
==493== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==493== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==493== Command: ./test_suite
==493== 
Randomness seeded to: 1363030919
===============================================================================
All tests passed (187 assertions in 7 test cases)

==493== 
==493== HEAP SUMMARY:
==493==     in use at exit: 0 bytes in 0 blocks
==493==   total heap usage: 3,991 allocs, 3,991 frees, 566,298 bytes allocated
==493== 
==493== All heap blocks were freed -- no leaks are possible
==493== 
==493== For lists of detected and suppressed errors, rerun with: -s
==493== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
root@57fb0c278533:~/cis5480/course_projects_5480/p0_vector# 

If you do not see the text ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0), you almost certainly have an issue that needs to be fixed. In the case you do have errors, valgrind should output some error messages when you ran it. Read these to try and diagnose the issue.

4.3 Acceptable System Calls and Library Functions

In this assignment you may use only the following library functions. The man pages should describe how a function indicates error.

Developing Your Code

In general, there are two environments for you to develop projects in this course: (1) Docker Container or (2) SpecLab servers for remote developing. We recommend the first environment.

For the first choice, we provide a Docker Image for your convenience to create a course-standardized container. You can find the set-up instructions on Ed. We highly recommend you to use it as your developing environment because all grading will be done on it. Please do not develop on macOS directly as there are significant differences between macOS and Unix variants such as Ubuntu Linux. If you decide to develop on a different environment anyway, you must compile and test your code on the course-standardized environment to ensure your code runs as you expect during grading.

For the second choice, SpecLab will be available to you as students, however, we discourage its usage unless your personal machine has super, super weak performance specifications such as a single core machine with 2GB of RAM or less. SpecLab is a collection of older desktops that run the same Linux variant as eniac, and most importantly, you can crash them as much as you want without causing too much chaos. You access SpecLab remotely over ssh. There are roughly 10 SpecLab machines up at any time. When you are ready to develop, choose a random number between 01 and 50, or so, and then issue this command:

ssh specDD.seas.upenn.edu

where DD is replaced with the number you chose. If that machine is not responding, then add one to that number and try again. Not all SpecLab machines are currently up, but most are. You must be on SEASnet to directly ssh into SpecLab. The RESnet (your dorms) is not on SEASnet, but the machines in the Moore computer lab are. If you are not on SEASnet, you may still remotely access SpecLab by first ssh-ing into eniac and then ssh-ing into SpecLab.

Tips and Suggested Order

For this course we mostly want to let you come up with the implementation from scratch. This will apply more in the next assignments, but since this is a C refresher, we will provide more guidance. The principles guiding this advice is to take it one step at a time, think about your design beforehand, and modularize your code into functions. We recommend keeping these principles in mind for the future assignments.

Our first tip is to consider what functions you will need for the assignment. To get inspiration, you should take a look at the allowed functions

Another tip is to test gradually and gradually add functionality. Start with trying to implement the vec_new and vec_destroy methods, and then making sure your code has no clang-tidy or clang-format errors and passes the test section:

./test_suite "Construction and Destruction"

Progressively add more functionality while still checking for clang-tidy errors. The next test case is to make sure you handle push methods correctly:

./test_suite "Push Operations"

And you can continually go through every test in test_vector.cpp :)

Finally, keep going through the remaining test sections in your test_vector.cpp (or equivalent test file) until all of your methods have been verified. Throughout this process, remember to:

By taking this incremental approach, you will find that debugging is easier and your final implementation will be more robust.

Submission

First, organize your homework directory so that you have all the code and Makefile in a subdirectory called /penn-vector. Once you are done, your directory should look something like this

─25sp-cis5480-student1
 └─penn-vector
   ├─Vec.c
   ├─Vec.h
   ├─test_vector.cpp
   ├─test_suite.cpp
   ├─catch.cpp
   ├─catch.hpp
   ├─main.c
   ├─vector.h
   ├─test_macro.cpp
   └─Makefile

After you are done, commit and push your code to github.

Finally, you will be submitting your github repo to gradescope. On gradescope, when prompted to submit your assignment, select the “Github” option from Submission Method. The auto-grader will run once submitted, and you may move on if you get a 100% on the auto-grader. You have unlimited submission attempts before the deadline.

Grading Guidelines

Please note that general deductions may occur for a variety of programming errors including memory violations, lack of error checking, poor code organization, etc

You may use any comment style you wish, but we highly suggest using DOxygen style comments as they promote fully documenting function inputs, outputs, and error cases. A useful guide to this comment style can be found here: https://www.doxygen.nl/manual/docblocks.html

Additionally, the staff have come up with a guideline for style for this assignmnet and beyond. If ever in doubt about a stylistic change, feel free to ask on Ed or durring Office Hours! Check out the document here: Style Guideline

Your programs will be graded on the course-standardized environments and must execute as specified there. Although you may develop and test on your local machine, you should always test that your program functions properly there.

5.0 Extra Credit: Implementing a Macro-Based Generic Vector

For this extra credit assignment, you will create a generic, type-safe vector using preprocessor directives (i.e. Macros). The goal is to enable the creation of vectors for any type, such as vector(int) or vector(char*), using your implementation of the Penn Vector as inspiration. The implementation relies on macros to create flexible and efficient code that works with any data type. If you choose to do so, you will implement your generic vector in a header file called vector.h.

Summary

Your task is to design a vector library using macros to replace the type-specific implementations. Unlike the original implementation in Vec.h, which uses a void* to store generic pointers, this version will create type-safe vectors at compile time. This approach avoids runtime type-checking while still enabling flexible and reusable code. See the Vector.h file for more information!

Notes

In addition to the basic macros used in Vector.h, you will have to make use of statement expressions and typeof.

If you want to test your code, you will need to build the tests with the command make test_macro and that will build a new executable called test_macro that you can run. Note that at time of publishing, the test_macro is incomplete. More tests will be added to it later and will be announced in lecture when it is updated.

This extra credit will challenge your understanding of both preprocessor macros and the intricacies of memory management in C. A big part of this extra credit is figuring things out for yourself. Minimal help will be given on the extra credit on Ed or in Office Hours.

Attribution

This can be a complex assignment, using APIs that are not always the most intuitive to use. You should be able to complete this with what was covered in lecture and in past courses (namely CIS 2400). You can use C refernces such as the man pages, cppreference.com or cplusplus.com (yes they are C++ website, but they should have documentation on C functions as well). That said, we do expect you to struggle some, which is how you will learn about systems programming, by doing things yourself. Please feel free to refer to work you have done in previous Penn courses (such as CIS 2400); it can be a helpful reminder for things like Makefiles and general C syntax.

The primary rule to keep you safe from plagiarism/cheating in this project is to attribute in your doc- umentation any outside sources you use. This includes both resources used to help you understand the concepts, and resources containing sample code you incorporated in your assignment. The course text and APUE need only be cited in the latter case. You should also use external code sparingly. Using most of an example from the pipe(2) man page is ok; using the ParseInputAndCreateJobAndHandleSignals() function from Foo’s Handy Write Your Own Shell Tutorial is not (both verbatim, and as a closely followed template on structuring your code).