Goals
- Refamiliarizing yourself with C programming (Pointers, structs, output parameters, memory allocation, etc)
- Getting used to writing ‘good’ C code
- Introduction to UNIX terminal and commands
Updates
We’ve now created a style guide that we expect all students to follow for assignments in this course. Check it out here: Style Guideline
Contents
- Goals
- Collaboration
- Overview
- Setup
- Specification
- Developing Your Code
- Tips and Suggested Order
- Submission
- Grading Guidelines
- Attribution
Collaboration
This is an individual assignment. You may not work with others. Please regularly check EdStem throughout this course for project specification clarifications. You are required to version control your code, but please only use the GitHub repository created for you by the staff. Do not work in public GitHub repositories! Please avoid publishing this project at any time, even post-submission, to observe course integrity policies.
Overview
Homework 00 is designed to help you get familiarized in writing good C code. Successful completion of this assignment will get you to better understand C programming topics such as pointer arithmetic, structures, and memory allocation. In this assignment, you will implement a vector data structure, named penn-vector, that allows a user to create a vector that can hold any pointer type. Additionally, the vector data structure must have a predifined capacity and an optional custom element deletion/free function that you will provide. A more robust specification of this data structure can be seen below. You will complete this assignment using only a specified set of C library functions.
Setup
Task 1. Register github account with the course and setup development environment.
For this assignment, you need to setup a development environment. Please follow the instructions below to setup your account:
Once you have installed and setup your docker container, execute the following commands in the terminal of your container to get the source files.
curl -o penn-vec.zip https://www.seas.upenn.edu/~cis5480/current/code/00/penn-vec.zip
unzip penn-vec.zip
First are the two main starter code files, Vec.c
and Vec.h
. Read through them carefully. Note how the struct and it’s members are defined.
Next is the Makefile used to compile your code. Later sections describe how you can run the Makefile.
The last is the test_suite.cpp
, test_basic.cpp
and test_panic.cpp
, which you will be running to check your vectors’s functionality. More information in section 3.4
Specification
Task 2. Read the specificaiton, seriously, read it all. If you have questions, ask in Office Hours or Ed.
Task 3.
Design & Implement penn-vector
1. Penn Vector
The Penn Vector is a dynamically resizable array that can store pointers to any type of data. While higher-level languages often provide built-in data structures for dynamic arrays, C requires us to implement our own. Thankfully, in C, every pointer is the same size! Thus, the Penn Vector, called Vec
in the code, holds an array of void *
elements, making it flexible enough for pointers to different data types. It also supports an optional element destructor function to clean up memory for each pointer if needed.
typedef void* ptr_t;
typedef void(*ptr_dtor_fn)(ptr_t); //function pointer that takes a ptr_t as argument, and returns nothing
typedef struct vec_st {
ptr_t* data;
size_t length;
size_t capacity;
ptr_dtor_fn ele_dtor_fn;
} Vec;
The Vec
struct holds the following information:
- data
- Serves as the generic double pointer that points to heap-allocated memory marking the start of the vector’s elements or ‘null’ if there are none.
- Reminder: it is a double pointer because this is a vector of pointers.
- length
- Contains the current number of elements in the vector.
- capacity
- Contains the current capacity of the vector in terms of the number of elements the vector can contain, not the number of bytes.
1.1 Data Storage
The vector allocates space for pointers (i.e., void *
) on the heap. As its size grows beyond its current capacity, the vector re-allocates its internal array (more on this below) with more space and copies over existing elements.
Because every pointer in C is the same size, storing them in an array of void *
allows us to sidestep more complex generic programming approaches. For example, suppose we have a custom struct MyStruct
; we can create a Vec
of MyStruct *
without needing to rewrite the data structure for different pointer types.
1.2 Element Destructor
One distinguishing feature of this vector is the optional element destructor function pointer, ele_dtor_fn
. If provided, this function is called on each element (i.e., each pointer stored in the vector) before it is removed or when the vector is destroyed. This allows safe cleanup of dynamically allocated memory or any other special behavior that must happen when an element is removed. Again, not all vectors will need ele_dtor_fn
, as this implies there is something special about how these pointers should be discarded (e.g. they are heap-allocated themselves or if they point to a struct containing heap-allocated memory, etc.). As it implies, the destrictor funciton is something you must provide if necessary. Onto the methods.
1.3 Penn Vector Methods
A Vec
supports the following core operations:
-
Creation with an initial capacity (
vec_new
). -
Insertion, erasure, and resizing:
-
vec_push_back
adds an element to the end, resizing if needed. -
vec_pop_back
removes the last element (and calls the destructor if available). -
vec_insert
inserts an element at a specified position, shifting the subsequent elements up. -
vec_erase
removes an element from a specified position, shifting subsequent elements down. -
vec_resize
changes the capacity of the vector if the current capacity is not sufficient.
-
-
Access and modification:
-
vec_get
retrieves the element at a given index. -
vec_set
modifies the element at a given index. -
vec_capacity
retrieves the current capacity of the vector. -
vec_len
retrieves the current length of the vector. -
vec_is_empty
checks if the vector is empty.
-
-
Cleanup:
-
vec_clear
removes all elements but leaves capacity as-is. -
vec_destroy
deallocates all internal storage and calls element destructors if applicable.
-
1.3 Overall
Your implementation of Vec
and its methods will do the following:
- Allocate memory for the array of pointers that serves as the ‘vector’.
- Maintain the inner state of the
Vec
, updating its fields accordingly. - Be resizable and mutable.
- Destroy itself once done, with no memory leakage.
An example use case of the Vec
data structure is shown below:
Vec v = vec_new(10, NULL); //no dstr function needed.
int fst = 10;
int snd = 11;
int trd = 12;
vec_push_back(&v, &fst);
vec_push_back(&v, &snd);
vec_push_back(&v, &trd);
int *ptr_int = vec_get(&v, 0); // returns a pointer to fst
vec_pop_back(&v);
vec_pop_back(&v);
vec_pop_back(&v); // empty vec with capacity for 10 pointers.
size_t len = vec_len(&v);
printf("Length is %d.\n", len); //should be 0
vec_destroy(&v); // data storage deallocated
There are multiple ways to implement the methods for this data structure, and the details are entirely up to you.
It is crucial that your vector works 100% correctly, as you will be updating and using this vector in future assignments.
2. Implementation
The main work occurs in the .c
file where the vector’s methods will be defined. We provide you a template for one of the methods, leaving you to do the rest. Below is an overview of each function and its role in maintaining the Vec
’s internal state. Refer to the .h
file for detailed information, particularly when a function is being used incorrectly.
2.1 Creating a Vector (vec_new
)
Vec vec_new(size_t initial_capacity, ptr_dtor_fn ele_dtor_fn);
- Allocates a block of memory (
malloc
) to storeinitial_capacity
number of pointers (void*
). - Initializes the
length
to 0 and setscapacity
to the requested capacity. - Capacity must be non-negative.
- Stores an optional destructor function,
ele_dtor_fn
, if provided.
2.1.1 Utility Macros for Vec
Something unique about these “functions” is that they shouldn’t be implemented as actual C functions. Instead, they should be created using preprocessor directives. Keep in mind that in C, #define
effectively replaces every occurrence of the defined token with the text on the right-hand side. For instance, take a look at the following macro:
#define true 1
Anywhere true
appears in the source file, the compiler will replace it with the value 1
.
We can also have function-like macros. For example we can do the following to define a macro “function” to add two things together.
#define ADD(x, y) ((x) + (y))
int main() {
int z = ADD(3, 5);
...
}
The compiler will see this code as the following, which will properly add the two “inputs” to the “function”.
int main() {
int z = ((3) + (5));
...
}
With that in mind, let’s explore the function-like macros needed for our Penn Vector.
#define vec_capacity(vec) ...
#define vec_len(vec) ....
#define vec_is_empty(vec) ...
-
vec_capacity
: Retrieves the current capacity of the vector. -
vec_len
: Retrieves the current length of the vector. -
vec_is_empty
: Checks if the vector contains no elements.
2.2 Getting and Setting Elements
ptr_t vec_get(Vec* self, size_t index);
void vec_set(Vec* self, size_t index, ptr_t new_ele);
-
vec_get
checks if the requestedindex
is within bounds and returns the pointer at that location. -
vec_set
also checks the bounds and overwrites the element at the given index withnew_ele
.
2.3 Adding and Removing Elements
void vec_push_back(Vec* self, ptr_t new_ele);
bool vec_pop_back(Vec* self);
-
vec_push_back
:- Checks if the vector is at capacity.
- If so, resizes the vector to expand it’s capacity by doubling it. If the capacity is 0, make it 1.
- Appends
new_ele
to the end if the vector.
-
vec_pop_back
:- Checks if the vector is empty.
- If not, pops the last element and returns true if successful. False otherwise.
2.4 Inserting and Erasing at Arbitrary Positions
void vec_insert(Vec* self, size_t index, ptr_t new_ele);
void vec_erase(Vec* self, size_t index);
-
vec_insert
:- Ensures
index
is within[0, length]
. - Resizes if at capacity, again it does this by doubling or by setting it to 1 if it was 0.
- Shifts elements as necessary.
- Inserts the new element at
index
.
- Ensures
-
vec_erase
:- Ensures
index
is in range. - Calls the destructor on the element at
index
, if needed. - Shifts elements after
index
down by one. - Decreases the length by one.
- Ensures
2.5 Resizing the Vector
void vec_resize(Vec* self, size_t new_capacity);
- Allocates a new array of pointers with size
new_capacity
. - Copies the existing elements into the new array.
- Updates self accordingly.
2.6 Clearing and Destroying the Vector
void vec_clear(Vec* self);
void vec_destroy(Vec* self);
-
vec_clear
:- Clears all of the elements from the vector.
-
vec_destroy
:- Clears all of the elements from the vector.
- Deallocate memory as neccessary.
- Sets
data
to NULL andcapacity
to 0.
2.7 Incorrect Usage of Methods
We provide two other files, panic.c
and it’s corresponding header file. However, you don’t have to modify these files at all. The functionality for our panic
function is defined here and allows you to throw an error message when necessary. See the Vec.h
file for more detailed instructions for when certain methods should panic and when they should not. Its’ usage can be seen below.
void vec_method(Vec* self){
if(/*used incorrectly*/){
panic("Panic message here.\n");
}
}
3. Testing Your Penn Vector
3.1 Creating Test Cases
We strongly encourage you to write extra test cases that exercise each vector function. The tests we provide are not all encompassing. For example, test what happens when you:
- Insert at the front, middle, and end of the vector.
- Pop from an empty vector or a vector with 1000 elements.
- Ensure state changes correctly within the struct.
- Resize to a smaller capacity (no-op) and to a bigger capacity.
- Call
vec_clear
after adding several elements, ensuring all destructors are properly called.
3.2 Verifying Memory Safety
It is critical to run tools like Valgrind to check for memory leaks. The vec_destroy
function must free all allocated memory, and the destructor function must clean up each element’s own allocated data if present. See the section below for more!
3.3 Integration with Future Assignments
Remember, you will be using this Vec
in future assignments that require storing pointers to various dynamically allocated structs. Ensuring your vector is robust and leak-free now will save you considerable time and effort in the future.
3.4 Testing your vector with test-suite
Testing will be done through the C++ Catch2 test suite. The provided test file checks for the basic functionality of the vector. Once you are done implementing the vector, you can run the following commands in the docker terminal.
make
./test_suite
This will output the test results to your terminal. We encourage you to write your own test cases to check for more edge cases as the gradescope submission will test for more cases than the test suite file we have provided.
Tip: Make sure you study the test cases we provide you. They are important indicators of your vector’s functionality
3.5 clang-tidy and clang-format
The makefile we provided with this assignment is configured to help make sure your code is properly formatted and follows good C coding conventions.
To do this, we make use of two tools: clang-format
and clang-tidy
To make sure your code identation is nice, we have clang format
. All you need to do to use this utility, is to run make format
, which will run the tool and indent your code propely.
Code that is turned in is expected to follow the specified style, code that is ill-formated must be fixed and re-submitted.
clang-tidy
is a more complicated tool. Part of it is checking for style and readability issues but that is not all it does. Examples of readability issues include:
Not using curly braces around if statements and loops:
if (condition) // clang-tidy will complain about missing curly braces
printf("hello!\n");
Declaring variables or parmaters with names that are too short:
void foo(char c) { // clang-tidy will complain about the name `c`
// does something
}
Having functions that are too complex and long. The tool calculates “cognitive complexity” of your code and will complain about anything that is too complex.
This means you should think about how to break your code into helpers, because if you don’t, clang-tidy
will complain and this will result in point deductions.
More on this specific error can be found here: Cognitive Complexity
One common thing that adds to cognitive complexity is to do the following:
bool foo(char* param) {
if ( !error1 ) {
if ( !error2 ) {
if ( !error3 ){
// do some computation
return true;
}
}
}
return false;
}
If you have a lot of nested code, then it contributes more to cognitive complexity. Rewriting the code as shown below will reduce the cognitive complexity.
bool foo(char* param) {
if ( error1 ) {
return false;
}
if ( error2 ) {
return false;
}
if ( error3 ) {
return false;
}
// do some computation
return true;
}
clang-tidy
is also useful for noticing some memory errors and pointing out bad practices when writing C code.
Because of all this, we are enforcing that your code does not produce any clang-tidy
errors. You can run clang-tidy on your code by running: make tidy-check
.
Whenever you compile your code using make
then it should also re-run clang-tidy
to check your code for errors.
Note that you will have to fix any compiler errors before clang-tidy will run (and be useful).
Code that has clang-format
, clang-tidy
or any compiler errors will face a deduction.
If you have any questions about understanding an error, please ask on Ed discussion and we will happily help you.
4. Miscellaneous
4.1 Code Organization
Clean code organization is critical for all software. You must try to modularize your code as much as you can, separating repeated parts of the code into functions, etc.
Also make sure that your source code (.c and .h files) is in the top-level directory and can be built by the autograder’s Makefile.
Your code should adhere to DRY (Don’t Repeat Yourself). If you are writing code that is used in more than one place, you should most likely write a function or a macro.
Note that this is actually somewhat enforced by the autograder, as it will check to make sure you have no functions that are too long. You must modularize your code well or face a deduction.
4.2 Memory Errors
You are required to check your code for memory errors. This is a nontrivial task, but an extremely important
one. Code with memory leaks and memory violations will incur deductions. Arguably, code that has memory access
errors should be considered incorrect and not receive points for any test cases that face those errors.
Fortunately, there is a very nice tool valgrind that is available to help you. You must find and fix any bugs
that valgrind locates, but there is no guarantee it will find all memory errors in your code.
Below is a sample run of the test_suite
in valgrind:
root@57fb0c278533:~/cis5480/course_projects_5480/p0_vector# valgrind --leak-check=full --show-leak-kinds=all ./test_suite
==493== Memcheck, a memory error detector
==493== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==493== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==493== Command: ./test_suite
==493==
Randomness seeded to: 1363030919
===============================================================================
All tests passed (187 assertions in 7 test cases)
==493==
==493== HEAP SUMMARY:
==493== in use at exit: 0 bytes in 0 blocks
==493== total heap usage: 3,991 allocs, 3,991 frees, 566,298 bytes allocated
==493==
==493== All heap blocks were freed -- no leaks are possible
==493==
==493== For lists of detected and suppressed errors, rerun with: -s
==493== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
root@57fb0c278533:~/cis5480/course_projects_5480/p0_vector#
If you do not see the text ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
, you almost certainly have an issue that needs to be fixed. In the case you do have errors, valgrind should output some error
messages when you ran it. Read these to try and diagnose the issue.
4.3 Acceptable System Calls and Library Functions
In this assignment you may use only the following library functions. The man pages should describe how a function indicates error.
- malloc (
*
) - calloc (
*
) - realloc (
*
) - free
Developing Your Code
In general, there are two environments for you to develop projects in this course: (1) Docker Container or (2) SpecLab servers for remote developing. We recommend the first environment.
For the first choice, we provide a Docker Image for your convenience to create a course-standardized container. You can find the set-up instructions on Ed. We highly recommend you to use it as your developing environment because all grading will be done on it. Please do not develop on macOS directly as there are significant differences between macOS and Unix variants such as Ubuntu Linux. If you decide to develop on a different environment anyway, you must compile and test your code on the course-standardized environment to ensure your code runs as you expect during grading.
For the second choice, SpecLab will be available to you as students, however, we discourage its usage unless your personal machine has super, super weak performance specifications such as a single core machine with 2GB of RAM or less. SpecLab is a collection of older desktops that run the same Linux variant as eniac, and most importantly, you can crash them as much as you want without causing too much chaos. You access SpecLab remotely over ssh. There are roughly 10 SpecLab machines up at any time. When you are ready to develop, choose a random number between 01 and 50, or so, and then issue this command:
ssh specDD.seas.upenn.edu
where DD is replaced with the number you chose. If that machine is not responding, then add one to that number and try again. Not all SpecLab machines are currently up, but most are. You must be on SEASnet to directly ssh into SpecLab. The RESnet (your dorms) is not on SEASnet, but the machines in the Moore computer lab are. If you are not on SEASnet, you may still remotely access SpecLab by first ssh-ing into eniac and then ssh-ing into SpecLab.
Do not develop your code on eniac. students are notorious for crashing eniac in creative ways, normally via a “fork-bomb.” This always seems to happen about two hours before a homework deadline, and the result is chaos, not just for our class, but for everyone else using eniac. Students who are caught running any assignment directly on eniac will be penalized.
Tips and Suggested Order
For this course we mostly want to let you come up with the implementation from scratch. This will apply more in the next assignments, but since this is a C refresher, we will provide more guidance. The principles guiding this advice is to take it one step at a time, think about your design beforehand, and modularize your code into functions. We recommend keeping these principles in mind for the future assignments.
Our first tip is to consider what functions you will need for the assignment. To get inspiration, you should take a look at the allowed functions
Another tip is to test gradually and gradually add functionality.
Start with trying to implement the vec_new
and vec_destroy
methods, and then making sure your code has no clang-tidy or clang-format errors and passes the test section:
./test_suite "Construction and Destruction"
Progressively add more functionality while still checking for clang-tidy errors. The next test case is to make sure you handle push methods correctly:
./test_suite "Push Operations"
And you can continually go through every test in test_vector.cpp
:)
Finally, keep going through the remaining test sections in your test_vector.cpp
(or equivalent test file) until all of your methods have been verified. Throughout this process, remember to:
- Check for memory leaks or other issues using tools like Valgrind (if available).
- Run
clang-tidy
andclang-format
to keep your code clean and consistent. - Make sure you thoroughly understand each step before moving on.
By taking this incremental approach, you will find that debugging is easier and your final implementation will be more robust.
Submission
First, organize your homework directory so that you have all the code and Makefile in a subdirectory called /penn-vector. Once you are done, your directory should look something like this
─25sp-cis5480-student1
└─penn-vector
├─Vec.c
├─Vec.h
├─test_vector.cpp
├─test_suite.cpp
├─catch.cpp
├─catch.hpp
├─main.c
├─vector.h
├─test_macro.cpp
└─Makefile
After you are done, commit and push your code to github.
Finally, you will be submitting your github repo to gradescope. On gradescope, when prompted to submit your assignment, select the “Github” option from Submission Method. The auto-grader will run once submitted, and you may move on if you get a 100% on the auto-grader. You have unlimited submission attempts before the deadline.
Grading Guidelines
- 70% Autograded Functionality
- 15% Valgrind
- 10% Style
- 5% No Compiler Warnings
Please note that general deductions may occur for a variety of programming errors including memory violations, lack of error checking, poor code organization, etc
You may use any comment style you wish, but we highly suggest using DOxygen style comments as they promote fully documenting function inputs, outputs, and error cases. A useful guide to this comment style can be found here: https://www.doxygen.nl/manual/docblocks.html
Additionally, the staff have come up with a guideline for style for this assignmnet and beyond. If ever in doubt about a stylistic change, feel free to ask on Ed or durring Office Hours! Check out the document here: Style Guideline
Your programs will be graded on the course-standardized environments and must execute as specified there. Although you may develop and test on your local machine, you should always test that your program functions properly there.
5.0 Extra Credit: Implementing a Macro-Based Generic Vector
For this extra credit assignment, you will create a generic, type-safe vector using preprocessor directives (i.e. Macros). The goal is to enable the creation of vectors for any type, such as vector(int)
or vector(char*)
, using your implementation of the Penn Vector as inspiration. The implementation relies on macros to create flexible and efficient code that works with any data type. If you choose to do so, you will implement your generic vector in a header file called vector.h
.
Summary
Your task is to design a vector library using macros to replace the type-specific implementations. Unlike the original implementation in Vec.h
, which uses a void*
to store generic pointers, this version will create type-safe vectors at compile time. This approach avoids runtime type-checking while still enabling flexible and reusable code. See the Vector.h file for more information!
Notes
- Use your implementation of Penn Vector as a guide for structuring your macros.
- Pay close attention to memory management, particularly when resizing or freeing vectors.
- Your macros should generate compile-time errors for invalid types or arguments, ensuring type safety.
- Structuring macros to be syntactically valid in any context where a C expression is allowed can be challenging. Ensure your macros maintain proper syntactic compatibility.
- Aim for simplicity and clarity, but ensure robustness in edge cases (e.g., empty vectors or resizing failures).
In addition to the basic macros used in Vector.h, you will have to make use of statement expressions and typeof
.
If you want to test your code, you will need to build the tests with the command make test_macro
and that will build a new executable called test_macro
that you can run. Note that at time of publishing, the test_macro
is incomplete. More tests will be added to it later and will be announced in lecture when it is updated.
This extra credit will challenge your understanding of both preprocessor macros and the intricacies of memory management in C. A big part of this extra credit is figuring things out for yourself. Minimal help will be given on the extra credit on Ed or in Office Hours.
Attribution
This can be a complex assignment, using APIs that are not always the most intuitive to use. You should be able to complete this with what was covered in lecture and in past courses (namely CIS 2400). You can use C refernces such as the man pages, cppreference.com or cplusplus.com (yes they are C++ website, but they should have documentation on C functions as well). That said, we do expect you to struggle some, which is how you will learn about systems programming, by doing things yourself. Please feel free to refer to work you have done in previous Penn courses (such as CIS 2400); it can be a helpful reminder for things like Makefiles and general C syntax.
The primary rule to keep you safe from plagiarism/cheating in this project is to attribute in your doc-
umentation any outside sources you use. This includes both resources used to help you understand the
concepts, and resources containing sample code you incorporated in your assignment. The course text and APUE
need only be cited in the latter case. You should also use external code sparingly. Using most of an example
from the pipe(2)
man page is ok; using the ParseInputAndCreateJobAndHandleSignals()
function from Foo’s Handy Write Your Own Shell Tutorial is not (both verbatim, and as a closely followed
template on structuring your code).