Easy to use, header only, macro generated, generic and type-safe Data Structures in C.
- Project Structure
- Available Collections
- Features
- Overall To-Do
- Design Decisions
- What to use
- How to use
- benchmarks - Where all benchmarks are hosted
- docs - A folder hosting the generated documentation by mdBook
- documentation - The markdowns used by mdBook to generate the website
- examples - Examples using the C Macro Collections Library
- src - All headers part of the C Macro Collections Library
- cmc - The main C Macro Collections Library
- cor - Core includes in the C Macro Collections Library
- dev - The main C Macro Collections Library for development (containing logging)
- sac - Statically Allocated Collections
- utl - Utility like ForEach macros, logging, etc
- macro_collections.h - Master header containing all collections and utilities
- tests - Where all tests are hosted
- Linear Collections
- List, LinkedList, Deque, Stack, Queue, SortedList
- Sets
- HashSet, TreeSet, HashMultiSet
- Maps
- HashMap, TreeMap, HashMultiMap
- Bidirectional Maps
- HashBidiMap
- Heaps
- Heap, IntervalHeap
- WIP
- BitSet, Matrix, TreeBidiMap, TreeMultiMap, TreeMultiSet
The following table is an overview of all the currently available or upcoming data structures:
All collections come with a two-way iterator. You can go back and forwards in constant time and access elements in constant time.
All collections have a cmc_alloc_node
which provides pointers to the four dynamic memory allocation functions in C: malloc
, calloc
, realloc
and free
. These pointers can be customized for each individual collection created or a default can be used, as specified in cmc_alloc_node_default
.
Every function that operates on a collection can be separated in 5 different types. Create, Read, Update, Delete and (an extra one besides CRUD) Resize. You can define one callback function for each operation. Check out the documentation to see when each callback function is called.
Functions table is a struct
of function pointers containing 'methods' for a custom data type. Some methods are optional and others are needed in order to a collection to operate. They are:
A comparator function is used in sorted collections or when an equality is being checked like when trying to find a certain element in a list. It is responsible for taking two arguments of the same data type and comparing them. The return value is an int
with the following definitions:
- Return
1
if the first argument is greater than the second; - Return
0
if the first argument equals the second; - Return
-1
if the first argument is less than the second.
A copy function is used when a collection is being copied. It can be used to make a deep copy of of your custom data type. It must take a single parameter and return a new copy of that same data type. If this function is absent (NULL
) the data type will be copied by assignment (for pointers this is a shallow copy).
A string function is responsible for taking a FILE
pointer and a custom data type and outputting the string representation of that data returning a bool
indication success or failure. It is useful for debugging.
The free function is called when a collection is cleared (all elements removed) or freed (all elements removed and freed from memory) and it is responsible for completely freeing all resources that are usually acquired by your data type.
This function receives a custom data type as parameter and returns a size_t
hash of that data. Used in hashtables.
A priority function works much like the comparator function except that it compares the priority between two elements. It is used in collections whose structure is based on the priority of elements and not in their general comparison.
- Return
1
if the first argument has a greater priority than the second; - Return
0
if the first argument has the same priority as second; - Return
-1
if the first argument has a lower priority than the second.
The following table shows which functions are required, optional or never used for each Collection:
Collection | CMP | CPY | STR | FREE | HASH | PRI |
---|---|---|---|---|---|---|
Deque | ||||||
HashMap | ||||||
HashBidiMap | ||||||
HashMultiMap | ||||||
HashMultiSet | ||||||
HashSet | ||||||
Heap | ||||||
IntervalHeap | ||||||
List | ||||||
LinkedList | ||||||
Queue | ||||||
SortedList | ||||||
Stack | ||||||
TreeMap | ||||||
TreeSet |
Color | Label |
---|---|
Required for basic functionality. | |
Required for specific functions. | |
Required for non-core specific functions. | |
Optional. | |
Not Used. |
In the long term, these are the steps left for the completion of this library:
- Complete the implementation of all the functions in the scope of the TODO file for the main collections;
- Reorganize and complete all tests for the
cmc
collections; - Make an exact copy of all collections to
dev
with many logging utility, for them to be used under development; - Port all of these collections to be statically allocated and be part of the
sac
library; - Complete all tests for
sac
.
Currently all collections need to be allocated on the heap. Iterators have both options but it is encouraged to allocate them on the stack since they don't require dynamic memory.
Yes, you can use a Deque as a Queue or a List as a Stack without any major cost, but the idea is to have the least amount of code to fulfill the needs of a collection.
Take for example the Stack. It is simple, small and doesn't have many functions. If you generate a List to be used (only) as a Stack (which is one of the bulkiest collections) you'll end up with a lot of code generated and compiled for nothing.
The Deque versus Queue situation is a little less problematic, but again, you need to be careful when generating a lot of code as compilation times might go up to 15 seconds even with modern ultra-fast compilers.
Another example is using a HashMap/TreeMap as a HashSet/TreeSet (with a dummy value that is never used), but I just think that this is a bad thing to do and you would be wasting some memory. Also, the sets generate a lot of code related to set theory, whereas maps don't.
You can use them as Stacks, Queues and Deques, but with modern memory hierarchy models, array-based data structures have a significantly faster runtime due to caching, so I didn't bother to have specific implementations of those aforementioned collections.
Modifying a collection will possibly invalidate all iterators currently initialized by it. Currently, the only collection that allows this is the LinkedList (using the node-based functions, not the iterator).
The following table shows how each collection is implemented and how well they do when using as common abstract data types.
- Ideal - The collection implements correctly the abstract data type;
- Not Ideal - The implementation is fulfilled but some functionalities are either not part of the ADT or not present;
- Bad - It can be done, but its a bad idea.
To generate the collection, all you need to do is to include the necessary header files. You can include the containers you want to use individually or you can include the master header, macro_collections.h
, that comes with the entire C-Macro-Collections library.
Note here that
SNAME
represents the uppercase name of the collection.
Every collection is separated by two parts:
HEADER
- Contains all struct definitions and function definitions.SOURCE
- Contains all function implementations.
All collections have three main macros:
CMC_GENERATE_SNAME
- GeneratesCMC_GENERATE_SNAME_HEADER
andCMC_GENERATE_SNAME_SOURCE
.
Or you can generate each part individually:
CMC_GENERATE_SNAME_HEADER
- Generates all struct definitions and function definitions.CMC_GENERATE_SNAME_SOURCE
- Generates all function implementations.
When including macro_collections.h
in your source code you gain access to a macro called CMC_COLLECTION_GENERATE
with the following parameters:
- C - Container name in uppercase (BIDIMAP, DEQUE, HASHMAP, HASHSET, HEAP, INTERVALHEAP, LINKEDLIST, LIST, MULTIMAP, MULTISET, QUEUE, SORTEDLIST, STACK, TREEMAP, TREESET).
- PFX - Functions prefix or namespace.
- SNAME - Structure name (
struct SNAME
). - K - Key type. Only used in HASHMAP, TREEMAP, MULTIMAP and BIDIMAP; ignored by others.
- V - Value type. Primary type for most collections, or value to be mapped by HASHMAP, TREEMAP, MULTIMAP and BIDIMAP.
In fact, all macros follow this pattern. So whenever you see a macro with a bunch of parameters and you don't know what they are, you can check out the above list.
There are 2 for-each macros:
-
CMC_FOREACH
- Starts at the start of the collection towards the end. -
CMC_FOREACH_REV
- Starts at the end of the collection towards the start. -
PFX - Functions prefix or namespace.
-
SNAME - Structure name.
-
TARGET - The variable name of the collection you wish to iterate over.
-
ITERNAME - Iterator variable name.
For CMC_FOREACH
and CMC_FOREACH_REV
you will be able to name the iterator variable through the ITERNAME parameter.
Check out some code reviews that covers some parts the project:
About | Link |
---|---|
Unit Test ./utl/test.h | |
Interval Heap ./cmc/intervalheap.h | |
Hash Set ./cmc/hashset.h | |
Linked List ./cmc/linkedlist.h | |
Others |