Encapsulation in C
Introduction
Encapsulation is one of the most important concepts in software design. It is an important tool in being able to modularize your code, and protect against changes in dependencies, and to give yourself the flexibility to choose and experiment with different implementations.
Encapsulation is the notion that data is hidden from consumers. In this case, consumers are the other pieces of software using the piece you are using. Encapsulation requires a separation of code into two part: the interface and the implementation.
The interface is the contract to the outside, and it is the only thing that consumers interact with. The implementation is what is under the covers. In object orientated languages, this interface is very explicit, typically as public methods of a class, or to even further encapsulate, as an abstract base class returned through some kind of generic construction pattern, such as a factory.
C is able to emulate basic encapsulation pretty easily, but it is in a very different way than other languages. There are three facets to handling an encapsulated interface in C: Opaque Pointers, Static Functions, and the use of multiple Compilation Units.
That is what I’ll go over here. In the future, we’ll continue this and talk about other things we can do. Our driving example will be to implement a basic stack. The code is in the stack sample here: https://bitbucket.org/wgunther/learn-c
The Interface
We will maintain two compilation units. The first will be for the stack implementation, and the second will be for the main function, which will just run our unit tests.
Our first step is to write the interface. This will be in stack.h
, and it
represents both the functionality we have to implement, and the contract with
the consuming code. This will be the only way the consuming code will interact
with stacks.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#ifndef STACK_H
#define STACK_H
#include <stdbool.h>
// Opaque Stack object.
typedef struct Stack Stack;
// Value type for the stack.
typedef int StackValue;
// Makes a new stack. Returns NULL if not able to create.
Stack* Stack_New();
// Relinquishes a stack, and fixes dangling pointer.
void Stack_Delete(Stack** stack);
// Pushes value on the stack. It will be the next value returned in pop (unless
// another value is pushed on of course).
void Stack_Push(Stack* stack, StackValue value);
// Whether stack is empty.
bool Stack_Empty(Stack* stack);
// Pops the last value off the stack, and returns it. Popping an empty stack
// yields undefined behavior.
StackValue Stack_Pop(Stack* stack);
#endif
Writing Towards a Test
Now we can write some tests. Here, I’m just using some basic macros that check things and can print the result,including the stringified line, and will record the result in globals. For the sake of brevity, I’m omitting the unit test compilation unit.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include "stack.h"
#include "../unit-test.h"
int main() {
// perform test multiple time to guard against flakiness (if undefined
// behavior, for example).
for (int test_num = 0; test_num < 10; ++test_num) {
Stack* stack = Stack_New();
AssertTrue(Stack_Empty(stack));
Stack_Push(stack, 10);
AssertTrue(!Stack_Empty(stack));
Stack_Push(stack, 20);
AssertTrue(!Stack_Empty(stack));
AssertTrue(Stack_Pop(stack) == 20);
AssertTrue(Stack_Pop(stack) == 10);
AssertTrue(Stack_Empty(stack));
// stress test
for (int i = 0; i < 1000; ++i) {
Stack_Push(stack, i);
}
for (int i = 999; i >=0; --i) {
AssertTrue(Stack_Pop(stack) == i);
}
AssertTrue(Stack_Empty(stack));
Stack_Delete(&stack);
}
PrintTestSummary();
}
I’m going to guess our coverage is generally pretty good here. We’ll missing some boundary cases, but this is good enough to get started and write towards passing a test.
If you’re worried about the boundary case where we pop and empty stack, you
shouldn’t be. In our contract, we passed the buck of making sure that doesn’t
happen to the consumer, and documented it would result in undefined behavior.
Since C lacks the ability to handle exceptions very gracefully without building
a lot of infrastructure, it can be best when writing library code have this
kind of undefined behavior. Even in C++, which does have exceptions, great
lengths is taken to avoid throwing them in the standard library; calling
std::vector::pop_back
on an empty vector, for instance, is undefined
behavior.
Some Takeways
In our consuming code, an instance of Stack
cannot exist. Stack
can only
exist as a pointer (apart from a sufficiently crafty person making a type cast,
anyway). This has to be this way based on where the definition of struct
Stack
is. If you think about it from the perspective of the compiler, in
compilation units not containing the stack implementation, it cannot know
sizeof(Stack)
and therefore can never know how much memory to allocate for
Stack
on the stack. Like you, the compiler cannot know if we’ve chosen to
implement Stack
using a linked list, an array, or something else.
This organization as pros and cons. The primary pro, and the point of this
whole lesson, is that Stack
is encapsulated, and consumers cannot muck with
the implementation. So they cannot use private data and break our ability to
change the code in the future, or, even worse, if the consumer could modify our
private data they could break our invariants and assumptions.
Another pro is with regards to compilation. The Stack
library will be
compiled separately from the stack tests, and any other consumers. This means
that if the owners of the stack library makes non-interface changing changes,
it will not require any re-compiling of the consuming code, only linking. In
big projects, this can be a really huge speed up.
There are cons for this, of course. The biggest con is that consumers cannot possibly create their own stacks, and all stacks will be heap allocated. Another con is that all functions must be externally defined and linked, which means that the compiler avoids the ability to make any inlining optimizations.
Implementation
Now we’ll implement the stack behavior.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#include <stdlib.h>
#include "stack.h"
// StackNode things: this is an internal helper class.
typedef struct StackNode StackNode; // forward declaration.
struct StackNode {
StackValue data;
StackNode* next;
};
static StackNode* StackNode_New(StackValue data) {
StackNode* node = malloc(sizeof(StackNode));
if (node) {
node->data = data;
node->next = NULL;
}
return node;
}
static void StackNode_Delete(StackNode** node) {
free(*node);
*node = NULL;
}
// Stack things.
struct Stack {
StackNode* head;
};
Stack* Stack_New() {
Stack* stack = malloc(sizeof(Stack));
if (stack) {
stack->head = NULL;
}
return stack;
};
void Stack_Delete(Stack** stack) {
StackNode* head = (*stack)->head;
while (head != NULL) {
StackNode* next = head->next;
StackNode_Delete(&head);
head = next;
}
free(*stack);
}
void Stack_Push(Stack* stack, StackValue value) {
StackNode* new_node = StackNode_New(value);
new_node->next = stack->head;
stack->head = new_node;
}
StackValue Stack_Pop(Stack* stack) {
StackNode* old_node = stack->head;
StackValue value = old_node->data;
stack->head = old_node->next;
StackNode_Delete(&old_node);
return value;
}
bool Stack_Empty(Stack* stack) {
return stack->head == NULL;
}
Some takeaways
I chose to implement this using a linked list. You could easily implement it as a dynamic array as well, which resizes when it grows too large. The consumer won’t care.
You’ll notice functions involving StackNode
s are marked static
. This means
that they are ‘local’ to this compilation unit; they do not have external
linkage, so they cannot be reference in other compilation units, and they can’t
stomp on names that other people choose in other compilation units. If you are
familiar with C++, this is usually accomplished there by putting functions in
unnamed namespaces. Generally, there should be 1 header file per
compilation unit, and all functions and variables local to that unit should be
static. This avoids obscure linking errors and dependencies that
can exist in code that should genuinely be local.
Another design decision here that you saw from the interface is to enforce
there aren’t dangling pointers. Generally, this is a good technique for
avoiding flakiness in behavior: set pointers pointing to bad memory to NULL
and check for NULL
to determine whether an invariant is broken and to fail
gracefully. I didn’t really need to do it for the local StackNode
helper
class, but it doesn’t hurt so much. I also omitted a NULL
check in
Stack_Push
just for simplicity, but it would be better to return a bool
to
determine the success of the operation, rather than breaking completely :-)
.
Comments