Introduction

Encapsulation is one of the most important concepts in software design. It is an important tool in being able to modularize your code, and protect against changes in dependencies, and to give yourself the flexibility to choose and experiment with different implementations.

Encapsulation is the notion that data is hidden from consumers. In this case, consumers are the other pieces of software using the piece you are using. Encapsulation requires a separation of code into two part: the interface and the implementation.

The interface is the contract to the outside, and it is the only thing that consumers interact with. The implementation is what is under the covers. In object orientated languages, this interface is very explicit, typically as public methods of a class, or to even further encapsulate, as an abstract base class returned through some kind of generic construction pattern, such as a factory.

C is able to emulate basic encapsulation pretty easily, but it is in a very different way than other languages. There are three facets to handling an encapsulated interface in C: Opaque Pointers, Static Functions, and the use of multiple Compilation Units.

That is what I’ll go over here. In the future, we’ll continue this and talk about other things we can do. Our driving example will be to implement a basic stack. The code is in the stack sample here: https://bitbucket.org/wgunther/learn-c

The Interface

We will maintain two compilation units. The first will be for the stack implementation, and the second will be for the main function, which will just run our unit tests.

Our first step is to write the interface. This will be in stack.h, and it represents both the functionality we have to implement, and the contract with the consuming code. This will be the only way the consuming code will interact with stacks.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#ifndef STACK_H
#define STACK_H

#include <stdbool.h>

// Opaque Stack object.
typedef struct Stack Stack;

// Value type for the stack.
typedef int StackValue;

// Makes a new stack. Returns NULL if not able to create.
Stack* Stack_New();

// Relinquishes a stack, and fixes dangling pointer.
void Stack_Delete(Stack** stack);

// Pushes value on the stack. It will be the next value returned in pop (unless
// another value is pushed on of course).
void Stack_Push(Stack* stack, StackValue value);

// Whether stack is empty.
bool Stack_Empty(Stack* stack);

// Pops the last value off the stack, and returns it. Popping an empty stack
// yields undefined behavior.
StackValue Stack_Pop(Stack* stack);

#endif

Writing Towards a Test

Now we can write some tests. Here, I’m just using some basic macros that check things and can print the result,including the stringified line, and will record the result in globals. For the sake of brevity, I’m omitting the unit test compilation unit.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include "stack.h"
#include "../unit-test.h"

int main() {
  // perform test multiple time to guard against flakiness (if undefined
  // behavior, for example).
  for (int test_num = 0; test_num < 10; ++test_num) {
    Stack* stack = Stack_New();
    AssertTrue(Stack_Empty(stack));

    Stack_Push(stack, 10);
    AssertTrue(!Stack_Empty(stack));
    Stack_Push(stack, 20);
    AssertTrue(!Stack_Empty(stack));
    AssertTrue(Stack_Pop(stack) == 20);
    AssertTrue(Stack_Pop(stack) == 10);
    AssertTrue(Stack_Empty(stack));

    // stress test
    for (int i = 0; i < 1000; ++i) {
      Stack_Push(stack, i);
    }
    for (int i = 999; i >=0; --i) {
      AssertTrue(Stack_Pop(stack) == i);
    }
    AssertTrue(Stack_Empty(stack));

    Stack_Delete(&stack);
  }
  PrintTestSummary();
}

I’m going to guess our coverage is generally pretty good here. We’ll missing some boundary cases, but this is good enough to get started and write towards passing a test.

If you’re worried about the boundary case where we pop and empty stack, you shouldn’t be. In our contract, we passed the buck of making sure that doesn’t happen to the consumer, and documented it would result in undefined behavior. Since C lacks the ability to handle exceptions very gracefully without building a lot of infrastructure, it can be best when writing library code have this kind of undefined behavior. Even in C++, which does have exceptions, great lengths is taken to avoid throwing them in the standard library; calling std::vector::pop_back on an empty vector, for instance, is undefined behavior.

Some Takeways

In our consuming code, an instance of Stack cannot exist. Stack can only exist as a pointer (apart from a sufficiently crafty person making a type cast, anyway). This has to be this way based on where the definition of struct Stack is. If you think about it from the perspective of the compiler, in compilation units not containing the stack implementation, it cannot know sizeof(Stack) and therefore can never know how much memory to allocate for Stack on the stack. Like you, the compiler cannot know if we’ve chosen to implement Stack using a linked list, an array, or something else.

This organization as pros and cons. The primary pro, and the point of this whole lesson, is that Stack is encapsulated, and consumers cannot muck with the implementation. So they cannot use private data and break our ability to change the code in the future, or, even worse, if the consumer could modify our private data they could break our invariants and assumptions.

Another pro is with regards to compilation. The Stack library will be compiled separately from the stack tests, and any other consumers. This means that if the owners of the stack library makes non-interface changing changes, it will not require any re-compiling of the consuming code, only linking. In big projects, this can be a really huge speed up.

There are cons for this, of course. The biggest con is that consumers cannot possibly create their own stacks, and all stacks will be heap allocated. Another con is that all functions must be externally defined and linked, which means that the compiler avoids the ability to make any inlining optimizations.

Implementation

Now we’ll implement the stack behavior.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#include <stdlib.h>

#include "stack.h"

// StackNode things: this is an internal helper class.

typedef struct StackNode StackNode; // forward declaration.

struct StackNode {
  StackValue data;
  StackNode* next;
};

static StackNode* StackNode_New(StackValue data) {
  StackNode* node = malloc(sizeof(StackNode));
  if (node) {
    node->data = data;
    node->next = NULL;
  }
  return node;
}

static void StackNode_Delete(StackNode** node) {
  free(*node);
  *node = NULL;
}

// Stack things.

struct Stack {
  StackNode* head;
};

Stack* Stack_New() {
  Stack* stack = malloc(sizeof(Stack));
  if (stack) {
    stack->head = NULL;
  }
  return stack;
};

void Stack_Delete(Stack** stack) {
  StackNode* head = (*stack)->head;
  while (head != NULL) {
    StackNode* next = head->next;
    StackNode_Delete(&head);
    head = next;
  }
  free(*stack);
}

void Stack_Push(Stack* stack, StackValue value) {
  StackNode* new_node = StackNode_New(value);
  new_node->next = stack->head;
  stack->head = new_node;
}

StackValue Stack_Pop(Stack* stack) {
  StackNode* old_node = stack->head;
  StackValue value = old_node->data;
  stack->head = old_node->next;
  StackNode_Delete(&old_node);
  return value;
}

bool Stack_Empty(Stack* stack) {
  return stack->head == NULL;
}

Some takeaways

I chose to implement this using a linked list. You could easily implement it as a dynamic array as well, which resizes when it grows too large. The consumer won’t care.

You’ll notice functions involving StackNodes are marked static. This means that they are ‘local’ to this compilation unit; they do not have external linkage, so they cannot be reference in other compilation units, and they can’t stomp on names that other people choose in other compilation units. If you are familiar with C++, this is usually accomplished there by putting functions in unnamed namespaces. Generally, there should be 1 header file per compilation unit, and all functions and variables local to that unit should be static. This avoids obscure linking errors and dependencies that can exist in code that should genuinely be local.

Another design decision here that you saw from the interface is to enforce there aren’t dangling pointers. Generally, this is a good technique for avoiding flakiness in behavior: set pointers pointing to bad memory to NULL and check for NULL to determine whether an invariant is broken and to fail gracefully. I didn’t really need to do it for the local StackNode helper class, but it doesn’t hurt so much. I also omitted a NULL check in Stack_Push just for simplicity, but it would be better to return a bool to determine the success of the operation, rather than breaking completely :-).