All too often I see programmers stumped trying to lay out the folder structure for their embedded C project.
My best advice is that folder structure is not the real problem. it is just one symptom of dependency problems. If we fix the underlying dependencies a pragmatic folder structure for your project will probably be obvious due to the design being sound.
On Modularity in general
Writing modular code is not nearly as easy as it sounds.
Trying it out for real we quickly discover that simply distributing bits of code across a number of files does not solve much of our problems. This is because modularity is about Architecture and Design and, as such, there is a lot more to it. To determine if we did a good job we need to first look at WHY. WHY exactly do we desire the code to be modular, or to be more specific - what exactly are we trying to achieve by making the code modular?
A lot can be said about modularity but to me, my goals are usually as follows:
- Reduce working set complexity through divide and conquer.
- Avoid duplication by re-using code in multiple projects (mobility).
- Adam Smith-like division of labor.
When code is broken down into team-sized modules we can construct and maintain it more efficiently. Teams can have areas of specialization and everyone does not have to understand the entire problem in order to contribute.
In engineering, functional decomposition is a process of breaking a complex system into a number of smaller subsystems with clear distinguishable functions (responsibilities). The purpose of doing this is to apply the divide and conquer strategy to a complex problem. This is also often called Separation of Concerns.
If we keep that in mind we can test for modularity during code review by using a couple of simple core concepts.
- Separation: Is the boundary of every module clearly distinguishable? This requires every module to be in a single file, or else - if it spans multiple files - a single folder which encapsulates the contents of the module into a single entity.
- Independent and Interchangeable: This implies that we can also use the module in another program with ease, something Robert C Martin calls Mobility. A good test is to imagine how you would manage the code using version control systems if the module you are evaluating had to reside in a different repository, have its own version number and its own independent documentation.
- Individually testable: If a module is truly independent it can be used by itself in a test program without bringing a string of other modules along. Testing of the module should follow the Open-Closed principle which means that we can create our tests without modifying the module itself in any way.
- Reduction in working set Complexity: If the division is not making the code easier to understand it is not effective. This means that modules should perform abstraction - hiding as much of the complexity inside the module and exposing a simplified interface one layer of abstraction above the module function.
Software Architecture is in the end all about Abstraction and Encapsulation, which means that making your code modular is all about Architecture.
By dividing your project into a number of smaller, more manageable problems, you can solve each of these individually. We should be able to give each of these to a different autonomous team that has its own release schedule, it's own code repository and it's own version number.
Exploring some program file structures
Now that we have established some ground rules for testing for modularity, let's look at some examples and see if we can figure out which ones are no good and which ones can work based on what we discussed above.
Example 1: The Monolith
I would hope that we can all agree that this fails the modularity test on all counts.
If you have a single file like this there really is only one way to re-use any of the code, and that is to copy and paste it into your other project. For a couple of lines this could still work, but normally we want to avoid duplicating code in multiple projects as this means we have to maintain it in multiple places and if a bug was found in one copy there would be no way to tell how many times the code has been copied and where else we would have to go fix things.
I think what contributes to the problem here is that little example projects or demo projects (think about that hello world application) often use this minimalistic structure in the interest of simplifying it down to the bare minimum. This makes sense if we want to really focus on a very specific concept as an example, but it sets a very poor example of how real projects should be structured.
Example 2: The includible main
In this project, main.c grew to the point where the decision was made to split it into multiple files, but the code was never redesigned, so the modules still have dependencies back to main. That is usually when we see questions like this on Stack Overflow.
Of course main.c cannot call into module.c without including module.h, and the module is really the only candidate for including main.h, which means that you have what we call a circular dependency. This mutual dependency indicates that we do not actually have 2 modules at all. Instead, we have one module which has been packaged into 2 different files.
Your program should depend on the modules it uses, it does not make sense for any of these modules to have a reverse dependency back to your program, and as such it does not make any sense to have something like main.h. Instead, just place anything you are tempted to place in main.h at the top of main.c instead!
If you do have definitions or types that you think can be used by more than one module then make this into a proper module, give it a proper name and let anything which uses this include this module as a proper dependency.
Always remember that header files are the public interfaces into your C translation unit. Any good Object Oriented programming book will advise you to make as little as possible public in your class. You should never expose the insides of your module publically if it does not form part of the public interface for the class. If your definitions, types or declarations are intended for internal use only they should not be in your public header file, placing them at the top of your C file most likely the best.
A good example is device configuration bits. I like to place my configuration bit definitions in a file by itself called device_config.h, which contains only configuration bit settings for my project. This module is only used by main, but it is not called main.h. Instead, it has a single responsibility which is easy to deduce from the name of the file. To keep it single responsibility I will never put other things like global defines or types in this file. It is only for setting up the processor config bits and if I do another project where the settings should be the same (e.g. the bootloader for the project) then it is easy for me to re-use this single file.
In a typical project, you will want to have an application that depends on a number of libraries, something like this. Importantly we can describe the program as an application that uses WiFi, TempSensors, and TLS. There should not be any direct dependencies between modules. Any dependencies between modules should be classified as configuration which is injected by the application, and the code that ties all of this together should be part of the application, not the modules. It is important that we adhere to the Open-Closed principle here. We cannot inject dependencies by modifying the code in the libraries/modules that we use, it has to be done by changing the application. The moment we change the libraries to do this we have changed the library in an application-specific way and we will pay the price for that when we try to re-use it.
It is always critical that the dependencies here run only in one direction, and that you can find all the code that makes up each module on your diagram in a single file or in a folder by itself to enable you to deal with the module as a whole.
Example 3: The Aggregate or generic header file
Projects often use an aggregate header file called something like "includes.h". This quickly leads to the pattern where every module depends on every other and is also known as Spaghetti Code. It becomes obvious if you look at the include graph or when you try and re-use a module in your project by itself for e.g. a test. When any header file is changed you have to re-test every module now.
This fails the test of having clearly distinguishable boundaries and clear and obvious dependencies between modules.
In MCC there is a good (or should I say bad?) example of such an aggregate header file called mcc.h. I created a minimal project using MCC for the PIC16F18877 and only added the Accel3 click to the project as a working example for this case.
The include graph generated using Doxygen looks as follows.
There is no indication from this graph that the Accelerometer is the one using the I2C driver, and although main never calls to I2C itself it does look like that dependency exists. The noble intention here is of course to define a single external interface for MCC generated code, but it ends up tying all of the MCC code together into a single monolithic thing. This means my application does not depend on the Accelerometer, it now depends instead on a single monolithic thing called "everything inside of MCC", and as MCC grows this will become more and more painful to manage.
If you remove the aggregate header then main no longer includes everything and the kitchen sink, and the include graph reduces to something much more useful as follows:
This works better because now the abstractions are being used to simplify things effectively, and the dependency of the sensor on I2C is hidden from the application level. This means we could change the sensor from I2C to SPI without having any impact on the next layer up.
Another version of this anti-pattern is called "One big Header File", where instead of making one header that includes all the others, we just place all the contents of all those headers into a single global file. This file is often called "common.h" or "defs.h" or "global.h" or something similar. Ward Cunningham has a good comprehensive list of the problems caused by this practice on his wiki.
Example 4: The shared include folder
This is a great example of Cargo Culting something that sometimes works in a library, and applying it everywhere without understanding the consequences. The mistake here is to divide the project into sources and headers instead of dividing it into modules. Sources and headers are hopefully not the division that comes to mind when I ask you to divide code into modules!
In the context of a library, where the intention is very much to have an external interface (include) separated from its internal implementation (src), this segregation can make sense, but your program is not a library.
When you look at this structure you should ask how would this work in the following typical scenarios:
- If one of the libraries grows enough that we need to split it into multiple files? How will you now know which headers and/source belong to which library?
- If two libraries end up with identically named files? Typical examples of collisions are types.h, config.h, hal.h, callbacks.h or interface.h.
- If I have to update a library to a later version, how will I know which files to replace if they are all mixed into the same folder?
- How do I know which files are part of my project, and as such, I should maintain them locally, as opposed to which files are part of a library and should be maintained at the library project location which is used in many projects?
This structure is bad because it breaks the core architectural principles of cohesion and encapsulation which dictates that we keep related things together, and encapsulate logical or functional groupings into clearly identifiable entities.
If you do not get this right it leads to library files being copied into every project, and that means multiple copies of the same file in revision control. You also end up with files that have nothing to do with each other grouped together in the same folder.
Example 5: A better way
I am not saying this is the one true way to structure your project, but with this arrangement, we can get the libraries from revision control and simply replace an entire folder when we do. It is also obvious which files are part of each library, and which ones belong to my project. We can see at a glance that this project has it's own code and depends on 3 libraries. The structure embodies information about the project which helps us manage it, and this information is not duplicated requiring us to keep data from different places in sync.
We can now include these libraries into this, or any other project, by simply telling Git to fetch the desired version of each of these folders from its own repository. This makes it easy to update the version of any particular library, and name collisions between libraries are no longer an issue.
Additionally, as the library grows it will be easy to distinguish in my code which library I have a dependency on, and exactly which types.h file I am referring to when I refer to the header files as follows.
Many different project directory structures could work for your project. We are in no way saying that this is "the one true structure". What we are saying is that when the time comes to commit your project to a structure, do remember the pros and cons of each of these examples we discussed. That way you will at least know the coming consequences of your decisions before you are committed to them.
Robert C. Martin, aka Uncle Bob, wrote a great article back in 2000 describing the SOLID architectural principles. SOLID is focussed on managing dependencies between software modules. Following these principles will help create an architecture that manages the dependencies between modules well.
A SOLID design will naturally translate into a manageable folder structure for your embedded C project.