Jump to content
 

How to struct - lessons on Structures in C

Orunmila

991 views

Structures in the C Programming Language

Structures in C is one of the most misunderstood concepts. We see a lot of questions about the use of structs, often simply about the syntax and portability. I want to explore both of these and look at some best practice use of structures in this post as well as some lesser known facts. Covering it all will be pretty long so I will start off with the basics, the syntax and some examples, then I will move on to some more advanced stuff. If you are an expert who came here for some more advanced material please jump ahead using the links supplied. 

Throughout I will refer to the C99 ANSI C standard often, which can be downloaded from the link in the references. If you are not using a C99 compiler some things like designated initializers may not be available. I will try to point out where something is not available in older complilers that only support C89 (also known as C90). C99 is supported in XC8 from v2.0 onwards.

Advanced topics handled lower down

  1. Scope
  2. Designated Initializers
  3. Declaring Volatile and Const
  4. Bit-Fields
  5. Padding and Packing of structs and Alignment
  6. Deep and Shallow copy of structures
  7. Comparing Structs

Basics

A structure is a compound type in C which is known as an "aggregate type". Structures allows us to use sets of variables together like a single aggregate object. This allows us to pass groups of variables into functions, assign groups of variables to a destination location as a single statement and so forth.

Structures are also very useful when serializing or de-serializing data over communication ports. If you are receiving a complex packet of data it is often possible to define a structure specifying the layout of the variables e.g. the IP protocol header structure, which allows more natural access to the members of the structure.

Lastly structures can be used to create register maps, where a structure is aligned with CPU registers in such a way that you can access the registers through the corresponding structure members.

The C language has only 2 aggregate types namely structures and arrays. A union is notably not considered an aggregate type as it can only have one member object (overlapping objects are not counted separately). [Section "6.5.2 Types" of C99]

Syntax

The basic syntax for defining a structure follows this pattern.

struct [structure tag] {
   member definition;
   member definition;
   ...
} [one or more structure variables];

As indicated by the square brackets both the structure tag (or name) and the structure variables are optional. This means that I can define a structure without giving it a name. You can also just define the layout of a structure without allocating any space for it at the same time.

What is important to note here is that if you are going to use a structure type throughout your code the structure should be defined in a header file and the structure definition should then NOT include any variable definitions. If you do include the structure variable definition part in your header file this will result in a different variable with an identical name being created every time the header file is included! This kind of mistake is often masked by the fact that the compiler will co-locate these variables, but this kind of behavior can cause really hard to find bugs in your code, so never do that!

Declare the layout of your structures in a header file and then create the instances of your variables in the C file they belong to. Use extern definitions if you want a variable to be accessible from multiple C files as usual.

Let's look at some examples.

Example1 - Declare an anonymous structure (no tag name) containing 2 integers, and create one instance of it. This means allocate storage space in RAM for one instance of this structure on the stack.

struct {
   int i;
   int j;
}  myVariableName;

This structure type does not have a name, so it is an anonymous struct, but we can access the variables via the variable name which is supplied. The structure type may not have a name but the variable does. When you declare a struct like this it is not possible to declare a function which will accept this type of structure by name.

Example 2 - Declare a type of structure which we will use later in our code. Do not allocate any space for it.

struct myStruct {
   int i;
   int j;
};

If we declare a structure like this we can create instances or define variables of the struct type at a later stage as follows. (According to the standard "A declaration specifies the interpretation and attributes of a set of identifiers. A definition of an identifier is a declaration for that identifier that causes storage to be reserved for that object" - 6.7)

struct myStruct myVariable1;
struct myStruct myVariable2;

Example 3 - Declare a type of structure and define a type for this struct.

typedef struct myStruct {
   int i;
   int j;
} myStruct_t;  // Not to be confused by a variable declaration
               // typedef changes the syntax here - myStruct_t is part of the typedef, NOT the struct definition!

// This is of course equivalent to
struct myStruct {
   int i;
   int j;
};  // Now if you placed a name here it would allocate a variable
typedef struct myStruct myStruct_t;

The distinction here is a constant source of confusion for developers, and this is one of many reasons why using typedef with structs is NOT ADVISED. I have added in the references a link to some archived conversations which appeared on usenet back in 2002. In these messages Linus Torvalds explains much better than I can why it is generally a very bad idea to use typedef with every struct you declare as has become a norm for so many programmers today. Don't be like them!

In short typedef is used to achieve type abstraction in C, this means that the owner of a library can at a later time change the underlying type without telling users about it and everything will still work the same way. But if you are not using the typedef exactly for this purpose you end up abstracting, or hiding, something very important about the type. If you create a structure it is almost always better for the consumer to know that they are dealing with a structure and as such it is not safe to to comparisons like == to the struct and it is also not safe to copy the struct using = due to deep copy problems (later on I describe these). By letting the user of your structs know explicitly they are using structs when they are you will avoid a lot of really hard to track down bugs in the future. Listen to the experts!

This all means that the BEST PRACTICE way to use structs is as follows,

Example 4- How to declare a structure, instantiate a variable of this type and pass it into a function. This is the BEST PRACTICE way.

struct point {  // Declare a cartesian point data type
   int x;
   int y;
};

void pointProcessor(struct point p) // Declare a function which takes struct point as parameter by value
{
   int temp = p.x;
   ...  // and the rest
}

void main(void)
{
   // local variables
   struct point myPoint = {3,2}; // Allocate a point variable and initialize it at declaration.

   pointProcessor(myPoint);
}

As you can see we declare the struct and it is clear that we are defining a new structure which represents a point. Because we are using the structure correctly it is not necessary to call this point_struct or point_t because when we use the structure later it will be accompanied by the struct keyword which will make its nature perfectly clear every time it is used.

When we use the struct as a parameter to a function we explicitly state that this is a struct being passed, this acts as a caution to the developers who see this that deep/shallow copies may be a problem here and need to be considered when modifying the struct or copying it. We also explicitly state this when a variable is declared, because when we allocate storage is the best time to consider structure members that are arrays or pointers to characters or something similar which we will discuss later under deep/shallow copies and also comparisons and assignments.

Note that this example passes the structure to the function "By Value" which means that a copy of the entire structure is made on the parameter stack and this is passed into the function, so changing the parameter inside of the function will not affect the variable you are passing in, you will be changing only the temporary copy.

Example 5 - HOW NOT TO DO IT! You will see lots of examples on the web to do it this way, it is not best practice, please do not do it this way!

// This is an example of how NOT to do it
// This does the same as example 4 above, but doing it this way abstracts the type in a bad way
// This is what Linus Torvalds warns us against!
typedef struct point_tag {  // Declare a cartesian point data type
   int x;
   int y;
} point_t;

void pointProcessor(point_t p) 
{
   int temp = p.x;
   ...  // and the rest
}

void main(void)
{
   // local variables
   point_t myPoint = {3,2}; // Allocate a point variable and initialize it at declaration.

   pointProcessor(myPoint);
}

Of course now the tag name of the struct has no purpose as the only thing we ever use it for is to declare yet another type with another name, this is a source of endless confusion to new C programmers as you can imagine! The mistake here is that the typedef is used to hide the nature of the variable.

Initializers

As you saw above it is possible to assign initial values to the members of a struct at the time of definition of your variable. There are some interesting rules related to initializer lists which are worth pointing out. The standard requires that initializers be applied in the order that they are supplied, and that all members for which no initializer is supplied shall be initialized to 0. This applies to all aggregate types. This is all covered in the standard section 6.7.8.

I will show a couple of examples to clear up common misconceptions here. Desctiptions are all in the comments.

struct point {
   int x;
   int y;
};

void function(void)
{
   int myArray1[5];            // This array has random values because there is no initializer
   int myArray2[5] = { 0 };    // Has all its members initialized to 0
   int myArray3[5] = { 5 };    // Has first element initialized to 5, all other elements to 0
   int myArray3[5] = { };      // Has all its members initialized to 0
 
   struct point p1;            // x and y both indeterminate (random) values
   struct point p2 = {1, 2};   // x = 1 and y = 2
   struct point p3 = { 1 };    // x = 1 and y = 0;

   // Code follows here
}

These rules about initializers are important when you decide in which order to declare your members of your structures. We saw a great example of how user interfaces can be simplified by placing members to be initialized to 0 at the end of the list of structure members when we looked at the examples of how to use RTCOUNTER in another blog post.

More details on Initializers such as designated initializers and variable length arrays, which were introduced in C99, are discussed in the advanced section below.

Assignment

Structures can be assigned to a target variable just the same as any other variable. The result is the same as if you used the assignment operator on each member of the structure individually. In fact one of the enhancements of the "Enhanced Midrange" code in all PIC16F1xxx devices is the capability to do shallow copies of structures faster thought specialized instructions!

struct point {  // Declare a cartesian point data type
   int x;
   int y;
};

void main(void) {
   struct point p1 = {4,2};   // p1 initialized though an initializer-list
   struct point p2 = p1;      // p2 is initialized through assignment
  
   // At this point p2.x is equal to p1.x and so is p2.y equal to p1.y
  
   struct point p3;
  
   p3 = p2;  // And now all three points have the same value
}

Be careful though, if your structure contains external references such as pointers you can get into trouble as explained later under Deep and Shallow copy of structures.

Basic Limitations

Before we move on to advanced topics. As you may have suspected there are some limitations to how much of each thing you can have in C. The C standard calls these limits Translation Limits. They are a requirement of the C standard specifying what the minimum capabilities of a compiler has to be to call itself compliant with the standard. This ensures that your code will compile on all compliant compilers as long as you do not exceed these limits.

The Translation Limits applicable to structures  are:

  1. External identifiers must use at most 31 significant characters. This means structure names or members of structures should not exceed 31 unique characters.
  2. At most 1023 members in a struct or union
  3. At most 63 levels of nested structure or union definitions in a single struct-declaration-list

Advanced Topics

Scope

Structure, union, and enumeration tags have scope that begins just after the appearance of the tag in a type specifier that declares the tag. When you use typedef's however the type name only has scope after the type declaration is complete. This makes it tricky to define a structure which refers to itself when you use typedef's to define the type, something which is important to do if you want to construct something like a linked list.

I regularly see people tripping themselves up with this because they are using the BAD way of using typedef's. Just one more reason not to do that!

Here is an example.

// Perfectly fine declaration which compiles as myList has scope inside the curly braces
struct myList {
   struct myList* next;
};

// This DOES NOT COMPILE ! 
// The reason is that myList_t only has scope after the curly brace when the type name is supplied.
typedef struct myList {
   myList_t* next;
} myList_t;

As you can see above we can easily refer a member of the structure to a pointer of the structure itself when you stay away from typedef's,  but how do you handle the more complex case of two separate structures referring to each other?

In order to solve that one we have to make use of incomplete struct types.

Quote

A structure or union type of unknown content (as described in 6.7.2.3) is an incomplete type. It is completed, for all declarations of that type, by declaring the same structure or union tag with its defining content later in the same scope.
-- C99 - 6.2.5-22

Below is an example of how this looks in practice.

struct a;   // Incomplete declaration of a
struct b;   // Incomplete declaration of b

struct a {         // Completing the declaration of a with member pointing to still incomplete b
   struct b * myB;
};

struct b {         // Completing the declaration of b with member pointing to now complete a
   struct a * myA;
};

This is an interesting example from the standard on how scope is resolved.

Quote

EXAMPLE 3 The following obscure constructions

      typedef signed int t;
      typedef int plain;
      struct tag {
            unsigned t:4;
            const t:5;
            plain r:5;
      };

declare
   a typedef name t with type signed int,
   a typedef name plain with type int,
   and a structure with three bit-field members, one named t that contains values in the range [0, 15], an unnamed const- qualified bit-field which (if it could be accessed) would contain values in either the range [−15, +15] or [−16, +15], and one named r that contains values in one of the ranges [0, 31], [−15, +15], or [−16, +15]. (The choice of range is implementation-defined.)

The first two bit-field declarations differ in that unsigned is a type specifier (which forces t to be the name of a structure member), while const is a type qualifier (which modifies t which is still visible as a typedef name). 
 

Designated Initializers (introduced in C99)

Example 4 above used initializer-lists to initialize the members of our structure, but we were only able to omit members at the end, which limited us quite severely. If we could omit any member from the list, or rather include members by designation, we could supply the initializers we need and let the rest be set safely to 0. This was introduced in C99.

This addition had a bigger impact on Unions however. There is a rule in a union which states that initializer-lists shall be applied solely to the first member of the union. It is easy to see why this was necessary, since the members of a each struct which comprizes a union do not have to be the same number of members, it would be impossible to apply a list of constants to an arbitraty member of the union. In many cases this means that designated initializers are the only way that unions can be initialized consistently.

Examples with structs.

struct multi {
  int x;
  int y;
  int a;
  int b;
};

struct multi myVar = {.a = 5, .b = 3};  // Initialize the struct to { 0, 0, 5, 3 }
   

Examples with a Union.

struct point { int x; int y; };

struct circle {
   struct point center;
   int radius;
};

struct line {
   struct point start;
   struct point end;
};

union shape {
   struct circle  mCircle;
   struct line    mLine;
};

void main(void)
{
   volatile union shape shape1 = {.mLine = {{1,2}, {3,4}}};   // Initialize the union using the line member
   volatile union shape shape2 = {.mCircle = {{1,2}, 10}};    // Initialize the union using the circle member
   
   ...
}

The type of initialization of a union using the second member of the union was not possible before C99, which also means if you are trying to port C99 code to a C89 compiler this will require you to write initializer functions which are functionally different and your port may end up not working as expected.

Initializers with designations can be combined with compound literals. Structure objects created using compound literals can be passed to functions without depending on member order. Here is an example.

struct point {
  int x;
  int y;
};

// Passing 2 anonymous structs into a function without declaring local variables
drawline( (struct point){.x=1, .y=1}, (struct point){.y=3, .x=4});

Volatile and Const Structure declarations

When declaring structures it is often necessary for us to make the structure volatile, this is especially important if you are going to overlay the structure onto registers (a register map) of the microprocessor. It is important to understand what happens to the members of the structure in terms of volatility depending on how we declare it.

This is best explained using the examples from the C99 standard.

struct s {  // Struct declaration
  int i; 
  const int ci; 
};

// Definitions 
struct s s;
const struct s cs;
volatile struct s vs;

// The various members have the types:
s.i      // int
s.ci     // const int
cs.i     // const int
cs.ci    // const int
vs.i     // volatile int
vs.ci    // volatile const int

Bit Fields

It is possible to include in the declaration of a structure how many bits each member should occupy. This is known as "Bit Fields". It can be tricky to write portable code using bit-fields if you are not aware of their limitations.

Firstly the standard states that "A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation-defined type." Further to this it also statest that "As specified in 6.7.2 above, if the actual type specifier used is int or a typedef-name defined as int, then it is implementation-defined whether the bit-field is signed or unsigned."

This means effectively that unless you use _Bool or unsigned int your structure is not guaranteed to be portable to other compilers or platforms. The recommended way to declare portable and robust bitfields is as follows.

struct bitFields {
   unsigned enable : 1;  
   unsigned count  : 3;
   unsigned mode   : 4;
};

When you use any of the members in an expression they will be promoted to a full sized unsigned int during the expression evaluation. When assigning back to the members values will be truncated to the allocated size.

It is possible to use anonymous bitfields to pad out your structure so you do not need to use dummy names in a struct if you build a register map with some unimplemented bits. That would look like this:

struct bitFields {
   unsigned int enable : 1;  
   unsigned : 3;
   unsigned int mode   : 4;
};

This declares a variable which is at least 8 bits in size and has 3 padding bits between the members "enable" and "mode".

The caveat here is that the standard does not specify how the bits have to be packed into the structure, and different systems do in fact pack bits in different orders (e.g. some may pack from LSB while others will pack from MSB first). This means that you should not rely on the postion of specific position of bits in your struct being in specific locations. All you can rely on is that in 2 structs of the same type the bits will be packed in corresponding locations. When you are dealing with communication systems and sending structures containing bitfields over the wire you may get a nasty surprize if bits are in a different order on the receiver side. And this also brings us to the next possible inconsitency - packing.

This means that for all the syntactic sugar offered by bitfields, it is still more portable to use shifting and masking. By doing so you can select exactly where each bit will be packed, and on most compilers this will result in the same amount of code as using bitfields.

Padding, Packing and Alignment

This is going to be less applicable on a PIC16, but if you write portable code or work with larger processors this becomes very important.

Typically padding will happen when you declare a structure that has members which are smaller than the fastest addressible unit of the processor. The standard allows the compiler to place padding, or unused space, in between your structure members to give you the fastest access in exchange for using more RAM. This is called "Alignment". On embedded applications RAM is usually in short supply so this is an important consideration.

You will see e.g. on a 32-bit processor that the size of structures will increment in multiples of 4. 

The following example shows the definition of some structures and their sizes on a 32-bit processor (my i7 in this case running macOS). And yes it is a 64 bit machine but I am compiling for 32-bit here.

// This struct will likely result in sizeof(iAmPadded) == 12
struct iAmPadded {
   char c;
   int i;
   char c2;
}

// This struct results in sizeof(iAmPadded) == 8 (on GCC on my i7 Mac) or it could be 12 depending on the compiler used.
struct iAmPadded {
   char c;
   char c2;
   int i;
}

Many compilers/linkers will have settings with regards to "Packing" which can either be set globally. Packing will instruct the compiler to avoid padding in between the members of a structure if possible and can save a lot of memory. It is also critical to understand packing and padding if you are making register overlays or constructing packets to be sent over communication ports.

If you are using GCC packing is going to look like this:

// This struct on gcc on a 32-bit machine has sizeof(struct iAmPadded) == 6
struct  __attribute__((__packed__)) iAmPadded {
   char c;
   int i;
   char c2;
}

// OR this has the same effect for GCC
#pragma pack(1)
struct  __attribute__((__packed__)) iAmPadded {
   char c;
   int i;
   char c2;
}

If you are writing code on e.g. an AVR which uses GCC and you want to use the same library on your PIC32 or your Cortex-M0 32-bitter then you can instruct the compiler to pack your structures like this and save loads of RAM.

Note that taking the address of structure members may result in problems on architectures which are not byte-addressible such as a SPARC. Also it is not allowed to take the address of a bitfield inside of a structure.

One last note on the use of the sizeof operator.  When applied to an operand that has structure or union type, the result is the total number of bytes in such an object, including internal and trailing padding.

Deep and Shallow copy

Another one of those areas where we see countless bugs. Making structures with standard integer and float types does not suffer from this problem, but when you start using pointers in your structures this can turn into a problem real fast.

Generally it is perfectly fine to create copies of structures by passing them into functions or using the assignement operator "=".

Example

struct point {
   int a;
   int b;
};

void function(void)
{
    struct point point1 = {1,2};
    struct point point2;

    point2 = point1;   // This will copy all the members of point1 into point2
}
Similarly when we call a function and pass in a struct a copy of the structure will be made into the parameter stack in the same way. When the structure however contains a pointer we must be careful because the process will copy the address stored in the pointer but not the data which the pointer is pointing to. When this happens you end up with 2 structures containing pointers pointing to the same data, which can cause some very strange behavior and hard to track down bugs. Such a copy, where only the pointers are copied is called a "shallow copy" of the structure. The alternative is to allocate memory for members being pointed to by the structure and create what is called a "deep copy" of the structure which is the safe way to do it.
 
We probably see this with strings more often than with any type of pointer e.g.
struct person {
   char* firstName;
   char* lastName;
}

// Function to read person name from serial port
void getPerson(struct person*  p);

void f(void)
{
   struct person myClient = {"John", "Doe"};  // The structure now points to the constant strings

   // Read the person data
   getPerson(&myClient)   
}

// The intention of this function is to read 2 strings and assign the names of the struct person
void getPerson(struct person*  p)
{
     char first[32];
     char last[32];
  
     Uart1_Read(first, 32);
     Uart1_Read(last, 32);
  
     p.firstName = first;
     p.lastName = last;
}

// The problem with this code is that it is easy for to look like it works. 

The probelm with this code is that it will very likely pass most tests you throw at it, but it is tragically broken. The 2 buffers, first and last, are allocated on the stack and when the function returns the memory is freed, but still contains the data you received. Until another function is called AND this function allocates auto variables on the stack the memory will reamain intact. This means at some later stage the structure will become invalid and you will not be able to understand how, if you call the function twice you will later find that both variables you passed in contain the same names.

Always double check and be mindful where the pointers are pointing and what the lifetime of the memory allocated is. Be particularly careful with memory on the stack which is always short-lived.

For a deep copy you would have to allocate new memory for the members of the structure that are pointers and copy the data from the source structure to the destination structure manually. Be particularly careful when structures are passed into a function by value as this makes a copy of the structure which points to the same data, so in this case if you re-allocate the pointers you are updating the copy and not the source structure! For this reason it is best to always pass structures by reference (function should take a pointer to a structure) and not by value. Besides if data is worth placing in a structure it is probably going to be bigger than a single pointer and passing the structure by reference would probably be much more efficient!

Comparing Structs

Although it is possible to asign structs using "=" it is NOT possible to compare structs using "==". The most common solution people go for is to use memcmp with sizeof(struct) to try and do the comparison. This is however not a safe way to compare structures and can lead to really hard to track down bugs!

The problem is that structures can have padding as described above, and when structures are copied or initialized there is no obligation on the compiler to set the values of the locations that are just padding, so it is not safe to use memcmp to compare structures. Even if you use packing the structure may still have trailing padding after the data to meet alignment requirements. The only time using memcmp is going to be safe is if you used memset or calloc to clear out all of the memory yourself, but always be mindful of this caveat.

Conclusion

Structs are an important part of the C language and a powerful feature, but it is important that you ensure you fully understand all the intricacies involved in structs. There is as always a lot of bad advice and bad code out there in the wild wild west known as the internet so be careful when you find code in the wild, and just don't use typdef on your structs!

References

As always the WikiPedia page is a good resource

Link to a PDF of the comittee draft which was later approved as the C99 standard

Linus Torvalds explains why you should not use typedef everywhere for structs

Good write-up on packing of structures

Bit Fields discussed on Hackaday

 

 

  • Like 1
  • Helpful 1


2 Comments


Recommended Comments

Wandering to here, just want to ask some questions :      
1.
"The type of initialization of a union using the second member of the union was not possible before C99"  why? 

2.
typedef signed int t;

      typedef int plain;
      struct tag {
            unsigned t:4;
"one named t that contains values"  All I see is a unname member.


3. Why on earth can't I use navigation keys and Enter here???

  • Helpful 1

Share this comment


Link to comment
4 hours ago, witz said:

Wandering to here, just want to ask some questions :      
1.
"The type of initialization of a union using the second member of the union was not possible before C99"  why? 

Because in C89 this would be a syntax error. The syntax did not exist until it was introduced in C99 together with named initializers. In C89 it was not possible to initialize a union by it's second member because it was not possible to name the target member. This is important because many compilers today are still not fully C99 compliant and support only some of it's constructs, which means that if you use named initializers your code may be less portable because some compilers may still choke on that syntax.

4 hours ago, witz said:

2.
typedef signed int t;

      typedef int plain;
      struct tag {
            unsigned t:4;
"one named t that contains values"  All I see is a unname member.

This example is verbatim from the C99 standard section 6.7.7 paragraph 6. The answer to your question is right there in the last sentence "The first two bit-field declarations differ in that unsigned is a type specifier (which forces t to be the name of a structure member), while const is a type qualifier (which modifies t which is still visible as a typedef name). "

So in other words because of the "unsigned" the t is forced to be the name of the member and it is NOT the type of the member as you may expect. This means that when used like that the member is indeed not unnamed, it is named as t of type unsigned and the typedef from above is not applicable at all. I know, that is why even in the standard they refer to this as "obscure"!

4 hours ago, witz said:

3. Why on earth can't I use navigation keys and Enter here???

I have no idea, navigation keys and Enter work just fine for me. I am using Google Chrome, perhaps it is the browser or a setting. Which browser are you using?

 

Share this comment


Link to comment
Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...