Initializing Strings in C
Embedded applications are hard for a large number of reasons, but one of the main issues is memory. Today I want to talk about how our C variables get initialized and a few assumptions we make as we use C to write embedded software.
Let us take a few simple declarations such as we might make every day.
char *string1 = "string1"; const char *string2 = "string2"; char const *string3 = "string3"; char * const string4 = "string4"; char const * const string5 = "string5";
In C99 these will all compile just fine but they are very different. In C++11, 2 of these will have a warning. We shall discuss them in order.
Duration & Scope
The first thing to notice is these variables are all declared outside of a function. That affects them in the following ways:
- External Linkage - i.e. they are global.
- Static Storage Duration - i.e. they are always active.
- allocated and initialized before control goes to main()
Of course, if the keyword static had be placed before one of the declarations, the external linkage would have changed to internal linkage, preventing the variable from being accessed outside the compilation unit.
Because they have static storage class, they will be initialized. If no initializer is specified they will be initialized to 0 or NULL.
In each case, these have an initializer. So what is being initialized? Each variable is some type of a character pointer. So the value being stored in the variable will be an address. The address will be the address of a string of characters. The image on the right shows a data segment with the initialized variables. C does not actually specify where these const char strings need to be located. In GCC, they will be placed in the data segment and the variable will get the address of the string. However, in XC8, they will be placed in the Text segment (in FLASH). In the Arduino (GCC) environment you can force a string into FLASH by adding PROGMEM or using the F (FLASH) macro like this : F("String"). If the string is NOT in flash memory, then CINIT will copy the string from the text segment into the data segment. Then the variables will be initialized with pointers to the strings.
String 1
The declaration for string 1 is simply a char *. This is a pointer to character(s). It is permissible for this pointer to be changed at run-time. i.e. the following is legal:
string1 = string2;
Of course, you will get a warning because string2 is a const char *.
It is also permissible to do this:
string1[3] = 'a';
Which will change the original string into "strang1". However it is NOT permissible to do the following:
string1 = string2;
string1[3] = 'a';
If you DO do this, it is likely to compile but there could be a few problems because string2 is declared as a const string so it must not be modified.
Here is my compiler reluctantly obeying me and then the program crashes on the write. Just imagine that the string is in FLASH so writing is impossible without specific write sequences which the compiler probably does not know.
In some environments you can get away with this. For example, I was using a Cypress WiFi device that loads the entire FLASH into RAM and then executes it. This code will run and it will not crash. Be very careful in such circumstances because in a few years you will be tasked to port the program to something else and life will be made hard because you did not fix the warnings. It turns out that in section 6.7.3 paragraph 5 of the C99 standard the behavior of line 22 is undefined. Your environment can choose to do anything it wants.
String 2 & 3
The declaration for string 2 is a const char * and string 3 is char const *. These are IDENTICAL. This is in section 6.7.3 paragraph 9 "the order of type qualifiers within a list of specifiers or qualifiers does not affect the specified type". So these are pointers to a constant character(s). In a nice compiler, these characters would be stored in FLASH memory and never copied into RAM. That would be most memory efficient. However, GCC will copy these from FLASH into RAM and then use the address of these strings to initialize the variables.
String 4
This declaration is to a const pointer. That is, the pointer value cannot change but the data pointed to by the pointer CAN change.
Note the ERROR on line 22. Line 21 is perfectly fine. The data pointed by the pointer is NOT const so it is allowed to change.
In C++11, the original declaration will have a warning because a char * is being initialized to point to a const char *. Never mind that the pointer is const.
String 5
This is a combination of both sorts of constants. A const pointer pointing at a const character(s). This can be initialized from a const string just fine. But you will not be allowed to change the pointer or the data pointed to.
Both line 21 and line 22 have errors and not simply warnings.
We will do more of these variable initializer posts. The language rules are very clear but there are a few constructions that we don't see very often. And even worse, the assumptions we make about the syntax work often enough that we end up with some very strange notions on what the language allows.
A good resource for testing your knowledge about strange C declarations is this website:
Good Luck
4 Comments
Recommended Comments