Jump to content
 
  • entries
    31
  • comments
    46
  • views
    22,149

Contributors to this blog

Initializing Strings in C


N9WXU

1,421 views

 Share

Embedded applications are hard for a large number of reasons, but one of the main issues is memory.  Today I want to talk about how our C variables get initialized and a few assumptions we make as we use C to write embedded software.

Let us take a few simple declarations such as we might make every day.

char *string1 = "string1";
const char *string2 = "string2";
char const *string3 = "string3";
char * const string4 = "string4";
char const * const string5 = "string5";

 

In C99 these will all compile just fine but they are very different.  In C++11, 2 of these will have a warning.  We shall discuss them in order.

Duration & Scope

The first thing to notice is these variables are all declared outside of a function.  That affects them in the following ways:

  • External Linkage - i.e. they are global.
  • Static Storage Duration - i.e. they are always active.
  • allocated and initialized before control goes to main()segments.png

Of course, if the keyword static had be placed before one of the declarations, the external linkage would have changed to internal linkage, preventing the variable from being accessed outside the compilation unit.

Because they have static storage class, they will be initialized.  If no initializer is specified they will be initialized to 0 or NULL.

In each case, these have an initializer.  So what is being initialized?  Each variable is some type of a character pointer.  So the value being stored in the variable will be an address.  The address will be the address of a string of characters.  The image on the right shows a data segment with the initialized variables.  C does not actually specify where these const char strings need to be located.  In GCC, they will be placed in the data segment and the variable will get the address of the string.  However, in XC8, they will be placed in the Text segment (in FLASH).  In the Arduino (GCC) environment you can force a string into FLASH by adding PROGMEM or using the F (FLASH) macro like this : F("String").  If the string is NOT in flash memory, then CINIT will copy the string from the text segment into the data segment.  Then the variables will be initialized with pointers to the strings.

String 1

The declaration for string 1 is simply a char *.  This is a pointer to character(s).  It is permissible for this pointer to be changed at run-time.  i.e. the following is legal:

string1 = string2;

Of course, you will get a warning because string2 is a const char *.

It is also permissible to do this:

string1[3] = 'a';

Which will change the original string into "strang1".  However it is NOT permissible to do the following:

string1 = string2;
string1[3] = 'a';

If you DO do this, it is likely to compile but there could be a few problems because string2 is declared as a const string so it must not be modified.

image.png

Here is my compiler reluctantly obeying me and then the program crashes on the write.  Just imagine that the string is in FLASH so writing is impossible without specific write sequences which the compiler probably does not know.

In some environments you can get away with this.  For example, I was using a Cypress WiFi device that loads the entire FLASH into RAM and then executes it.  This code will run and it will not crash.  Be very careful in such circumstances because in a few years you will be tasked to port the program to something else and life will be made hard because you did not fix the warnings.  It turns out that in section 6.7.3 paragraph 5 of the C99 standard the behavior of line 22 is undefined.  Your environment can choose to do anything it wants.

String 2 & 3

The declaration for string 2 is a const char * and string 3 is char const *.  These are IDENTICAL.  This is in section 6.7.3 paragraph 9 "the order of type qualifiers within a list of specifiers or qualifiers does not affect the specified type".  So these are pointers to a constant character(s).  In a nice compiler, these characters would be stored in FLASH memory and never copied into RAM.  That would be most memory efficient.  However, GCC will copy these from FLASH into RAM and then use the address of these strings to initialize the variables. 

String 4

This declaration is to a const pointer.  That is, the pointer value cannot change but the data pointed to by the pointer CAN change.  

image.png

Note the ERROR on line 22.  Line 21 is perfectly fine.  The data pointed by the pointer is NOT const so it is allowed to change.

In C++11, the original declaration will have a warning because a char * is being initialized to point to a const char *.  Never mind that the pointer is const.

String 5

This is a combination of both sorts of constants.  A const pointer pointing at a const character(s).  This can be initialized from a const string just fine.  But you will not be allowed to change the pointer or the data pointed to.

image.png

Both line 21 and line 22 have errors and not simply warnings.

We will do more of these variable initializer posts.  The language rules are very clear but there are a few constructions that we don't see very often.  And even worse, the assumptions we make about the syntax work often enough that we end up with some very strange notions on what the language allows.

A good resource for testing your knowledge about strange C declarations is this website:

https://www.cdecl.org

image.png

Good Luck

 Share

4 Comments


Recommended Comments

It is also permissible to do this:

string1[3] = 'a';

Which will change the original string into "striag1".

I believe the original string will change to "strang1"

Link to comment
  • Member
13 hours ago, IamGroot said:

It is also permissible to do this:


string1[3] = 'a';

Which will change the original string into "striag1".

I believe the original string will change to "strang1"

Thanks for pointing out the mistake, we have updated the text accordingly.

Link to comment
  • Member
12 hours ago, IamGroot said:

string1[3] = 'a';

This gave a segmentation fault on compilation 

 

Looks like it gave the segfault upon running? And yes that is what I would expect because  on a PC your code is trying to write to memory which should not permit writes. On an embedded system the behavior will depend a lot on the underlying system. Some systems will actually crash in some way, others, like XC8 based PIC microcontrollers, actually copy the read-only section into RAM so the code will actually work. This is why this is so dangerous, the behavior depends on the target and the toolchain and when this is one day tried on another system it could be a real challenge to figure out the mistake because it is so easily masked.

Link to comment
Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...