Jump to content
 

Leaderboard


Popular Content

Showing content with the highest reputation since 12/23/2018 in Blog Entries

  1. 3 points
    When comparing CPU's and architectures it is also a good idea to compare the frameworks and learn how the framework will affect your system. In this article I will be comparing a number of popular Arduino compatible systems to see how different "flavors" of Arduino stack up in the pin toggling test. When I started this effort, I thought it would be a straight forward demonstration of CPU efficiency, clock speed and compiler performance on the one side against the Arduino framework implementation on the other. As is often the case, if you poke deeply into even the most trivial of systems you will always find something to learn. As I look around my board stash I see that there are the following Arduino compatible development kits: Arduino Nano Every (ATMega 4809 @ 20MHz AVR Mega) Mini Nano V3.0 (ATMega 328P @ 16MHz AVR) RobotDyn SAMD21 M0-Mini (ATSAMD21G18A @ 48MHz Cortex M0+) ESP-12E NodeMCU (ESP8266 @ 80MHz Tenselica) Teensy 3.2 (MK20DX256VLH7 @ 96MHz Cortex M4) ESP32-WROOM-32 (ESP32 @ 240MHz Tenselica) And each of these kits has an available Arduino framework. Say what you will about the Arduino framework, there are some serious advantages to using it and a few surprises. For the purpose of this testing I will be running one program on every board. I will use vanilla "Arduino" code and make zero changes for each CPU. The Arduino framework is very useful for normalizing the API to the hardware in a very consistent and portable manner. This is mostly true at the low levels like timers, PWM and digital I/O, but it is very true as you move to higher layers like the String library or WiFi. Strangely, there are no promises of performance. For instance, every Arduino program has a setup() function where you put your initialization and a loop() function that is called very often. With this in mind it is easy to imagine the following implementation: extern void setup(void); extern void loop(void); void main(void) { setup(); while(1) { loop(); } } And in fact when you dig into the AVR framework you find the following code in main.cpp int main(void) { init(); initVariant(); #if defined(USBCON) USBDevice.attach(); #endif setup(); for (;;) { loop(); if (serialEventRun) serialEventRun(); } return 0; } There are a few "surprises" that really should not be surprises. First, the Arduino environment needs to be initialized (init()), then the HW variant (initVariant()), then we might be using a usb device so get USB started (USBDevice.attach()) and finally, the user setup() function. Once we start our infinite loop. Between calls to the loop function the code maintains the serial connection which could be USB. I suppose that other frameworks could implement this environment a little bit differently and there could be significant consequences to these choices. The Test For this test I am simply going to initialize 1 pin and then set it high and low. Here is the code. void setup() { pinMode(2,OUTPUT); } void loop() { digitalWrite(2,HIGH); digitalWrite(2,LOW); } I am expecting this to make a short high pulse and a slightly longer low pulse. The longer low pulse is to account for the extra overhead of looping back. This is not likely to be as fast as the pin toggles Orunmila did in the previous article but I do expect it to be about half as fast. Here are the results. The 2 red lines at the bottom are the best case optimized raw speed from Orunmila's comparison. That is a pretty interesting chart and if we simply compare the data from the ATMEGA 4809 both with ASM and Arduino code, you see a 6x difference in performance. Let us look at the details and we will summarize at the end. Nano 328P So here is the first victim. The venerable AVR AT328P running 16MHz. The high pulse is 3.186uS while the low pulse is 3.544uS making a pulse frequency of 148.2kHz. Clearly the high and low pulses are nearly the same so the extra check to handle the serial ports is not very expensive but the digitalWrite abstraction is much more expensive that I was anticipating. Nano Every The Nano Every uses the much newer ATMega 4809 at 20Mhz. The 4809 is a different variant of the AVR CPU with some additional optimizations like set and clear registers for the ports. This should be much faster. The high pulse is 1.192uS and the low pulse is 1.504uS. Again the pulses are almost the same size so the additional overhead outside of the loop function must be fairly small. Perhaps it is the same serial port test. Interestingly, one of the limiting factors of popular Arduino 3d printer controller projects such as GRBL is the pin toggle rate for driving the stepper motor pulses. A 4809 based controller could be 2x faster for the same stepper code. Sam D21 Mini M0 Now we are stepping up to an ARM Cortex M0 at 48Mhz. I actually expect this to be nearly 2x performance as the 4809 simply because the instructions required to set pins high and low should be essentially the same. Wow! I was definitely NOT expecting the timing to get worse than the 4809. The high pulse width is 1.478uS and the low pulse width is 1.916uS making the frequency 294.6kHz. Obviously toggling pins is not a great measurement of CPU performance but if you need fast pin toggling in the Arduino world, perhaps the SAMD21 is not your best choice. Teensy 3.2 This is a NXP Cortex M4 CPU at 96 MHz. This CPU is double the clock speed as the D21 and it is a M4 CPU which has lots of great features, though those features may not help toggle pins quickly. Interesting. Clearly this device is very fast as shown by the short high period of only 0.352uS. But, this framework must be doing quite a lot of work behind the scenes to justify the 2.274uS of loop delay. Looking a little more closely I see a number of board options for this hardware. First, I see that I can disable the USB. Surely the USB is supported between calls to the loop function. I also see a number of compiler optimization options. If I turn off the USB and select the "fastest" optimizations, what is the result? Teensy 3.2, No USB and Fastest optimizations Making these two changes and re-running the same C code produces this result: That is much better. It is interesting to see the compiler change is about 3x faster for this test (measured on the high pulse) and the lack of USB saves about 1uS in the loop rate. This is not a definitive test of the optimizations and probably the code grew a bit, but it is a stark reminder that optimization choices can make a big difference. ESP8266 The ESP8266 is a 32-bit Tenselica CPU. This is still a load/store architecture so its performance will largely match ARM though undoubtedly there are cases where it will be a bit different. The 8266 runs at 80Mhz so I do expect the performance to be similar to the Teensy 3.2. The wildcard is the 8266 framework is intended to support WiFI so it is running FreeRTOS and the Arduino loop is just one thread in the system. I have no idea what that will do to our pin toggle so it is time to measure. Interesting. It is actually quite slow and clearly there is quite a bit of system house-keeping happening in the main loop. The high pulse is only 0.948uS so that is very similar to Nano Every at 1/4th the clock speed. The low pulse is simply slow. This does seem to be a good device for IoT but not for pin toggling. ESP32 The ESP32 is a dual core very fast machine, but it does run the code out of a cache. This is because the code is stored in a serial memory. Of course our test is quite short so perhaps we do not need to fear the cache miss. Like the ESP8266, the Arduino framework is built upon a FreeRTOS task. But this has a second CPU and lots more clock speed so lets look at the results: Interesting, the toggle rate is about 2x the Teensy while the clock speed is about 3x. I do like how the pulses are nearly symmetrical. A quick peek at the source code for the framework shows the Arduino running as a thread but the thread updates the watchdog timer and the serial drivers on each pass through the loop. Conclusions It is very educational to make measurements instead of assumptions when evaluating an MCU for your next project. A specific CPU may have fantastic specifications and even demonstrations but it is critical to include the complete development system and code framework in your evaluation. It is a big surprise to find the 16MHz AVR328P can actually toggle a pin faster than the ESP8266 when used in a basic Arduino project. The summary graph at the top of the article is duplicated here: In this graph, the Pin Toggling Speed is actually only 1/(the high period). This was done on purpose so only the pin toggle efficiency is being compared. In the test program, the low period is where the loop() function ends and other housekeeping work can take place. If we want to compare the CPU/CODE efficiency, we should really normalize the pin toggling frequency to a common clock speed. We can always compensate for inefficiency with more clock speed. This graph is produced by dividing the frequency by the clock speed and now we can compare the relative efficiencies. That Cortex M4 and its framework in the Teensy 3.2 is quite impressive now. Clearly the ESP-32 is pretty good but using its clock speed for the win. The Mega 4809 has a reasonable framework just not enough clock speed. All that aside, the ASM versions (or even a faster framework) could seriously improve all of these numbers. The poor ESP8266 is pretty dismal. So what is happening in the digitalWrite() function that is making this performance so slow? Put another way, what am I getting in return for the low performance? There are really 3 reasons for the performance. Portability. Each device has work to adapt to the pin interface so the price of portability is runtime efficiency Framework Support. There are many functions in the framework that could be affected by the writing to the pins so the digitalWrite function must modify other functions. Application Ignorance. The framework (and this function) cannot know how the system is constructed so they must plan for the worst. Let us look at the digitalWrite for the the AVR void digitalWrite(uint8_t pin, uint8_t val) { uint8_t timer = digitalPinToTimer(pin); uint8_t bit = digitalPinToBitMask(pin); uint8_t port = digitalPinToPort(pin); volatile uint8_t *out; if (port == NOT_A_PIN) return; // If the pin that support PWM output, we need to turn it off // before doing a digital write. if (timer != NOT_ON_TIMER) turnOffPWM(timer); out = portOutputRegister(port); uint8_t oldSREG = SREG; cli(); if (val == LOW) { *out &= ~bit; } else { *out |= bit; } SREG = oldSREG; } Note the first thing is a few lookup functions to determine the timer, port and bit described by the pin number. These lookups can be quite fast but they do cost a few cycles. Next we ensure we have a valid pin and turn off any PWM that may be active on that pin. This is just safe programming and framework support. Next we figure out the output register for the update, turn off the interrupts (saving the interrupt state) set or clear the pin and restore interrupts. If we knew we were not using PWM (like this application) we could omit the turnOffPWM function. If we knew all of our pins were valid we could remove the NOT_A_PIN test. Unfortunately all of these optimizations require knowledge of the application which the framework cannot know. Clearly we need new tools to describe embedded applications. This has been a fun bit of testing. I look forward to your comments and suggestions for future toe-to-toe challenges. Good Luck and go make some measurements. PS: I realize that this pin toggling example is simplistic at best. There are some fine Arduino libraries and peripherals that could easily toggle pins much faster than the results shown here. However, this is a simple Apples to Apples test of identical code in "identical" frameworks on different CPU's so the comparisons are valid and useful. That said, if you have any suggestions feel free to enlighten us in the comments.
  2. 2 points
    Structures in the C Programming Language Structures in C is one of the most misunderstood concepts. We see a lot of questions about the use of structs, often simply about the syntax and portability. I want to explore both of these and look at some best practice use of structures in this post as well as some lesser known facts. Covering it all will be pretty long so I will start off with the basics, the syntax and some examples, then I will move on to some more advanced stuff. If you are an expert who came here for some more advanced material please jump ahead using the links supplied. Throughout I will refer to the C99 ANSI C standard often, which can be downloaded from the link in the references. If you are not using a C99 compiler some things like designated initializers may not be available. I will try to point out where something is not available in older complilers that only support C89 (also known as C90). C99 is supported in XC8 from v2.0 onwards. Advanced topics handled lower down Scope Designated Initializers Declaring Volatile and Const Bit-Fields Padding and Packing of structs and Alignment Deep and Shallow copy of structures Comparing Structs Basics A structure is a compound type in C which is known as an "aggregate type". Structures allows us to use sets of variables together like a single aggregate object. This allows us to pass groups of variables into functions, assign groups of variables to a destination location as a single statement and so forth. Structures are also very useful when serializing or de-serializing data over communication ports. If you are receiving a complex packet of data it is often possible to define a structure specifying the layout of the variables e.g. the IP protocol header structure, which allows more natural access to the members of the structure. Lastly structures can be used to create register maps, where a structure is aligned with CPU registers in such a way that you can access the registers through the corresponding structure members. The C language has only 2 aggregate types namely structures and arrays. A union is notably not considered an aggregate type as it can only have one member object (overlapping objects are not counted separately). [Section "6.5.2 Types" of C99] Syntax The basic syntax for defining a structure follows this pattern. struct [structure tag] { member definition; member definition; ... } [one or more structure variables]; As indicated by the square brackets both the structure tag (or name) and the structure variables are optional. This means that I can define a structure without giving it a name. You can also just define the layout of a structure without allocating any space for it at the same time. What is important to note here is that if you are going to use a structure type throughout your code the structure should be defined in a header file and the structure definition should then NOT include any variable definitions. If you do include the structure variable definition part in your header file this will result in a different variable with an identical name being created every time the header file is included! This kind of mistake is often masked by the fact that the compiler will co-locate these variables, but this kind of behavior can cause really hard to find bugs in your code, so never do that! Declare the layout of your structures in a header file and then create the instances of your variables in the C file they belong to. Use extern definitions if you want a variable to be accessible from multiple C files as usual. Let's look at some examples. Example1 - Declare an anonymous structure (no tag name) containing 2 integers, and create one instance of it. This means allocate storage space in RAM for one instance of this structure on the stack. struct { int i; int j; } myVariableName; This structure type does not have a name, so it is an anonymous struct, but we can access the variables via the variable name which is supplied. The structure type may not have a name but the variable does. When you declare a struct like this it is not possible to declare a function which will accept this type of structure by name. Example 2 - Declare a type of structure which we will use later in our code. Do not allocate any space for it. struct myStruct { int i; int j; }; If we declare a structure like this we can create instances or define variables of the struct type at a later stage as follows. (According to the standard "A declaration specifies the interpretation and attributes of a set of identifiers. A definition of an identifier is a declaration for that identifier that causes storage to be reserved for that object" - 6.7) struct myStruct myVariable1; struct myStruct myVariable2; Example 3 - Declare a type of structure and define a type for this struct. typedef struct myStruct { int i; int j; } myStruct_t; // Not to be confused by a variable declaration // typedef changes the syntax here - myStruct_t is part of the typedef, NOT the struct definition! // This is of course equivalent to struct myStruct { int i; int j; }; // Now if you placed a name here it would allocate a variable typedef struct myStruct myStruct_t; The distinction here is a constant source of confusion for developers, and this is one of many reasons why using typedef with structs is NOT ADVISED. I have added in the references a link to some archived conversations which appeared on usenet back in 2002. In these messages Linus Torvalds explains much better than I can why it is generally a very bad idea to use typedef with every struct you declare as has become a norm for so many programmers today. Don't be like them! In short typedef is used to achieve type abstraction in C, this means that the owner of a library can at a later time change the underlying type without telling users about it and everything will still work the same way. But if you are not using the typedef exactly for this purpose you end up abstracting, or hiding, something very important about the type. If you create a structure it is almost always better for the consumer to know that they are dealing with a structure and as such it is not safe to to comparisons like == to the struct and it is also not safe to copy the struct using = due to deep copy problems (later on I describe these). By letting the user of your structs know explicitly they are using structs when they are you will avoid a lot of really hard to track down bugs in the future. Listen to the experts! This all means that the BEST PRACTICE way to use structs is as follows, Example 4- How to declare a structure, instantiate a variable of this type and pass it into a function. This is the BEST PRACTICE way. struct point { // Declare a cartesian point data type int x; int y; }; void pointProcessor(struct point p) // Declare a function which takes struct point as parameter by value { int temp = p.x; ... // and the rest } void main(void) { // local variables struct point myPoint = {3,2}; // Allocate a point variable and initialize it at declaration. pointProcessor(myPoint); } As you can see we declare the struct and it is clear that we are defining a new structure which represents a point. Because we are using the structure correctly it is not necessary to call this point_struct or point_t because when we use the structure later it will be accompanied by the struct keyword which will make its nature perfectly clear every time it is used. When we use the struct as a parameter to a function we explicitly state that this is a struct being passed, this acts as a caution to the developers who see this that deep/shallow copies may be a problem here and need to be considered when modifying the struct or copying it. We also explicitly state this when a variable is declared, because when we allocate storage is the best time to consider structure members that are arrays or pointers to characters or something similar which we will discuss later under deep/shallow copies and also comparisons and assignments. Note that this example passes the structure to the function "By Value" which means that a copy of the entire structure is made on the parameter stack and this is passed into the function, so changing the parameter inside of the function will not affect the variable you are passing in, you will be changing only the temporary copy. Example 5 - HOW NOT TO DO IT! You will see lots of examples on the web to do it this way, it is not best practice, please do not do it this way! // This is an example of how NOT to do it // This does the same as example 4 above, but doing it this way abstracts the type in a bad way // This is what Linus Torvalds warns us against! typedef struct point_tag { // Declare a cartesian point data type int x; int y; } point_t; void pointProcessor(point_t p) { int temp = p.x; ... // and the rest } void main(void) { // local variables point_t myPoint = {3,2}; // Allocate a point variable and initialize it at declaration. pointProcessor(myPoint); } Of course now the tag name of the struct has no purpose as the only thing we ever use it for is to declare yet another type with another name, this is a source of endless confusion to new C programmers as you can imagine! The mistake here is that the typedef is used to hide the nature of the variable. Initializers As you saw above it is possible to assign initial values to the members of a struct at the time of definition of your variable. There are some interesting rules related to initializer lists which are worth pointing out. The standard requires that initializers be applied in the order that they are supplied, and that all members for which no initializer is supplied shall be initialized to 0. This applies to all aggregate types. This is all covered in the standard section 6.7.8. I will show a couple of examples to clear up common misconceptions here. Desctiptions are all in the comments. struct point { int x; int y; }; void function(void) { int myArray1[5]; // This array has random values because there is no initializer int myArray2[5] = { 0 }; // Has all its members initialized to 0 int myArray3[5] = { 5 }; // Has first element initialized to 5, all other elements to 0 int myArray3[5] = { }; // Has all its members initialized to 0 struct point p1; // x and y both indeterminate (random) values struct point p2 = {1, 2}; // x = 1 and y = 2 struct point p3 = { 1 }; // x = 1 and y = 0; // Code follows here } These rules about initializers are important when you decide in which order to declare your members of your structures. We saw a great example of how user interfaces can be simplified by placing members to be initialized to 0 at the end of the list of structure members when we looked at the examples of how to use RTCOUNTER in another blog post. More details on Initializers such as designated initializers and variable length arrays, which were introduced in C99, are discussed in the advanced section below. Assignment Structures can be assigned to a target variable just the same as any other variable. The result is the same as if you used the assignment operator on each member of the structure individually. In fact one of the enhancements of the "Enhanced Midrange" code in all PIC16F1xxx devices is the capability to do shallow copies of structures faster thought specialized instructions! struct point { // Declare a cartesian point data type int x; int y; }; void main(void) { struct point p1 = {4,2}; // p1 initialized though an initializer-list struct point p2 = p1; // p2 is initialized through assignment // At this point p2.x is equal to p1.x and so is p2.y equal to p1.y struct point p3; p3 = p2; // And now all three points have the same value } Be careful though, if your structure contains external references such as pointers you can get into trouble as explained later under Deep and Shallow copy of structures. Basic Limitations Before we move on to advanced topics. As you may have suspected there are some limitations to how much of each thing you can have in C. The C standard calls these limits Translation Limits. They are a requirement of the C standard specifying what the minimum capabilities of a compiler has to be to call itself compliant with the standard. This ensures that your code will compile on all compliant compilers as long as you do not exceed these limits. The Translation Limits applicable to structures are: External identifiers must use at most 31 significant characters. This means structure names or members of structures should not exceed 31 unique characters. At most 1023 members in a struct or union At most 63 levels of nested structure or union definitions in a single struct-declaration-list Advanced Topics Scope Structure, union, and enumeration tags have scope that begins just after the appearance of the tag in a type specifier that declares the tag. When you use typedef's however the type name only has scope after the type declaration is complete. This makes it tricky to define a structure which refers to itself when you use typedef's to define the type, something which is important to do if you want to construct something like a linked list. I regularly see people tripping themselves up with this because they are using the BAD way of using typedef's. Just one more reason not to do that! Here is an example. // Perfectly fine declaration which compiles as myList has scope inside the curly braces struct myList { struct myList* next; }; // This DOES NOT COMPILE ! // The reason is that myList_t only has scope after the curly brace when the type name is supplied. typedef struct myList { myList_t* next; } myList_t; As you can see above we can easily refer a member of the structure to a pointer of the structure itself when you stay away from typedef's, but how do you handle the more complex case of two separate structures referring to each other? In order to solve that one we have to make use of incomplete struct types. Below is an example of how this looks in practice. struct a; // Incomplete declaration of a struct b; // Incomplete declaration of b struct a { // Completing the declaration of a with member pointing to still incomplete b struct b * myB; }; struct b { // Completing the declaration of b with member pointing to now complete a struct a * myA; }; This is an interesting example from the standard on how scope is resolved. Designated Initializers (introduced in C99) Example 4 above used initializer-lists to initialize the members of our structure, but we were only able to omit members at the end, which limited us quite severely. If we could omit any member from the list, or rather include members by designation, we could supply the initializers we need and let the rest be set safely to 0. This was introduced in C99. This addition had a bigger impact on Unions however. There is a rule in a union which states that initializer-lists shall be applied solely to the first member of the union. It is easy to see why this was necessary, since the members of a each struct which comprizes a union do not have to be the same number of members, it would be impossible to apply a list of constants to an arbitraty member of the union. In many cases this means that designated initializers are the only way that unions can be initialized consistently. Examples with structs. struct multi { int x; int y; int a; int b; }; struct multi myVar = {.a = 5, .b = 3}; // Initialize the struct to { 0, 0, 5, 3 } Examples with a Union. struct point { int x; int y; }; struct circle { struct point center; int radius; }; struct line { struct point start; struct point end; }; union shape { struct circle mCircle; struct line mLine; }; void main(void) { volatile union shape shape1 = {.mLine = {{1,2}, {3,4}}}; // Initialize the union using the line member volatile union shape shape2 = {.mCircle = {{1,2}, 10}}; // Initialize the union using the circle member ... } The type of initialization of a union using the second member of the union was not possible before C99, which also means if you are trying to port C99 code to a C89 compiler this will require you to write initializer functions which are functionally different and your port may end up not working as expected. Initializers with designations can be combined with compound literals. Structure objects created using compound literals can be passed to functions without depending on member order. Here is an example. struct point { int x; int y; }; // Passing 2 anonymous structs into a function without declaring local variables drawline( (struct point){.x=1, .y=1}, (struct point){.y=3, .x=4}); Volatile and Const Structure declarations When declaring structures it is often necessary for us to make the structure volatile, this is especially important if you are going to overlay the structure onto registers (a register map) of the microprocessor. It is important to understand what happens to the members of the structure in terms of volatility depending on how we declare it. This is best explained using the examples from the C99 standard. struct s { // Struct declaration int i; const int ci; }; // Definitions struct s s; const struct s cs; volatile struct s vs; // The various members have the types: s.i // int s.ci // const int cs.i // const int cs.ci // const int vs.i // volatile int vs.ci // volatile const int Bit Fields It is possible to include in the declaration of a structure how many bits each member should occupy. This is known as "Bit Fields". It can be tricky to write portable code using bit-fields if you are not aware of their limitations. Firstly the standard states that "A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation-defined type." Further to this it also statest that "As specified in 6.7.2 above, if the actual type specifier used is int or a typedef-name defined as int, then it is implementation-defined whether the bit-field is signed or unsigned." This means effectively that unless you use _Bool or unsigned int your structure is not guaranteed to be portable to other compilers or platforms. The recommended way to declare portable and robust bitfields is as follows. struct bitFields { unsigned enable : 1; unsigned count : 3; unsigned mode : 4; }; When you use any of the members in an expression they will be promoted to a full sized unsigned int during the expression evaluation. When assigning back to the members values will be truncated to the allocated size. It is possible to use anonymous bitfields to pad out your structure so you do not need to use dummy names in a struct if you build a register map with some unimplemented bits. That would look like this: struct bitFields { unsigned int enable : 1; unsigned : 3; unsigned int mode : 4; }; This declares a variable which is at least 8 bits in size and has 3 padding bits between the members "enable" and "mode". The caveat here is that the standard does not specify how the bits have to be packed into the structure, and different systems do in fact pack bits in different orders (e.g. some may pack from LSB while others will pack from MSB first). This means that you should not rely on the postion of specific position of bits in your struct being in specific locations. All you can rely on is that in 2 structs of the same type the bits will be packed in corresponding locations. When you are dealing with communication systems and sending structures containing bitfields over the wire you may get a nasty surprize if bits are in a different order on the receiver side. And this also brings us to the next possible inconsitency - packing. This means that for all the syntactic sugar offered by bitfields, it is still more portable to use shifting and masking. By doing so you can select exactly where each bit will be packed, and on most compilers this will result in the same amount of code as using bitfields. Padding, Packing and Alignment This is going to be less applicable on a PIC16, but if you write portable code or work with larger processors this becomes very important. Typically padding will happen when you declare a structure that has members which are smaller than the fastest addressible unit of the processor. The standard allows the compiler to place padding, or unused space, in between your structure members to give you the fastest access in exchange for using more RAM. This is called "Alignment". On embedded applications RAM is usually in short supply so this is an important consideration. You will see e.g. on a 32-bit processor that the size of structures will increment in multiples of 4. The following example shows the definition of some structures and their sizes on a 32-bit processor (my i7 in this case running macOS). And yes it is a 64 bit machine but I am compiling for 32-bit here. // This struct will likely result in sizeof(iAmPadded) == 12 struct iAmPadded { char c; int i; char c2; } // This struct results in sizeof(iAmPadded) == 8 (on GCC on my i7 Mac) or it could be 12 depending on the compiler used. struct iAmPadded { char c; char c2; int i; } Many compilers/linkers will have settings with regards to "Packing" which can either be set globally. Packing will instruct the compiler to avoid padding in between the members of a structure if possible and can save a lot of memory. It is also critical to understand packing and padding if you are making register overlays or constructing packets to be sent over communication ports. If you are using GCC packing is going to look like this: // This struct on gcc on a 32-bit machine has sizeof(struct iAmPadded) == 6 struct __attribute__((__packed__)) iAmPadded { char c; int i; char c2; } // OR this has the same effect for GCC #pragma pack(1) struct __attribute__((__packed__)) iAmPadded { char c; int i; char c2; } If you are writing code on e.g. an AVR which uses GCC and you want to use the same library on your PIC32 or your Cortex-M0 32-bitter then you can instruct the compiler to pack your structures like this and save loads of RAM. Note that taking the address of structure members may result in problems on architectures which are not byte-addressible such as a SPARC. Also it is not allowed to take the address of a bitfield inside of a structure. One last note on the use of the sizeof operator. When applied to an operand that has structure or union type, the result is the total number of bytes in such an object, including internal and trailing padding. Deep and Shallow copy Another one of those areas where we see countless bugs. Making structures with standard integer and float types does not suffer from this problem, but when you start using pointers in your structures this can turn into a problem real fast. Generally it is perfectly fine to create copies of structures by passing them into functions or using the assignement operator "=". Example struct point { int a; int b; }; void function(void) { struct point point1 = {1,2}; struct point point2; point2 = point1; // This will copy all the members of point1 into point2 } Similarly when we call a function and pass in a struct a copy of the structure will be made into the parameter stack in the same way. When the structure however contains a pointer we must be careful because the process will copy the address stored in the pointer but not the data which the pointer is pointing to. When this happens you end up with 2 structures containing pointers pointing to the same data, which can cause some very strange behavior and hard to track down bugs. Such a copy, where only the pointers are copied is called a "shallow copy" of the structure. The alternative is to allocate memory for members being pointed to by the structure and create what is called a "deep copy" of the structure which is the safe way to do it. We probably see this with strings more often than with any type of pointer e.g. struct person { char* firstName; char* lastName; } // Function to read person name from serial port void getPerson(struct person* p); void f(void) { struct person myClient = {"John", "Doe"}; // The structure now points to the constant strings // Read the person data getPerson(&myClient) } // The intention of this function is to read 2 strings and assign the names of the struct person void getPerson(struct person* p) { char first[32]; char last[32]; Uart1_Read(first, 32); Uart1_Read(last, 32); p.firstName = first; p.lastName = last; } // The problem with this code is that it is easy for to look like it works. The probelm with this code is that it will very likely pass most tests you throw at it, but it is tragically broken. The 2 buffers, first and last, are allocated on the stack and when the function returns the memory is freed, but still contains the data you received. Until another function is called AND this function allocates auto variables on the stack the memory will reamain intact. This means at some later stage the structure will become invalid and you will not be able to understand how, if you call the function twice you will later find that both variables you passed in contain the same names. Always double check and be mindful where the pointers are pointing and what the lifetime of the memory allocated is. Be particularly careful with memory on the stack which is always short-lived. For a deep copy you would have to allocate new memory for the members of the structure that are pointers and copy the data from the source structure to the destination structure manually. Be particularly careful when structures are passed into a function by value as this makes a copy of the structure which points to the same data, so in this case if you re-allocate the pointers you are updating the copy and not the source structure! For this reason it is best to always pass structures by reference (function should take a pointer to a structure) and not by value. Besides if data is worth placing in a structure it is probably going to be bigger than a single pointer and passing the structure by reference would probably be much more efficient! Comparing Structs Although it is possible to asign structs using "=" it is NOT possible to compare structs using "==". The most common solution people go for is to use memcmp with sizeof(struct) to try and do the comparison. This is however not a safe way to compare structures and can lead to really hard to track down bugs! The problem is that structures can have padding as described above, and when structures are copied or initialized there is no obligation on the compiler to set the values of the locations that are just padding, so it is not safe to use memcmp to compare structures. Even if you use packing the structure may still have trailing padding after the data to meet alignment requirements. The only time using memcmp is going to be safe is if you used memset or calloc to clear out all of the memory yourself, but always be mindful of this caveat. Conclusion Structs are an important part of the C language and a powerful feature, but it is important that you ensure you fully understand all the intricacies involved in structs. There is as always a lot of bad advice and bad code out there in the wild wild west known as the internet so be careful when you find code in the wild, and just don't use typdef on your structs! References As always the WikiPedia page is a good resource Link to a PDF of the comittee draft which was later approved as the C99 standard Linus Torvalds explains why you should not use typedef everywhere for structs Good write-up on packing of structures Bit Fields discussed on Hackaday
  3. 2 points
    Comparing raw pin toggling speed AVR ATmega4808 vs PIC 16F15376 Toggling a pin is such a basic thing. After all, we all start with that Blinky program when we bring up a new board. It is actually a pretty effective way to compare raw processing speed between any two microcontrollers. In this, our first Toe-to-toe showdown, we will be comparing how fast these cores can toggle a pin using just a while loop and a classic XOR toggle. First, let's take a look at the 2 boards we used to compare these cores. These were selected solely because I had them both lying on my desk at the time. Since we are not doing anything more than toggling a pin we just needed an 8-bit AVR core and an 8-bit PIC16F1 core on any device to compare. I do like these two development boards though, so here are the details if you want to repeat this experiment. In the blue corner, we have the AVR, represented by the ATmega4808, sporting an AtMega core (AVRxt in the instruction manual) Clocking at a maximum of 20MHz. We used the AVR-IOT WG Development Board, part number AC164160. This board can be obtained for $29 here: https://www.microchip.com/Developmenttools/ProductDetails/AC164160 Compiler: XC8 v2.05 (Free) In the red corner, we have the PIC, represented by the 16F15376, sporting a PIC16F1 Enhanced Midrange core. Clocking at a maximum of 32MHz. We used the MPLAB® Xpress PIC16F15376 Evaluation Board, part number DM164143. This board can be obtained at $12 here: https://www.microchip.com/developmenttools/ProductDetails/DM164143 Compiler: XC8 v2.05 (Free) Results This is what we measured. All the details around the methodology we used and an analysis of the code follows below and attached you will find all the source code we used if you want to try this at home. The numbers in the graph are pin toggling frequency in kHz after it has been normalized to a 1MHz CPU clock speed. How we did it (and some details about the cores) Doing objective comparisons between 2 very different cores is always hard. We wanted to make sure that we do an objective comparison between the cores which you can use to make informed decisions on your project. In order to do this, we had to deal with the fact that the maximum clock speed of these devices is not the same and also that the fundamental architecture of these two cores is very different. In principle, the AVR is a Load-store Architecture machine with a 1 stage pipeline. This basically means that all ALU operations have to be performed between CPU registers and the RAM is used to load from and store results to. The PIC, on the other hand, uses a Register Memory Architecture, which means in short that some ALU operations can be performed on RAM locations directly and that the machine has a much smaller set of registers. On the PIC all instructions are 1 word in length, which is 14-bits wide, while the data bus is 8-bits in size and all results will be a maximum of 8-bits in size. On the AVR instructions can be 16-bit or 32-bit wide which results in different execution times depending on the instruction. Both processors have a 1 stage pipeline, which means that the next instruction is fetched while the current one is being executed. This means branching causes an incorrect fetch and results in a penalty of one instruction cycle. One major difference is that the AVR, due to its Load-store Architecture, is capable of completing the instruction within as little as just one clock cycle. When instructions need to use the data bus they can take up to 5 clock cycles to execute. Since the PIC has to transfer data over the bus it takes multiple cycles to execute an instruction. In keeping with the RISC paradigm of highly regular instruction pipeline flow, all instructions on the PIC take 4 clock cycles to execute. All of this just makes it tricky and technical to compare performance between these processors. What we decided to do is rather take typical tasks we need the CPU to perform which occurs regularly in real programs and simply measure how fast each CPU can perform these tasks. This should allow you to work backwards from what your application will be doing during maximum throughput pressure on the CPU and figure out which CPU will perform the best for your specific problem. Round 1: Basic test For the first test, we used virtually the same code on both processors. Since both of these are supported by MCC it was really easy to get going. We created a blank project for the target CPU Fired up MCC Adjusted the clock speed to the maximum possible Clicked in the pin manager to make a single pin on PORTC an output Hit generate code. After this all we added was the following simple while loop: PIC AVR while (1) { LATC ^= 0xFF; } while (1) { PORTC.OUT ^= 0xFF; } The resulting code produced by the free compilers (XC8 v2.05 in both cases) was as follows, interestingly enough both loops had the same number of instructions (6 in total) including the loop jump. This is especially interesting as it will show how the execution of a same-length loop takes on each of these processors. You will notice that without optimization there is some room for improvement, but since this is how people will evaluate the cores at first glance we wanted to go with this. PIC AVR .tg {border-collapse:collapse;border-spacing:0;} .tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;} .tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;} .tg .tg-0lax{text-align:left;vertical-align:top} Address Hex Instruction 07B3 30FF MOVLW 0xFF 07B4 00F0 MOVWF __pcstackCOMMON 07B5 0870 MOVF __pcstackCOMMON, W 07B6 0140 MOVLB 0x0 07B7 069A XORWF LATC, F 07B8 2FB3 GOTO 0x7B3 Address Hex Instruction 017D 9180 LDS R24, 0x00 017E 0444 017F 9580 COM R24 0180 9380 STS 0x00, R24 0181 0444 0182 CFFA RJMP 0x17D We used my Saleae logic analyzer to capture the signal and measure the timing on both devices. Since the Saleae is thresholding the digital signal and the rise and fall times are not always identical you will notice a little skew in the measurements. We did run everything 512x slower to confirm that this was entirely measurement error, so it is correct to round all times to multiples of the CPU clock in all cases here. PIC AVR Analysis For the PIC The clock speed was 32MHz. We know that the PIC takes 4 clock cycles to execute one instruction, which gives us an expected instruction rate of one instruction every 125ns. Rounding for measurement errors we see that the PIC has equal low and high times of 875ns. That is 7 instruction cycles for each loop iteration. To verify if this makes sense we can look at the ASM. We see 6 instructions, the last of which is a GOTO, which we know will take 2 instruction cycles to execute. Using that fact we can verify that the loop repeats every 7 instruction cycles as expected (7 x 125ns = 875ns.) For the AVR The clock speed was 20MHz. We know that the AVR takes 1 clock cycle per instruction, which gives us an expected instruction rate of one instruction every 50ns. Rounding for measurement errors we see that the AVR has equal low and high times of 400ns. That is 8 instruction cycles for each loop iteration. To verify if this makes sense we again look at the ASM. We see 4 instructions, the last of which is an RJMP, which we know will take 2 instruction cycles to execute. We also see one LDS which takes 3 cycles because it is accessing sram, and one STS instruction which will each take 2 cycles and a Complement instruction which takes 1 more. Using those facts we can verify that the loop should repeat every 8 instruction cycles as expected (8 x 50ns = 400ns.) Comparison Since the 2 processors are not running at the same clock speed we need to do some math to get a fair comparison. We think 2 particular approaches would be reasonable. Compare the raw fastest speed the CPU can do, this gives a fair benchmark where CPU's with higher clock speeds get an advantage. Normalize the results to a common clock speed, this gives us a fair comparison of capability at the same clock speed. In the numbers below we used both methods for comparison. The numbers AVR PIC Notes Clock Speed 20MHz 32MHz Loop Speed 400ns 875ns Maximum Speed 2.5Mhz 1.142MHz Loop speed as a toggle frequency Normalized Speed 125kHz 35.7kHz Loop frequency normalized to a 1MHz CPU clock ASM Instructions 4 6 Loop Code Size 12 bytes 12 bytes 4 instructions 6 words 10.5 bytes 6 instructions Due to the nuances here we compared this 3 ways Total Code Size 786 bytes 101 words 176.75 bytes Round 2: Expert Optimized test For the second round, we tried to hand-optimize the code to squeeze out the best possible performance from each processor. After all, we do not want to just compare how well the compilers are optimizing, we want to see what is the absolute best the raw CPU's can achieve. You will notice that although optimization doubled our performance, it made little difference to the relative performance between the two processors. For the PIC we wrote to LATC to ensure we are in the right bank, and pre-set the W register, this means the loop reduces to just a XORF and a GOTO. For the AVR we changed the code to use the Toggle register instead doing an XOR of the OUT register for the port. The optimized code looked as follows. PIC AVR LATC = 0xFF; asm ("MOVLW 0xFF"); while (1) { asm ("XORWF LATC, F"); } asm ("LDI R30,0x40"); asm ("LDI R31,0x04"); asm ("SER R24"); while (1){ asm ("STD Z+7,R24"); } The resulting ASM code after these changes now looked as follows. Note we did not include the instructions outside of the loop here as we are really just looking at the loop execution. PIC AVR .tg {border-collapse:collapse;border-spacing:0;} .tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;} .tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;} .tg .tg-0lax{text-align:left;vertical-align:top} Address Hex Instruction 07C1 069A XORWF LATC, F 07C2 00F0 GOTO 0x7C1 Address Hex Instruction 0180 8387 STD Z+7,R24 0181 CFFE RJMP 0x180 Here are the actual measurements: PIC AVR Analysis For the PIC we do not see how we could improve on this as the loop has to be a GOTO which takes 2 cycles and 1 instruction is the least amount of work we could possibly do in the loop so we are pretty confident that this is the best we can do, and when measuring we see 3 instruction cycles which we think is the limit here. Note: N9WXU did suggest that we could fill all the memory with XOR instructions and let it loop around forever and in doing so save the GOTO, but we would still have to set W to FF every second instruction to have consistent timing, so this would still be 2 instructions per "loop" although it would use all the FLASH and execute in 250ns. Novel as this idea was, since that means you can do nothing else we dismissed that idea as not representative. For the AVR we think we are also at the limits here. The toggle register lets us toggle the pin in 1 clock cycle which cannot be beaten, and the RJMP unavoidably adds 2 more. We measure 3 cycles for this. AVR PIC Notes Clock Speed 20MHz 32MHz Loop Speed 150ns 375ns Maximum Speed 6.667Mhz 2.667MHz Loop speed as a toggle frequency Normalized Speed 333.3kHz 83.3kHz Loop frequency normalized to a 1MHz CPU clock ASM Instructions 2 2 Loop Code Size 4 bytes 4 bytes 2 words 3.5 bytes At this point, we can do a raw comparison of absolute toggle frequency performance after the hand optimization. Comparing this way gives the PIC the advantage of running at 32MHz while the AVR is limited to 20MHz. Interestingly the PIC gains a little as expected, but the overall picture does not change much. The source code can be downloaded here: PIC MPLAB-X Project file MicroforumToggleTestPic16f1.zip AVR MPLAB-X Project file MicroforumToggleTest4808.zip What next? For our next installment, we have a number of options. We could add more cores/processors to this test of ours, or we can take a different task and cycle through the candidates on that. We could also vary the tools by using different compilers and see how they stack up against each other and across the architectures. Since our benchmarks will all be based on real-world tasks it should not matter HOW the CPU is performing the task or HOW we created the code, the comparison will simply be how well the job gets done. Please do post any ideas or requests in the comments and we will see if we can either improve this one or oblige with another Toe-to-toe comparison. Updates: This post was updated to use the 1 cycle STD instruction instead of the 2 cycle STS instruction for the hand-optimized AVR version in round 2
  4. 2 points
    Comments I was musing over a piece of code this week trying to figure out why it was doing something that seemed to not make any sense at first glance. The comments in this part of the code were of absolutely no help, they were simply describing what the code was doing. Something like this: // Add 5 to i i += 5; // Send the packet sendPacket(&packet); // Wait on the semaphore sem_wait(&sem); // Increment thread count Threadcount++; These comments just added to the noise in the file, made the code not fit on one page, harder to read and did not tell me anything that the code was not already telling me. What was missing was what I was grappling with. Why was it done this way, why not any other way? I asked a colleague and to my frustration his answer was that he remembered that there was some discussion about this part of the code and that it was done this way for a very good reason! My first response was of course "well why is that not in the comments!?" I remember having conversations about comments being a code smell many times in the past. There is an excellent talk by Kevlin Henney about this on youtube. Just like all other code smells, comments are not universally bad, but whenever I see a comment in a piece of code my spider sense starts tingling and I immediate look a bit deeper to try and understand why comments were actually needed here. Is there not a more elegant way to do this which would not require comments to explain, where reading the code would make what it is doing obvious? WHAT vs. WHY Comments We all agree that good code is code which is properly documented, referring to the right amount of comments, but there is a terrible trap here that programmers seem to fall in all of the time. Instead of documenting WHY they are doing things a particular way, they instead put in the documentation WHAT the code is doing. As Henney explains English, or whatever written language for that matter, is not nearly as precise a language as the programming language used itself. The code is the best way to describe what the code is doing and we hope that someone trying to maintain the code is proficient in the language it is written in, so why all of the WHAT comments? I quite like this Codemanship video, which shows how comments can be a code smell, and how we can use the comments to refactor our code to be more self-explanatory. The key insight here is that if you have to add a comment to a line or a couple of lines of code you can probably refactor the code into a function which has the comment as the name. If you have a line which only calls a function that means that the function is probably not named well enough to be obvious. Consider taking the comment and using it as the name of the function instead. This blog has a number of great examples of how NOT to comment your code, and comical as the examples are the scary part is how often I actually see these kinds of comments in production code! It has a good example of a "WHY" comment as follows. /* don't use the global isFinite() because it returns true for null values */ Number.isFinite(value) So what are we to do, how do we know if comments are good or bad? I would suggest the golden rule must be to test your comment by asking whether is it explaining WHY the code is done this way or if it is stating WHAT the code is doing. If you are stating WHAT the code is doing then consider why you think the comment is necessary in the first place. First, consider deleting the comment altogether, the code is already explaining what is being done after all. Next try to rename things or refactor it into a well-named method or fix the problem in some other way. If the comment is adding context, explaining WHY it was done this way, what else was considered and what the trade-offs were that led to it being done this way, then it is probably a good comment. Quite often we try more than one approach when designing and implementing a piece of code, weighing various metrics/properties of the code to settle finally on the preferred solution. The biggest mistake we make is not to capture any of this in the documentation of the code. This leads to newcomers re-doing all your analysis work, often re-writing the code before realizing something you learned when you wrote it the first time. When you comment your code you should be capturing that kind of context. You should be documenting what was going on in your head when you were writing the code. Nobody should ever read a piece of your code and ask out loud "what were they thinking when they did this?". What you were thinking should be there in plain sight, documented in the comments. Conclusion If you find that you need to find the right person to maintain any piece of code in your system because "he knows what is going on in that code" or even worse "he is the only one that knows" this should be an indication that the documentation is incomplete and more often than not you will find that the comments in this code are explaining WHAT it is doing instead of the WHY's. When you comment your code avoid at all costs explaining WHAT the code is doing. Always test your comments against the golden rule of comments, and if it is explaining what is happening then delete that comment! Only keep the WHY comments and make sure they are complete. And make especially sure that you document the things you considered and concluded would be the wrong thing to do in this piece of code and WHY that is the case.
  5. 2 points
    Assembly language may no longer be the mainstream way to write code for embedded systems, however it is the best way to learn how a specific CPU works without actually building one. Assembly language is simply the raw instruction set of a specific CPU broken into easy to remember pneumonics with a very basic syntax. This enables you full control of everything the CPU does without any translation provided by a compiler. Sometimes this is the only reasonable way to do something that cannot be represented by a higher level language. Here is an example from a project I was working on today. Today I wanted to create a 128-bit integer (16 bytes). That means I will need to add, subtract, multiply, etc. on my new 128-bit datatype. I was writing for a 32-bit CPU so this would require 4 32-bit values concatenated together to form the 128-bit value. If we consider the trivial problem of adding two of these numbers together, lets consider the following imaginary code. int128_t foo = 432123421234; int128_t bar = 9873827438282; int128_t sum = foo + bar; But my 32-bit CPU does not understand int128_t so I must fake it. How about this idea. int32_t foo[] = {0x00112233, 0x44556677, 0x8899AABB, 0xCCDDEEFF}; int32_t bar[] = {0xFFEEDDCC, 0xBBAA9988, 0x77665544, 0x33221100}; int32_t sum[4]; sum[0] = foo[0] + bar[0]; sum[1] = foo[1] + bar[1]; sum[2] = foo[2] + bar[2]; sum[3] = foo[3] + bar[3]; But back in grade school I learned about the 10's place and how I needed to carry a 1 when the sum of the one's place exceeded 10. It seems that it is possible that FOO[0] + BAR[0] could exceed the maximum value that can be stored in an int32_t so there will be a carry from that add. How do I add carry into the next digit? In C I would need to rely upon some math tricks to determine if there was a carry. But the hardware already has a carry flag and there are instructions to use it. We could easily incorporate some assembly language and do this function in the most efficient way possible. So enough rambling. Let us see some code. First, we need to configure MPLAB to create an ASM project. Create a project in the normal way, but when you get to select a compiler you will select MPASM. Now you are ready to get the basic source file up and running. Here is a template to cut/paste. #include "p16f18446.inc" ; CONFIG1 ; __config 0xFFFF __CONFIG _CONFIG1, _FEXTOSC_ECH & _RSTOSC_EXT1X & _CLKOUTEN_OFF & _CSWEN_ON & _FCMEN_ON ; CONFIG2 ; __config 0xFFFF __CONFIG _CONFIG2, _MCLRE_ON & _PWRTS_OFF & _LPBOREN_OFF & _BOREN_ON & _BORV_LO & _ZCD_OFF & _PPS1WAY_ON & _STVREN_ON ; CONFIG3 ; __config 0xFF9F __CONFIG _CONFIG3, _WDTCPS_WDTCPS_31 & _WDTE_OFF & _WDTCWS_WDTCWS_7 & _WDTCCS_SC ; CONFIG4 ; __config 0xFFFF __CONFIG _CONFIG4, _BBSIZE_BB512 & _BBEN_OFF & _SAFEN_OFF & _WRTAPP_OFF & _WRTB_OFF & _WRTC_OFF & _WRTD_OFF & _WRTSAF_OFF & _LVP_ON ; CONFIG5 ; __config 0xFFFF __CONFIG _CONFIG5, _CP_OFF ; GPR_VAR UDATA Variable RES 1 SHR_VAR UDATA_SHR Variable2 RES 1 ;******************************************************************************* ; Reset Vector ;******************************************************************************* RES_VECT CODE 0x0000 ; processor reset vector pagesel START ; the location of START could go beyond 2k GOTO START ; go to beginning of program ISR CODE 0x0004 ; interrupt vector location ; add Interrupt code here RETFIE ;******************************************************************************* ; MAIN PROGRAM ;******************************************************************************* MAIN_PROG CODE ; let linker place main program START ; initialize the CPU LOOP ; do the work GOTO LOOP END The first thing you will notice is the formatting is very different than C. In assembly language programs the first column in your file is for a label, the second column is for instructions and the third column is for the parameters for the instructions. In this code RES_VECT, ISR, MAIN_PROG, START and LOOP are all labels. In fact, Variable and Variable2 are also simply labels. The keyword CODE tells the compiler to place code at the address following the keyword. So the RES_VECT (reset vector) is at address zero. We informed the compiler to place the instructions pagesel and GOTO at address 0. Now when the CPU comes out of reset it will be at the reset vector (address 0) and start executing these instructions. Pagesel is a macro that creates a MOVLP instruction with the bits <15:11> of the address of START. Goto is a CPU instruction for an unconditional branch that will direct the program to the address provided. The original PIC16 had 35 instructions plus another 50 or so special keywords for the assembler. The PIC16F1xxx family (like the PIC16F18446) raises that number to about 49 instructions. You can find the instructions in the instruction set portion of the data sheet documented like this: The documentation shows the syntax, the valid range of each operand, the status bits that are affected and the work performed by the instruction. In order to make full use of this information, you need one more piece of information. That is the Programmers Model. Even C has a programmers model but it does not always match the underlying CPU. In ASM programming the programmers model is even more critical. You can also find this information in the data sheet. In the case of the PIC16F18446 it can be found in chapter 7 labeled Memory Organization. This chapter is required reading for any aspiring ASM programmers. Before I wrap up we shall modify the program template above to have a real program. START banksel TRISA clrf TRISA banksel LATA loop bsf LATA,2 nop bcf LATA,2 GOTO loop ; loop forever END This program changes to the memory bank that contains TRISA and clears TRISA making all of PORT A an output. Next is changes to the memory bank that contains the LATCH register for PORT A and enters the loop. BSF is the pneumonic for Bit Set File and it allows us to set bit 2 of the LATA register. NOP is for No OPeration and just lets the bit set settle. BCF is for Bit Clear File and allows us to clear bit 2 and finally we have a branch to loop to do this all over again. Because this is in assembly we can easily count up the instruction cycles for each instruction and determine how fast this will run. Here is the neat thing about PIC's. EVERY instruction that does not branch takes 1 instruction cycle (4 clock cycles) to execute. So this loop is 5 cycles long. We can easily add instructions if we need to produce EXACTLY a specific waveform. I hope this has provided some basic getting started information for assembly language programming. It can be rewarding and will definitely provide a deeper understanding on how these machines work. Good Luck
  6. 2 points
    Time for part 2! Last time, I gave you the homework of downloading and installing MPLAB and finding a Curiosity Nano DM164144 . Once you have done your homework, it is time for STEP 3, get that first project running. Normally my advice would be to breakout Mplab Code Configurator and get the initialization code up and running, but I did not assign that for homework! So we will go old school and code straight to the metal. Fortunately, our first task is to blink an LED. Step 1: Find the pin with the LED. A quick check of the schematic finds this section on page 3. This section reveals that the LED is attached to PORT A bit 2. With the knowledge of the LED location, we can get to work at blinking the LED. The first step is to configure the LED pin as an output. This is done by clearing bits in the TRIS register. I will cheat and simply clear ALL the bits in this register. Next we go into a loop and repeatedly set and clear the the PORT A bit 2. #include <xc.h> void main(void) { TRISA = 0; while(1) { PORTA = 0; PORTA = 0x04; } return; } Let us put this together with MPLAB and get it into the device. First we will make a new project: Second, we will create our first source file by selecting New File and then follow the Microchip Embedded -> XC8 Compiler -> main.c give your file a name (I chose main.c) And you are ready to enter the program above. And this is what it looks like typed into MPLAB. But does it work? Plug in your shiny demo board and press this button: And Voila!, the LED is lit... but wait, my code should turn the LED ON and OFF... Why is my LED simply on? To answer that question I will break out my trusty logic analyzer. That is my Saleae Logic Pro 16. This device can quickly measure the voltage on the pins and draw a picture of what is happening. One nice feature of this device is it can show both a simple digital view of the voltage and an analog view. So here are the two views at the same time. Note the LED is on for 3.02µs (microseconds for all of you 7'th graders). That is 0.00000302 seconds. The LED is off for nearly 2µs. That means the LED is blinking at 201.3kHz. (201 thousand times per second). That might explain why I can't see it. We need to add a big delay to our program and slow it down so humans can see it. One way would be to make a big loop and just do nothing for a few thousand instructions. Let us make a function that can do that. Here is the new program. #include <xc.h> void go_slow(void) { for(int x=0;x<10000;x++) { NOP(); } } void main(void) { TRISA = 0; while(1) { PORTA = 0; go_slow(); PORTA = 0x04; go_slow(); } return; } Note the new function go_slow(). This simply executes a NOP (No Operation) 10,000 times. I called this function after turning the LED OFF and again after turning the LED ON. The LED is now blinking at a nice rate. If we attach the saleae to it, we can measure the new blink. Now is is going at 2.797 times per second. By adjusting the loop from 10,000 to some other value, we could make the blink anything we want. To help you make fast progress, please notice the complete project Step_3.zip attached to this post. Next time we will be exploring the button on this circuit board. For your homework, see if you can make your LED blink useful patterns like morse code. Good Luck Step_3.zip
  7. 1 point
    What every embedded programmer should know about ADC measurement, accuracy and sources of error ADC's you encounter will typically be specified as 8, 10 or 12-bit. This is however rarely the accuracy that you should expect from your ADC. It seems counter-intuitive at first, but once you understand what goes on under the hood this will be much clearer. What I am going to do today is take a simple evaluation board for the PIC18F47K40 (MPLAB® Xpress PIC18F47K40 Evaluation Board ) and determine empirically (through experiments and actual measurements) just how accurate you should expect ADC measurements to be. Feel free to skip ahead to a specific section if you already know the basics. Here is a summary of what we will cover with links for the cheaters. Units of Measurement of Errors Measurement Setup Sources and Magnitude of Errors Voltage Reference Noise Offset Gain Error Missing Codes and DNL Integral Nonlinearity (INL) Sampling Error Adding up Errors Vendor Comparison Final Notes Units of Measurement of Errors When we talk about ADC's you will often see the term LSB used. This term refers to the voltage represented by the least significant bit of the ADC, so in reality it is the voltage for which you should read 1 on the ADC. This is a convenient measure for ADC's since the reference voltage is often not fixed and the size of 1 LSB in volts will depend on what you have the reference at, while most errors caused by the transfer function will scale with the reference. For a 10-bit ADC with 3v3 of range one LSB will be 3.3/(2^10) = 3.3/1024 = 3.22mV. An error of 1% on a 10-bit converter would represent 1%*1024 = 10.24x the size of one LSB, so we will refer to this as 10LSB of error, which means our measurement could be off by 32.2mV or ten times the size of 1 LSB. When I have 10 LSB in error I really should be rounding my results to the nearest 10 LSB, since the least significant bits of my measurement will be corrupted by this error. 10LSB will take 3.32 bits to represent. This means that my lowest 3 bits are possibly incorrect and I can only be confident in the values represented by the 7 most significant bits of my result. This means that the effective number of bits (ENOB) for my system is only 7, even though my ADC is taking a 10-bit measurement. The lower 3 bits are affected by the measurement error and cannot be relied upon so they should be discarded if I am trying to make an absolute voltage measurement accurately. We can always work out exactly how many bits of accuracy we are losing, or to how many bits we need to round to, using the calculation: log(#LSB error)/log(2) Note that this calculation will give us fractional numbers of bits. If we have 10LSB error the error does not quite affect a full 4 bits (that happens only at 16LSB), but we can not say it removes only 3 bits, because that already happened at 8LSB, so this is somewhere in between. In order to compare errors meaningfully we will work with fractions of bits in these cases, so 10LSB of error reduces our accuracy by 3.32 bits. This is especially useful when errors are additive because we can add up all the fractional LSB of errors to get the total error to the nearest bit. At this point I would like to encourage you to take your oscilloscope and try to measure how much noise you can detect on your lines. You will probably be surprized that most desk oscilloscopes can only measure signals down to 20mV, which means that 1LSB on a 10-bit ADC with a 3V3 reference will be close to 10x smaller than the smallest signal your digital scope can measure! If you can see noise on the scope (which you probably can) then that means it is probably at least 20mV or 10LSB of error. It turns out that our intuition about how accurate an ADC should be, as well as how accurate our scope can measure is seldom correct ... Measurement Setup I am using my trusty Saleae Logic Pro 8 today. It has a 12-bit ADC and measures +-10V on the analog channel and is calibrated to be accurate to between 9 and 10 ENOB of absolute accuracy. This means that 1LSB of error will be roughly 4.8mV, which for my 2V system with a 10-bit ADC is already the size of 2LSB of measurement error. When I ground the Saleae input and take a measurement we can see how much noise to expect on the input during our measurements. As you will see later we actually want to see 2-3LSB of noise so that we can improve accuracy by digital filtering, if you do not have enough noise this is not possible, so this looks really good. Using the software to find the maximum variation for me you can see that I have about 15.64mV of noise on my line. Since the range is +-10V this is only 15.6/20000 = 0.08% of error, but this is going to be, for my target 2V range, 15.6/2048*1024 = 8LSB of error to start with on my measurement equipment! For an experiment we are going to need an analog voltage source to measure using the ADC. It so happens that this device has a DAC, so why not just use that! You would think that this was a no-brainer, but it turns out, as always, that it is not quite as simple as that would seem! What I will do first is set the DAC and ADC to use the same reference (this has the added benefit that Vref inaccuracy will be cancelled out, nice!).We expect that if we set the DAC to give us 1.024V (50% of full range) and we then measure this using the 10-bit ADC, that we would measure half of the ADC range, or 512 right? For the test I made a simple program that will just measure the ADC every 1 second and print the result to the UART. Well here is the result of the measurement (to the right). Not what you expected ?! Not only are the 1st two readings appallingly bad, but the average seems to be 717, which is a full 40% more than we expect! How is this possible? Well this is how. Not only is the ADC inaccurate here, but the DAC even more so! The DAC is only 5 bits and it is specified to be accurate to 5LSB. That is already a full 320mV of error, but that is still not nearly enough to explain why we are measuring 717/1024*2.048 = 1.434V instead of 1.024V... So what is really going on here? To see I connected my trusty Saleae and changed the application to cycle the DAC though all 32 values, 1s per value, and make a plot for us to look at. On the Saleae we see this. It turns out that the DAC is such a weak source that anything you connect to it's output (like even an ADC input leakage or simply an I/O pin with nothing connected to it!), will load down the DAC and skew the output! This has been the cause of consternation for many a soul actually (see e.g. this post on the Microchip forum) Wow, so that makes sense, but is there anything we can do about this? On this device unfortunately there is not much we can do. There are devices with on-board op-amps you can use to buffer the DAC output like the 16F170x family, but this device does not have op-amps so we are out of luck! I will blog all about DAC's and about what the reasons for this shape is on another occasion, this blog is about ADC after all! So all I am going to do is adjust the DAC setting to give us about the voltage we need by measuring the result using the Saleae and call it a day. Turns out I needed to subtract 6 from the output setting to get close. We now see a measurement of 520 and this is what we see while taking measurements with the Saleae. 10.37mV of noise on just about 1V and we are in business! Sources and Magnitude of Error When measurement errors are uncorrellated it means that they will all add up to form the total worst case error. For example if I have 2LSB of noise, and I also have 2LSB of reference error this means the reading can be 2LSB removed from the correct value as a result of the reference, and an additional 2LSB as a result of the noise, to give 4LSB of total error. This means that these two types of errors are not correllated and the bits contributed by each to the total error are additive. At this point I want to mention that customers often come to me demanding a 16bit-ADC because they feel that the 10-bit one they have is not adequate for their application. They can seldom explain to me why they need 31uV of accuracy or what advanced layout techniques they are applying to keep the noise levels to even remotely this range, and most of the time the real problem turns out to be that their 10bit-ADC is really being used so badly that they are hardly getting 5 bits of accuracy from the converter. I also often see calculations which effectively discard the lower 4 bits of the ADC measurement, leaving you with only 8-bits of effective measurement, so if you do that getting more bits in the ADC is obviously only going to buy you only disappointment! That said, lets look at all of the most significant sources of error one by one in more detail. There are quite a few so we will give them headings and numbers. 1. Voltage Reference To get us started, lets look at the voltage reference and see how many LSB this contributes to our measurement error. If you are using a 1% reference, then please do not insist that you need a 16-bit or even a 12-bit ADC, because your reference alone is contributing errors into the 7th most significant bit and 8 bits is all you are going to get anyway! The datasheet for our evaluation board chip (PIC18F47K40) shows that the voltage reference will be accurate to +-4% when we set it to 2.048V like we did. People are always surprized when they realize how many LSB they are losing due to the voltage reference! 4% of 1024 = 51.2, which means that the reference alone can contribute up to 51.2 LSB of error to our system! Using an expensive off-chip reference also complicates things for us. Using that we would now have to be very careful with the layout to not introduce noise at the reference pin, and also take care of any signals coupling into this pin. Even then the reference will likely be something like a TL431 which will only be accurate to 1% which will be 10 LSB reducing our 10-bit ADC to less than 8 ENOB. We must note that reference errors are not equally distributed. At the maximum scale 1% represents 10LSB of error, but at the lower end of the scale 1% will represent only 1% of 1LSB. Since we are looking for the worst-case error we have to work with 10LSB due to the 1% error over the full ADC range. In your application you may be able to adjust the contribution of this error down to better represent the range you are expecting to measure. For example - at mid range, where our test signal is, the reference error will only contribute 5LSB of error with a 1% reference, or 25LSB for our 4% internal reference. The reference error is something which we could calibrate out if we knew what the error was and many manufacturers discard it stating simply that you should calibrate it out. Sadly these references tend to drift over time, temperature and supply voltages, so you usually cannot just calibrate it in the factory and compensate for the error in software and forget it. To revisit our 16-bit ADC scenario, if I want to measure accurately to 31uV (16 bits on a 2V reference) that reference would have to be accurate to 31uV/2V = 0.0015%. Let's look on Digikey for a price on a Voltage reference with the best specs we can find. Best candidate I can find is this one at $128.31 a piece, and even that gives me only 0.1% with up to 0.6ppm/C drift. This means from 0 to 100C I will have 0.006% of temp drift (2LSB) on top of the 0.1% tolerance (which is another 33LSB). Now to be fair if I am building a control system I am more interested in perturbations from a setpoint and a 16-bit ADC may be valuable even if my reference is off, because I am not trying to take an absolute measurement, but still maintaining noise levels below 30uV is more of a challenge than it sounds, especially if I am driving some power which adds noise to the equation. This is of course the difference between accuracy and resolution. Accuracy gives me the minimum absolute error while resolution gives me the smallest relative unit of measure. 2. Noise Noise is of course the problem we all expect. It can often be a pretty large contributor to your measurement errors, and digital circuits are known for producing lots of noise that will couple into your inputs, but as we will see noise is not all bad and can be essential if you want to improve the results through digital post processing. We have seen that every 2mV of noise will add 1LSB to the error on our system as we have a 2V reference, and 1024 steps of measurement. As you have now seen this 2mV is probably much smaller than we can measure with a typical oscilloscope, so we cannot be sure how much noise we really have if we simply look at it on our scope. For most systems the recommendation would be to place the microcontroller in lowest power sleep mode and avoid toggling any output pins during the sampling of the ADC measurement to get the measurement with the lowest noise level. A simple experiment will show how much noise we could be coupling into the measurement when an adjacent pin is being toggled. I updated our program from before to simply toggle the pin next to the ADC input constantly and measured with the Saleae to see what the effect is. On the left is the signal zoomed out and on the right is one of the transitions zoomed in so you can get a better look. That glitch on the measurement line is 150mV or 75 LSB of noise due to an adjacent pin toggling, and the dev board I have does not even have long traces which would have made this much worse! It seems like a good idea to filter all this noise using analog low-pass filters like filter capacitors, but this is not always wise. We can make small amounts of noise work to our advantage, as long as it is white noise which is uncorrellated with our signal and other errors. When we do post-processing like taking multiple samples and averaging the result we can potentially increase the overall accuracy of our measurement. Using this technique it is possible to increase the ENOB (effective number of bits) of your measurements by simply taking more samples and averaging them. Without getting too deep into the math there, if you oversample a signal by a factor of N you will improve the SNR by a factor of sqrt(N), which means oversampling 256 times and taking the average will result in an increase of 16x the SNR, which represents an additional 4 bits of resolution of the ADC. Of course this is where having uncorrellated white noise of at least +-1LSB is important. If you have no noise on your signal you would likely just sample the same value 256 times and the average would not add any improvement to the resolution. If you had white noise added to the signal however you would sample a variety of values with the average lying somewhere in between the LSB you can measure, and the ratio of difference would represent the value of the signal more accurately. For a detailed discussion on this topic you can take a look at this application note by Silicon Labs https://www.silabs.com/documents/public/application-notes/an118.pdf 3. Offset The internal circuitry in the ADC will cause some offset error added into the conversion. This error will move all measurements either up or down by an equal amount. The Offset is a critical parameter for an ADC and should be specified in the datasheet for your device. For the PIC18F47K40 device the error due to offset is specified as 2 LSB. Of course if we know what the offset is we could easily subtract this from the results, so many specifications will exclude the offset error and claim that you could easily "calibrate out" the offset. This may be possible, even easy to do, but if you do not write the code for it and do the actual math you will have to include the offset error in your accuracy calculations, and measuring what the current offset is can be a real challenge in a real-world system which is not located on your laboratory bench. If you do decide to measure the offset on the factory floor and calibrate it out using software you need to be careful to use an accurate reference, avoid noise and other sources of error and make sure that the offset will remain constant over the operating range of voltage and temperature and also that it will not drift over time. If any of these are true your calibration will be met with limited success. Offset is often hard to calibrate out since many ADC's are not accurate close to the extremes (at Vref or 0V). If they were you could take a measurement with the input on Vref+ and on Vref- and determine the offset, but we knew it was never going to be this easy! The offset will also be different from device to device, so it is not possible to calibrate this out with fixed values in your code, you will have to actively measure this on every device in the factory and adjust as the offset changes. Some manufacturers will actually calibrate out the offset on an ADC for you during their manufacturing process. If this is the case you will probably see a small offset error of +-1 LSB which means that it is calibrated to be within this range. On our device the datasheet specifies a typical offset error of 0.5 LSB with a max error of 2 LSB, so this device is factory calibrated to remove the offset error, but even after this we should still expect up to 2 LSB of drift in the offset around the calibrated value. 4. Gain Error Similar to the offset the internal transfer function of the ADC is designed to be as close as possible to ideal but there is always some error. Gain error will cause the slope of the transfer function to be changed. Depending on the offset this can cause an error which is at maximum either at the top or bottom end of the measurement scale as shown in the figure below. Like the offset it is also possible to calibrate out the gain error, as long as we have enough reference points to use for the calibration. If the transfer function is perfectly linear this would mean we would only require 2 measurement points. For our device the datasheet spec is typically 0.2LSB of gain error with a max error of 1.5LSB. This means that we cannot gain much from attempting to calibrate out the gain on this one. For other manufacturers you can easily find gain and offset errors in the tens of LSB, which makes calibration and compensation for the gain and offset worth the effort. The PIC18F47K40 is not only compensated for drift with temperature but also individually calibrated in the factory, so it seems that any additional calibration measurements will be at most accurate to 1LSB and the device is already specified to typically have less than this error, so calibration will probably gain us nothing. 5. Missing Codes and DNL We expect that every time the code increments by 1LSB that the input voltage has increased by exactly 1LSB in size. For an ADC the DNL error is a measure of how close to this ideal we are in reality. It represents the largest single step error that exists for the entire range of the ADC. If the DNL is stated at 0.5LSB this means that it can take anything from 0.5LSB to 1.5LSB of input voltage change to get the output code to increment by 1. When the DNL is more than 1LSB it means that we can move the input voltage by 2LSB and only get a single count of the converter. When this happens it is possible that it causes the next bit to be squeezed down to 0LSB, which can cause the converter to skip that code entirely as shown below. Most converters will specify that the result will monotonically increase as the voltage increases and that it will have no missing codes as you scan through the range, but you still have to be careful, because this is under ideal conditions and when you add in the other errors it is possible that some codes get skipped, so when you are comparing the output of the converter never check for a specific conversion value. Always look for a value in a range around the limit you are checking. 6. Integral Nonlinearity - INL INL is another of the critical parameters for all ADC's and will be stated in your datasheet if the ADC is any good. For our example the INL is specified as 3.5LSB. The tearm INL refers to the integral of the differential nonlinearity. In effect it represents what the maximum deviation from the ideal transfer function of the ADC will be as shown in the picture below. The yellow line represents the ideal transfer function while the blue line represents the actual transfer function. As you can see the INL is defined as the size of the maximum error through the range of the ADC. Since the INL can happen at any location along the curve it is not possible to calibrate this out. It is also uncorrellated with the other errors we have examined. We just have to live with this one! 7. Sampling error A SAR ADC will consist of a sampling capacitor which holds the voltage we are converting during the conversion cycle. We must take care when we take a sample that we allow enough time for this sampling capacitor to charge to the level of accuracy we want to see in our conversion. Effectively we end up with a circuit that has some serial impedance through which the sampling capacitor is charged. The simplified circuit for the PIC18F47K40 looks as follows (from the datasheet). As you can see the series impedance (Rs) together with the sampling writch and passgate impedance (RIC + Rss) will form a low-pass RC filter to charge Chold. A detailed calculation of the sampling time required to be within 1LSB of the desired sampling value is shown in the ADC section of the device datasheet. If we leave too little time for the sample to be acquired this will directly result in a measurement error. In our case this means that if we have 10K Rs and we wait for 462us after the sampling mux turns to the input we are measuring, the capacitor will be charged to within 0.5LSB of our target voltage. The ADC on the PIC18F47L40 has a built-in circuit that can keep the sampling switch closed for us for a number of Tadc periods. This can be set by adjusting the register ADACQ or using the provided API generated by MCC to achieve this. That first inaccurate result we saw in the conversion was a direct result of the channel not being given enough time to charge the sampling cap since the acquisition time was set to the default value of 0. Of course since we are not switching channels the capacitor is closer to the correct value when to take subsequent samples so the error seems to be going away over time! I have seen customers just throw away the first ADC sample as inaccurate, but if you do not understand why you can easily get yourselfs into a lot of trouble when you need to switch channels! We can re-do the measurement and this time use an acquisition time of 4xTadc = 6.8us. This is the result. NOTE : There is another Errata on this device that you have to wait at least 1 instruction cycle before reading the ADGO bit to see if the conversion is complete after setting the ADGO bit. At first I was just doing what the datasheet suggested, set ADGO and then wait while(ADGO); for the conversion to complete. Due to this errata however the ADGO bit will still read 0 the first time you read it and you will think the conversion is done while it has not started, resulting in an ADC reading of 0 ! After adding the required NOP() to the generated MCC code as follows the incorrect first reading is gone: adc_result_t ADCC_GetSingleConversion(adcc_channel_t channel, uint8_t acquisitionDelay) { // Turn on the ADC module ADCON0bits.ADON = 1; // select the A/D channel ADPCH = channel; //Set the Acquisition Delay ADACQ = acquisitionDelay; //Disable the continuous mode. ADCON0bits.ADCONT = 0; // Start the conversion ADCON0bits.ADGO = 1; NOP(); // NOP workaround for ADGO silicon issue // Wait for the conversion to finish while (ADCON0bits.ADGO) { } // Conversion finished, return the result return (adc_result_t)(((adc_result_t)ADRESH << 8) + ADRESL); } Uncorrelated errors I will leave the full analysis up to the reader, but all of these errors are uncorrellated and thus additive, so for our case the worst case error will be when all of these errors align, the offset is in the same direction as the gain error, as the noise, as the INL error, etc. Of course when we test on the bench it is unlikely that we will encounter a situation where all of these are 100% aligned, but if we have manufactured thousands of units in the field running for years it is definitely going to happen and much more often that you would like, so we have no choice but to design for the worst-case error we are likely to see in the wild. For our exampe the different sources of error add up as follows: Voltage Reference = 4% [41 LSB] Noise [8 LSB] Offset [2.5 LSB] Gain [1.5 LSB] INL [3.5 LSB] For a total of 56.5 LSB of potential absolute error in measurement. This reduces our effective number of bits by log(56.5)/log(2) = 5.8 bits, which means that our 10-bit LSB can have absolute errors running into the 6th bit, giving us only 4 ENB (effective number of bits) when we are looking for absolute accuracy. We can improve this to 26.5 LSB by suing a 1% off-chip reference, which will make the ENB = 5 bits. If we look at the measurement we get using the Saleae we measure 0.99V on the line which should result in 0.99V/2.045V *1024 = 495 but our measurement is in fact 520, which is off by 25LSB. So as we can see our 1-board sample does not hit the worst case error at the center of the sampling range here, but our error extended at least into the 5th bit of the result as our 25LSB error requires more than 4 bits to represent. Nevertheless 25LSB is quite a bit better than the worst-case value of 56.5 LSB of error which we calculated, so this particular sample is not doing too badly! I am going to get my hands on a hair dryer in the week and take some measurements at an elevated temperature and then I will come back and update this for your reading pleasure 🙂 Comparison I recently compared some ADC's from different vendors. I was actually looking more at the other features but since I was busy with this I also noted down the specs. Not all of the datasheets were perfectly clear so do reach out to me if I made a mistake somewhere, but this was how they matched up in terms of ADC performance. As fas as I could find them I used the worst case specifications and not the typical ones. Some manufacturers only specify typical results, so this comparison is probably not fair to those who make better specifications with better information. Let me know in the comments how you feel about this. I will go over the numbers again and maybe come update all of these to typical values for a more fair comparison if someone asks me for this ... Manufacturer -> Device -> Xilinx XC7Z010 Microchip PIC32MZ EF Texas Instruments CC3220SF Espressif ESP32 ST Micro STM32L475 Renesas R65N V2 INL 2 3 2.5 12 2.5 3 DNL 1 1 4 7 1.5 2 Offset 8 2 6(1) 25(1) 2.5 3.5 Gain 0.5 8 82(1) ?(2) 4.5 3.5 Total Error (INL+Offset+Gain) 10.5 13 90.5 37+ 9.5 10 I noted that many of these manufacturers specify their ADC only at one temperature point (25C) so you probably have to dig a little deeper to ensure that the specs will not vary greatly over temperature. (1) These settings were specified in the datasheet as an absolute voltage and I converted them to LSB for max range and best resolution of the ADC. Specifically for the TI device the offset was specified as 2mV and gain error as 20mV on a 1.4V range, and for ESP32 the offset is specified as 60mV but for a wider voltage range of 2.45v (2) For ESP32 I was not able to determine the gain error clearly form the datasheet. Final Notes We can conlcude a couple of very important points from this. If the datasheet claims a 12-bit ADC we should not expect 12-bits of accuracy. First we need to calculate what to expect from our entire system, and we should expect the reference to add the most to our error. All 12-bit converters are not equal, so when comparing devices do not just look at how many bits the converters provide, also compare their performance! The same sytem can yield between 5 and 10 bits of accuracy depending on the specs of the converter, so do not be fooled! Many of the vendors specified their ADC only at a very specific temperature and reference voltage at maximum, take care not to be fooled by this - shall we call it "creative" specmanship and be sure to compare apples with apples when looking for absolute accuracy. Source Code For those who have this board or device I attach the test code I used for download here : ADC_47K40.zip
  8. 1 point
    It has been a busy few weeks but finally I can sit down and write another installment to this series on embedded programming. Today's project will be another peripheral and a visualization tool for our development machines. We are going to learn about the UART. Then we are going to write enough code to attach the UART to a basic C printing library (printf) and finally we are going to use a program called processing to visualize data sent over the UART. This should be a fun adventure so let us get started. UART (USART?) The word UART is an acronym that stands for Universal Asynchronous Receiver Transmitter. Some UART's have an additional feature so they are Universal Synchronous Asynchronous Receiver Transmitter or USART. We are not going to bother with the synchronous part so let us focus on the asynchronous part. The purpose of the UART is to send and receive data between our micro controller and some other device. Most frequently the UART is used to provide some form of user interface or simply debugging information live on a console. The UART signaling has special features that allow it to send data at an agreed upon rate (the baud rate) and an agreed upon data format (most typically 8 data bits, no parity and 1 stop bit). I am not going to go into detail on the UART signaling but you can easily learn about it by looking in chapter 36 of the PIC16F18446 datasheet. Specifically look at figure 36-3 if you are interested. UARTs are a device that transfer bytes of data (8-bits in a byte) one bit at a time across a single wire (one wire for transmit and one for receive) and a matching UART puts the bits back into a byte. Each bit is present on the wire for a brief but adjustable amount of time. Sending the data slowly is suitable for long noisy wires while sending the data fast is good for impatient engineers. A common baud rate for user interfaces is 9600 baud which is sends 1 letter every millisecond (0.001 seconds). Many years ago, before the internet, I had a 300 baud modem for communicating with other computers. When I read my e-mail the letters would arrive slightly slower than I could read. Later, after the internet was invented and we used modems to dial into the internet, I had a 56,600 baud modem so I could view web-pages and pictures more quickly. Today I use UARTS to send text data to my laptop and with the help of the processing program we can make a nice data viewer. Enough history... let us figure out the UART. Configuring the UART The UART is a peripheral with many options so we need to configure the UART for transmitting and receiving our data correctly. To setup the UART we shall turn to page 571 of our PIC16F18446 data sheet. The section labeled 36.1.1.7 describes the settings required to transmit asynchronous data. And now for some code. Here is the outline of our program. void init_uart(void) { } int getch(void) { } void putch(char c) { } void main(void) { init_uart(); while(1) { putch(getch()+1); } } This is a common pattern for designing a program. I start by writing my "main" function simply using simple functions to do the tricky bits. That way I can think about the problem and decide how each function needs to work. What sort of parameters each function will require and what sort of values each function will return. This program will be very simple. After initializing the UART to do its work, the program will read values from the UART, add 1 to each value and send it back out the UART. Since every letter in a computer is a simple value it will return the next letter in each series that I type. If I type 'A' it will return 'B' and so on. I like this test program for UART testing because it will demonstrate the UART is working and not simply shorting the receive pin to the transmit pin. Now that we have an idea for each function. Let us write them one a time. init_uart() This function will simply write all the needed values into the special function registers to activate the UART. To do that, we shall follow the checklist in the data sheet. We need to configure both the transmit and the receive function. Step 1 is to configure the baud rate. We shall use the BRG16 because that mode has the simplest baud rate formula. Of course, it is even easier when you use the supplied tables. I always figure that if you are not cheating, you are not trying so lets just find the 32MHz column in the SYNC=0, BRGH=0, BRG16 = 1 table. I like 115.2k baud because that makes the characters send much faster. So the value we need for the SPBRG is 16. Step 2 is to configure the pins. We shall configure TX and RX. To do that, we need the schematics. The key portion of the schematics is this bit. The confusing part is, the details of the TX and RX pin. If I had a $ for every time I got these backwards, I would have retired already. Fortunately the team at Microchip who drew these schematics were very careful to label the CDC TX connecting to the UART RX and the CDC RX connecting to the UART TX. They also labeled the UART TX as RB4 and UART RX as RB6. This looks pretty simple. Now we need to steer these signals to the correct pins via the PPS peripheral. NOTE: The PPS stands for Peripheral Pin Select. This feature allows the peripherals to be steered to almost any I/O pin on the device. This makes laying out your printed circuit boards MUCH easier as you can move the signals around to make all the connections straight through. You can also steer signals to more than one output pin enabling debugging by steering signals to your LED's or a test point. After steering the functions to the correct pins, it is time to clear the ANSEL for RB6 (making the pin digital) and clear TRIS for RB4 (making the pin an output). The rest of the initialization steps involve putting the UART in the correct mode. Let us see the code. void init_uart(void) { // STEP 1 - Set the Baud Rate BAUD1CONbits.BRG16 = 1; TX1STAbits.BRGH = 0; SPBRG = 16; // STEP 2 - Configure the PPS // PPS unlock sequence // this should be in assembly because the hardware counts the instruction cycles // and will not unlock unless it is exactly right. The C language cannot promise to // make the instruction cycles exactly what is below. asm("banksel PPSLOCK"); asm("movlw 0x55"); asm("movwf PPSLOCK"); asm("movlw 0xAA"); asm("movwf PPSLOCK"); asm("bcf PPSLOCK,0"); RX1PPSbits.PORT = 1; RX1PPSbits.PIN = 6; RB4PPS = 0x0F; // Step 2.5 - Configure the pin direction and digital mode ANSELBbits.ANSB6 = 0; TRISBbits.TRISB4 = 0; // Step 3 TX1STAbits.SYNC = 0; RC1STAbits.SPEN = 1; // Step 4 TX1STAbits.TX9 = 0; // Step 5 - No needed // Step 1 from RX section (36.1.2.1) RC1STAbits.CREN = 1; // Step 6 TX1STAbits.TXEN = 1; // Step 7 - 9 are not needed because we are not using interrupts // Step 10 is in the putchar step. } To send a character we simply need to wait for room in the transmitter and then add another character. void putch(char c) { while(!TX1IF); // sit here until there is room to send a byte TX1REG = c; } And to receive a character we can just wait until a character is present and retrieve it. int getch(void) { while(!RC1IF); // sit here until there is a byte in the receiver return RC1REG; } And that completes our program. Here is the test run. Every time I typed a letter or number, the following character is echoed. Typing a few characters is interesting and all but I did promise graphing. We shall make a simple program that will stream a series of numbers one number per row. Then we will introduce processing to convert this stream into a graph. The easiest way to produce a stream of numbers is to use printf. This function will produce a formatted line of text and insert the value of variables where you want them. The syntax is as follows: value = 5; printf("This is a message with a value %d\n",value); To get access to printf you must do two things. 1) You must include the correct header file. So add the following line near the top of your program. (by the xc.h line) #include <stdio.h> 2) You must provide a function named putch that will output a character. We just did that. So here is our printing program. #include <stdio.h> #include <math.h> /// Lots of stuff we already did void main(void) { double theta = 0; init_uart(); while(1) { printf("%f\r\n",sin(theta)); theta += 0.01; if(theta >= 2 * 3.1416) theta = 0; } } Pictures or it didn't happen. And now for processing. Go to http://www.processing.org and download a suitable version for your PC. Attached is a processing sketch that will graph whatever appears in the serial port from the PIC16. And here is a picture of it working. I hope you found this exercise interesting. As always post your questions & comments below. Quite a lot was covered in this session and your questions will guide our future work. Good Luck exercise_5a.zip exercise_5.zip graph_data.pde
  9. 1 point
    Today I am going to introduce a peripheral. So far we have interacted with the I/O pins on the micro controller. This can get you pretty far and if you can manipulate the pins quickly enough you can produce nearly any signal. However, there are times when the signal must be generated continuously and your MCU must perform other tasks. Today we will be generating a PWM signal. PWM stands for Pulse Width Modulation. The purpose of PWM is to produce a fixed frequency square wave but the width of the high and low pulses can be adjusted. By adjusting the percentage of the time the signal is high versus low, you can change the amount of energy in a system. In a motor control application, changing the energy will change the speed. In an LED application, changing the energy will change the brightness. We will use the PWM to change the brightness of the LED on our demo board. Here is a sample of a PWM signal. The PWM frequency is 467.3hz and the duty cycle is 35%. One way to produce such a signal is to execute a bit of code such as this one. do_pwm macro banksel pwm_counter movlw pwm_reload decfsz pwm_counter,f movf pwm_counter,w movwf pwm_counter banksel pwm subwf pwm,w banksel LATA movf LATA,w andlw 0xFB btfss STATUS,C iorlw 0x04 movwf LATA endm This code does not help you if you are writing in C but it will serve to show the challenges of this technique. This is a simple macro that must be executed numerous times to produce the entire PWM cycle. The PWM frequency is determined by how often this macro can be called. So one option is quite simply this: loop do_pwm goto loop But if you do this the MCU will not be able to perform any other functions. loop do_pwm banksel reload incfsz reload goto loop banksel pwm incf pwm,f goto loop Here is one that updates the PWM duty cycle every time the reload variable overflows. This is not a lot of additional work but the MCU is still 100% consumed toggling a pin and deciding when to update the duty cycle. Surely there is a better way. Enter a Peripheral MCU Peripherals are special hardware functions that can be controlled by the software but do some tasks automatically. One common peripheral is the PWM peripheral. Different MCU have different peripherals that can produce a PWM signal. For instance, older PICmicrocontrollers have the CCP (Capture, Compare, PWM) peripheral and newer PICmicrocontrollers have dedicated PWM's. To demonstrate the use of this simple peripheral I will switch to C and give some code. The good side to using a peripheral is the work it takes on. The down side is the additional effort required to setup the additional hardware. Microchip provides a tool call MCC to do this work but in this example, we will do the work ourselves. Fortunately, Microchip provided a section in the data sheet that provides an 8 step checklist for configuring the PWM. Time for doing the steps // Steps from Chapter 30.9 of the datasheet // Step 1 TRISAbits.TRISA2 = 1; // Step 2 PWM6CON = 0; // Step 3 T2PR = 0xFF; // Step 4 PWM6DC = 358 << 6; // Step 5 T2CLKCONbits.CS = 1; T2CONbits.CKPS = 0; T2CONbits.ON = 1; // Step 6 // Step 7 TRISAbits.TRISA2 = 0; RA2PPS = 0x0D; // Step 8 PWM6CONbits.EN = 1; And here is 35% just like before, except last time the period of the entire wave was 2.14ms and now it is 0.254ms. That is about 10x faster than before. This time the main loop is doing absolutely nothing making it possible to do all kinds of neat things like implement a flicker to make the LED look like a candle. while(1) { __delay_ms(5); PWM6DC = rand() % 1024 << 6; } So here is a candle. Through honestly it is not the best looking candle. Perhaps you can do a better job. Peripherals can be a huge time saver and can certainly provide more CPU in your application for the real "secret sauce". Most of the data sheet covers peripherals so we will go through a few of them in the next few weeks. Good Luck Step_4.zip step_4_asm.zip
  10. 1 point
    It is now time for my favorite introductory lab. This lab is my favorite because it.... nope I will not spoil the surprise! Today we will do two labs and we will learn about digital inputs, and pushbuttons. First the digital input. The PIC16F18446 has 20 pins spread across 3 ports. PORTA, PORTB and PORTC. Last time we did not learn about the digital I/O pins but we did learn how to drive a pin as an output. It turns out that making a pin an output is pretty simple... Just clear the appropriate TRIS bit to zero and that pin is an output. It turns out that all the pins are inputs by default when the chip starts up. Additionally, many of the pins have an alternate analog function and the pin is configured as an analog input by default in order to keep the port in the lowest power state that cannot conflict with outside circuitry. To get a digital input, we need to leave the digital input function enabled (TRIS is 1) and turn the analog function off (ANSEL is 0). If we leave the analog function enabled, the digital function will always read a zero. Here is the port diagram from the data sheet. You can see in the diagram that if the ANSELx signal is set to a 1 the leading to the data bus AND gate will always output a 0. (Remember, both inputs to an AND gate must be 1 for the output to be 1 and the little circle on the ANSELx signal inverts 1's to 0's). From this diagram we learn that ANSELx must be 0 and TRISx must be 1 or the output buffer connected to the TRISx signal will drive the pin. It is time for some code. We will make a simple program that lights the LED when the button on our Curiosity is pressed and when the button is released, the LED will turn off. A quick peak at the schematic shows the button is on PORTC bit 2. The Program Please: void main(void) { TRISAbits.TRISA2 = 0; ANSELCbits.ANSC2 = 0; WPUCbits.WPUC2 = 1; while(1) { if(PORTCbits.RC2) { LATAbits.LATA2 = 1; } else { LATAbits.LATA2 = 0; } } } A quick surprise, the WPUC register is the weak pull up control register for PORTC. This is not shown in the generic diagram but the explanation is simple. The pushbutton will connect the RC2 pin to ground which will create a 0 on the input. The weak pull up is required to make a 1 when the button is NOT pressed. Setting the WPUC2 bit will enable the weak pull up for RC2. Programming.... Testing... Viola! it works, But this is pretty boring after all, we could replace the Curiosity with a wire and save some money. It is time to make that computer earn its place in the design. We are going to make a small change to the program so it will toggle the LED each time the button is pressed. void main(void) { TRISAbits.TRISA2 = 0; ANSELCbits.ANSC2 = 0; WPUCbits.WPUC2 = 1; while(1) { if(PORTCbits.RC2) { if(LATAbits.LATA2 == 1) LATAbits.LATA2 = 0; else LATAbits.LATA2 = 1; } } } Well that is strange.... it seems like it works when I press the button. AH HA, RC2 is a 1 when the button is NOT pressed.... stand by... void main(void) { TRISAbits.TRISA2 = 0; ANSELCbits.ANSC2 = 0; WPUCbits.WPUC2 = 1; while(1) { if(PORTCbits.RC2==0) { if(LATAbits.LATA2 == 1) LATAbits.LATA2 = 0; else LATAbits.LATA2 = 1; } } } Well darn, it almost works again. It seems like it is dim when I hold the button and when I release it, the LED is randomly on or off. AH HA!, It is toggling the LED as long as I hold the pin. We need to add a variable and only trigger the LED change when the pin changes state. __bit oldRC2 = 0; void main(void) { TRISAbits.TRISA2 = 0; ANSELCbits.ANSC2 = 0; WPUCbits.WPUC2 = 1; while(1) { if(PORTCbits.RC2==0 && oldRC2 == 1) { if(LATAbits.LATA2 == 1) LATAbits.LATA2 = 0; else LATAbits.LATA2 = 1; } oldRC2 = PORTCbits.RC2; } } I added a variable called oldRC2. I used the compiler built-in type __bit to represent a single bit of data (matching the data size of a single pin) and the LED is triggered when the button is pressed (RC2 == 0) AND the button was previously NOT pressed (oldRC2 == 1). The value of oldRC2 is set to be RC2 after the testing. Well Heck... It is getting closer but something is still not quite right. I press the button and the LED changes state... usually... sometimes... I see the problem. The pin RC2 is sampled twice. Once at the beginning where it is compared to the oldRC2 and once a bit later to make a new copy for oldRC2. What if the value on RC2 changed between these two samplings. That would mean that we could miss an edge. The solution is simple. __bit oldRC2 = 0; __bit newRC2 = 0; void main(void) { TRISAbits.TRISA2 = 0; ANSELCbits.ANSC2 = 0; WPUCbits.WPUC2 = 1; while(1) { newRC2 = PORTCbits.RC2; if(newRC2==0 && oldRC2 == 1) { if(LATAbits.LATA2 == 1) LATAbits.LATA2 = 0; else LATAbits.LATA2 = 1; } oldRC2 = newRC2; } } We will just create a new variable and sample RC2 once into the variable newRC2. Then we will use that for the comparison and the assignment. Ah, this seems to be pretty good. Time for some more extensive testing... This is strange. If I press this many times, every so often it looks like a button press was skipped. Let us put the logic analyzer on this problem. Ok, the top trace is RC2(the button) and the bottom trace is RA2 (the LED). Every falling edge of RC2 the LED changes state (on to off, or off to on). But look right in the middle. It appears that the LED changed state on the rising edge as well as the falling edge. Perhaps we should look at that spot more closely. Look at that. An extra transition on the button. It turns out that buttons are made of two pieces of metal that are pressed together. Sometimes the metal will bounce and break the connection. If we measure this bounce we find that it is nearly 20useconds wide ( 0.000020 seconds). That is pretty fast but the PIC16F18446 detected the bounce and changed the LED state. If you research button bouncing you will find that this phenomenon is on ALL buttons, but some buttons are much worse than others. This button is actually pretty good and it took a large number of tries before I was caught by it. This button is very good. So I will do the simplest button debouncer I have ever done. __bit oldRC2 = 0; __bit newRC2 = 0; void main(void) { TRISAbits.TRISA2 = 0; ANSELCbits.ANSC2 = 0; WPUCbits.WPUC2 = 1; while(1) { newRC2 = PORTCbits.RC2 && PORTCbits.RC2; if(newRC2==0 && oldRC2 == 1) { if(LATAbits.LATA2 == 1) LATAbits.LATA2 = 0; else LATAbits.LATA2 = 1; } oldRC2 = newRC2; } } I will simply read PORTC twice and let it read a one if it reads high on BOTH reads. Note the AND function between the two reads of RC2. That is all for today. Your homework is to do some research on different ways to solve the debounce problem. Next week we will introduce our first peripheral.
  11. 1 point
    I realize now that I have a new pet peeve. The widespread and blind inclusion, by Lemmings calling themselves embedded C programmers, of extern "C" in C header files everywhere. To add insult to injury, programmers (if I can call them that) who commit this atrocity tend to gratuitously leave a comment in the code claiming that this somehow makes the code "compatible" with C++ (whatever that is supposed to mean - IT DOES NOT !!!). It usually looks something like this (taken from MCC produced code - uart.h) #ifdef __cplusplus // Provide C++ Compatibility extern "C" { #endif So why does this annoy me so much? (I mean besides the point that I am unaware of any C++ compiler for the PIC16, which I generated this code for - which in itself is a hint that this has NEVER been tested on any C++ compiler ! ) First and foremost - adding something to your code which claims that it will "Provide C++ Compatibility" is like entering into a contract with the user that they can now safely expect the code to work with C++. Well I have bad news for you buddy - it takes a LOT more than wrapping everything in extern "C" to make your code "Compatible with C++" !!! If you are going to include that in your code you better know what you are doing and your code better work (as in actually being tested) with a real C++ compiler. If it does not I am going to call out the madness you perpetrated by adding something which only has value when you mix C and C++, but your code cannot be used with C++ at all in the first place - because you are probably doing a host of things which are incompatible with C++ ! In short, as they say - your header file comment is writing cheques your programming skills cannot cash. Usually a conversation about this starts with me asking the perpetrator if they can explain to me what "That thing" actually does, and when do you think you need this to use the code with a C++ compiler, so let's start there. I can count on one hand the people who have been able to explain to me how this works correctly. Extern "C" - What it does and what it is meant for This construct is used to tell a C++ compiler that it should set a property called langage linkage which affects mangling of names as well as possible calling conventions. It does not guarantee that you can link to C code compiled with a different compiler. This is a good point to quote the C++14 specification, section 7.5 on Language linkage That said, if you are e.g. using GCC for your C as well as your C++ compiler you will have a very good chance of being lucky enough that the implementation-defined linkage will be compatible! A C++ compiler will use "name mangling" to ensure that symbols with identical names can be distinguished from each other by passing additional semantic information to the linker. This becomes important when you use e.g. function overloading, or namespaces. This mangling of names is a feature of C++ (which allows duplicate names across the program, but does NOT exist in C (which does not allow duplicate symbols to be linked at all). When you place an extern "C" wrapper around the declaration of a symbol you are telling the C++ compiler to disable this name mangling feature and also to alter it's calling convention when calling a function to be more likely to link with C code, although calling conventions are a topic all in it's own. As you may have deduced by now there is a reason that this is not called "extern C++"! This is required to make your C++ calls match the names and conventions in the C object files, which does not contain mangled names. If you ARE going to place a comment next to your faux pas, at least get it right and say "For compatibility with C naming and calling conventions" instead of claiming incorrectly that this is required for C++ compatibility somehow! There is one very specific use case, one which we used to encounter all the time in what would now be called "the olden days" of Windows 3.1 when everything was shared through dynamic link libraries (.dll files). It turns out that in order to use a dll which was written using C++ from your C program you had to wrap the publicly exported function declarations in extern "C". Yes that is correct, this is used to make C++ code compatible with C, NOT THE OTHER WAY AROUND! So when do I really need this then? Let's take a small step back. If you want to use your C code with a C++ compiler, you can simply compile it all with the C++ compiler! The compiler will resolve the names just fine after mangling them all, and since you are properly using name spaces with your C++ code this will reduce the chances of name collisions and you will be all the happier for doing the right thing. No need for disabling name mangling there. If you do not need this to get your code to compile with a C++ compiler, then when DO you need it? Ahhh, now you are beginnng to see why this has become a pet peeve of mine ... but the plot still has some thickening to do before we are done... Pretty much the only case where it makes sense to do this is when you are going to compile some of your code with a C compiler, then compile some other code with a C++ compiler, and in the end take the object files from the two compilers and link them together, probably using the C++ linker. Of course when you feed your .c files and your .cpp files to a typical C++ compiler it will actually run a C compiler on the .c files and the C++ compiler on the .cpp files by default, and mangling will be an issue and you will exclaim "you see! He was wrong!", but not that fast ... it is simple enough to tell the compiler to compile all files with the C++ compiler, and this remains the best way to use C libraries in source form with your C++ code. If you are going to compile the C code into a binary library and link it in (which sometimes happens with commercial 3rd party libraries - like .dll's - DUH ! ) then there is a case for you to do this, but most likely this does not apply to you at all as you will have the source available all the time and you are working on an embedded system where you are making a single binary. To help you ease the world of hurt you are probably in for - you should read up on how to tell your compiler to use a particlar calling convention which has a chance of being compatible with the linker. If you are using GCC you can start here. To be clear, if you add extern "C" to your C code and then compiles it into an object file to be linked with your C++ program the extern "C" qualifier is entirely ignored by the C compiler. Yes that is right, all the way to producing the object file this has no effect whatsoever. It is only when you are producing calls to the C code from the C++ code that the C++ code is altered to match the C naming and calling conventions, so this means that the C++ code is altered to be compatible with C. In the end There is a hell of a lot of things you will need to consider if you want to mix C and C++. I promise that I will write another blog on that next time around if I get anybody asking for it in the comments. Adding extern "C" to your code is NOT required to make it magically "compatible with C++". In order to do that you need to heed with extreme caution the long list of incompatibilities between C and C++ and you will probably have to be much more specific than just stating C and C++. Probably something like C89 and C++11 if I had to wager a guess to what will be most relevant. And remember the reasons why it is almost always just plain stupid to use extern "C" in your C headers. It does not do what you think it does. It especially does not do what your comment claims it does - it does not provide C++ compatability! Don't let your comments write cheques your code cannot cash! - Before you even think of adding that, make sure you KNOW AND FOLLOW ALL the C++ compatability rules first. For heaven's sake TEST your code with a C++ compiler ! If you want to use a "C" library with your C++ code simply compile all the code with your C++ compiler. QED - no mess no fuss! If it is not possible to do 5 above, then compile the C code normally (without that nonsense) and place extern "C" around the #include of the C library only! (example below). After all, this is for providing C linkage compatability to C++ compiled code! If you are producing a binary library using C to be used/linked with a C++ compiler, then please save us all and just compile the library with both C and C++ and provide 2 binaries! If all of the above fail, beacuase you really just hit that one in a million case where you think you need this, then for Pete's sake educate yourself before you attempt something that hard, hopefully in the process you will realize that it is just a bad idea after all! Now please just STOP IT! I feel very much like pulling a Jeff Atwood and saying after all that only a moron would use extern "C" in their C headers (of course he was talking about using tabs). Orunmila Oh - I almost forgot - the reasonable way of using extern "C" looks like this: #include <string> // Yes <string>, not <string.h> - this is C++ ! #include <stdio> // Include C libraries extern "C" { #include "clib1.h" #include "clib2.h" } using namespace std; // Because the cuticles on my typing finger hurt if I have to type out "std::" all the time! // This is a proper C++ class because I am too patrician to use "C" like that peasant library writer! class myClass { private: int myPrivateInt; ... ... Appeals to Authority Dan Saks CppCon talk entitled “extern c: Talking to C Programmers about C++” A very good answer on Stackoverflow Some decent answers on this Stackoverflow question Fairly OK post on geeksforgeeks The dynamic loading use case with examples Note: That first one is a bit of a red herring as it does not explain extern C at all - nevertheless it is a great talk which all embedded programmers can benefit from 🙂
  12. 1 point
    Whenever I start a new project I always start off reaching for a simple while(1) "superloop" architecture https://en.wikibooks.org/wiki/Embedded_Systems/Super_Loop_Architecture . This works well for doing the basics but more often than not I quickly end up short and looking to employ a timer to get some kind of scheduling going. MCC makes this pretty easy and convenient to set up. It contains a library called "Foundation Services" which has 2 different timer implementations, TIMEOUT and RTCOUNTER. These two library modules have pretty much the same interface, but they are implemented very differently under the hood. For my little "Operating System" I am going to prefer the RTCOUNTER version as keeping time accurately is more important to me than latency. The TIMEOUT module is capable of providing low latency reaction times whenever a timer expires by adjusting the timer overflow point so that an interrupt will occur right when the next timer expires. This allows you to use the ISR to call the action you want to happen directly and immediately from the interrupt context. Nice as that may be in some cases, it always increases the complexity, the code size and the overall cost of the system. In our case RTCOUNTER is more than good enough so we will stick with that. RTCOUNTER Operation First a little bit more about RTCOUNTER. In short RTCOUNTER keeps track of a list of running timers. Whenever you call the "check" function it will compare the expiry time of the next timer and call the task function for that timer if it has expired. It achieves this by using a single hardware timer which will operate in "Free Running" mode. This means the hardware timer will never be "re-loaded" by the code, it will simply overflow back to it's starting value naturally, and every time this happens the module will count that another overflow has happened in an overflow counter. The count of the timer is made up by a combination of the actual hardware timer and the overflow counter. By "hiding" or abstracting the real size of the hardware timer like this the module can easily be switched over to use any of the PIC timers, regardless if they count up or down or how many bits they implement in hardware. Mode 32-bit Timer value General (32-x)-bit of g_rtcounterH x-bit Hardware Timer Using TMR0 in 8-bit mode 24-bits of g_rtcounterH TMR0 (8-bit) Using TMR1 16-bits of g_rtcounterH TMR1H (8-bit) TMR1L (8-bit) RTCOUNTER is compatible with all the PIC timers, and you can switch it to a different timer later without modifying your application code which is nice. Since all that happens when the timer overflows is updating the counter the Timer ISR is as short and simple as possible. // We only increment the overflow counter and clear the flag on every interrupt void rtcount_isr(void) { g_rtcounterH++; PIR4bits.TMR1IF = 0; } When we run the "check" function it will construct the 32-bit time value by combining the hardware timer and the overflow counter (g_rtcounterH). It will then compare this value to the expiry time of the next timer in the list to expire. By keeping the list of timers sorted by expiry time it saves time during the checking (which happens often) by doing the sorting work during creation of the timer (which happens infrequently). How to use it Using it is failry straight-forward. Create a callback function which returns the "re-schedule" time for the timer. Allocate memory for your timer/task and tie it to your callback function Create the timer (which starts it) specifying how long before it will first expire Regularly call the check function to check if the next timer has expired, and call it's callback if it has. In C the whole program may look something like this example: #include "mcc_generated_files/mcc.h" int32_t ledFlasher(void* p); rtcountStruct_t myTimer = {ledFlasher}; void main(void) { SYSTEM_Initialize(); INTERRUPT_GlobalInterruptEnable(); INTERRUPT_PeripheralInterruptEnable(); rtcount_create(&myTimer, 1000); // Create a new timer using the memory at &myTimer // This is my main Scheduler or OS loop, all tasks are executed from here from now on while (1) { // Check if the next timer has expired, call it's callback from here if it did rtcount_callNextCallback(); } } int32_t ledFlasher(void* p) { LATAbits.RA0 ^= 1; // Toggle our pin return 1000; // When we are done we want to restart this timer 1000 ticks later, return 0 to stop } Example with Multiple Timers/Tasks Ok, admittedly blinking an LED with a timer is not rocket science and not really impressive, so let's step it up and show how we can use this concept to write an application which is more event-driven than imperative. NOTE: If you have not seen it yet I recommend reading Martin Fowler's article on Event Sourcing and how this design pattern reduces the probability of errors in your system on his website here. By splitting our program into tasks (or modules) which each perform a specific action and works independently of other tasks, we can generate code modules which are completely independent and re-usable quite easily. Independent and re-usable (or mobile as Uncle Bob says) does not only mean that the code is maintainable, it also means that we can test and debug each task by itself, and if we do this well it will make the code much less fragile. Code is "fragile" when you are fixing something in one place and something seemingly unrelated breaks elsewhere ... that will be much less likely to happen. For my example I am going to construct some typical tasks which need to be done in an embedded system. To accomplish this we will Create a task function for each of these by creating a timer for it. Control/set the amount of CPU time afforded to each task by controlling how often the timer times out Communicate between tasks only through a small number of shared variables (this is best done using Message Queue's - we will post about those in a later blog some time) Let's go ahead and construct our system. Here is the big picture view This system has 6 tasks being managed by the Scheduler/OS for us. Sampling the ADC to check the battery level. This has to happen every 5 seconds Process keys, we are looking at a button which needs to be de-bounced (100 ms) Process serial port for any incoming messages. The port is on interrupt, baud is 9600. Our buffer is 16 bytes so we want to check it every 10ms to ensure we do not loose data. Update system LCD. We only update the LCD when the data has changed, we want to check for a change every 100ms Update LED's. We want this to happen every 500ms Drive Outputs. Based on our secret sauce we will decide when to toggle some pins, we do this every 1s These tasks will work together, or co-operate, by keeping to the promise never to run for a long time (let's agree 10ms is a long time, tasks taking longer than that needs to be broken into smaller steps). This arrangement is called Co-operative Multitasking . This is a well-known mechanism of multi-tasking on a microcontroller, and has been implemented in systems like "Windows 3.1" and "Windows 95" as well as "Classic Mac-OS" in the past. By using the Scheduler and event driven paradigm here we can implement and test each of these subsystems independently. Even when we have it all put together we can easily replace one of these subsystems with a "Test" version of it and use that to generate test conditions for us to ensure everything will work correctly under typical operation conditions. We can "disable" any part of the system by simply commenting out the "create" function for that timer and it will not run. We can also adjust how often things happen or adjust priorities by modifying the task time values. As before we first allocate some memory to store all of our tasks. We will initialize each task with a pointer to the callback function used to perform this task as before. The main program= ends up looking something like this. void main(void) { SYSTEM_Initialize(); INTERRUPT_GlobalInterruptEnable(); INTERRUPT_PeripheralInterruptEnable(); rtcount_create(&adcTaskTimer, 5000); rtcount_create(&keyTaskTimer, 100); rtcount_create(&serialTaskTimer, 10); rtcount_create(&lcdTaskTimer, 100); rtcount_create(&ledTaskTimer, 500); rtcount_create(&outputTaskTimer, 1000); // This is my main Scheduler or OS loop, all tasks are executed from events while (1) { // Check if the next timer has expired, call it's callback from here if it did rtcount_callNextCallback(); } } As always the full project incuding the task functions and the timer variable declarations can be downloaded. The skeleton of this program which does initialize the other peripherals, but runs the timers completely compiles to only 703 words of code or 8.6% on this device, and it runs all 6 program tasks using a single hardware timer.
  13. 1 point
    This week we will discuss a different strategy for tokenizing our words. This strategy will be to convert our word to a number in an unambiguous manner. Then we can simply see if our number is one we recognize. How hard can that be? The process of converting something into an unambiguous number is hashing. Hash algorithms come in all shapes and sizes depending upon their application needs. One nice hash is the Pearson hash which can make an 8-bit number of a string. For a reasonably comprehensive list of hash algorithms check out this List of hash functions. Generally hash functions are intended to convert some string into a unique number that is very unlikely to represent any other string. How unlikely depends upon the algorithm and the size of the number. For example, the Pearson hashes are quick and easy. They produce an 8-bit value using an XOR table. However, there are more than 256 words in the English language, much less the rest of the worlds languages, so the odds of a hash collision are quite high with a large set of words. However, it is relatively easy to manipulate the XOR tables so that there are no collisions for a small word set. Assuming we can apply some hash algorithm that will uniquely identify our words, it should be pretty easy to apply this technique to find our tokens. For example, in the first example we had the following table of keywords: const char *wordList[] = {"GPGGA","GNGSA","GPGSV","GPBOD","GPDBT","GPDCN"}; enum wordTokens {NO_WORD = -1,GPGGA,GNGSA,GPGSV,GPBOD,GPDBT,GPDCN}; What if, that changed to this: const int hashList[] = {<hash_of_GPGGA>,<hash_of_GNGSA>,<hash_of_GPGSV>,<hash_of_GPBOD>,<hash_of_GPDBT>,<hash_of_GPDCN>}; enum wordTokens {NO_WORD = -1,GPGGA,GNGSA,GPGSV,GPBOD,GPDBT,GPDCN}; It is not hard to imagine hashing the incoming word into an integer and then scanning the hashList looking for a match. This would be 2.5x smaller in memory if we did not need to store the master wordList and the comparison would be a 2 byte compare (or a 1 byte compare if we used a smaller hash). It would only require 2N comparisons where N is the number of words to check for. Of course hashing the incoming word creates an up-front cost but that cost could be buried inside the character receive function. The hash methods are not perfect. With any one-way algorithm on unknown input, there is the possibility of a collision. That is where two words have the same computed hash value. This could mean that two of the keywords have the same value, or it could mean that a randomly chosen word (or random characters) matches a valid input. In a system where the keyword list is static and known at compile time, it is possible to develop a "perfect hash". That is a hash that guarantees all valid inputs are unique. If your system is concerned about random "noise" being treated as valid data, there are at least two ways to solve this. Keep a list of the original words and do a final byte-for-byte compare one time. Add a checksum to the input and make sure the checksum has to be valid in addition to the hash match. For NMEA strings, this is already available. Can we go faster still? The integer compare search method works very well, but there are a few ways to go even faster. Sort the hashes in the list and use a search algorithm like a binary search to find the match. This reduces the the time from O(n) to O(log(n)). Much faster. Use the hash as an index into an array of the tokens. This reduces the time from O(n) to O(1). Much Much faster but makes a potentially HUGE array (2^<hash bit length>) Use the hash%<word count> to create a minimally sized table. This works but requires a minimal perfect hash. That is, a hash with the property of producing an N record table for N words. These algorithms are hard to find. Use the hash as the token in the rest of the system. Why bother looking up a token if you already have a nice number. This is a good solution but assumes that you never need to use the tokens as indices of other arrays. The idea that you can use the hash as an index into an array of records is the bases of a data structure called a Hash Table. Accessing a hash table is typically O(1) until a collision is detected and then it goes to O(m) where m is the number of items in the collisions. Typically the system implements the hash table as a sparse array with additional arrays holding the matching hash items. That is a lot of words. I think it is time for a few examples. First let us implement a basic hash search using our short word list. That was pretty easy but it is easy to see how finding a perfect hash function can get more and more difficult as we add words. Fortunately for us, there is a tool that is part of the GCC compiler suite called Gperf. Gperf's job is to find a perfect hash algorithm for a list of words and produce the C code to process it. Sounds perfect, so here is an example of how that works. First we must prepare a word list. The word list below shows a structure that will be used to store the word list in the C program. This is followed by the list of words and indices that will be used to populate an array of the structure. struct keyword_s {const char *name; int index;}; %% GPGGA,0 GNGSA,1 GPGSV,2 GPBOD,3 GPDBT,4 GPDCN,5 GPRMC,6 GPBWC,7 The word list is converted into C code with the following command line: gperf -t -I --output-file=hashTable.c keywordlist.txt This command will create a file called hashTable.c. Inside this file is one public function called in_word_set. Below you can see where I modified the NMEA_findToken function to use the in_word_set function supplied by gperf. enum wordTokens {NO_WORD = -1,GPGGA,GNGSA,GPGSV,GPBOD,GPDBT,GPDCN, GPRMC, GPBWC}; struct keyword_s {const char *name; int index;}; extern struct keyword_s *in_word_set (register const char *str, register size_t len); enum wordTokens NMEA_findToken(char *word) { enum wordTokens returnValue = NO_WORD; struct keyword_s *kw; kw = in_word_set(word,5); if(kw) returnValue = kw->index; return returnValue; } Compile and Run and you get the following results. Note how the hash search spends a fixed amount of time computing the 8-bit Pearson hash of the keyword. Then it spends a small amount of time searching for the hash value in the list of hash keys. This search is a brute force linear search. A binary search would likely be faster with a large word set, but with only 8 words in the word set most o the time its spent computing the hash. The GPERF code is very interesting. Notice how the function is fixed time if the word is present and is much faster when the word is absent. There are a number of options to GPERF to allow it to produce code with different strategies. If you look at the code produced by GPERF you will notice that there is a final test using strcmp (or optionally strncmp). This will eliminate the possibility of a collision. If we don't care about collisions, look how much faster this gets. STRNCMP IF-ELSE RAGEL -G2 Hash Search (Hash/Search) Hash GPERF Hash GPERF no compare GNGSA 399 121 280 326 167/159 374 126 GPGSV 585 123 304 288 167/121 374 126 GLGSV 724 59 225 503 167/336 113 113 GPRMC 899 83 299 536 167/369 374 126 GPGGA 283 113 298 440 167/273 374 126 So far I would have to say that the GPEF solution is easily the best way to decipher the sentences from my GPS. I know the GPS will deliver good strings so I would feel pretty comfortable stripping off the final compare. However, even with the string compare the GPERF solution is pretty good. It is only consistently beat by the hand-crafted if-else which will be a challenge to maintain. Perhaps we should consider writing a code generator for that method. Hashing is a very interesting topic with a lot of ratholes behind it. Some of the fastest algorithms and search methods take advantage of hashes to some degree. Our passwords also use hashes so that passwords can be compared without knowing what the actual password is. I hope this has been as informative to you as it was to write. As usual, take a look at the attached MPLAB projects to try out the different ideas. Good Luck. example4-PearsonSearch.zip example4 gperf.zip example4 gperf no compare.zip
  14. 1 point
    “Code generation, like drinking alcohol, is good in moderation.” — Alex Lowe This episode we are going to try something different. The brute force approach had the advantage of being simple and easy to maintain. The hand-crafted decision tree had the advantage of being fast. This week we will look at an option that will hopefully combine the simplicity of the string list and the speed of the decision tree. This week we will use a code generator to automatically create the tokenizing state machine. I will leave it to you to decide if we use generation in moderation. Let me introduce RAGEL http://www.colm.net/open-source/ragel/. I discovered RAGEL a few years ago when I was looking for a quick and dirty way to build some string handling state machines. RAGEL will construct a complete state machine that will handle the parsing of any regular expression. It can do tokenizing and it can do parsing. Essentially, you define the rules for the tokens and the functions to call when each token is found. For instance, you can write a rule to handle any integer and when an integer is found it can call your doInteger() method. For our simple example of identifying 6 words, the RAGEL code will be a bit overkill but it will be MUCH faster than a brute force string search and in the same ball park as the hand crafted decision tree. Let us get started. First let us get the housekeeping out of the way. This part of the code you have seen before. It is identical to the first two examples I have already provided. There are two differences. First, this only LOOKS like C code. In fact, it is a RAGEL file (I saved it with a .rl extension) and you will see the differences in a moment. When I use a code synthesizer, I like to place the needed command line at the top of the file in comments. While comments are a smell, this sort of comment is pretty important. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 // compile into C with ragel // ragel -C -L -G2 example3.rl -o example3.c // #include <string.h> #include <stdio.h> #include "serial_port.h" char * NMEA_getWord(void) { static char buffer[7]; memset(buffer,0,sizeof(buffer)); do { serial_read(buffer,1); } while(buffer[0] != '$'); for(int x=0;x<sizeof(buffer)-1;x++) { serial_read(&buffer[x], 1); if(buffer[x]==',') { buffer[x] = 0; break; } } return buffer; } enum wordTokens {NO_WORD = -1,GPGGA,GNGSA,GPGSV,GPBOD,GPDBT,GPDCN, GPRMC, GPBWC}; RAGEL is pretty nice in that they choose some special symbols to identify the RAGEL bits so the generator simply passes all input straight to the output until it finds the RAGEL identifiers and then it gets to work. This architecture allows you to simply insert RAGEL code directly into your C (or other languages) and add the state machines in place. The first identifiers we find are the declaration of a state machine (foo seemed traditional). You can define more than one machine so it is important to provide a hint to the generator about which one you want to define. After the machine definition, I specified the location to place all the state machine data tables. There are multiple ways RAGEL can produce a state machine. If the machine requires data, it will go at the write data block. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 %% machine foo; %% write data; enum wordTokens NMEA_findToken(char *word) { const char *p = word; const char *pe = word + strlen(word); int cs; enum wordTokens returnValue = NO_WORD; %%{ action gpgga { returnValue = GPGGA; fbreak; } action gngsa { returnValue = GNGSA; fbreak; } action gpgsv { returnValue = GPGSV; fbreak; } action gpbod { returnValue = GPBOD; fbreak; } action gpdbt { returnValue = GPDBT; fbreak; } action gpdcn { returnValue = GPDCN; fbreak; } action gpbwc { returnValue = GPBWC; fbreak; } action gprmc { returnValue = GPRMC; fbreak; } gpgga = ('GPGGA') @gpgga; gngsa = ('GNGSA') @gngsa; gpgsv = ('GPGSV') @gpgsv; gpbod = ('GPBOD') @gpbod; gpdbt = ('GPDBT') @gpdbt; gpdcn = ('GPDCN') @gpdcn; gpbwc = ('GPBWC') @gpbwc; gprmc = ('GPRMC') @gprmc; main := ( gpgga | gngsa | gpgsv | gpbod | gpdbt | gpdcn | gpbwc | gprmc )*; write init; write exec noend; }%% return returnValue; } Next is the C function definition starting at line 4 above. I am keeping the original NMEA_findToken function as before. No sense in changing what is working. At the beginning of the function is some RAGEL housekeeping defining the range of text to process. In this case the variable p represents the beginning of the test while pe represents the end of the text. The variable cs is a housekeeping variable and the token is the return value so initialize it to NO_WORD. The next bit is some RAGEL code. The %%{ defines a block of ragel much like /* defines the start of a comment block. The first bit of ragel is defining all of the actions that will be triggered when the strings are identified. Honestly, these actions could be anything and I held back simply to keep the function identical to the original. It would be easy to fully define the NMEA data formats and fully decode each NMEA sentence. These simply identify the return token and break out of the function. If we had not already sliced up the tokens we would want to keep store our position in the input strings so we could return to the same spot. It is also possible to feed the state machine one character a time like in an interrupt service routine. After the actions, line 21 defines the search rules and the action to execute when a rule is matched. These rules are simply regular expressions (HA! REGEX and SIMPLE in the same sentence). For this example, the expressions are simply the strings. But if your regular expressions were more complex, you could go crazy. Finally, the machine is defined as matching any of the rules. The initialization and the actual execute code are placed and the RAGEL is complete. Whew! Let us look at what happened when we compile it. One of my favorite programming tools is graphviz. Specifically DOT. It turns out that RAGEL can produce a dot file documenting the produced state machine. Lets try it out. bash> ragel -C -L -V example3.rl -o example3.dot bash> dot example3.dot -T png -O It would be nicer if all the numbers on the arrows were the characters rather than the ASCII codes but I suppose I am nitpicking. Now you see why I named my actions after the sentences. The return arrow clearly shows which action is being executed when the words are found. It also shows that the action triggers when the last letter is found rather than a trailing character. I suppose if you had the word gpgga2, then you would need to add some additional REGEX magic. The dotted arrow IN leading to state 17 refers to any other transition not listed. That indicates that any out-of-place letter simply goes back to 17 without triggering an ACTION. It is possible to define a “SYNTAX ERROR” action to cover this case but I did not care. For my needs, failing quietly is a good choice. This all looks pretty good so far. What does the C look like? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 /* #line 1 "example3.rl" */ // compile into C with ragel // ragel -C -L -G2 example3.rl -o example3.c // #include < string.h > #include < stdio.h > #include "serial_port.h" char * NMEA_getWord(void) { static char buffer[7]; memset(buffer, 0, sizeof(buffer)); do { serial_read(buffer, 1); } while (buffer[0] != '$'); for (int x = 0; x < sizeof(buffer) - 1; x++) { serial_read( & buffer[x], 1); if (buffer[x] == ',') { buffer[x] = 0; break; } } return buffer; } enum wordTokens { NO_WORD = -1, GPGGA, GNGSA, GPGSV, GPBOD, GPDBT, GPDCN, GPRMC, GPBWC }; /* #line 34 "example3.rl" */ /* #line 39 "example3.c" */ static const int foo_start = 17; static const int foo_first_final = 17; static const int foo_error = 0; static const int foo_en_main = 17; /* #line 35 "example3.rl" */ enum wordTokens NMEA_findToken(char * word) { const char * p = word; const char * pe = word + strlen(word); int cs; enum wordTokens returnValue = NO_WORD; /* #line 57 "example3.c" */ { cs = foo_start; } /* #line 62 "example3.c" */ { switch (cs) { tr5: /* #line 45 "example3.rl" */ { returnValue = GNGSA; { p++; cs = 17; goto _out; } } goto st17; tr12: /* #line 47 "example3.rl" */ { returnValue = GPBOD; { p++; cs = 17; goto _out; } } goto st17; tr13: /* #line 50 "example3.rl" */ { returnValue = GPBWC; { p++; cs = 17; goto _out; } } goto st17; tr16: /* #line 48 "example3.rl" */ { returnValue = GPDBT; { p++; cs = 17; goto _out; } } goto st17; tr17: /* #line 49 "example3.rl" */ { returnValue = GPDCN; { p++; cs = 17; goto _out; } } goto st17; tr20: /* #line 44 "example3.rl" */ { returnValue = GPGGA; { p++; cs = 17; goto _out; } } goto st17; tr21: /* #line 46 "example3.rl" */ { returnValue = GPGSV; { p++; cs = 17; goto _out; } } goto st17; tr23: /* #line 51 "example3.rl" */ { returnValue = GPRMC; { p++; cs = 17; goto _out; } } goto st17; st17: p += 1; case 17: /* #line 101 "example3.c" */ if (( * p) == 71) goto st1; goto st0; st0: cs = 0; goto _out; st1: p += 1; case 1: switch (( * p)) { case 78: goto st2; case 80: goto st5; } goto st0; st2: p += 1; case 2: if (( * p) == 71) goto st3; goto st0; st3: p += 1; case 3: if (( * p) == 83) goto st4; goto st0; st4: p += 1; case 4: if (( * p) == 65) goto tr5; goto st0; st5: p += 1; case 5: switch (( * p)) { case 66: goto st6; case 68: goto st9; case 71: goto st12; case 82: goto st15; } goto st0; st6: p += 1; case 6: switch (( * p)) { case 79: goto st7; case 87: goto st8; } goto st0; st7: p += 1; case 7: if (( * p) == 68) goto tr12; goto st0; st8: p += 1; case 8: if (( * p) == 67) goto tr13; goto st0; st9: p += 1; case 9: switch (( * p)) { case 66: goto st10; case 67: goto st11; } goto st0; st10: p += 1; case 10: if (( * p) == 84) goto tr16; goto st0; st11: p += 1; case 11: if (( * p) == 78) goto tr17; goto st0; st12: p += 1; case 12: switch (( * p)) { case 71: goto st13; case 83: goto st14; } goto st0; st13: p += 1; case 13: if (( * p) == 65) goto tr20; goto st0; st14: p += 1; case 14: if (( * p) == 86) goto tr21; goto st0; st15: p += 1; case 15: if (( * p) == 77) goto st16; goto st0; st16: p += 1; case 16: if (( * p) == 67) goto tr23; goto st0; } _out: {} } /* #line 66 "example3.rl" */ return returnValue; } int main(int argc, char ** argv) { if (serial_open() > 0) { for (int x = 0; x < 24; x++) { char * w = NMEA_getWord(); enum wordTokens t = NMEA_findToken(w); printf("word %s,", w); if (t >= 0) printf("token %d\n", t); else printf("no match\n"); } } serial_close(); return 0; } And this is why we use a code generator. The code does not look too terrible. i.e., I could debug it if I thought there were some bugs and it does follow the state chart in a perfectly readable way. BUT, I hope you are not one of those programmers who finds GOTO against their religion. (Though Edsger Dijkstra did allow an exception for low level code when he wrote EWD215 https://www.cs.utexas.edu/users/EWD/transcriptions/EWD02xx/EWD215.html ) So how does this perform? STRNCMP IF-ELSE RAGEL -G2 GNGSA 399 121 280 GPGSV 585 123 304 GLGSV 724 59 225 GPRMC 899 83 299 GPGGA 283 113 298 And for the code size MPLAB XC8 in Free mode on the PIC16F1939 shows 2552 bytes of program and 1024 bytes of data. Don’t forget that printf is included. But this is comparable to the other examples because I am only changing the one function. So our fancy code generator is usually faster than the brute force approach, definitely slower than the hand-crafted approach and is fairly easy to modify. I think I would use the string compare until I got a few more strings and then make the leap to RAGEL. Once I was committed to RAGEL, I think I would see how much of the string processing I could do with RAGEL just to speed the development cycles and be prepared for that One Last Feature from Marketing. Next week we will look at another code generator and a completely different way to manage this task. Good Luck. example3.X.zip
  15. 1 point
    When we left off we had just built a test framework that allowed us to quickly and easily try out different ways to identify NMEA keywords. The first method shown was a brute force string compare search. For this week, I promised to write about an if-else decoder. The brute force search was all about applying computing resources to solve the problem. This approach is all about applying human resources to make life easy on the computer. So this solution will suck. Let us press on. The big problem with the string compare method is each time we discard a word, we start from scratch on the second word. Consider that most NMEA strings from a GPS start with the letters GP. It would be nice to discard every word that does not begin with a G and only look at each letter once. Consider this state machine: I did simplify the drawing… every invalid letter will transfer back to state 1 but that would clutter the picture. This would require the smallest number of searches to find the words. So one way to build this is to write a big IF-ELSE construct that covers all the choices. This will step through the letters and end up with a decision on what keyword was found. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 enum wordTokens NMEA_findToken(char *word) { enum wordTokens returnValue = NO_WORD; char c = *word++; if(c == 'G') { c = *word++; if(c == 'P') { c = *word++; if(c == 'G') // gpGga or gpGsv { c = *word++; if(c == 'G') // gpgGa { c = *word++; if(c == 'A') { if(*word == 0) // found GPGGA { returnValue = GPGGA; } } } else if(c == 'S') // gpgSv { c = *word++; if(c == 'V') { if(*word == 0) // found GPGSV { returnValue = GPGSV; } } } } else if(c == 'B') // gpBod { c = *word++; if(c == 'O') { c = *word++; if(c == 'D') { if(*word == 0) { returnValue = GPBOD; } } } } else if(c == 'D') // gpDcn or gpDbt { c = *word++; if(c == 'C') { c = *word++; if(c == 'N') { if(*word == 0) { returnValue = GPDCN; } } } else if(c == 'B') { c = *word++; if(c == 'T') { if(*word == 0) { returnValue = GPDBT; } } } } } else if(c == 'N') // gNgsa { c = *word++; if(c == 'G') { c = *word++; if(c == 'S') { c = *word++; if(c == 'A') { if(*word == 0) { returnValue = GNGSA; } } } } } } return returnValue; } And it is just that easy. This is fast, small and has only one serious issue in my opinion. I hope you are very happy with the words chosen, because making changes is expensive in programmer time. This example only has 6 6-letter words and is 100 lines of code. They are easy lines, but almost all of them will require rewriting if you change even one word. Here are the stats so you can compare with last weeks string compare. STRNCMP IF-ELSE GNGSA 399 121 GPGSV 585 123 GLGSV 724 59 GPRMC 899 83 GPGGA 283 113 These are the CPU cycles required on a PIC16F1939. You can verify in the simulator. That is all for now. Stay tuned, next time we will show a nice way to manage this maintenance problem. Good Luck example2.c example2.X.zip
  16. 1 point
    This is the first of a 5 part article where I will explore some different ways to tokenize keywords. This is a simple and common task that seems to crop up when you least expect it. We have all probably done some variation of the brute force approach in this first posting, but the posts that follow should prove interesting. Here is the sequence: Part 1 : STRCMP Brute Force and the framework Part 2 : IF-ELSE for speed Part 3 : Automating the IF-ELSE for maintenance Part 4 : Perfect Hash Maps Part 5 : Trie's Feel free to skip around once they are all published if there is an interesting topic that you want to learn more about. In this first posting I will be building a complete testable example so we can do science rather than expounding upon our opinions. About The Test Data I am currently working on a GPS project so I am being subjected to NMEA sentences. (https://en.wikipedia.org/wiki/NMEA_0183). These sentences are quite easy to follow and you generally get a burst of text data every second from your GPS module. The challenge is to quickly & efficiently extract this information while keeping your embedded system performing all the tasks required and do this with the smallest possible program. As far as stream data is concerned, NMEA sentences are remarkably well behaved. Here are a few sample messages from my new Bad Elf GPS Pro +. $GNGSA,A,3,32,14,01,20,18,08,31,11,25,10,27,,1.19,0.61,1.02*16 $GNGSA,A,3,65,66,88,75,81,82,76,67,,,,,1.19,0.61,1.02*13 $GPGSV,4,1,14,14,73,336,25,32,60,023,31,18,52,294,20,51,51,171,30*7A $GPGSV,4,2,14,31,47,171,34,10,44,087,33,11,30,294,26,01,26,316,15*78 $GPGSV,4,3,14,20,20,108,34,22,19,304,16,08,13,247,21,27,08,218,18*7A The pattern is: $<IDENTIFIER>,<COMMA SEPERATED DATA>*<CHECKSUM> The work of separating out the parts of a sentence is called parsing. We will not be fully parsing the NMEA sentence but if we wanted to, there are many useful tools like BISON that can automate this job. Each of the automated tools has a minimum price of entry in terms of code size and CPU cycles. This brute force approach has the benefit of simplicity, maintainability and a low initial size/speed cost. So let us press on and establish a performance baseline. Our Test Program The program we are about to construct works like this: Like all instructors I will concentrate on the necessary parts of the lesson and leave all the processing as an exercise for the student. Once you have identified the sentence is easy to apply a suitable sscanf to or strtok to process each sentence individually. The actual part we are going to measure is Identify The Keyword but Find a Keyword is a necessary first step and a little bit of wrapper/measurement code is needed to complete the experiment. Parsing out our Keywords To get the identifier all I need to do is discard characters until I see a '$'. Then collect characters until I see a ','. That is easy code to write so here is a quick version: char * NMEA_getWord(void) { static char buffer[7]; memset(buffer,0,sizeof(buffer)); do { serial_read(buffer,1); } while(buffer[0] != '$'); for(int x=0;x<sizeof(buffer)-1;x++) { serial_read(&buffer[x], 1); if(buffer[x]==',') { buffer[x] = 0; break; } } return buffer; } I did not try to collect the entire NMEA sentence, validate the checksum, handle line noise. So this is not a complete application. It does fill a buffer with the NMEA sentence quickly and legibly. One more note: the buffer is 7 bytes large because my Bad Elf GPS sends PELFID as the FIRST sentence upon connection. If I wanted to decode this sentence, the buffer needs another byte. This simple function will be used for all the testing in this series of articles so it is worth understanding. So here is a description of each line of code: the function will take no parameters and return a pointer to a NMEA sentence. It will block until it has such a sentence so it assumes a continuous data stream. There is a static buffer of 7 characters. A pointer to this buffer will be sent to the caller with every iteration. The buffer is 7 characters to allow for the 6 character message from my GPS and a null terminator. Because this is a static buffer we cannot initialize it in the definition. So every time this function is called, we will clear out the buffer to 0's The do/while loop reads single characters from the serial port (provided by the user) until a '$' character is received. The getchar() pattern may have been a better abstraction for the serial port but I was originally planning to receive 5 bytes after the $ with a single call. The 6 byte PELFID changed my mind. The for loop simply reads UP-TO sizeof(buffer) -1 bytes unless the received byte is a ','. I prefer to use sizeof whenever possible to make the compiler do the math to determine how big arrays are. if a ',' is received, we convert it to a 0 and break. lastly we return the buffer pointer. The next piece of housekeeping we should discuss is the main function. This is a simple test application that will provide the framework for evaluating ways of detecting text strings and timing their impact. int main(int argc, char **argv) { if(serial_open()>0) { for(int x = 0; x < 24; x ++) { char *w = NMEA_getWord(); enum wordTokens t = NMEA_findToken(w); printf("word %s,",w); if(t >= 0) printf("token %d\n",t); else printf("no match\n"); } } serial_close(); return 0; } Like before, I will give a quick line-by-line explanation This is the standard main declaration for ALL C programs. For embedded systems we often ignore the parameters and return values but I like to do it for test programs simply because I generally prototype as command line programs on my Mac. serial_open is a simple abstraction that is easy to adapt to any system. It will prepare the serial input from my GPS. The version attached to this project will work on a Linux or Mac and open a specific /dev/cu.<device> that is hardcoded. This is appropriate for such a test application where knowledgeable users can easily edit the code. Obviously a robust version would pass the serial device in as a command line parameter and an embedded system would just activate the hardware. This program will decode 24 keywords. That makes a simple limit to run in the simulator or on my Mac and is sufficient for these tests. Now we run the test. First, get a keyword Second find the keyword and return the token Third display some output Repeat Finally we close the serial port. Often I will skip this step but on a PC program, leaving the port open will cause an error on the second run of the program until the port times out and closes automatically. By closing I save time for the user. For an embedded system, the serial_close function is usually left empty. For the simulator I would add a while(1) line before the return 0. This will freeze the simulation and allow us to study the results. Otherwise, XC8 may simply restart our application from the beginning and fill up my log file with repeat runs. This covers all the code except for the bit that we are interested in. For these type of experiments I like to be as agile as possible. This is about failing fast so we can discard bad ideas as quickly as possible. We should never invest so much of our energy in a bad idea that we emotionally feel obligated to "make it work at all costs". I have seen too many bad ideas last for too long due to this human failing. So lets briefly talk about findToken The function NMEA_findToken takes a pointer to a possible keyword (as identified by NMEA_getWord) and returns an integer (enum really) that identifies it as a number to the rest of the code. I will leave as an exercise to the student the work of doing something useful with a token id. In the future we can replace this function with different techniques and so long as the interface stays constant, we can simply insert the code and measure the performance. For today's test, we will simply apply a brute force strategy. We have a list of words: const char *wordList[] = {"GPGGA","GNGSA","GPGSV","GPBOD","GPDBT","GPDCN"}; and a matching enumeration: enum wordTokens {NO_WORD = -1,GPGGA,GNGSA,GPGSV,GPBOD,GPDBT,GPDCN}; All we need to do is compare our test word to each word in the list until a match is found. The C standard library provides a useful string compare function strncmp. But XC8 does not include it (it arrived with C99) so we will use its predecessor strcmp and press on. We will just be very careful not to loose our terminating character. enum wordTokens NMEA_findToken(char *word) { enum wordTokens retValue = NO_WORD; for(int x=0; x < sizeof(wordList)/sizeof(*wordList);x++) { if(strcmp(word,wordList[x])==0) { retValue = x; break; } } return retValue; } Here is the blow-by-blow: This function will take a pointer to a string and return a token. Initialize the return value to the no_word value Step through the word list Test each word looking for a match on a match, set the return value and leave the loop Return the value to the caller I ran it on my Mac and it runs fine identifying GPS strings from my GPS. I also ran it in the PIC16F1939 simulator so I could measure the CPU cycles and code size for a baseline. For the testing I am looking for 6 words which were carefully chosen to have some missing characters, overlapping characters, and matching words at the beginning and end of the list. Analysis Before I show the measurements we should make this scientific and formulate a hypothesis. My test sequence is GNGSA, GPGSV, GLGSV, GPRMC, GPGGA The words in the list are in this order : GPGGA, GNGSA, GPGSV, GPBOD, GPDBT, GPDCN So I expect the first word to fail at the P in GPGGA, then pass GNGSA. This will be a total of 7 tests to execute. The second word will fail GPGGA on the third test, fail GNGSA on the second test and then pass for a total of 10 tests to execute. GLGSV will require 12 tests to fail. In every word it will fail the second test. GPRMC will require 14 tests to fail GPGGA will require 5 tests to pass. The most number of tests required to pass or fail is equal to the number of characters in the entire list. That assumes that the list is constructed so the only differences are in the last letter so a possible word must go through the entire word of the entire list before determining if it is a match or fail. For our little test, that is not possible, but if the list were different it would take 30 tests. So I expect it to be O(n) where n is the total character count in the word list. Results Here is the actual data for executing NMEA_findToken in cpu instruction cycles on a PIC16F1939. The data was taken using the MPLAB X simulator. Word STRCMP Cycles Notes GNGSA 399 Second word in the word list GPGSV 585 Third word in the word list GLGSV 724 Not in word list GPRMC 899 Not in word list GPGGA 283 First word in the word list The program did include printf and some buffers so the program did end up large at 2153 bytes of flash memory. Improvements: If you decided that this was the strategy for you, then you can make some improvements simply by carefully ordering the list so the most frequent words are first. But after looking things over, it would be nice if we could do our search where we did not need to repeat letters. i.e. all the words start with G, so if the first letter is not a G we can stop the entire process. We will explore one way to accomplish this optimization next time. For now, take a look at the attached files and I welcome's any suggestions for improvements to this strategy. Good Luck example1.c serial_port.h nmea_stimiulus.txt serial_pic.c
  17. 1 point
    What every embedded programmer should know about … Floating Point Numbers "Obvious is the most dangerous word in mathematics." E. T. Bell “Floating Point Numbers” is one of those areas that seem obvious at first, but do not be fooled! Perhaps when we learn how to do algebra and arithmetic we are never bound in our precision in quite the same way a computer is, and when we encounter computers we are so in awe by how “smart” they are that it becomes hard to fathom how limited floating point numbers on computers truly are. The first time I encountered something like this it had me completely baffled: void example1(void) { volatile double dval1; volatile bool result; dval1 = 0.6; dval1 += 0.1; if (dval1 == (0.6+0.1)) result = true; // Even though these are “obviously” the same else result = false; // This line is in fact what happens! } In this example we can see first hand one of the effects of floating point rounding errors. We are adding 0.6 to 0.1 in two slightly different ways and the computer is telling us that the result is not the same! But how is this possible? Surely, something must be broken here? Rest assured, there is nothing broken. This is simply how floating point numbers work, and as a programmer it is very important to know and understand why! In this particular case the compiler is adding up 0.6 and 0.1 at compile time in a different way than the microcontroller does at runtime. 0.7 represented as a binary fraction does not round to the same value as 0.6 + 0.1. This of course is merely scratching the surface, there is an impressive number of pitfalls when using floating point numbers in computer programs which result in incorrect logic, inaccurate calculations, broken control systems and unexpected division by 0 errors. Let’s take a more detailed look at these problems, starting at the beginning. Binary Fractions For decades now computers represent floating point numbers predominantly in a consistent way (IEEE 754-1985). [See this fascinating paper on it’s history by one of it’s architects, Prof W. Kahan] In order to understand the IEEE 754 representation better, we first have to understand how fractional numbers are represented when using binary. We are all familiar with the way binary numbers work. It should come as no surprise that when you go right of the decimal point it works just the same as decimal numbers, you keep on dividing by 2, so in binary 0.1 = 1/2 and 0.01 = 1/4 and so on. Let’s look at an example. The representation of 10.5625 in binary is 1010.1001. ( online calculator if you want to play ) Position 8's 4's 2's 1's 1/2's 1/4's 1/8's 1/16's Value 1 0 1 0 . 1 0 0 1 To get the answer we have to add up all of the digits, so we have 1 x8 + 1 x2 + 1 x 1/2 + 1 x 1/16 = 8 + 2 + 0.5 + 0.0625 = 10.5625 That seems easy enough but there are some peculiarities which are not obvious, one of these is that just like 1/3 is represented by a repeating decimal for decimal numbers, in binary entirely different numbers end up being repeating fractions, like 1/10, which is not a repeating decimal is in fact a repeating binaries fraction. The representation of the decimal number 0.1 in binary turns out to be 0.00011 0011 0011 0011 0011 0011 … When you do math with a repeating fraction like 1/3 or 1/6 or even 1/11 you know to take care as the effects of rounding will impact your accuracy, but it feels counter intuitive to apply the same caution to numbers like 0.1 or 0.6 or 0.7 which we have been trusting for accuracy all of our lives. IEEE 754 Standard Representation Next we need to take a closer look at the IEEE 754 standard so that we can understand what level of accuracy is reasonable to expect. 32 Bit IEEE 754 floating point number Sign (1bit) Exponent (8-bits) Significand (1.xxxxxxxx) (23 bits) The basic representation is fairly easy to understand. (for more detail follow the links from the WikiPedia page ) Standard (or Single Precision) floating point numbers are represented using 32 bits. The number is represented as a 24 bit binary number raised to an 8-bit exponent The 24 bit number is called the significand. The significand is represented as a binary fraction with an implied leading 1,meaning we really only use 23 bits to represent it as the MSB is always 1. 1 bit, the MSB, is always used as a sign bit Some bit patterns have special meanings, e.g. since there is always a leading 1 for the significand representing exactly 0 requires a special rule. There are a lot of special bit patterns with special meanings in the standard e.g. Infinity, NaN and subnormal numbers. NOTE: As you can imagine all these special bit patterns adds quite a lot of overhead to the code dealing with floating point numbers. The Solaris manual has a very good description of the special cases The decimal number 0.25 is for example represented using IEEE 754 floating point standard presentation as follows: Firstly the sign bit is 0 as the number is positive. Next up is the significand. We want to represent ¼ - in binary that would be 0.01, but the rule for is that we have to have a leading 1 in front of the decimal point, so we have to shift the decimal point by 2 positions to give us 1.0 * 2^-2 = 0.25. (This rule is meant to ensure that we will not have 2 different representations of the same number). The exponent is constructed by taking the value and adding a bias of 127 to it, this has some nice side effects, we can represent anything from -127 to +128 without sacrificing another sign bit for the exponent, and we can also sort numbers by simply treating the entire number as a signed integer. In the example we need 2 to the power of (-2) so we take 127 + (-2) which gives 125. In IEEE 754 that would look as follows: Sign (0) Exponent (125) Significand (1.0) 0 0111 1101 000 0000 0000 0000 0000 0000 If you would like to see this all done by hand you can look at this nice youtube video which explains how this works in more detail with an example - Video Link NOTE: You can check these numbers yourself using the attached project and the MPLAB-X debugger to inspect them. If you right-click in the MPLAB-X watch window you can change the representation of the numbers to binary to see the individual bits) Next we will look at what happens with a number like 0.1, which we now know has a repeating binary fraction representation. Remember that 0.1 in binary is 0.00011001100110011..., which is encoded as 1.100110011001100… x 2^(-4) (because the rule demands a leading 1 in the fraction). Doing the same as above this leads to: Sign (0) Exponent (123) Significand (1.100110011...) 0 0111 1011 100 1100 1100 1100 1100 1101 The only tricky bit there is the last digit which is a 1, as we are required to round off to the nearest 23 binary digits and since the next digit would be a 1 that requires us to round up so the last digit becomes a 1. For more about binary fractional rounding see this post on the angular blog This rounding process is one of the primary causes of inaccuracy when we work with floating point numbers! In this case 0.1 becomes exactly 0.100000001490116119384765625, which is very close to 0.1 but not quite exact. To make things worse the IDE will try to be helpful by rounding the displayed number to 8 decimal digits for you, so looking at the number in the watch window will show you 0.1 exactly! Precision Before we look at how things can break, let’s get the concept of precision under the knee first. If we have 24 bits to represent a binary significand that equates to 7.22472 decimal digits of precision (since log(2^24) = 7.22472). This means that the number is by design only accurate to the 7th digit and sometimes it can be accurate to the 8th digit, but further digits should be expected to be only approximations. The smallest absolute value normal 32-bit floating point number would be 1.0 x 2 ^ (-126) = 1.17549453E-38 represented as follows: Sign (0) Exponent (1) Significand (1.0) 0 0000 0001 000 0000 0000 0000 0000 0000 NOTE: Numbers with exponent 0 represent “Subnormal” numbers which can be even smaller but are treated slightly differently, this is out of scope for this discussion. (see https://en.wikipedia.org/wiki/Denormal_number for more info) The absolute precision of floating point numbers vary wildly over their range as a consequence of the point floating with the 7 most significant digits. This means that the largest numbers, in the range of 10^38 have only 7 accurate digits, which means the smallest distance between 2 subsequent numbers will be 10^31, or 10 Quintillion for those who know big numbers. This means that a rounding error can mean we are off by 1 Quintillion in error! To put that in context, if we used 32-bit floating point numbers to store bank balances they would have only 7 digits of accuracy, which means any balance of $100,000.00 or more will not be 100% correct and the bigger the balance gets the larger the error would be. As the numbers get smaller the error floats down with the decimal point so for the smallest numbers the error shrinks to a miniscule 10^-45. Also these increments do not match the increments we would have for decimal digits, which means converting between the two is imprecise and involves choosing the closest representation instead of the exact number (known as rounding). The following figure shows a decimal and binary representation of numbers on the same scale. Note that the binary intervals are divisible only by powers of 2 while the decimals are divided by 10’s, which means the lines to not align (follow the blue line down e.g.) These number lines also show that there are lots of values in-between which cannot be represented and we have to round them to the nearest representable number. Implications and Errors We know now that floating point numbers are imprecise and we have already seen one example of how rounding of these numbers can cause exact comparisons to fail unexpectedly. This will make up our first problem. Example PROBLEM 1 - Comparison of floating point numbers The first typical problem we will look at is the example that we started with. PROBLEM 1 : Floating point numbers are imprecise so we cannot reliably check them for equality. SOLUTION 1 : Never do exact comparisons for equality on floating point numbers, use inequalities like greater than or smaller than instead, and when doing this be sensitive to the fact that the imprecision and rounding can cause unexpected behavior at the boundaries. If you are really forced to look for a specific value consider looking for a small range e.g. check if Temp > 99.0 AND Temp < 101.0 if you are trying to find 100. void example1(void) { // If you REALLY HAVE to compare these then do something like this. // The range you pick needs to be appropriate to cover the maximum // possible rounding error for the calculation. // If we added up only 1 pair the most this error can be is 7 decimal // digits of accuracy, so these values are appropriate if ((dval1 > 0.6999999) && (dval1 < 0.7000001)) { result = true; // This time we get the truth ! } else { result = false; // And this does NOT happen ... } } The attached source code project contains more examples of this. Example PROBLEM 2 - Addition of numbers of different scale The next problem happens often when we implement something like a PID controller where we have to make thousands of additions over a time period intended to move the control variable by a small increment every cycle. Because of the limitation of 7.22 digits of precision we can run into trouble when our sampling rate is high and the value of Pi (integral term) requires more than 7.22 digits of precision. Example 2 shows this case nicely. void example2(void) { volatile double PI = 1E-5; volatile double controlValue = 300; controlValue += 1 * PI; controlValue += 1 * PI; controlValue += 1 * PI; } This failure case happens quite easily when we have e.g. a PWM duty cycle we are controlling, and the current value is 300, but the sampling rate is very high, in this case 100kHz. We want to add a small increment to the controlValue at every sample, but we want these to add up to 1, so that the controlValue becomes 301, after 1 second of control, but since 1E-5 is 8 decimal places down from the value 300 the entire error we are trying to add gets rounded away. If you run this code in the simulator you will see that controlValue remains the same no matter how many times you add this small value to it. When you change the control value to 30 it works though, which means that you can test this as well as you like, and it will look like it is working but it can fail in production! When you change the controlValue to 30 you will see that the first 2 additions work, and then it starts losing precision and it gets worse as you attempt to approach 300 An example of where this happened for real and people actually died is the case of the Patriot Missile software bug - yes this really happened! (The full story here and more about it here ) PROBLEM 2 : When doing addition or subtraction, rounding can cause systems to fail. SOLUTION 2 : Always take care that 7 digits of precision will be enough to do your math, if not make sure you store the integral or similar part in a separate variable to compensate for the precision mismatch. Imprecision and rounding of interim values can also cause all kinds of problems like giving different answers depending on the order the numbers are added together or underflows to 0 for interim values causing division by zero incorrectly when doing multiplication or division. There is a nice example of this when calculating quadratic roots in the attached source code project Example PROBLEM 3 - Getting consistent absolute precision across the range Our third problem is very typical in financial systems, where no matter how large the balance gets, for the books to balance we must always be accurate to 1 cent. The floating point number system ensures that we always get consistent precision as a ratio or percentage of the number we deal with, for single precision numbers that is 7.22 digits or roughly 0.000001 % of accuracy, but as the numbers get really big the absolute precision gets worse as numbers get bigger. If we want to be accurate to 1mm or 1cm or 1 cent this system will not work, but there are ways to get consistent absolute precision, and even avoid the speed penalty of dealing with floating point numbers entirely! PROBLEM 3 : Precision of the smallest increment has to remain constant for my application but floating point numbers break things due to lack of precision when my numbers get large. SOLUTION 3: In cases like this floating point numbers are just not appropriate to store the number. Use a scaled integer to represent your number. For money this is a simple as storing cents as integer instead of dollars. For distances this may mean storing millimeters as an integer instead of storing meters as a floating point, or for time storing milliseconds as an integer instead of storing seconds as a floating point. Conclusion It is quite natural to think that how Floating Point numbers work is obvious, but as we have seen it is everything but. This blog covered only the basics of how floating point numbers are handled on computers, why they are imprecise and the 3 most common types of problems programmers have to deal with, including typical solutions to these problems. There are a lot more things that can go wrong when you start getting numbers close to zero (subnormal), floating point underflow to 0, dealing with infinity etc. during calculations. A lot of these are nicely described by David Goldberg in this paper , but they are probably much less likely to affect you unless you are doing serious number crunching. Please do read that paper if you ever have to build something using any floating point calculations where accuracy is critical like a Patriot Missile System! References Why we needed the IEEE 754 standard by Prof W. Kahan An online Converter between Binary and Decimal The WikiPedia page on the standard Great description in the Solaris Manual Reproduction of the David Goldberg paper Youtube video of how to convert a decimal floating point number to IEEE754 format How rounding of binary fractions work About subnormal or denormal numbers Details on the Patriot Missile disaster - Link 1 Details on the Patriot Missile disaster - Link 2 Online IEEE754 calculator so you can check your compilers and hardware Source Code
  18. 0 points
    The story of Mel Kaye is perhaps my favorite bit of Lore. Mel is often referred to as a "Real Programmer", of course a parody of the famous essay Real Men Don't Eat Quiche by Bruce Feirstein which led to the publication of a precious tongue-in-cheek book on stereotypes about masculinity, which in turn led Ed Post to speculate about the properties of the "Real Programmer" in his essay Real Programmers Don't use Pascal, but I digress. The story itself is famous enough to have it's own Wikipedia page. Even though this was written 36 years ago back in 1983 it still resonates well. The story shows us, by the mistakes of others, that doing something in an impressively complex way may very well demonstrate impressive skill and truly awe onlookers, but it is really not the best way to do things in the real world where our code needs to be maintained later. A good friend once told me that Genius is like a fertile field. However much it's potential to grow fantastic crops, it is also capable of growing the most impressive weeds if left untended, as Mel has shown us first hand here. Without further ado, here it is for your reading pleasure: The story of Mel A recent article devoted to the *macho* side of programming made the bald and unvarnished statement: Real Programmers write in Fortran. Maybe they do now, in this decadent era of Lite beer, hand calculators and "user-friendly" software but back in the Good Old Days, when the term "software" sounded funny and Real Computers were made out of drums and vacuum tubes, Real Programmers wrote in machine code. Not Fortran. Not RATFOR. Not, even, assembly language. Machine Code. Raw, unadorned, inscrutable hexadecimal numbers. Directly. Lest a whole new generation of programmers grow up in ignorance of this glorious past, I feel duty-bound to describe, as best I can through the generation gap, how a Real Programmer wrote code. I'll call him Mel, because that was his name. I first met Mel when I went to work for Royal McBee Computer Corp., a now-defunct subsidiary of the typewriter company. The firm manufactured the LGP-30, a small, cheap (by the standards of the day) drum-memory computer, and had just started to manufacture the RPC-4000, a much-improved, bigger, better, faster -- drum-memory computer. Cores cost too much, and weren't here to stay, anyway. (That's why you haven't heard of the company, or the computer.) I had been hired to write a Fortran compiler for this new marvel and Mel was my guide to its wonders. Mel didn't approve of compilers. "If a program can't rewrite its own code," he asked, "what good is it?" Mel had written, in hexadecimal, the most popular computer program the company owned. It ran on the LGP-30 and played blackjack with potential customers at computer shows. Its effect was always dramatic. The LGP-30 booth was packed at every show, and the IBM salesmen stood around talking to each other. Whether or not this actually sold computers was a question we never discussed. Mel's job was to re-write the blackjack program for the RPC-4000. (Port? What does that mean?) The new computer had a one-plus-one addressing scheme, in which each machine instruction, in addition to the operation code and the address of the needed operand, had a second address that indicated where, on the revolving drum, the next instruction was located. In modern parlance, every single instruction was followed by a GO TO! Put *that* in Pascal's pipe and smoke it. Mel loved the RPC-4000 because he could optimize his code: that is, locate instructions on the drum so that just as one finished its job, the next would be just arriving at the "read head" and available for immediate execution. There was a program to do that job, an "optimizing assembler", but Mel refused to use it. "You never know where its going to put things", he explained, "so you'd have to use separate constants". It was a long time before I understood that remark. Since Mel knew the numerical value of every operation code, and assigned his own drum addresses, every instruction he wrote could also be considered a numerical constant. He could pick up an earlier "add" instruction, say, and multiply by it, if it had the right numeric value. His code was not easy for someone else to modify. I compared Mel's hand-optimized programs with the same code massaged by the optimizing assembler program, and Mel's always ran faster. That was because the "top-down" method of program design hadn't been invented yet, and Mel wouldn't have used it anyway. He wrote the innermost parts of his program loops first, so they would get first choice of the optimum address locations on the drum. The optimizing assembler wasn't smart enough to do it that way. Mel never wrote time-delay loops, either, even when the balky Flexowriter required a delay between output characters to work right. He just located instructions on the drum so each successive one was just *past* the read head when it was needed; the drum had to execute another complete revolution to find the next instruction. He coined an unforgettable term for this procedure. Although "optimum" is an absolute term, like "unique", it became common verbal practice to make it relative: "not quite optimum" or "less optimum" or "not very optimum". Mel called the maximum time-delay locations the "most pessimum". After he finished the blackjack program and got it to run, ("Even the initializer is optimized", he said proudly) he got a Change Request from the sales department. The program used an elegant (optimized) random number generator to shuffle the "cards" and deal from the "deck", and some of the salesmen felt it was too fair, since sometimes the customers lost. They wanted Mel to modify the program so, at the setting of a sense switch on the console, they could change the odds and let the customer win. Mel balked. He felt this was patently dishonest, which it was, and that it impinged on his personal integrity as a programmer, which it did, so he refused to do it. The Head Salesman talked to Mel, as did the Big Boss and, at the boss's urging, a few Fellow Programmers. Mel finally gave in and wrote the code, but he got the test backwards, and, when the sense switch was turned on, the program would cheat, winning every time. Mel was delighted with this, claiming his subconscious was uncontrollably ethical, and adamantly refused to fix it. After Mel had left the company for greener pa$ture$, the Big Boss asked me to look at the code and see if I could find the test and reverse it. Somewhat reluctantly, I agreed to look. Tracking Mel's code was a real adventure. I have often felt that programming is an art form, whose real value can only be appreciated by another versed in the same arcane art; there are lovely gems and brilliant coups hidden from human view and admiration, sometimes forever, by the very nature of the process. You can learn a lot about an individual just by reading through his code, even in hexadecimal. Mel was, I think, an unsung genius. Perhaps my greatest shock came when I found an innocent loop that had no test in it. No test. *None*. Common sense said it had to be a closed loop, where the program would circle, forever, endlessly. Program control passed right through it, however, and safely out the other side. It took me two weeks to figure it out. The RPC-4000 computer had a really modern facility called an index register. It allowed the programmer to write a program loop that used an indexed instruction inside; each time through, the number in the index register was added to the address of that instruction, so it would refer to the next datum in a series. He had only to increment the index register each time through. Mel never used it. Instead, he would pull the instruction into a machine register, add one to its address, and store it back. He would then execute the modified instruction right from the register. The loop was written so this additional execution time was taken into account -- just as this instruction finished, the next one was right under the drum's read head, ready to go. But the loop had no test in it. The vital clue came when I noticed the index register bit, the bit that lay between the address and the operation code in the instruction word, was turned on-- yet Mel never used the index register, leaving it zero all the time. When the light went on it nearly blinded me. He had located the data he was working on near the top of memory -- the largest locations the instructions could address -- so, after the last datum was handled, incrementing the instruction address would make it overflow. The carry would add one to the operation code, changing it to the next one in the instruction set: a jump instruction. Sure enough, the next program instruction was in address location zero, and the program went happily on its way. I haven't kept in touch with Mel, so I don't know if he ever gave in to the flood of change that has washed over programming techniques since those long-gone days. I like to think he didn't. In any event, I was impressed enough that I quit looking for the offending test, telling the Big Boss I couldn't find it. He didn't seem surprised. When I left the company, the blackjack program would still cheat if you turned on the right sense switch, and I think that's how it should be. I didn't feel comfortable hacking up the code of a Real Programmer." -- Source: usenet: utastro!nather, May 21, 1983.
 


  • Newsletter

    Want to keep up to date with all our latest news and information?
    Sign Up
×
×
  • Create New...