Jump to content


  • Content Count

  • Joined

  • Last visited

  • Days Won


Everything posted by Orunmila

  1. In the comments of our blog on structures in c a member asked me a specific question about what they observed. As this code is a beautiful example of a number of problems we often see we thought it a good idea to make an entry just to discuss this as there really is a lot going on here. We will cover the following: What allocation and freeing of memory on the stack means and the lifetime of objects In which direction the stack usually grows (note - the C standard does not contain the word "stack" so this is compiler-specific) Another look at deep vs. shallow copies of c strings inside structures In order to keep this all generic, I am going to be using the LLVM compiler on my MAC to do all my examples. The examples are all standard C and you can play with the code on your favorite compiler, but since the details of memory allocation are not mandated by the C standard your results may not look exactly like mine. I will e.g. show how the results I get changes when I modify the optimization levels. The Question The OP @zakasterwas asking this: Here is the code snippet they provided: struct person { char* name; }; void get_person(struct person* p) { char new_name[20]; // on stack, gets freed when function returned printf("input new name for person:"); scanf("%s", &new_name); p->name = new_name; printf("address of new_name = %p\n", &new_name[0]); } void eg_test_copy2(void) { struct person p = {"alex"}; get_person(&p); printf("p name = %s\n", p.name); char dummy[20] = { 0 }; printf("address of dummy = %p\n", &dummy[0]); printf("p name = %s\n", p.name); } Variable Allocation When you declare a variable the compiler will only reserve a memory location to be used by the variable. This process will not actually clear the memory unless the variable has static linkage, the standard states that only variables with static linkage (in simple terms this means global variables) shall be initialized to 0. If you want a variable to be initialized you have to supply an initializer. What actually happens before your main function starts running is that something generally referred to as "c-init" will run. This is a bit of code that will do the work needed by the C standard before your code runs, and one of the things it will do is to clear, usually using a loop, the block of memory which will contain statically linked variables. Other things that may be in here are setting up interrupt vectors and other machine registers and of course copying the initial values of global variables that do have initializers over the locations reserved for these variables. When a variable goes "out of scope" the memory is no longer reserved. This simply means that it is free for others to use, it does not mean that the memory is cleared when it is no longer reserved. This is very important to note. This phenomenon often leads to developers testing their code after having a pointer that points to memory which is no longer reserved, and the code seems to work fine until the new owner of that part of memory modifies it, then the code inexplicably breaks! No, it was actually broken all along and you just got lucky that the memory was not used at the time you were accessing this unreserved piece of memory! The classic way this manifests can be seen in our first test (test1) below. #include <stdio.h> char* get_name() { char new_name[20]; // on stack, gets freed when function returned printf("Enter Name:"); scanf("%s", new_name); return new_name; } int main(void) { char* theName; theName = get_name(); printf("\r\nThe name was : %s\r\n", theName); return 0; } I compile and run this and get : > test1 Enter Name:Orunmila The name was : Orunmila Note: Let me mention here that I was using "gcc test1.c -O3" to compile that, when I use the default optimization or -O1 it prints junk instead. When you do something which is undefined in the C standard the behavior will not be guaranteed to be the same on all machines. So I can easily be fooled into thinking this is working just fine, but it is actually very broken! On LLVM I actually get a compiler warning when I compile that as follows: test1.c:7:12: warning: address of stack memory associated with local variable 'new_name' returned [-Wreturn-stack-address] return new_name; ^~~~~~~~ 1 warning generated. Did I mention that I do love LLVM?! We can quickly see how this breaks down if we call the function more than once in a row like this (test2): #include <stdio.h> char* get_name() { char new_name[20]; // on stack, gets freed when function returned printf("Enter Name:"); scanf("%s", new_name); return new_name; } int main(void) { char* theName; char* theSecondName; theName = get_name(); theSecondName = get_name(); printf("\r\nThe first name was : %s\r\n", theName); printf("The second name was : %s\r\n", theSecondName); return 0; } Now we get the following obviously wrong behavior Enter Name:N9WXU Enter Name:Orunmila The first name was : Orunmila The second name was : Orunmila This happens because the declarations of theName and theSecondName in the code only reserve enough memory to store a pointer to a memory location. When the function returns it does not actually return the string containing the name, it only returns the address of the string, the name of the memory location which used to contain the string inside of the function get_name(). At the time when I print the name, the memory is no longer reserved, but as nobody else has used it since I called the function (I did perform any other operation which makes use of the stack in other words). The code is still printing the name, but both name pointers are pointing the same location in memory (which is actually just a coincidence, the compiler would have been within its rights to place the two in different locations). If you call a function that has a local variable between fetching the names and printing them the names will be overwritten by these variables and it will print something which looks like gibberish instead of the names I was typing. We will leave it to the reader to play with this and see how/why this breaks. I would encourage you to also add this to the end, these print statements will clearly show you where the variables are located and why they print the same thing - you will notice that the values of both pointers are the same! printf("Location of theName : %p\r\n", &theName); // This prints the location of the first pointer printf("Location of theSecondName : %p\r\n", &theSecondName); // This prints the location of the second pointer printf("Value of theName : %p\r\n", theName); // This prints the value of the first pointer printf("Value of theSecondName : %p\r\n", theSecondName); // This prints the value of the second pointer This all should answer the question asked, which was "I can still access the old name, is this weird?". The answer is no, this is nor weird at all, but it is undefined and if you called some other functions in between you would see the memory which used to hold the old name being overwritten in weird and wonderful ways as expected. How does the stack grow? Now that we have printed out some pointers this brings us to the next question. Our OP noticed that "the address for `dummy` actually starts after the address of new_name + 4 bytes x 20". We need to be careful here, the C standard requires pointers to be byte-addressable, which means that the address being 20x4 away makes no sense by itself, and in this case it is a pure coincidence. A couple of things should be noted here: The stack usually grows downwards in memory The size of a char[20] buffer will always be 20 and never 4x20 (specified in section of the C99 standard) In the example question the address of new_name was at 0x61FD90, which is actually smaller than 0x61FDE0, and in other words it was placed on the stack AFTER dummy. Here is a diagram which shows a typical layout that a C compiler may choose to use. The reason there was a gap of 80 between the pointers was simply due to the way the compiler decided to place the variables on the stack. It was probably creating some extra space on the stack for passing parameters around and this just happened to be exactly 60 bytes, which resulted in a gap of 80. The C standard only defines the scope of the variables, it does not mandate how the compiler must place them in memory. This can even vary for the same compiler when you add more code as the linker may move things around and will probably change when you change the optimization settings for the compiler. I did some tests with LLVM and if I look at the addresses in the example they will differ significantly when I am using optimization O1, but when I set it to O3 the difference between the two pointers is exactly 20 bytes for the example code. Getting back to Structures and Strings Looking at the intent of the OP's code we can now get back to how structures and strings work in C. With our interface like this struct person { char* name; }; void get_person(struct person* p); What we have is a struct which very importantly does NOT contain a string, it only contains the address of a string. That person struct will reserve (typically) the 4 bytes of RAM required to store a 32-bit address which will be the location where a string exists in memory. If you use it like this you will most often find that the address of "name" will be exactly the same as the address of the person struct you are passing in, so if our OP tested the following this would have been clear: struct person p = {"alex"}; printf("Address of p = %p\n", &p); printf("Address of p.name = %p\n", &p.name); These two addresses must be the same because the struct has only one member! When we want to work with a structure that contains the name of a person we have 2 choices and they both have pro's and con's. Let the struct contain a pointer and use malloc to allocate memory for the string on the heap. (not recommended for embedded projects!) Let the struct contain an array of chars that can contain the name. For option 1 the declaration is fine, but the getName function would have to look as follows: void get_person(struct person* p) { char* new_name = malloc(20); // on heap, so remember to check if it returns NULL ! printf("input new name for person:"); scanf("%s", new_name); p->name = new_name; printf("address of new_name = %p\n", new_name); } Of course, now you have to check and handle the case where we run out of memory and malloc returns NULL, we also have to be cognisant of heap fragmentation and most importantly we now have to be very careful to ensure that the memory gets freed or we will have a memory leak! For option 2 the structure and the function has to change to something like the following: struct person { char name[20]; } void get_person(struct person* p) { printf("input new name for person:"); scanf("%s", p->name); printf("address of p->name = %p\n", p->name); } Of course now we use 20 bytes of memory regardless how long the name is, but on the upside we do not have to worry about freeing the memory, when the instance goes out of scope the compiler will take care of that for us. Also now we can assign one person struct to another which will actually copy the entire string and we still have the option of passing it by reference by using the address of the object! Conclusion Be careful when using C strings in structures, there are a lot of ways these can get you into trouble. Memory leaks and shallow copies, where you make a copy of the pointer but not the string, are very likely to catch you sooner rather than later.
  2. Your example is actually quite complex and it contains a number of really typical and very dangerous mistakes. I think it will take up way too much space here in the comments to answer this properly so I will write another blog just about your question here and post that over the weekend when I get some time to put it together. A hint to what is wrong with the code: The buffer you are using in get_person() is allocated on the stack, but the space is made available for use after the function returns. After this point you have a struct which contains a pointer to the buffer on the stack, but the memory no longer belongs to you and when you call any other function it will get overwritten. There is no rule that the compiler must allocate variables on the stack consecutively, and actually XC8 uses a "compiled stack" which means that variables are placed permanently where they will not overlap. You can probably get behavior closer to what you expect if you change the stack model to use a software stack. The last time this happened to someone in my team they were calling get_person for the first name, and then for the second name, and after calling it twice they tried to print out both names and both the structs had the same string, if you call it many times all the people will have the last name you entered. Try this with your code: struct person p1 = {"nobody"}; struct person p2 = {"nobody"}; get_person(&p1); get_person(&p2); printf("p1 name = %s\n", p1.name); printf("p2 name = %s\n", p2.name); You will enter a new name in get_person for each one, and after that the printing should not print nobody but the 2 different names you have entered, and both names should be different. Let me know if that behaves as you expected? Read my upcoming blog on stack usage and pointers to see why 🙂
  3. Thanks for pointing that out! I looked over the code 3 times before I realized you were referring to the Table 🙂 And yes I agree there are a lot of ways to optimize what happens before the loop, we decided not to even bother with making that smaller as it did not form part of our focus here. For example I also noticed that the Start/MCC generated code for the AVR was actually quite large, more than 4x what the size was for the PIC code - in fact it took more than 800 instructions to just make a loop that toggles a pin. In ASM I can do all that in almost 100x less, but if this was going to be a reasonably fast read we had to be laser focussed. The idea in the longer run is to add a lot of small posts that focus on one specific aspect every time, and we will be sure to cover the advantages of OUT in the future sometime based on your recommendation.
  4. We just started a new blog which aims to create some benchmarks for comparing performance of small microcontrollers. This is something that we always needed to do ourselves as part of processor selection, so we decided to share our experiments and results here for everybody to use.
  5. Comparing raw pin toggling speed AVR ATmega4808 vs PIC 16F15376 Toggling a pin is such a basic thing. After all, we all start with that Blinky program when we bring up a new board. It is actually a pretty effective way to compare raw processing speed between any two microcontrollers. In this, our first Toe-to-toe showdown, we will be comparing how fast these cores can toggle a pin using just a while loop and a classic XOR toggle. First, let's take a look at the 2 boards we used to compare these cores. These were selected solely because I had them both lying on my desk at the time. Since we are not doing anything more than toggling a pin we just needed an 8-bit AVR core and an 8-bit PIC16F1 core on any device to compare. I do like these two development boards though, so here are the details if you want to repeat this experiment. In the blue corner, we have the AVR, represented by the ATmega4808, sporting an AtMega core (AVRxt in the instruction manual) Clocking at a maximum of 20MHz. We used the AVR-IOT WG Development Board, part number AC164160. This board can be obtained for $29 here: https://www.microchip.com/Developmenttools/ProductDetails/AC164160 Compiler: XC8 v2.05 (Free) In the red corner, we have the PIC, represented by the 16F15376, sporting a PIC16F1 Enhanced Midrange core. Clocking at a maximum of 32MHz. We used the MPLAB® Xpress PIC16F15376 Evaluation Board, part number DM164143. This board can be obtained at $12 here: https://www.microchip.com/developmenttools/ProductDetails/DM164143 Compiler: XC8 v2.05 (Free) Results This is what we measured. All the details around the methodology we used and an analysis of the code follows below and attached you will find all the source code we used if you want to try this at home. The numbers in the graph are pin toggling frequency in kHz after it has been normalized to a 1MHz CPU clock speed. How we did it (and some details about the cores) Doing objective comparisons between 2 very different cores is always hard. We wanted to make sure that we do an objective comparison between the cores which you can use to make informed decisions on your project. In order to do this, we had to deal with the fact that the maximum clock speed of these devices is not the same and also that the fundamental architecture of these two cores is very different. In principle, the AVR is a Load-store Architecture machine with a 1 stage pipeline. This basically means that all ALU operations have to be performed between CPU registers and the RAM is used to load from and store results to. The PIC, on the other hand, uses a Register Memory Architecture, which means in short that some ALU operations can be performed on RAM locations directly and that the machine has a much smaller set of registers. On the PIC all instructions are 1 word in length, which is 14-bits wide, while the data bus is 8-bits in size and all results will be a maximum of 8-bits in size. On the AVR instructions can be 16-bit or 32-bit wide which results in different execution times depending on the instruction. Both processors have a 1 stage pipeline, which means that the next instruction is fetched while the current one is being executed. This means branching causes an incorrect fetch and results in a penalty of one instruction cycle. One major difference is that the AVR, due to its Load-store Architecture, is capable of completing the instruction within as little as just one clock cycle. When instructions need to use the data bus they can take up to 5 clock cycles to execute. Since the PIC has to transfer data over the bus it takes multiple cycles to execute an instruction. In keeping with the RISC paradigm of highly regular instruction pipeline flow, all instructions on the PIC take 4 clock cycles to execute. All of this just makes it tricky and technical to compare performance between these processors. What we decided to do is rather take typical tasks we need the CPU to perform which occurs regularly in real programs and simply measure how fast each CPU can perform these tasks. This should allow you to work backwards from what your application will be doing during maximum throughput pressure on the CPU and figure out which CPU will perform the best for your specific problem. Round 1: Basic test For the first test, we used virtually the same code on both processors. Since both of these are supported by MCC it was really easy to get going. We created a blank project for the target CPU Fired up MCC Adjusted the clock speed to the maximum possible Clicked in the pin manager to make a single pin on PORTC an output Hit generate code. After this all we added was the following simple while loop: PIC AVR while (1) { LATC ^= 0xFF; } while (1) { PORTC.OUT ^= 0xFF; } The resulting code produced by the free compilers (XC8 v2.05 in both cases) was as follows, interestingly enough both loops had the same number of instructions (6 in total) including the loop jump. This is especially interesting as it will show how the execution of a same-length loop takes on each of these processors. You will notice that without optimization there is some room for improvement, but since this is how people will evaluate the cores at first glance we wanted to go with this. PIC AVR .tg {border-collapse:collapse;border-spacing:0;} .tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;} .tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;} .tg .tg-0lax{text-align:left;vertical-align:top} Address Hex Instruction 07B3 30FF MOVLW 0xFF 07B4 00F0 MOVWF __pcstackCOMMON 07B5 0870 MOVF __pcstackCOMMON, W 07B6 0140 MOVLB 0x0 07B7 069A XORWF LATC, F 07B8 2FB3 GOTO 0x7B3 Address Hex Instruction 017D 9180 LDS R24, 0x00 017E 0444 017F 9580 COM R24 0180 9380 STS 0x00, R24 0181 0444 0182 CFFA RJMP 0x17D We used my Saleae logic analyzer to capture the signal and measure the timing on both devices. Since the Saleae is thresholding the digital signal and the rise and fall times are not always identical you will notice a little skew in the measurements. We did run everything 512x slower to confirm that this was entirely measurement error, so it is correct to round all times to multiples of the CPU clock in all cases here. PIC AVR Analysis For the PIC The clock speed was 32MHz. We know that the PIC takes 4 clock cycles to execute one instruction, which gives us an expected instruction rate of one instruction every 125ns. Rounding for measurement errors we see that the PIC has equal low and high times of 875ns. That is 7 instruction cycles for each loop iteration. To verify if this makes sense we can look at the ASM. We see 6 instructions, the last of which is a GOTO, which we know will take 2 instruction cycles to execute. Using that fact we can verify that the loop repeats every 7 instruction cycles as expected (7 x 125ns = 875ns.) For the AVR The clock speed was 20MHz. We know that the AVR takes 1 clock cycle per instruction, which gives us an expected instruction rate of one instruction every 50ns. Rounding for measurement errors we see that the AVR has equal low and high times of 400ns. That is 8 instruction cycles for each loop iteration. To verify if this makes sense we again look at the ASM. We see 4 instructions, the last of which is an RJMP, which we know will take 2 instruction cycles to execute. We also see one LDS which takes 3 cycles because it is accessing sram, and one STS instruction which will each take 2 cycles and a Complement instruction which takes 1 more. Using those facts we can verify that the loop should repeat every 8 instruction cycles as expected (8 x 50ns = 400ns.) Comparison Since the 2 processors are not running at the same clock speed we need to do some math to get a fair comparison. We think 2 particular approaches would be reasonable. Compare the raw fastest speed the CPU can do, this gives a fair benchmark where CPU's with higher clock speeds get an advantage. Normalize the results to a common clock speed, this gives us a fair comparison of capability at the same clock speed. In the numbers below we used both methods for comparison. The numbers AVR PIC Notes Clock Speed 20MHz 32MHz Loop Speed 400ns 875ns Maximum Speed 2.5Mhz 1.142MHz Loop speed as a toggle frequency Normalized Speed 125kHz 35.7kHz Loop frequency normalized to a 1MHz CPU clock ASM Instructions 4 6 Loop Code Size 12 bytes 12 bytes 4 instructions 6 words 10.5 bytes 6 instructions Due to the nuances here we compared this 3 ways Total Code Size 786 bytes 101 words 176.75 bytes Round 2: Expert Optimized test For the second round, we tried to hand-optimize the code to squeeze out the best possible performance from each processor. After all, we do not want to just compare how well the compilers are optimizing, we want to see what is the absolute best the raw CPU's can achieve. You will notice that although optimization doubled our performance, it made little difference to the relative performance between the two processors. For the PIC we wrote to LATC to ensure we are in the right bank, and pre-set the W register, this means the loop reduces to just a XORF and a GOTO. For the AVR we changed the code to use the Toggle register instead doing an XOR of the OUT register for the port. The optimized code looked as follows. PIC AVR LATC = 0xFF; asm ("MOVLW 0xFF"); while (1) { asm ("XORWF LATC, F"); } asm ("LDI R30,0x40"); asm ("LDI R31,0x04"); asm ("SER R24"); while (1){ asm ("STD Z+7,R24"); } The resulting ASM code after these changes now looked as follows. Note we did not include the instructions outside of the loop here as we are really just looking at the loop execution. PIC AVR .tg {border-collapse:collapse;border-spacing:0;} .tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;} .tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;} .tg .tg-0lax{text-align:left;vertical-align:top} Address Hex Instruction 07C1 069A XORWF LATC, F 07C2 00F0 GOTO 0x7C1 Address Hex Instruction 0180 8387 STD Z+7,R24 0181 CFFE RJMP 0x180 Here are the actual measurements: PIC AVR Analysis For the PIC we do not see how we could improve on this as the loop has to be a GOTO which takes 2 cycles and 1 instruction is the least amount of work we could possibly do in the loop so we are pretty confident that this is the best we can do, and when measuring we see 3 instruction cycles which we think is the limit here. Note: N9WXU did suggest that we could fill all the memory with XOR instructions and let it loop around forever and in doing so save the GOTO, but we would still have to set W to FF every second instruction to have consistent timing, so this would still be 2 instructions per "loop" although it would use all the FLASH and execute in 250ns. Novel as this idea was, since that means you can do nothing else we dismissed that idea as not representative. For the AVR we think we are also at the limits here. The toggle register lets us toggle the pin in 1 clock cycle which cannot be beaten, and the RJMP unavoidably adds 2 more. We measure 3 cycles for this. AVR PIC Notes Clock Speed 20MHz 32MHz Loop Speed 150ns 375ns Maximum Speed 6.667Mhz 2.667MHz Loop speed as a toggle frequency Normalized Speed 333.3kHz 83.3kHz Loop frequency normalized to a 1MHz CPU clock ASM Instructions 2 2 Loop Code Size 4 bytes 4 bytes 2 words 3.5 bytes At this point, we can do a raw comparison of absolute toggle frequency performance after the hand optimization. Comparing this way gives the PIC the advantage of running at 32MHz while the AVR is limited to 20MHz. Interestingly the PIC gains a little as expected, but the overall picture does not change much. The source code can be downloaded here: PIC MPLAB-X Project file MicroforumToggleTestPic16f1.zip AVR MPLAB-X Project file MicroforumToggleTest4808.zip What next? For our next installment, we have a number of options. We could add more cores/processors to this test of ours, or we can take a different task and cycle through the candidates on that. We could also vary the tools by using different compilers and see how they stack up against each other and across the architectures. Since our benchmarks will all be based on real-world tasks it should not matter HOW the CPU is performing the task or HOW we created the code, the comparison will simply be how well the job gets done. Please do post any ideas or requests in the comments and we will see if we can either improve this one or oblige with another Toe-to-toe comparison. Updates: This post was updated to use the 1 cycle STD instruction instead of the 2 cycle STS instruction for the hand-optimized AVR version in round 2
  6. I think with these drivers it always depends on what you need and if this is a good match for your requirements. If you say that this seems very complicated it sounds like this driver probrably is doing a lot more than you need, so my first question would be "what do you expect instead"? I know N9WXU has talked to much of this already and also said it depends on whether you require a ring buffer. What I can offer you is what the thinking was for this implementation. For this driver the requirements were: We have to be able to receive data on interrupt, this ensures we never miss any bytes due to data overrun (next byte received before prior byte is read out). The application does NOT want to process the data inside of the ISR, so we need to store the data temporarily We need to be able to store multiple bytes, which means we may have 7 or 8 interrupts before we process the data. This means the application can safely take much longer to process the data before we lose data. If you serial port is running 115200 it means one character arrives every 87us. If we have a buffer of 16 bytes it means we only need to service the serial port at least once every 1.4ms to be sure we never lose any data, this extra time can be very important to us, and we can make the buffer bigger to get even more time. If this matches your situation you need to do the following. In the ISR you need to: 1. Store the data in a temp buffer // use this default receive interrupt handler code eusart2RxBuffer[eusart2RxHead++] = RCREG2; 2. Handle wrapping of the buffer when you reach the end of the array if(sizeof(eusart2RxBuffer) <= eusart2RxHead) { eusart2RxHead = 0; } 3. And keep track of how many bytes we have ready for processing eusart2RxCount++; When the app processes the data you need to: 1. If you try to read 1 byte before the data is available you need to wait until there is something to read uint8_t readValue = 0; while(0 == eusart2RxCount) { } 2. Remove one byte from the ring buffer and again handle the wrapping at the end of the array readValue = eusart2RxBuffer[eusart2RxTail++]; if(sizeof(eusart2RxBuffer) <= eusart2RxTail) { eusart2RxTail = 0; } 3. Reduce the number of bytes in the buffer by one (we need to disable the ISR or a collision can happen here), and then return the retrieved byte PIE3bits.RC2IE = 0; eusart2RxCount--; PIE3bits.RC2IE = 1; return readValue; I think this is kind of as simple as you can ever do this without reducing the functionality. Of course if you just wanted to without interrupts read the single byte all you need is to return RCREG, which BTW is what you will get if you uncheck the interrupt checkbox. Also if you do not like all of the ring buffer stuff you can enable interrupts and replace the ISR with the same thing you get when you do not need a buffer, then it is simpler but more likely to lose data. PS. I did not describe the eusart2RxLastError line as I cannot see all the code for that here and I cannot remember the details about that one line. What it is doing is updating a global (eusart2RxLastError) to indicate if there was any error. From the looks of this code snippet that part of the code may have some bugs in it as the last error is not updated after the byte is read out, but I may just be missing the start of the ISR ...
  7. If there is an explanation of the numbers right there they are not rightfully magical anymore 🙂 I can embrace that kind of description, as long as you have the discipline to keep the comments and the values in sync. But I still prefer describing the register instead of the value, if I want to change the prescaler or the CCP1CON value I now need to reach for the datasheet (although the convenient reference will tell me where!) I would prefer having the entire CCP1CON described here but especially on 32bit machines this can be entirely unpractical and then I will be OK with that scheme.
  8. I do not like the comments describing the magic number's value, I much prefer the comments describing the register here, to help you not having to go to the datasheet. The comments should make the static information like the register design clear here so that the reader of the code can understand what the number means by just reading the comment. This is subtly different from the comment claiming the meaning. If you do this the comments will not go out of sync with the code and also help you if you want to change the value to something it is not currently set to. Comments should describe WHY we are doing things, these motivations should be design, assumptions or external constraints.
  9. The driver is more like a framework for you to inject your own functions to change the behavior. The framework is pretty solid but the injected stuff is not complete (as you have discovered). The way to do that is to add a counter to the code we just did. When you start a transaction set it to X and count down on every retry. If you enable the NAK polling callback that will cause a retry, so you just have to modify that address nak function to retry only a limited number of times and that would be it 🙂
  10. Ok, it is never that easy right :) Couple of small oversights up to now. 1. The do_I2C_SEND_STOP did not clear the busy flag 2. My code I posted above has a bug, doing a post-decrement and checkign for 0 later, which does not work. 3. NAK polling is enabled without any timeout, we need to disable this. After fixing these 3 things it should work as expected. This is how: 1. There is a bug in the driver at do_I2C_SEND_STOP, change that to look like this. It is setting the state to idle here but missing the clearing of the busy flag. static i2c_fsm_states_t do_I2C_SEND_STOP(void) { i2c1_driver_stop(); i2c_status.busy = 0; return I2C_IDLE; } 2. There is a bug in the code I posted above, I was doing to--, and after that checking for 0, but the post-decrement will reduce to to -1 by the time I was checking, so that should be like the following, main difference being : inline int mssp1_waitForEvent(uint16_t *timeout) { uint16_t to = 50000; // This will cause a 50ms timeout on I2C if(PIR3bits.SSP1IF == 0) { while(--to) { if(PIR3bits.SSP1IF) break; __delay_us(1); // We reduce this to 1us, the while loop will add some delay so the timeout will be at least 50ms } if (to == 0) { return -1; } } return 0; } 3. At this point the driver will operate in the default mode which is NAK Polling, this means that every time we get a NAK the driver will retry, assuming that the device we are talking to is busy, and we need to retry when a NAK happens. This is activated by setting the addressNAK callback to do a restart-write. We do not want this for your case, we want to fail and return when there is no answer. We can do this by commenting out the line in i2c_simple_master which enables this as follows: void i2c_write1ByteRegister(i2c_address_t address, uint8_t reg, uint8_t data) { while(!i2c_open(address)); // sit here until we get the bus.. i2c_setDataCompleteCallback(wr1RegCompleteHandler,&data); i2c_setBuffer(&reg,1); //i2c_setAddressNACKCallback(i2c_restartWrite,NULL); //NACK polling? i2c_masterWrite(); while(I2C_BUSY == i2c_close()); // sit here until finished. } Once you have done these three things the driver should do what you need, it will go back to idle after an address NAK with i2c_status.error set to I2C_FAIL.
  11. So there seems to be a couple of bugs here, firstly that it is not realizing that the address was a NAK, that is pretty standard and should work correctly, but after that it switches the command from write to read, which is also not correct. I would expect it to retry, but it does not make sense to retry using read if the initial attempt was to write to the device... I don't know how soon I will have time to debug this so here are some tips for you on how the driver works. The state machine and all decisions happen in i2c_masterOperation which is in i2c_master.c. In that function it will check if you are using the interrupt driven driver, in which case it will just return and let the state change happen in the IRQ, else it will run the poller, which will poll for the change in state and run the interrupt if the state change has happened. Essentially the poller is just polling the interrupt flag SSP1IF and calling the ISR when this is set. So there are 2 possible ways this could be going wrong. Case 1 is if the IF is never set, you can check this in the debugger but from what you describe it is not timing out which means that the IF seems to be set when the NAK happens. The other option is that NAK check is disabled or not successfully checked in the ISR. You should check here : void i2c_ISR(void) { mssp1_clearIRQ(); // NOTE: We are ignoring the Write Collision flag. // the write collision is when SSPBUF is written prematurely (2x in a row without sending) // NACK After address override Exception handler if(i2c_status.addressNACKCheck && i2c1_driver_isNACK()) { i2c_status.state = I2C_ADDRESS_NACK; // State Override } i2c_status.state = fsmStateTable[i2c_status.state](); } That if check there should be true. From what I can see, if you are not seeing a timeout, you must be seeing this ISR function being called from the poller, and this must set the state to I2C_ADDRESS_NACK, if this does not happen we can investigate if it is because the check is disabled or the ACKSTAT register has the wrong value. If it goes in there and the state is set the next place to look is where the NAK is processed, which should be here: // TODO: probably need 2 addressNACK's one from read and one from write. // the do NACK before RESTART or STOP is a special case that a new state simplifies. static i2c_fsm_states_t do_I2C_DO_ADDRESS_NACK(void) { i2c_status.addressNACKCheck = 0; i2c_status.error = I2C_FAIL; switch(i2c_status.callbackTable[i2c_addressNACK](i2c_status.callbackPayload[i2c_addressNACK])) { case i2c_restart_read: case i2c_restart_write: return do_I2C_SEND_RESTART(); default: return do_I2C_SEND_STOP(); } }
  12. Yes that is what I meant when I said "I should actually check why you are not getting one". I would expect that you would send out the address and receive back a NAK when there is nothing connected to the bus. It would be important that you do have the pull-up resistors on the bus though. Looks like I am going to have to crack out that hardware and look at the signals on my Saleae to see what exactly is going on. I left my Saleae at the office so I can only do that tomorrow. Stand by and I will let you know what I find.
  13. Ok this is slightly more complex because the status is one layer up so you have to pass the fact that there was a timeout up one layer, but that function does not return anything, so you have to change that. 3 easy steps. Start in i2c_driver.h and change the prototype of the function like this: INLINE int mssp1_waitForEvent(uint16_t*); Change the mssp code to this: inline int mssp1_waitForEvent(uint16_t *timeout) { uint16_t to = 50000; // This will cause a 50ms timeout on I2C if(PIR3bits.SSP1IF == 0) { while(to--) { if(PIR3bits.SSP1IF) break; __delay_us(1); // We reduce this to 1us, the while loop will add some delay so the timeout will be at least 50ms } if (to == 0) { return -1; } } return 0; } And then lastly catch the error in i2c_master.c inline void i2c_poller(void) { while(i2c_status.busy) { if (mssp1_waitForEvent(NULL) == -1) { i2c_status.state = I2C_ADDRESS_NACK; // State Override for timeout case } i2c_ISR(); } } So what this does is pass up -1 when there is a timeout, which will then advance the state machine (which happens in i2c_ISR() ) based on the status. Now as I said I just used the Address NAK behavior there, when you have no slave connected you should see an address NAK (I should actually check why you are not getting one), but in this case we are saying a timeout requires the same behavior which should work ok. If it does not we may want to add a state for timeout as I described before. But let's try the timeout using the I2C_ADDRESS_NACK method first for your board. If this does not work I will crack out a board and run it on the hardware to see exactly what is happening.
  14. It is just the other layer, it has mssp at the bottom, i2c_driver and master and simple on top of that. I just did not pay close enough attention there, I did not check if it would compile, let me check it out tomorrow and I can help you to get that to build.
  15. Last time that happened to me was yesterday!
  16. Ok, I am back up and running and I see that with that you will end up stuck in the next loop. You can force it to break out of the loop by simulating an error. The correct error for your case, and correct behavior in most cases when you get a timeout, would be to perform the AddressNAK behavior. You can trigger that by doing this: inline void mssp1_waitForEvent(uint16_t *timeout) { uint16_t to = 50000; // This will cause a 50ms timeout on I2C if(PIR3bits.SSP1IF == 0) { while(to--) { if(PIR3bits.SSP1IF) break; __delay_us(1); // We reduce this to 1us, the while loop will add some delay so the timeout will be at least 50ms } if (to == 0) { i2c_status.state = I2C_ADDRESS_NACK; // State Override for timeout case } } } If you want different behavior in the case of a timeout than you have for an address NAK you can always add an event to the state table called stateHandlerFunction, and at this location set the state to the new entry, and then copy the do_I2C_DO_ADDRESS_NACK() function and change the behavior in that to what you want to do different for a timeout. You may e.g. set the error to I2C_TIMEOUT which you could add to i2c_error_t and you probably do not want to do the do_I2C_SEND_RESTART. All of that is of course a more substantial change, but I could walk you through this if you want.
  17. You can use statically linked memory (like a global array of items) to allocate, or you can allocate it on the stack by creating the variable in the function where it is being used. In the case where you statically allocate it the case where you run out of memory will cause the code not to compile, which means of course when it does compile you are guaranteed that you can never fail in this way. If you allocate from any pool (or heap) you will always have to be excessively careful as things like fragmentation can easily catch you. FreeRTOS e.g. has a safe heap implementation which has malloc but no free, that allows you to call malloc during init to initialize all the memory you need, and get determinisitc failure if you do not have enough memory which is easy to debug. If you have a free then making it safe is substantially more difficult because you will fail at the point of maximum memory pressure and this will likely be a race condition which may have low probability of hapening during your testing. The best course of action is to design the system in such a way that it does not compile when you do not have sufficient resources, that way there is no guessing game. Proper error checking only gets you so far. I have seen a case where the init function of the UART (which was the only user interface) failed to allocate memory (linker settings were wrong of course), but the point is that error checking would not have helped much in that case. I have also seen similar cases where there is no way to recover especially in bootloaders.
  18. That delay is central to solving your problem. It is really part of some unfinished timeout code at the top of that function which you thankfully included into your question! The idea was that you can use the concept shown in the comments to produce a timeout in the driver. If you do not expect any timeouts you can safely remove the delay, it simply makes the timeout be a multiple of the delay number, so if we delay 100us, and we set timeout to 100 (the default) then we will get a 10ms timeout. What I would suggest is that you complete the timeout implementation as follows: First you need to set the timeout variable to a fixed value, and then we will reduce the unit delay to 1us (you could remove it altogether and the timeout will be 50,000 loop cycles. If this is too short you can always increase the variable to 32bit and increase the count or add a couple of NOP's to the function. It should look like this when you are done: inline void mssp1_waitForEvent(uint16_t *timeout) { uint16_t to = 50000; // This will cause a 50ms timeout on I2C if(PIR3bits.SSP1IF == 0) { while(to--) { if(PIR3bits.SSP1IF) break; __delay_us(1); // We reduce this to 1us, the while loop will add some delay so the timeout will be at least 50ms } } } MPLAB-X has died on me today, I am re-installing and will check out if this is sufficient to fix your problem as soon as I have it back up and running, in the meantime you could try the above.
  19. All too often I see programmers stumped trying to lay out the folder structure for their embedded C project. My best advice is that folder structure is not the real problem. it is just one symptom of dependency problems. If we fix the underlying dependencies a pragmatic folder structure for your project will probably be obvious due to the design being sound. In this blog I am going to first look briefly at Modularity in general, and then explore some program folder structures I see often, exploring if and why they smell. On Modularity in general Writing modular code is not nearly as easy as it sounds. Trying it out for real we quickly discover that simply distributing bits of code across a number of files does not solve much of our problems. This is because modularity is about Architecture and Design and, as such, there is a lot more to it. To determine if we did a good job we need to first look at WHY. WHY exactly do we desire the code to be modular, or to be more specific - what exactly are we trying to achieve by making the code modular? A lot can be said about modularity but to me, my goals are usually as follows: Reduce working set complexity through divide and conquer. Avoid duplication by re-using code in multiple projects (mobility). Adam Smith-like division of labor. When code is broken down into team-sized modules we can construct and maintain it more efficiently. Teams can have areas of specialization and everyone does not have to understand the entire problem in order to contribute. In engineering, functional decomposition is a process of breaking a complex system into a number of smaller subsystems with clear distinguishable functions (responsibilities). The purpose of doing this is to apply the divide and conquer strategy to a complex problem. This is also often called Separation of Concerns. If we keep that in mind we can test for modularity during code review by using a couple of simple core concepts. Separation: Is the boundary of every module clearly distinguishable? This requires every module to be in a single file, or else - if it spans multiple files - a single folder which encapsulates the contents of the module into a single entity. Independent and Interchangeable: This implies that we can also use the module in another program with ease, something Robert C Martin calls Mobility. A good test is to imagine how you would manage the code using version control systems if the module you are evaluating had to reside in a different repository, have its own version number and its own independent documentation. Individually testable: If a module is truly independent it can be used by itself in a test program without bringing a string of other modules along. Testing of the module should follow the Open-Closed principle which means that we can create our tests without modifying the module itself in any way. Reduction in working set Complexity: If the division is not making the code easier to understand it is not effective. This means that modules should perform abstraction - hiding as much of the complexity inside the module and exposing a simplified interface one layer of abstraction above the module function. Software Architecture is in the end all about Abstraction and Encapsulation, which means that making your code modular is all about Architecture. By dividing your project into a number of smaller, more manageable problems, you can solve each of these individually. We should be able to give each of these to a different autonomous team that has its own release schedule, it's own code repository and it's own version number. Exploring some program file structures Now that we have established some ground rules for testing for modularity, let's look at some examples and see if we can figure out which ones are no good and which ones can work based on what we discussed above. Example 1: The Monolith I would hope that we can all agree that this fails the modularity test on all counts. If you have a single file like this there really is only one way to re-use any of the code, and that is to copy and paste it into your other project. For a couple of lines this could still work, but normally we want to avoid duplicating code in multiple projects as this means we have to maintain it in multiple places and if a bug was found in one copy there would be no way to tell how many times the code has been copied and where else we would have to go fix things. I think what contributes to the problem here is that little example projects or demo projects (think about that hello world application) often use this minimalistic structure in the interest of simplifying it down to the bare minimum. This makes sense if we want to really focus on a very specific concept as an example, but it sets a very poor example of how real projects should be structured. Example 2: The includible main In this project, main.c grew to the point where the decision was made to split it into multiple files, but the code was never redesigned, so the modules still have dependencies back to main. That is usually when we see questions like this on Stack Overflow. Of course main.c cannot call into module.c without including module.h, and the module is really the only candidate for including main.h, which means that you have what we call a circular dependency. This mutual dependency indicates that we do not actually have 2 modules at all. Instead, we have one module which has been packaged into 2 different files. Your program should depend on the modules it uses, it does not make sense for any of these modules to have a reverse dependency back to your program, and as such it does not make any sense to have something like main.h. Instead, just place anything you are tempted to place in main.h at the top of main.c instead! If you do have definitions or types that you think can be used by more than one module then make this into a proper module, give it a proper name and let anything which uses this include this module as a proper dependency. Always remember that header files are the public interfaces into your C translation unit. Any good Object Oriented programming book will advise you to make as little as possible public in your class. You should never expose the insides of your module publically if it does not form part of the public interface for the class. If your definitions, types or declarations are intended for internal use only they should not be in your public header file, placing them at the top of your C file most likely the best. A good example is device configuration bits. I like to place my configuration bit definitions in a file by itself called device_config.h, which contains only configuration bit settings for my project. This module is only used by main, but it is not called main.h. Instead, it has a single responsibility which is easy to deduce from the name of the file. To keep it single responsibility I will never put other things like global defines or types in this file. It is only for setting up the processor config bits and if I do another project where the settings should be the same (e.g. the bootloader for the project) then it is easy for me to re-use this single file. In a typical project, you will want to have an application that depends on a number of libraries, something like this. Importantly we can describe the program as an application that uses WiFi, TempSensors, and TLS. There should not be any direct dependencies between modules. Any dependencies between modules should be classified as configuration which is injected by the application, and the code that ties all of this together should be part of the application, not the modules. It is important that we adhere to the Open-Closed principle here. We cannot inject dependencies by modifying the code in the libraries/modules that we use, it has to be done by changing the application. The moment we change the libraries to do this we have changed the library in an application-specific way and we will pay the price for that when we try to re-use it. It is always critical that the dependencies here run only in one direction, and that you can find all the code that makes up each module on your diagram in a single file or in a folder by itself to enable you to deal with the module as a whole. Example 3: The Aggregate or generic header file Projects often use an aggregate header file called something like "includes.h". This quickly leads to the pattern where every module depends on every other and is also known as Spaghetti Code. It becomes obvious if you look at the include graph or when you try and re-use a module in your project by itself for e.g. a test. When any header file is changed you have to re-test every module now. This fails the test of having clearly distinguishable boundaries and clear and obvious dependencies between modules. In MCC there is a good (or should I say bad?) example of such an aggregate header file called mcc.h. I created a minimal project using MCC for the PIC16F18877 and only added the Accel3 click to the project as a working example for this case. The include graph generated using Doxygen looks as follows. There is no indication from this graph that the Accelerometer is the one using the I2C driver, and although main never calls to I2C itself it does look like that dependency exists. The noble intention here is of course to define a single external interface for MCC generated code, but it ends up tying all of the MCC code together into a single monolithic thing. This means my application does not depend on the Accelerometer, it now depends instead on a single monolithic thing called "everything inside of MCC", and as MCC grows this will become more and more painful to manage. If you remove the aggregate header then main no longer includes everything and the kitchen sink, and the include graph reduces to something much more useful as follows: This works better because now the abstractions are being used to simplify things effectively, and the dependency of the sensor on I2C is hidden from the application level. This means we could change the sensor from I2C to SPI without having any impact on the next layer up. Another version of this anti-pattern is called "One big Header File", where instead of making one header that includes all the others, we just place all the contents of all those headers into a single global file. This file is often called "common.h" or "defs.h" or "global.h" or something similar. Ward Cunningham has a good comprehensive list of the problems caused by this practice on his wiki. Example 4: The shared include folder This is a great example of Cargo Culting something that sometimes works in a library, and applying it everywhere without understanding the consequences. The mistake here is to divide the project into sources and headers instead of dividing it into modules. Sources and headers are hopefully not the division that comes to mind when I ask you to divide code into modules! In the context of a library, where the intention is very much to have an external interface (include) separated from its internal implementation (src), this segregation can make sense, but your program is not a library. When you look at this structure you should ask how would this work in the following typical scenarios: If one of the libraries grows enough that we need to split it into multiple files? How will you now know which headers and/source belong to which library? If two libraries end up with identically named files? Typical examples of collisions are types.h, config.h, hal.h, callbacks.h or interface.h. If I have to update a library to a later version, how will I know which files to replace if they are all mixed into the same folder? How do I know which files are part of my project, and as such, I should maintain them locally, as opposed to which files are part of a library and should be maintained at the library project location which is used in many projects? This structure is bad because it breaks the core architectural principles of cohesion and encapsulation which dictates that we keep related things together, and encapsulate logical or functional groupings into clearly identifiable entities. If you do not get this right it leads to library files being copied into every project, and that means multiple copies of the same file in revision control. You also end up with files that have nothing to do with each other grouped together in the same folder. Example 5: A better way On the other hand, if you focus on cohesion and encapsulation you should end up with something more like this I am not saying this is the one true way to structure your project, but with this arrangement, we can get the libraries from revision control and simply replace an entire folder when we do. It is also obvious which files are part of each library, and which ones belong to my project. We can see at a glance that this project has it's own code and depends on 3 libraries. The structure embodies information about the project which helps us manage it, and this information is not duplicated requiring us to keep data from different places in sync. We can now include these libraries into this, or any other project, by simply telling Git to fetch the desired version of each of these folders from its own repository. This makes it easy to update the version of any particular library, and name collisions between libraries are no longer an issue. Additionally, as the library grows it will be easy to distinguish in my code which library I have a dependency on, and exactly which types.h file I am referring to when I refer to the header files as follows. Conclusion Many different project directory structures could work for your project. We are in no way saying that this is "the one true structure". What we are saying is that when the time comes to commit your project to a structure, do remember the pros and cons of each of these examples we discussed. That way you will at least know the coming consequences of your decisions before you are committed to them. Robert C. Martin, aka Uncle Bob, wrote a great article back in 2000 describing the SOLID architectural principles. SOLID is focussed on managing dependencies between software modules. Following these principles will help create an architecture that manages the dependencies between modules well. A SOLID design will naturally translate into a manageable folder structure for your embedded C project.
  20. Some advice for Microchip: If this was my product I would stop selling development kits with A1 or A3 silicon to customers. I2C is widely used and it will create a really bad impression of the product's reliability if customers were to evaluate it with defective silicon. And please fix the Errata, your workaround for I2C Issue 1 does not work as advertized !
  21. Ok some feedback on this one. The workaround in the Errata turns out does not work. The Errata claims you can But this does not work at all. We clear BCL and wait for both S and P bits to be 0 but this never happens and we end up waiting forever. As an attempt to work around this we decided to try to reset the entire module, this means that we set the ON bit in I2CxCON to 0 to disable the module, this resets all the status bits and resets the I2C state machine, once this is done we wait 4 clock cycles (since the second workaround in the Errata suggests we should wait for 4 clock cycles) and then we set the ON bit back to a 1. This clears the BCL error condition correctly and allows us to continue using the peripheral. We have not yet tried to implement the workaround with the timeout that resets the I2C peripheral if it becomes unresponsive without warning, that will be coming up next, but it does seem like that will work fine as it will also disable the entire module when the condition happens which seems to clean out the HW state machine which it looks like is the culprit here. The I2C peripheral section 24 of the family datasheet can be found here http://ww1.microchip.com/downloads/en/devicedoc/61116f.pdf
  22. I am struggling to figure out how to work around what seems to be a silicon bug in the PIC32MZ2048EFM on A1 silicon. I am using the development kit DM320104. From MPLABX I can see that the board I have is running A1 revision silicon. Looking at the Errata for the device I found that there is a silicon Errata on the I2C peripheral and I am hitting at least 2 of the described problems. • False Error Condition 1: False Master Bus Collision Detect (Master-mode only) – The error is indicated through the BCL bit (I2CxSTAT). • False Error Condition 3: Suspended I2C Module Operations (Master or Slave modes) – I2C transactions in progress are inadvertently suspended without error indications. In both cases the Harmony I2C driver ends up in a loop never returning again. For condition 1 the ISR keeps triggering and I2C stops working and for condition 3 the driver just gets stuck. I have tried to implement the workarounds listed in that Errata but I seem to have no luck. The Errata does not have an example, only a text description so I was hoping someone on here has tried this and can help me figure out what I am doing wrong. Currently for condition 1 from the bus collision ISR we are clearing the ISR flag and the BCL bit and then setting the start bit in the I2C1STAT register, but the interrupt keeps on firing away and no start condition is happening. Any idea what we are doing wrong?
  23. Absolutley, and nice examples! Hungarian notation breaks the abstraction of having a variable name with unspecified underlying storage, so I think it is the worst way to leak implementation details!
  24. I think specifically we need to know what processor you are trying to use as this differs from device to device. The simplest and most generic answer would be to add the UART to your project and click on the checkbox to enable interrupts for the driver. After generating code you will have to set the callback which you want called when the interrupt occurs. After this you need to make sure you are enabling interrupts in your main code and it should work. If you supply us with the details above I will post some screenshots for you on how to do this. Just to show you the idea I picked the 16F18875 and added the EUSART as follows: You can see I clicked next to "Enable EUSART Interrupts" Then in my main I ensured the interrupts are enabled. When I now run the code the ISR created by MCC is executed every time a byte is received. The ISR function is called EUSART_Receive_ISR and it is located in the eusart.c file. You can edit this function or replace it by setting a different function as ISR by calling EUSART_SetRxInterruptHandler if you want to change the behavior.
  25. Yes Doxygen is great for that, it also allows you to click on the boxes and drill down into the details. I use it for this all of the time!
  • Create New...