Jump to content
 

Search the Community

Showing results for tags 'pic'.

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • Top Level
    • Announcements
    • The Lounge
    • Questions and Answers
    • Forum Help
    • Project Gallery
    • Vendor Bug Reports

Blogs

  • What every embedded programmer should know about ...
  • Programming Lore
  • Toe-to-toe

Categories

  • Files

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


About Me

Found 5 results

  1. I recently got my hands on a brand new PIC18F47K40 Xpress board (I ordered one after we ran into the Errata bug here a couple of weeks ago). I wanted to start off with a simple "Hello World" app which would use the built-in CDC serial port, which is great for doing printf debugging with, and interacting with the board in general since it has no LED's and no display to let me know that anything is working, but immediately I got stuck. Combing the user's guide I could not find any mention of the CDC interface or which pins to configure to make it work, so I stared at the schematic and identified a hand full of candidates which I could try as the correct pins. Eventually I figured it out, the TX pin to send data to the PC is RB6 and the RX pin which will receive data from the PC is RB7, so if you are setting up the UART using MCC it should look like this: I created a very simple little terminal application to test the board with and thought this may be something useful for others who start with this board, especially since the standard MCC UART code falls afoul of the dreaded Errata we discussed here before caught me out again even on this simple little application. So remember to set the linker up to implement the workaround for the Errata like so -> The little program simply waits for a keypress (so you have time to set up your terminal application) and then (at 115200 BAUD) sends the welcome message "Hello World". After this the program will echo every character you type, but it will enclose it in a message so that you are sure you are not just fooled by the local echo in your terminal program! Here is the whole program, the serial port and other setup is all generated using MCC. void main(void) { // Initialize the device SYSTEM_Initialize(); // Wait for a keypress before we start EUSART1_Read(); // Say Hello printf("Hello World\r\n"); while (1) { printf("Received: '%c' \r\n", EUSART1_Read()); } } The full example project can be downloaded from here ADC_47K40.zip Note: I have set the project up to automatically copy the hex file onto the board, programming it, when I build the code. This is however set up for my MAC, if you are no a PC or linux you will probably see an error when building that it could not copy the file. If you want to set this up for your platform the instructions are in the users guide for the board. I include a screenshot of the page here for convenience.
  2. Comparing raw pin toggling speed AVR ATmega4808 vs PIC 16F15376 Toggling a pin is such a basic thing. After all, we all start with that Blinky program when we bring up a new board. It is actually a pretty effective way to compare raw processing speed between any two microcontrollers. In this, our first Toe-to-toe showdown, we will be comparing how fast these cores can toggle a pin using just a while loop and a classic XOR toggle. First, let's take a look at the 2 boards we used to compare these cores. These were selected solely because I had them both lying on my desk at the time. Since we are not doing anything more than toggling a pin we just needed an 8-bit AVR core and an 8-bit PIC16F1 core on any device to compare. I do like these two development boards though, so here are the details if you want to repeat this experiment. In the blue corner, we have the AVR, represented by the ATmega4808, sporting an AtMega core (AVRxt in the instruction manual) Clocking at a maximum of 20MHz. We used the AVR-IOT WG Development Board, part number AC164160. This board can be obtained for $29 here: https://www.microchip.com/Developmenttools/ProductDetails/AC164160 Compiler: XC8 v2.05 (Free) In the red corner, we have the PIC, represented by the 16F15376, sporting a PIC16F1 Enhanced Midrange core. Clocking at a maximum of 32MHz. We used the MPLAB® Xpress PIC16F15376 Evaluation Board, part number DM164143. This board can be obtained at $12 here: https://www.microchip.com/developmenttools/ProductDetails/DM164143 Compiler: XC8 v2.05 (Free) Results This is what we measured. All the details around the methodology we used and an analysis of the code follows below and attached you will find all the source code we used if you want to try this at home. The numbers in the graph are pin toggling frequency in kHz after it has been normalized to a 1MHz CPU clock speed. How we did it (and some details about the cores) Doing objective comparisons between 2 very different cores is always hard. We wanted to make sure that we do an objective comparison between the cores which you can use to make informed decisions on your project. In order to do this, we had to deal with the fact that the maximum clock speed of these devices is not the same and also that the fundamental architecture of these two cores is very different. In principle, the AVR is a Load-store Architecture machine with a 1 stage pipeline. This basically means that all ALU operations have to be performed between CPU registers and the RAM is used to load from and store results to. The PIC, on the other hand, uses a Register Memory Architecture, which means in short that some ALU operations can be performed on RAM locations directly and that the machine has a much smaller set of registers. On the PIC all instructions are 1 word in length, which is 14-bits wide, while the data bus is 8-bits in size and all results will be a maximum of 8-bits in size. On the AVR instructions can be 16-bit or 32-bit wide which results in different execution times depending on the instruction. Both processors have a 1 stage pipeline, which means that the next instruction is fetched while the current one is being executed. This means branching causes an incorrect fetch and results in a penalty of one instruction cycle. One major difference is that the AVR, due to its Load-store Architecture, is capable of completing the instruction within as little as just one clock cycle. When instructions need to use the data bus they can take up to 5 clock cycles to execute. Since the PIC has to transfer data over the bus it takes multiple cycles to execute an instruction. In keeping with the RISC paradigm of highly regular instruction pipeline flow, all instructions on the PIC take 4 clock cycles to execute. All of this just makes it tricky and technical to compare performance between these processors. What we decided to do is rather take typical tasks we need the CPU to perform which occurs regularly in real programs and simply measure how fast each CPU can perform these tasks. This should allow you to work backwards from what your application will be doing during maximum throughput pressure on the CPU and figure out which CPU will perform the best for your specific problem. Round 1: Basic test For the first test, we used virtually the same code on both processors. Since both of these are supported by MCC it was really easy to get going. We created a blank project for the target CPU Fired up MCC Adjusted the clock speed to the maximum possible Clicked in the pin manager to make a single pin on PORTC an output Hit generate code. After this all we added was the following simple while loop: PIC AVR while (1) { LATC ^= 0xFF; } while (1) { PORTC.OUT ^= 0xFF; } The resulting code produced by the free compilers (XC8 v2.05 in both cases) was as follows, interestingly enough both loops had the same number of instructions (6 in total) including the loop jump. This is especially interesting as it will show how the execution of a same-length loop takes on each of these processors. You will notice that without optimization there is some room for improvement, but since this is how people will evaluate the cores at first glance we wanted to go with this. PIC AVR .tg {border-collapse:collapse;border-spacing:0;} .tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;} .tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;} .tg .tg-0lax{text-align:left;vertical-align:top} Address Hex Instruction 07B3 30FF MOVLW 0xFF 07B4 00F0 MOVWF __pcstackCOMMON 07B5 0870 MOVF __pcstackCOMMON, W 07B6 0140 MOVLB 0x0 07B7 069A XORWF LATC, F 07B8 2FB3 GOTO 0x7B3 Address Hex Instruction 017D 9180 LDS R24, 0x00 017E 0444 017F 9580 COM R24 0180 9380 STS 0x00, R24 0181 0444 0182 CFFA RJMP 0x17D We used my Saleae logic analyzer to capture the signal and measure the timing on both devices. Since the Saleae is thresholding the digital signal and the rise and fall times are not always identical you will notice a little skew in the measurements. We did run everything 512x slower to confirm that this was entirely measurement error, so it is correct to round all times to multiples of the CPU clock in all cases here. PIC AVR Analysis For the PIC The clock speed was 32MHz. We know that the PIC takes 4 clock cycles to execute one instruction, which gives us an expected instruction rate of one instruction every 125ns. Rounding for measurement errors we see that the PIC has equal low and high times of 875ns. That is 7 instruction cycles for each loop iteration. To verify if this makes sense we can look at the ASM. We see 6 instructions, the last of which is a GOTO, which we know will take 2 instruction cycles to execute. Using that fact we can verify that the loop repeats every 7 instruction cycles as expected (7 x 125ns = 875ns.) For the AVR The clock speed was 20MHz. We know that the AVR takes 1 clock cycle per instruction, which gives us an expected instruction rate of one instruction every 50ns. Rounding for measurement errors we see that the AVR has equal low and high times of 400ns. That is 8 instruction cycles for each loop iteration. To verify if this makes sense we again look at the ASM. We see 4 instructions, the last of which is an RJMP, which we know will take 2 instruction cycles to execute. We also see one LDS which takes 3 cycles because it is accessing sram, and one STS instruction which will each take 2 cycles and a Complement instruction which takes 1 more. Using those facts we can verify that the loop should repeat every 8 instruction cycles as expected (8 x 50ns = 400ns.) Comparison Since the 2 processors are not running at the same clock speed we need to do some math to get a fair comparison. We think 2 particular approaches would be reasonable. Compare the raw fastest speed the CPU can do, this gives a fair benchmark where CPU's with higher clock speeds get an advantage. Normalize the results to a common clock speed, this gives us a fair comparison of capability at the same clock speed. In the numbers below we used both methods for comparison. The numbers AVR PIC Notes Clock Speed 20MHz 32MHz Loop Speed 400ns 875ns Maximum Speed 2.5Mhz 1.142MHz Loop speed as a toggle frequency Normalized Speed 125kHz 35.7kHz Loop frequency normalized to a 1MHz CPU clock ASM Instructions 4 6 Loop Code Size 12 bytes 12 bytes 4 instructions 6 words 10.5 bytes 6 instructions Due to the nuances here we compared this 3 ways Total Code Size 786 bytes 101 words 176.75 bytes Round 2: Expert Optimized test For the second round, we tried to hand-optimize the code to squeeze out the best possible performance from each processor. After all, we do not want to just compare how well the compilers are optimizing, we want to see what is the absolute best the raw CPU's can achieve. You will notice that although optimization doubled our performance, it made little difference to the relative performance between the two processors. For the PIC we wrote to LATC to ensure we are in the right bank, and pre-set the W register, this means the loop reduces to just a XORF and a GOTO. For the AVR we changed the code to use the Toggle register instead doing an XOR of the OUT register for the port. The optimized code looked as follows. PIC AVR LATC = 0xFF; asm ("MOVLW 0xFF"); while (1) { asm ("XORWF LATC, F"); } asm ("LDI R30,0x40"); asm ("LDI R31,0x04"); asm ("SER R24"); while (1){ asm ("STD Z+7,R24"); } The resulting ASM code after these changes now looked as follows. Note we did not include the instructions outside of the loop here as we are really just looking at the loop execution. PIC AVR .tg {border-collapse:collapse;border-spacing:0;} .tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;} .tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;} .tg .tg-0lax{text-align:left;vertical-align:top} Address Hex Instruction 07C1 069A XORWF LATC, F 07C2 00F0 GOTO 0x7C1 Address Hex Instruction 0180 8387 STD Z+7,R24 0181 CFFE RJMP 0x180 Here are the actual measurements: PIC AVR Analysis For the PIC we do not see how we could improve on this as the loop has to be a GOTO which takes 2 cycles and 1 instruction is the least amount of work we could possibly do in the loop so we are pretty confident that this is the best we can do, and when measuring we see 3 instruction cycles which we think is the limit here. Note: N9WXU did suggest that we could fill all the memory with XOR instructions and let it loop around forever and in doing so save the GOTO, but we would still have to set W to FF every second instruction to have consistent timing, so this would still be 2 instructions per "loop" although it would use all the FLASH and execute in 250ns. Novel as this idea was, since that means you can do nothing else we dismissed that idea as not representative. For the AVR we think we are also at the limits here. The toggle register lets us toggle the pin in 1 clock cycle which cannot be beaten, and the RJMP unavoidably adds 2 more. We measure 3 cycles for this. AVR PIC Notes Clock Speed 20MHz 32MHz Loop Speed 150ns 375ns Maximum Speed 6.667Mhz 2.667MHz Loop speed as a toggle frequency Normalized Speed 333.3kHz 83.3kHz Loop frequency normalized to a 1MHz CPU clock ASM Instructions 2 2 Loop Code Size 4 bytes 4 bytes 2 words 3.5 bytes At this point, we can do a raw comparison of absolute toggle frequency performance after the hand optimization. Comparing this way gives the PIC the advantage of running at 32MHz while the AVR is limited to 20MHz. Interestingly the PIC gains a little as expected, but the overall picture does not change much. The source code can be downloaded here: PIC MPLAB-X Project file MicroforumToggleTestPic16f1.zip AVR MPLAB-X Project file MicroforumToggleTest4808.zip What next? For our next installment, we have a number of options. We could add more cores/processors to this test of ours, or we can take a different task and cycle through the candidates on that. We could also vary the tools by using different compilers and see how they stack up against each other and across the architectures. Since our benchmarks will all be based on real-world tasks it should not matter HOW the CPU is performing the task or HOW we created the code, the comparison will simply be how well the job gets done. Please do post any ideas or requests in the comments and we will see if we can either improve this one or oblige with another Toe-to-toe comparison. Updates: This post was updated to use the 1 cycle STD instruction instead of the 2 cycle STS instruction for the hand-optimized AVR version in round 2
  3. Whenever I start a new project I always start off reaching for a simple while(1) "superloop" architecture https://en.wikibooks.org/wiki/Embedded_Systems/Super_Loop_Architecture . This works well for doing the basics but more often than not I quickly end up short and looking to employ a timer to get some kind of scheduling going. MCC makes this pretty easy and convenient to set up. It contains a library called "Foundation Services" which has 2 different timer implementations, TIMEOUT and RTCOUNTER. These two library modules have pretty much the same interface, but they are implemented very differently under the hood. For my little "Operating System" I am going to prefer the RTCOUNTER version as keeping time accurately is more important to me than latency. The TIMEOUT module is capable of providing low latency reaction times whenever a timer expires by adjusting the timer overflow point so that an interrupt will occur right when the next timer expires. This allows you to use the ISR to call the action you want to happen directly and immediately from the interrupt context. Nice as that may be in some cases, it always increases the complexity, the code size and the overall cost of the system. In our case RTCOUNTER is more than good enough so we will stick with that. RTCOUNTER Operation First a little bit more about RTCOUNTER. In short RTCOUNTER keeps track of a list of running timers. Whenever you call the "check" function it will compare the expiry time of the next timer and call the task function for that timer if it has expired. It achieves this by using a single hardware timer which will operate in "Free Running" mode. This means the hardware timer will never be "re-loaded" by the code, it will simply overflow back to it's starting value naturally, and every time this happens the module will count that another overflow has happened in an overflow counter. The count of the timer is made up by a combination of the actual hardware timer and the overflow counter. By "hiding" or abstracting the real size of the hardware timer like this the module can easily be switched over to use any of the PIC timers, regardless if they count up or down or how many bits they implement in hardware. Mode 32-bit Timer value General (32-x)-bit of g_rtcounterH x-bit Hardware Timer Using TMR0 in 8-bit mode 24-bits of g_rtcounterH TMR0 (8-bit) Using TMR1 16-bits of g_rtcounterH TMR1H (8-bit) TMR1L (8-bit) RTCOUNTER is compatible with all the PIC timers, and you can switch it to a different timer later without modifying your application code which is nice. Since all that happens when the timer overflows is updating the counter the Timer ISR is as short and simple as possible. // We only increment the overflow counter and clear the flag on every interrupt void rtcount_isr(void) { g_rtcounterH++; PIR4bits.TMR1IF = 0; } When we run the "check" function it will construct the 32-bit time value by combining the hardware timer and the overflow counter (g_rtcounterH). It will then compare this value to the expiry time of the next timer in the list to expire. By keeping the list of timers sorted by expiry time it saves time during the checking (which happens often) by doing the sorting work during creation of the timer (which happens infrequently). How to use it Using it is failry straight-forward. Create a callback function which returns the "re-schedule" time for the timer. Allocate memory for your timer/task and tie it to your callback function Create the timer (which starts it) specifying how long before it will first expire Regularly call the check function to check if the next timer has expired, and call it's callback if it has. In C the whole program may look something like this example: #include "mcc_generated_files/mcc.h" int32_t ledFlasher(void* p); rtcountStruct_t myTimer = {ledFlasher}; void main(void) { SYSTEM_Initialize(); INTERRUPT_GlobalInterruptEnable(); INTERRUPT_PeripheralInterruptEnable(); rtcount_create(&myTimer, 1000); // Create a new timer using the memory at &myTimer // This is my main Scheduler or OS loop, all tasks are executed from here from now on while (1) { // Check if the next timer has expired, call it's callback from here if it did rtcount_callNextCallback(); } } int32_t ledFlasher(void* p) { LATAbits.RA0 ^= 1; // Toggle our pin return 1000; // When we are done we want to restart this timer 1000 ticks later, return 0 to stop } Example with Multiple Timers/Tasks Ok, admittedly blinking an LED with a timer is not rocket science and not really impressive, so let's step it up and show how we can use this concept to write an application which is more event-driven than imperative. NOTE: If you have not seen it yet I recommend reading Martin Fowler's article on Event Sourcing and how this design pattern reduces the probability of errors in your system on his website here. By splitting our program into tasks (or modules) which each perform a specific action and works independently of other tasks, we can generate code modules which are completely independent and re-usable quite easily. Independent and re-usable (or mobile as Uncle Bob says) does not only mean that the code is maintainable, it also means that we can test and debug each task by itself, and if we do this well it will make the code much less fragile. Code is "fragile" when you are fixing something in one place and something seemingly unrelated breaks elsewhere ... that will be much less likely to happen. For my example I am going to construct some typical tasks which need to be done in an embedded system. To accomplish this we will Create a task function for each of these by creating a timer for it. Control/set the amount of CPU time afforded to each task by controlling how often the timer times out Communicate between tasks only through a small number of shared variables (this is best done using Message Queue's - we will post about those in a later blog some time) Let's go ahead and construct our system. Here is the big picture view This system has 6 tasks being managed by the Scheduler/OS for us. Sampling the ADC to check the battery level. This has to happen every 5 seconds Process keys, we are looking at a button which needs to be de-bounced (100 ms) Process serial port for any incoming messages. The port is on interrupt, baud is 9600. Our buffer is 16 bytes so we want to check it every 10ms to ensure we do not loose data. Update system LCD. We only update the LCD when the data has changed, we want to check for a change every 100ms Update LED's. We want this to happen every 500ms Drive Outputs. Based on our secret sauce we will decide when to toggle some pins, we do this every 1s These tasks will work together, or co-operate, by keeping to the promise never to run for a long time (let's agree 10ms is a long time, tasks taking longer than that needs to be broken into smaller steps). This arrangement is called Co-operative Multitasking . This is a well-known mechanism of multi-tasking on a microcontroller, and has been implemented in systems like "Windows 3.1" and "Windows 95" as well as "Classic Mac-OS" in the past. By using the Scheduler and event driven paradigm here we can implement and test each of these subsystems independently. Even when we have it all put together we can easily replace one of these subsystems with a "Test" version of it and use that to generate test conditions for us to ensure everything will work correctly under typical operation conditions. We can "disable" any part of the system by simply commenting out the "create" function for that timer and it will not run. We can also adjust how often things happen or adjust priorities by modifying the task time values. As before we first allocate some memory to store all of our tasks. We will initialize each task with a pointer to the callback function used to perform this task as before. The main program= ends up looking something like this. void main(void) { SYSTEM_Initialize(); INTERRUPT_GlobalInterruptEnable(); INTERRUPT_PeripheralInterruptEnable(); rtcount_create(&adcTaskTimer, 5000); rtcount_create(&keyTaskTimer, 100); rtcount_create(&serialTaskTimer, 10); rtcount_create(&lcdTaskTimer, 100); rtcount_create(&ledTaskTimer, 500); rtcount_create(&outputTaskTimer, 1000); // This is my main Scheduler or OS loop, all tasks are executed from events while (1) { // Check if the next timer has expired, call it's callback from here if it did rtcount_callNextCallback(); } } As always the full project incuding the task functions and the timer variable declarations can be downloaded. The skeleton of this program which does initialize the other peripherals, but runs the timers completely compiles to only 703 words of code or 8.6% on this device, and it runs all 6 program tasks using a single hardware timer.
  4. Version 1.0.0

    101 downloads

    This zip file contains the project implementing Duff's Device using XC8 and a PIC16F18875 which goes with the Duff's Device Blog. The project can be run entirely in the simulator in MPLAB-X so no need to have the device to play with the code! See the blog for more details
  5. Version 3.0.0

    517 downloads

    This is the LAB manual for the 2018 masters class which showed how to use I2C and Timers on the HPC curiosity board. The last lab shows how to use the BLE Click board to connect to your cellphone and send the touch data over this channel.
×
×
  • Create New...