Jump to content


  • Content Count

  • Joined

  • Last visited

  • Days Won


Blog Entries posted by N9WXU

  1. N9WXU
    When comparing CPU's and architectures it is also a good idea to compare the frameworks and learn how the framework will affect your system.  In this article I will be comparing a number of popular Arduino compatible systems to see how different "flavors" of Arduino stack up in the pin toggling test.  When I started this effort, I thought it would be a straight forward demonstration of CPU efficiency, clock speed and compiler performance on the one side against the Arduino framework implementation on the other.  As is often the case, if you poke deeply into even the most trivial of systems you will always find something to learn.
    As I look around my board stash I see that there are the following Arduino compatible development kits:
    Arduino Nano Every (ATMega 4809 @ 20MHz AVR Mega) Mini Nano V3.0 (ATMega 328P @ 16MHz AVR) RobotDyn SAMD21 M0-Mini (ATSAMD21G18A @ 48MHz Cortex M0+) ESP-12E NodeMCU (ESP8266 @ 80MHz Tenselica) Teensy 3.2 (MK20DX256VLH7 @ 96MHz Cortex M4) ESP32-WROOM-32 (ESP32 @ 240MHz Tenselica) And each of these kits has an available Arduino framework.  Say what you will about the Arduino framework, there are some serious advantages to using it and a few surprises.  For the purpose of this testing I will be running one program on every board.  I will use vanilla "Arduino" code and make zero changes for each CPU.  The Arduino framework is very useful for normalizing the API to the hardware in a very consistent and portable manner.  This is mostly true at the low levels like timers, PWM and digital I/O, but it is very true as you move to higher layers like the String library or WiFi.  Strangely, there are no promises of performance.  For instance, every Arduino program has a setup() function where you put your initialization and a loop() function that is called very often.  With this in mind it is easy to imagine the following implementation:
    extern void setup(void); extern void loop(void); void main(void) { setup(); while(1) { loop(); } } And in fact when you dig into the AVR framework you find the following code in main.cpp
    int main(void) { init(); initVariant(); #if defined(USBCON) USBDevice.attach(); #endif setup(); for (;;) { loop(); if (serialEventRun) serialEventRun(); } return 0; } There are a few "surprises" that really should not be surprises.  First, the Arduino environment needs to be initialized (init()), then the HW variant (initVariant()), then we might be using a usb device so get USB started (USBDevice.attach()) and finally, the user setup() function.  Once we start our infinite loop.  Between calls to the loop function the code maintains the serial connection which could be USB.  I suppose that other frameworks could implement this environment a little bit differently and there could be significant consequences to these choices.
    The Test
    For this test I am simply going to initialize 1 pin and then set it high and low.  Here is the code.
    void setup() { pinMode(2,OUTPUT); } void loop() { digitalWrite(2,HIGH); digitalWrite(2,LOW); } I am expecting this to make a short high pulse and a slightly longer low pulse.  The longer low pulse is to account for the extra overhead of looping back.  This is not likely to be as fast as the pin toggles Orunmila did in the previous article but I do expect it to be about half as fast.  
    Here are the results. The 2 red lines at the bottom are the best case optimized raw speed from Orunmila's comparison.

    That is a pretty interesting chart and if we simply compare the data from the ATMEGA 4809 both with ASM and Arduino code, you see a 6x difference in performance.  Let us look at the details and we will summarize at the end.
    Nano 328P
    So here is the first victim.  The venerable AVR AT328P running 16MHz.  The high pulse is 3.186uS while the low pulse is 3.544uS making a pulse frequency of 148.2kHz.
    Clearly the high and low pulses are nearly the same so the extra check to handle the serial ports is not very expensive but the digitalWrite abstraction is much more expensive that I was anticipating.

    Nano Every
    The Nano Every uses the much newer ATMega 4809 at 20Mhz.  The 4809 is a different variant of the AVR CPU with some additional optimizations like set and clear registers for the ports.  This should be much faster.

    The high pulse is 1.192uS and the low pulse is 1.504uS.  Again the pulses are almost the same size so the additional overhead outside of the loop function must be fairly small.  Perhaps it is the same serial port test.  Interestingly, one of the limiting factors of popular Arduino 3d printer controller projects such as GRBL is the pin toggle rate for driving the stepper motor pulses.  A 4809 based controller could be 2x faster for the same stepper code.
    Sam D21 Mini M0
    Now we are stepping up to an ARM Cortex M0 at 48Mhz.  I actually expect this to be nearly 2x performance as the 4809 simply because the instructions required to set pins high and low should be essentially the same.

    Wow!  I was definitely NOT expecting the timing to get worse than the 4809.  The high pulse width is 1.478uS and the low pulse width is 1.916uS making the frequency 294.6kHz.  Obviously toggling pins is not a great measurement of CPU performance but if you need fast pin toggling in the Arduino world, perhaps the SAMD21 is not your best choice.
    Teensy 3.2
    This is a NXP Cortex M4 CPU at 96 MHz.  This CPU is double the clock speed as the D21 and it is a M4 CPU which has lots of great features, though those features may not help toggle pins quickly.

    Interesting.  Clearly this device is very fast as shown by the short high period of only 0.352uS.  But, this framework must be doing quite a lot of work behind the scenes to justify the 2.274uS of loop delay.
    Looking a little more closely I see a number of board options for this hardware.  First, I see that I can disable the USB.  Surely the USB is supported between calls to the loop function.  I also see a number of compiler optimization options.  If I turn off the USB and select the "fastest" optimizations, what is the result?
    Teensy 3.2, No USB and Fastest optimizations
    Making these two changes and re-running the same C code produces this result:

    That is much better.  It is interesting to see the compiler change is about 3x faster for this test (measured on the high pulse) and the lack of USB saves about 1uS in the loop rate.  This is not a definitive test of the optimizations and probably the code grew a bit, but it is a stark reminder that optimization choices can make a big difference.
    The ESP8266 is a 32-bit Tenselica CPU.  This is still a load/store architecture so its performance will largely match ARM though undoubtedly there are cases where it will be a bit different.  The 8266 runs at 80Mhz so I do expect the performance to be similar to the Teensy 3.2.  The wildcard is the 8266 framework is intended to support WiFI so it is running FreeRTOS and the Arduino loop is just one thread in the system.  I have no idea what that will do to our pin toggle so it is time to measure.

    Interesting.  It is actually quite slow and clearly there is quite a bit of system house-keeping happening in the main loop.  The high pulse is only 0.948uS so that is very similar to Nano Every at 1/4th the clock speed.  The low pulse is simply slow.  This does seem to be a good device for IoT but not for pin toggling.
    The ESP32 is a dual core very fast machine, but it does run the code out of a cache.  This is because the code is stored in a serial memory.  Of course our test is quite short so perhaps we do not need to fear the cache miss.
    Like the ESP8266, the Arduino framework is built upon a FreeRTOS task.  But this has a second CPU and lots more clock speed so lets look at the results:

    Interesting, the toggle rate is about 2x the Teensy while the clock speed is about 3x.  I do like how the pulses are nearly symmetrical.  A quick peek at the source code for the framework shows the Arduino running as a thread but the thread updates the watchdog timer and the serial drivers on each pass through the loop.
    It is very educational to make measurements instead of assumptions when evaluating an MCU for your next project.  A specific CPU may have fantastic specifications and even demonstrations but it is critical to include the complete development system and  code framework in your evaluation.  It is a big surprise to find the 16MHz AVR328P can actually toggle a pin faster than the ESP8266 when used in a basic Arduino project.
    The summary graph at the top of the article is duplicated here:

    In this graph, the Pin Toggling Speed is actually only 1/(the high period).  This was done on purpose so only the pin toggle efficiency is being compared.  In the test program, the low period is where the loop() function ends and other housekeeping work can take place.  If we want to compare the CPU/CODE efficiency, we should really normalize the pin toggling frequency to a common clock speed.  We can always compensate for inefficiency with more clock speed.

    This graph is produced by dividing the frequency by the clock speed and now we can compare the relative efficiencies.  That Cortex M4 and its framework in the Teensy 3.2 is quite impressive now.  Clearly the ESP-32 is pretty good but using its clock speed for the win.  The Mega 4809 has a reasonable framework just not enough clock speed.  All that aside, the ASM versions (or even a faster framework) could seriously improve all of these numbers.  The poor ESP8266 is pretty dismal.
    So what is happening in the digitalWrite() function that is making this performance so slow?  Put another way, what am I getting in return for the low performance?  There are really 3 reasons for the performance.
    Portability.  Each device has work to adapt to the pin interface so the price of portability is runtime efficiency Framework Support.  There are many functions in the framework that could be affected by the writing to the pins so the digitalWrite function must modify other functions. Application Ignorance.  The framework (and this function) cannot know how the system is constructed so they must plan for the worst. Let us look at the digitalWrite for the the AVR
    void digitalWrite(uint8_t pin, uint8_t val) { uint8_t timer = digitalPinToTimer(pin); uint8_t bit = digitalPinToBitMask(pin); uint8_t port = digitalPinToPort(pin); volatile uint8_t *out; if (port == NOT_A_PIN) return; // If the pin that support PWM output, we need to turn it off // before doing a digital write. if (timer != NOT_ON_TIMER) turnOffPWM(timer); out = portOutputRegister(port); uint8_t oldSREG = SREG; cli(); if (val == LOW) { *out &= ~bit; } else { *out |= bit; } SREG = oldSREG; } Note the first thing is a few lookup functions to determine the timer, port and bit described by the pin number.  These lookups can be quite fast but they do cost a few cycles.  Next we ensure we have a valid pin and turn off any PWM that may be active on that pin.  This is just safe programming and framework support.  Next we figure out the output register for the update, turn off the interrupts (saving the interrupt state) set or clear the pin and restore interrupts.  If we knew we were not using PWM (like this application) we could omit the turnOffPWM function.  If we knew all of our pins were valid we could remove the NOT_A_PIN test.  Unfortunately all of these optimizations require knowledge of the application which the framework cannot know.  Clearly we need new tools to describe embedded applications.
    This has been a fun bit of testing.  I look forward to your comments and suggestions for future toe-to-toe challenges.
    Good Luck and go make some measurements.
    PS:  I realize that this pin toggling example is simplistic at best.  There are some fine Arduino libraries and peripherals that could easily toggle pins much faster than the results shown here.  However, this is a simple Apples to Apples test of identical code in "identical" frameworks on different CPU's so the comparisons are valid and useful.  That said, if you have any suggestions feel free to enlighten us in the comments.
  2. N9WXU
    Embedded applications are hard for a large number of reasons, but one of the main issues is memory.  Today I want to talk about how our C variables get initialized and a few assumptions we make as we use C to write embedded software.
    Let us take a few simple declarations such as we might make every day.
    char *string1 = "string1"; const char *string2 = "string2"; char const *string3 = "string3"; char * const string4 = "string4"; char const * const string5 = "string5";  
    In C99 these will all compile just fine but they are very different.  In C++11, 2 of these will have a warning.  We shall discuss them in order.
    Duration & Scope
    The first thing to notice is these variables are all declared outside of a function.  That affects them in the following ways:
    External Linkage - i.e. they are global. Static Storage Duration - i.e. they are always active. allocated and initialized before control goes to main() Of course, if the keyword static had be placed before one of the declarations, the external linkage would have changed to internal linkage, preventing the variable from being accessed outside the compilation unit.
    Because they have static storage class, they will be initialized.  If no initializer is specified they will be initialized to 0 or NULL.
    In each case, these have an initializer.  So what is being initialized?  Each variable is some type of a character pointer.  So the value being stored in the variable will be an address.  The address will be the address of a string of characters.  The image on the right shows a data segment with the initialized variables.  C does not actually specify where these const char strings need to be located.  In GCC, they will be placed in the data segment and the variable will get the address of the string.  However, in XC8, they will be placed in the Text segment (in FLASH).  In the Arduino (GCC) environment you can force a string into FLASH by adding PROGMEM or using the F (FLASH) macro like this : F("String").  If the string is NOT in flash memory, then CINIT will copy the string from the text segment into the data segment.  Then the variables will be initialized with pointers to the strings.
    String 1
    The declaration for string 1 is simply a char *.  This is a pointer to character(s).  It is permissible for this pointer to be changed at run-time.  i.e. the following is legal:
    string1 = string2; Of course, you will get a warning because string2 is a const char *.
    It is also permissible to do this:
    string1[3] = 'a'; Which will change the original string into "strang1".  However it is NOT permissible to do the following:
    string1 = string2; string1[3] = 'a'; If you DO do this, it is likely to compile but there could be a few problems because string2 is declared as a const string so it must not be modified.

    Here is my compiler reluctantly obeying me and then the program crashes on the write.  Just imagine that the string is in FLASH so writing is impossible without specific write sequences which the compiler probably does not know.
    In some environments you can get away with this.  For example, I was using a Cypress WiFi device that loads the entire FLASH into RAM and then executes it.  This code will run and it will not crash.  Be very careful in such circumstances because in a few years you will be tasked to port the program to something else and life will be made hard because you did not fix the warnings.  It turns out that in section 6.7.3 paragraph 5 of the C99 standard the behavior of line 22 is undefined.  Your environment can choose to do anything it wants.
    String 2 & 3
    The declaration for string 2 is a const char * and string 3 is char const *.  These are IDENTICAL.  This is in section 6.7.3 paragraph 9 "the order of type qualifiers within a list of specifiers or qualifiers does not affect the specified type".  So these are pointers to a constant character(s).  In a nice compiler, these characters would be stored in FLASH memory and never copied into RAM.  That would be most memory efficient.  However, GCC will copy these from FLASH into RAM and then use the address of these strings to initialize the variables. 
    String 4
    This declaration is to a const pointer.  That is, the pointer value cannot change but the data pointed to by the pointer CAN change.  

    Note the ERROR on line 22.  Line 21 is perfectly fine.  The data pointed by the pointer is NOT const so it is allowed to change.
    In C++11, the original declaration will have a warning because a char * is being initialized to point to a const char *.  Never mind that the pointer is const.
    String 5
    This is a combination of both sorts of constants.  A const pointer pointing at a const character(s).  This can be initialized from a const string just fine.  But you will not be allowed to change the pointer or the data pointed to.

    Both line 21 and line 22 have errors and not simply warnings.
    We will do more of these variable initializer posts.  The language rules are very clear but there are a few constructions that we don't see very often.  And even worse, the assumptions we make about the syntax work often enough that we end up with some very strange notions on what the language allows.
    A good resource for testing your knowledge about strange C declarations is this website:

    Good Luck
  3. N9WXU
    It has been said that software is the most complicated system humanity has created and like all complicated things we manage this complexity by breaking the problem into small problems.  Each small problem is then solved and each solution is assembled to create larger and larger solutions.  In other fields of work, humans created standardized solutions to common problems.  For example, nails and screws are common solutions to the problem of fastening wood together.  Very few carpenters worry about the details of building nails and screws, they simply use them as needed.  This practice of creating  common solutions to typical problems is also done in software.  Software Libraries can easily be used to provide drivers, and advanced functions saving a developer many hours of effort.
    To make a software library useful, the library developer must create an abstraction of the problem solved by the library.  This abstraction must interact with the library user in a simple way and hide the specialist details of the problem.  For example, if your task is to convert RGB color values into CMYK color values, you would get a library that had a function such as this one:
    struct cmyk { float cyan; float magenta; float yellow; float black; }; struct rgb { float red; float green; float blue; }; struct cmyk make_CMYK_from_RGB(struct rgb); This seems very simple and it would absolutely be simple to use.  But, if you had to implement such a function yourself you may quickly find your self immersed in color profiles and the behavior of the human eye.  All of this complexity is hidden behind a simple function.
    In the embedded world we often work with hardware and we are very used to silicon vendors providing hardware abstraction layers.  These hardware abstraction layers are an attempt to simplify the use use of a vendors hardware and to make it more complicated to switch the hardware to a competing system.  Let us go into a little more detail.
    Here is a typical software layer cake as drawn by a silicon vendor.  Often they will provide the bottom 3 layers and even a few demo applications.  The hope is you will create your application using their libraries.  The benefit for you is a significant time savings (you don't need to make your nails and screws).  The benefit to the silicon vendor is getting you locked into a proprietary solution.

    Here is a short story about the early "dark ages" of computing before PC's had reasonable device drivers (hardware abstraction).
    In the early days of PC gaming all PC games run in MSDOS.  This was to improve game performance be removing any software that was not specifically required.  The only sound these early PC had was a simple buzzer so a large number of companies developed a spectacular number of sound cards.  There were so many sound cards that PC games could not keep up adding sound card support.  Each game had a setup menu where the sound card was selected along with its I/O memory, IRQ, and other esoteric parameters.  We had to write the HW configuration down on a cheat sheet and each time we upgraded we had to juggle the physical configuration of our PC (with jumpers) so everything ran without conflict.  Eventually, the sound blaster card became the "standard" card and all other vendors either designed their HW to be compatible or wrote their drivers to look just like the sound blaster drivers and achieve compatibility in software.
    Hardware abstraction has the goal of creating a Hardware interface definition that allows the hardware to present the needed capabilities to the running application.  The hardware can have many additional features and capabilities but these are not important to the application so they are not part of the interface.  So abstraction provides a simplification by hiding the stuff the application does not care about.  The simplification comes from focusing just on the features the application does care about.
    So if the silicon vendors are providing these abstractions, life can be only good!... We should look a little more closely.
    Silicon is pretty cheap to make but expensive to design.  So each micro controller has a large number of features on each peripheral in the hopes that it will find a home in a large number of applications.  Many of these features are mutually exclusive such as synchronous vs asynchronous mode in the EUSART on a PIC micro controller.  These features are all well documented in the data sheets but at some point it was decided across the entire industry that if there were functions tied to each feature they would be easier to use.  Here is an example from MCC's MSSP driver in SPI mode:
    void SPI2_ClearWriteCollisionStatus(void) { SSP2CON1bits.WCOL = 0; } Now it may be useful to have a more readable name for the WCOL flag and perhaps ClearWriteCollisionStatus does make the code easier to use.  The argument is that making this function call will be more intuitive than clearing the WCOL bit.  As you go through many of the HAL layers you find hundreds of examples of very simple functions setting or clearing a few bits.  In a few cases you will find an example where all the functions are working together to create a specific abstraction.  Most cases, you simply find the HW flags hidden behind more verbosely named functions.  Simply renaming the bits is NOT a hardware abstraction.  In fact, if the C compiler does not automatically inline these functions they are simply creating overhead.
    Sadly there is another problem in this mess.  The data sheets are very precisely written documents that accurately describe the features of the hardware.  Typically these datasheets are written with references to registers and bits.  If the vendor provides a comprehensive function interface to the hardware, the data sheet will need to be recreated with function calls and function use examples rather than the bits and registers.
    In my opinion the term HAL (Hardware Abstraction Layer) has been hijacked to represent a function call interface to all the peripheral functions.  What about Board Support Package (BSP)?  Generally the BSP is inserted in the layer cake to provide a place for all the code that enables the vendor demo code to run on the "HAL".  Arguably, the BSP is what the purist would call the HAL.
    Enough of the ranting....How does this topic affect you the hapless developer who is likely using vendor code.
    Silicon Vendors will continue to provide HAL's to interface the hardware, Middleware to provide customers with high function libraries and board support packages to link everything to their proprietary demo boards.  As customers, we can evaluate their offering on their systems but we can expect to write our own BSP's to get the rest of their software running on our final product hardware.
    Software Vendors will continue to provide advanced libraries, RTOS's and other forms of middleware for us to buy to speed our development.  The ability to adapt this software to our systems largely depends upon how well the software vendor defines the expected interfaces that we need to provide.  Sometimes these vendors can be contracted to get their software running on our hardware and and get us going.
    FW engineers will continue to spend a significant part of the the project nudging all these pieces into one cohesive system so we can layer our secret sauce on top.
    One parting comment.
    Software layers are invented to allow large systems to take advantage of the single responsibility principle.  This is great, but if you use too many layers you end up with a new problem called Lasagna code.  If you use too few layers you end up with Spaghetti code.  One day I would love to know why Italian food is used to name two of the big software smells.
    Good Luck
  4. N9WXU
    Here I am on a clear cool evening, by the fire outside with my laptop.  Tonight I will talk about a new peripheral, the timer/counter.  Periodically I will be interrupted to put another log on my fire, but it should not slow me down too much.
    Timers and counters are almost the same peripheral with only the difference of what is causing the counter to count.  If the counter is incrementing from a clock source, it becomes a timer because each count registers the passage of a precise unit of time.  If the counter is incrementing from an unknown signal (perhaps not even a regular signal), it is simply a counter.  Clearly, the difference between these is a matter of the use-case and not a matter of design.  Through there are some technical details related to clocking any peripheral from an external "unknown" clock that is not synchronized with the clocks inside the microcontroller.  We will happy ignore those details because the designers have done a good job of sorting them out.
    Let us take a peek at a very simple timer on the PIC16F18446.  Turn to page 348 of your PIC16F18446 data sheet and take a look at figure 25-1.  (shown below)

    This is the basic anatomy of a pretty vanilla timer.  Of course most timers have many more features so this article is simply an introduction.  On the left side of this image there are a number of clock sources entering a symbol that represents a multiplexer.  A multiplexer is simply a device that can select one input and pass it to its output.  The T0CS<2:0> signal below the multiplexer is shorthand for a 3-bit signal named T0CS.  The slash 3 also indicates that that is a 3-bit signal.  Each of the possible 3-bit codes is inside the multiplexer next to one of the inputs.  This indicates the input you will select if you apply that code on the signal T0CS.  Pretty simple.  The inputs are from top to bottom (ignoring the reserved ones) SOSC (Secondary Oscillator), LFINTOSC (Low Frequency Internal Oscillator), HFINTOSC (High Frequency Internal Oscillator) Fosc/4 (The CPU Instruction Clock) an input pin (T0CKI) inverted or not-inverted.
    Let us cover each section of this peripheral in a little more detail.  Of course you should go to the data sheet to read all the information.
    The secondary oscillator is a second crystal oscillator on 2 I/O pins.  A crystal oscillator is a type of clock that uses a piece of quartz crystal to produce a very accurate frequency.  The secondary oscillator is designed to operate at 32.768kHz which by some coincidence is 2^15 counts per second.  This makes keeping accurate track of seconds very easy and very low power.  You could configure the hardware to wake up the CPU every second and spend most of your time in a low power sleep mode.
    There are two internal oscillator in the PIC16F18446.  The LFINTOSC is approximately 31kHz and is intended for low power low speed operation but not very accurate timing.  The HFINTOSC is adjustable from 1-32MHz and is better than 5% accurate so it is often sufficient for most applications.  Because these two oscillators are directly available to the timer, the CPU can be operating at a completely different frequency allowing high resolution timing of some events, while running the CPU at a completely different frequency.
    This option is the simplest option to select because most of the math you are doing for other peripherals is already at this frequency.  If you are porting software for a previous PIC16 MCU, the timer may already be assumed to be at this frequency.  Due to historical reasons, a PIC16 is often clocked at 4MHz.  This makes the instruction clock 1MHz and each timer tick is 1us.  Having a 1us tick makes many timing calculations trivial.  If you were clocking at 32MHz, each tick would be 31ns which is much smaller but does not divide as nicely into base 10 values.
    This option allows your system to measure time based upon an external clock.  You might connect the timing wheel of an engine to this input pin and compute the RPM with a separate timer.  
    After the input multiplexer, there is an input pre-scaler.  The goal of the pre-scaler is to reduce the input clock frequency to a slower frequency that may be more suitable for the application.  The most prescallers are implemented as a chain of 'T' flip-flops.  A T flip-flop simply changes its output (high to low or low to high) on each rising edge of an input signal.  That makes a T Flip-Flop a divide by 2 circuit for a clock.  If you have a chain of these and you use a multiplexer to decide which T flip flop to listen to, you get a very simple divider that can divide by some power of 2.  i.e. 2, 4, 8, 16... with each frequency 1/2 of the previous one.  
    The synchronizer ensures that input pulses that are NOT sourced by an oscillator related to FOSC are synchronized to FOSC.  This synchronization ensures reliable pulses for counting or for any peripherals that are attached to the counter.  However, synchronization requires the FOSC/4 clock source to be operating and that condition is not true in when the CPU is saving power in sleep.  If you are building an alarm clock that must run on a tiny battery, you will want the timer to operate while the CPU is sleeping and to produce a wakeup interrupt at appropriate intervals.  To do this, you disable synchronization.  Once the CPU has been awakened, it is a good idea to activate synchronization or to avoid interacting with the counter while it is running.
    TMR0 Body
    The TMR0 body used to be a simple counter, but in more recent years it has gained 2 options.  Either, the timer can be a 16-bit counter, or it can be an 8-bit counter with an 8-bit compare.  The 8-bit compare allows the timer to be reset to zero on any 8-bit value.  The 16-bit counter allows it to count for a longer period of time before an overflow.  The output from the TMR0 body depends upon the module.  In the 8-bit compare mode, the output will be set each time there is a compare match.  In the 16-bit mode, the output will be set each time the counter rolls from 0xFFFF to 0x0000.
    The output from the core can be directed to other peripherals such as the CLC's, it can also be sent through a postscaler for further division and then create an interrupt or toggle an output on an I/O pin.  The postscaler is different than the prescaler because it is not limited to powers of two.  It is a counting divider and it can divide by any value between 1 and 16.  We shall use that feature in the example.
    Using the Timer
    Timers can be used for a great number of things but one common thing is to produce a precise timing interval that does not depend upon your code.  For instance,  2 lessons ago, we generated a PWM signal.  The one way to do this was to set and clear the GPIO pin every so many instruction cycles.  Unfortunately, as we added code to execute the PWM would get slower and slower.  Additionally, it could get less reliable because the program could take different paths through the code.  Using the PWM peripheral was the perfect solution, but another solution would be to use a timer.  For instance, you could configure the timer to set the output after an interval.  After that interval had elapsed, you could set a different interval to clear the output.  By switching back and forth between the set interval and the clear interval, you would get a PWM output.  Still more work than the PWM peripheral, but MUCH better than the pure software approach.
    For this exercise we will use the timer to force our main loop to execute at a fixed time interval.  We will instrument this loop and show that even as we add work to the loop, it still executes at the same time interval.  This type of structure is called an exec loop and it is often used in embedded programming because it ensures that all the timing operations can be simple software counting in multiples of the loop period.
    And here is the program.
    void main(void) { TRISAbits.TRISA2 = 0; // Configure the TRISA2 as an output (the LED) T0CON1bits.ASYNC = 0; // Make sure the timer is synchronized T0CON1bits.CKPS = 5; // Configure the prescaler to divide by 32 T0CON1bits.CS = 2; // use the FOSC/4 clock for the input // the TMR0 clock should now be 250kHz TMR0H = 250; // Set the counter to reset to 0 when it reaches 250 (1ms) TMR0L = 0; // Clear the counter T0CON0bits.T0OUTPS = 9; // Configure the postscaler to divide by 10 T0CON0bits.T0EN = 1; // turn the timer on // the timer output should be every 10ms while(1) { LATAbits.LATA2 = 0; // Turn on the LED... this allows us to measure CPU time __delay_ms(5); // do some work... could be anything. LATAbits.LATA2 = 1; // Turn off the LED... Any extra time will be while the LED is off. while(! PIR0bits.TMR0IF ); // burn off the unused CPU time. This time remaining could be used as a CPU load indicator. PIR0bits.TMR0IF = 0; // clear the overflow flag so we can detect the next interval. } } I chose to use a delay macro to represent doing useful work.  In a "real" application, this area would be filled with all the various functions that need to be executed every 10 milliseconds.  If you needed something run every 20 milliseconds you would execute that function every other time.  In this way, many different rates can be easily accommodated so long as the total execution time does not exceed 10 milliseconds because that will stretch a executive cycle into the next interval and break the regular timing.
    Special Considerations
    One interesting effect in timers is they are often the first example of "concurrency issues" that many programmers encounter.  Concurrency issues arise when two different systems access the same resource at the same time.  Quite often you get unexpected results which can be seen as "random" behavior.  In the code above I configured the timer in 8-bit mode and took advantage of the hardware compare feature so I never needed to look at the timer counter again.  But let us imagine a slightly different scenario.  Imagine that we needed to measure the lap time of a race car.  When the car started the race we would start the timer.  As the car crossed the start line, we would read the timer BUT WE WOULD NOT STOP IT.  When the car finished the last lap, we could stop the timer and see the total time.  IN this way we would have a record for every lap in the race.  Simply by subtracting the time of completion for each lap, we would have each lap interval which would be valuable information for the race driver.  Each time we read the timer without stopping it, we have an opportunity for a concurrency issue.  For an 8-bit timer we can read the entire value with one instruction and there are no issues.  However, the race is likely to last longer than we can count on 8-bits so we need a bigger timer.  With a 16-bit timer we must perform 2 reads to get the entire value and now we encounter our problem.
    In the picture above I have shown two scenarios where TMR0 in 16-bit mode is counting 1 count per instruction cycle.  This is done to demonstrate the problem.  Slowing down the counting rate does not really solve the problem but it can affect the frequency of the issue.  In this example the blue cell indicates the first read while the red cell indicates the second read to get all 16-bits.  When the counter was 251, the reads are successful, however when the counter is 255, the actual value we will read will be 511 which is about 2x the actual value.  If we reverse the read order we have the same problem.  One solution is to read the high, then the low and finally, read the high a second time.  With these three data points and some math, it is possible to reconstruct the exact value at the time of the first read.  Another solution is in hardware.

    In the data sheet we see that there is some additional circuitry surrounding TMR0H.  With this circuitry, the TMR0 will automatically read TMR0H from the counter into a holding register when TMR0L is read. So if you read TMR0L first and then TMR0H you will NEVER have the issue.  Now consider the following line of C.
    timerValue = TMR0 It is not clear from just this line of code which byte of TMR0 is read first.  If it is the low byte this line is finished and perfect.  However, if it is the high byte, then we still have a problem.  One way to be perfectly clear in the code is the following:
    timerValue = TMR0L; timerValue |= TMR0H << 8; This code is guaranteed to read the registers in the correct order and should be no less efficient.  The left shift by 8 will probably not happen explicitly because the compiler is smart enough to simply read the value of TMR0H and store it in the high byte of timerValue.
    These concurrency issues can appear in many areas of computer programming.  If your program is using interrupts then it is possible to see variables partially updated when an interrupt occurs causing the same concurrency issues.  Some bigger computers use real-time operating systems to provide multi-tasking.  Sharing variables between the tasks is another opportunity for concurrency issues.  There are many solutions, for now just be aware that these exist and they will affect your future code.
    Timer 0 is probably the easiest timer on a PICmicrocontroller.  It has always been very basic and its simplicity makes it the best timer to play with as you learn how they work.  Once you feel you have mastered timer 0, spend some time with timer 1 and see what additional features it has.
    Once again, the project is in the attached files.
    Good Luck.
  5. N9WXU
    It has been a busy few weeks but finally I can sit down and write another installment to this series on embedded programming.
    Today's project will be another peripheral and a visualization tool for our development machines.  We are going to learn about the UART.  Then we are going to write enough code to attach the UART to a basic C printing library (printf) and finally we are going to use a program called processing to visualize data sent over the UART.  This should be a fun adventure so let us get started.
    The word UART is an acronym that stands for Universal Asynchronous Receiver Transmitter.  Some UART's have an additional feature so they are Universal Synchronous Asynchronous Receiver Transmitter or USART.  We are not going to bother with the synchronous part so let us focus on the asynchronous part.  The purpose of the UART is to send and receive data between our micro controller and some other device.  Most frequently the UART is used to provide some form of user interface or simply debugging information live on a console.  The UART signaling has special features that allow it to send data at an agreed upon rate (the baud rate) and an agreed upon data format (most typically 8 data bits, no parity and 1 stop bit).  I am not going to go into detail on the UART signaling but you can easily learn about it by looking in chapter 36 of the PIC16F18446 datasheet.  Specifically look at figure 36-3 if you are interested.  UARTs are a device that transfer bytes of data (8-bits in a byte) one bit at a time across a single wire (one wire for transmit and one for receive) and a matching UART puts the bits back into a byte.  Each bit is present on the wire for a brief but adjustable amount of time.  Sending the data slowly is suitable for long noisy wires while sending the data fast is good for impatient engineers.  A common baud rate for user interfaces is 9600 baud which is sends 1 letter every millisecond (0.001 seconds).  Many years ago, before the internet, I had a 300 baud modem for communicating with other computers.  When I read my e-mail the letters would arrive slightly slower than I could read.  Later, after the internet was invented and we used modems to dial into the internet, I had a 56,600 baud modem so I could view web-pages and pictures more quickly.  Today I use UARTS to send text data to my laptop and with the help of the processing program we can make a nice data viewer.  Enough history... let us figure out the UART.
    Configuring the UART
    The UART is a peripheral with many options so we need to configure the UART for transmitting and receiving our data correctly.  To setup the UART we shall turn to page 571 of our PIC16F18446 data sheet.  The section labeled describes the settings required to transmit asynchronous data.

    And now for some code.
    Here is the outline of our program.
    void init_uart(void) { } int getch(void) { } void putch(char c) { } void main(void) { init_uart(); while(1) { putch(getch()+1); } } This is a common pattern for designing a program.  I start by writing my "main" function simply using simple functions to do the tricky bits.  That way I can think about the problem and decide how each function needs to work.  What sort of parameters each function will require and what sort of values each function will return.  This program will be very simple.  After initializing the UART to do its work, the program will read values from the UART, add 1 to each value and send it back out the UART.  Since every letter in a computer is a simple value it will return the next letter in each series that I type.  If I type 'A' it will return 'B' and so on.  I like this test program for UART testing because it will demonstrate the UART is working and not simply shorting the receive pin to the transmit pin.  Now that we have an idea for each function.  Let us write them one a time.
    This function will simply write all the needed values into the special function registers to activate the UART.  To do that, we shall follow the checklist in the data sheet.  We need to configure both the transmit and the receive function.
    Step 1 is to configure the baud rate.  We shall use the BRG16 because that mode has the simplest baud rate formula.

    Of course, it is even easier when you use the supplied tables.  I always figure that if you are not cheating, you are not trying so lets just find the 32MHz column in the SYNC=0, BRGH=0, BRG16 = 1 table.  I like 115.2k baud because that makes the characters send much faster.  So the value we need for the SPBRG is 16.
    Step 2 is to configure the pins.  We shall configure TX and RX.  To do that, we need the schematics.  The key portion of the schematics is this bit.

    The confusing part is, the details of the TX and RX pin.  If I had a $ for every time I got these backwards, I would have retired already.  Fortunately the team at Microchip who drew these schematics were very careful to label the CDC TX connecting to the UART RX and the CDC RX connecting to the UART TX.  They also labeled the UART TX as RB4 and UART RX as RB6.  This looks pretty simple.  Now we need to steer these signals to the correct pins via the PPS peripheral.  
    NOTE: The PPS stands for Peripheral Pin Select.  This feature allows the peripherals to be steered to almost any I/O pin on the device.  This makes laying out your printed circuit boards MUCH easier as you can move the signals around to make all the connections straight through.  You can also steer signals to more than one output pin enabling debugging by steering signals to your LED's or a test point.
    After steering the functions to the correct pins, it is time to clear the ANSEL for RB6 (making the pin digital) and clear TRIS for RB4 (making the pin an output).
    The rest of the initialization steps involve putting the UART in the correct mode.  Let us see the code.
    void init_uart(void) { // STEP 1 - Set the Baud Rate BAUD1CONbits.BRG16 = 1; TX1STAbits.BRGH = 0; SPBRG = 16; // STEP 2 - Configure the PPS // PPS unlock sequence // this should be in assembly because the hardware counts the instruction cycles // and will not unlock unless it is exactly right. The C language cannot promise to // make the instruction cycles exactly what is below. asm("banksel PPSLOCK"); asm("movlw 0x55"); asm("movwf PPSLOCK"); asm("movlw 0xAA"); asm("movwf PPSLOCK"); asm("bcf PPSLOCK,0"); RX1PPSbits.PORT = 1; RX1PPSbits.PIN = 6; RB4PPS = 0x0F; // Step 2.5 - Configure the pin direction and digital mode ANSELBbits.ANSB6 = 0; TRISBbits.TRISB4 = 0; // Step 3 TX1STAbits.SYNC = 0; RC1STAbits.SPEN = 1; // Step 4 TX1STAbits.TX9 = 0; // Step 5 - No needed // Step 1 from RX section ( RC1STAbits.CREN = 1; // Step 6 TX1STAbits.TXEN = 1; // Step 7 - 9 are not needed because we are not using interrupts // Step 10 is in the putchar step. } To send a character we simply need to wait for room in the transmitter and then add another character.
    void putch(char c) { while(!TX1IF); // sit here until there is room to send a byte TX1REG = c; } And to receive a character we can just wait until a character is present and retrieve it.
    int getch(void) { while(!RC1IF); // sit here until there is a byte in the receiver return RC1REG; } And that completes our program.
    Here is the test run.

    Every time I typed a letter or number, the following character is echoed.
    Typing a few characters is interesting and all but I did promise graphing.  We shall make a simple program that will stream a series of numbers one number per row.  Then we will introduce processing to convert this stream into a graph.
    The easiest way to produce a stream of numbers is to use printf.  This function will produce a formatted line of text and insert the value of variables where you want them.
    The syntax is as follows:
    value = 5; printf("This is a message with a value %d\n",value); To get access to printf you must do two things.
    1) You must include the correct header file.  So add the following line near the top of your program.  (by the xc.h line)
    #include <stdio.h> 2) You must provide a function named putch that will output a character.  We just did that.
    So here is our printing program.
    #include <stdio.h> #include <math.h> /// Lots of stuff we already did void main(void) { double theta = 0; init_uart(); while(1) { printf("%f\r\n",sin(theta)); theta += 0.01; if(theta >= 2 * 3.1416) theta = 0; } } Pictures or it didn't happen.

    And now for processing.
    Go to http://www.processing.org and download a suitable version for your PC.
    Attached is a processing sketch that will graph whatever appears in the serial port from the PIC16.

    And here is a picture of it working.
    I hope you found this exercise interesting.  As always post your questions & comments below.  Quite a lot was covered in this session and your questions will guide our future work.
    Good Luck
    exercise_5a.zip exercise_5.zip graph_data.pde
  6. N9WXU
    Today I am going to introduce a peripheral.  So far we have interacted with the I/O pins on the micro controller.  This can get you pretty far and if you can manipulate the pins quickly enough you can produce nearly any signal.  However, there are times when the signal must be generated continuously and your MCU must perform other tasks.
    Today we will be generating a PWM signal.  PWM stands for Pulse Width Modulation.  The purpose of PWM is to produce a fixed frequency square wave but the width of the high and low pulses can be adjusted.  By adjusting the percentage of the time the signal is high versus low, you can change the amount of energy in a system.  In a motor control application, changing the energy will change the speed.  In an LED application, changing the energy will change the brightness.  We will use the PWM to change the brightness of the LED on our demo board.
    Here is a sample of a PWM signal.  The PWM frequency is 467.3hz and the duty cycle is 35%.

    One way to produce such a signal is to execute a bit of code such as this one.
    do_pwm macro banksel pwm_counter movlw pwm_reload decfsz pwm_counter,f movf pwm_counter,w movwf pwm_counter banksel pwm subwf pwm,w banksel LATA movf LATA,w andlw 0xFB btfss STATUS,C iorlw 0x04 movwf LATA endm This code does not help you if you are writing in C but it will serve to show the challenges of this technique.  This is a simple macro that must be executed numerous times to produce the entire PWM cycle.  The PWM frequency is determined by how often this macro can be called.  So one option is quite simply this:
    loop do_pwm goto loop But if you do this the MCU will not be able to perform any other functions.  
    loop do_pwm banksel reload incfsz reload goto loop banksel pwm incf pwm,f goto loop Here is one that updates the PWM duty cycle every time the reload variable overflows.  This is not a lot of additional work but the MCU is still 100% consumed toggling a pin and deciding when to update the duty cycle.  Surely there is a better way.
    Enter a Peripheral
    MCU Peripherals are special hardware functions that can be controlled by the software but do some tasks automatically.  One common peripheral is the PWM peripheral.  Different MCU have different peripherals that can produce a PWM signal.  For instance, older PICmicrocontrollers have the CCP (Capture, Compare, PWM) peripheral and newer PICmicrocontrollers have dedicated PWM's.  To demonstrate the use of this simple peripheral I will switch to C and give some code.
    The good side to using a peripheral is the work it takes on.  The down side is the additional effort required to setup the additional hardware.  Microchip provides a tool call MCC to do this work but in this example, we will do the work ourselves.  Fortunately, Microchip provided a section in the data sheet that provides an 8 step checklist for configuring the PWM.
    Time for doing the steps
    // Steps from Chapter 30.9 of the datasheet // Step 1 TRISAbits.TRISA2 = 1; // Step 2 PWM6CON = 0; // Step 3 T2PR = 0xFF; // Step 4 PWM6DC = 358 << 6; // Step 5 T2CLKCONbits.CS = 1; T2CONbits.CKPS = 0; T2CONbits.ON = 1; // Step 6 // Step 7 TRISAbits.TRISA2 = 0; RA2PPS = 0x0D; // Step 8 PWM6CONbits.EN = 1; And here is 35% just like before, except last time the period of the entire wave was 2.14ms and now it is 0.254ms.  That is about 10x faster than before.

    This time the main loop is doing absolutely nothing making it possible to do all kinds of neat things like implement a flicker to make the LED look like a candle.
    while(1) { __delay_ms(5); PWM6DC = rand() % 1024 << 6; } So here is a candle.  Through honestly it is not the best looking candle.  Perhaps you can do a better job.
    Peripherals can be a huge time saver and can certainly provide more CPU in your application for the real "secret sauce".   Most of the data sheet covers peripherals so we will go through a few of them in the next few weeks.
    Good Luck
  7. N9WXU
    Time for part 2! 
    Last time, I gave you the homework of downloading and installing MPLAB and finding a Curiosity Nano DM164144 .  
    Once you have done your homework, it is time for STEP 3, get that first project running.
    Normally my advice would be to breakout Mplab Code Configurator and get the initialization code up and running, but I did not assign that for homework!  So we will go old school and code straight to the metal.  Fortunately, our first task is to blink an LED.
    Step 1: Find the pin with the LED.
    A quick check of the schematic finds this section on page 3.

    This section reveals that the LED is attached to PORT A bit 2.  
    With the knowledge of the LED location, we can get to work at blinking the LED.
    The first step is to configure the LED pin as an output.  This is done by clearing bits in the TRIS register.  I will cheat and simply clear ALL the bits in this register.
    Next we go into a loop and repeatedly set and clear the the PORT A bit 2.  
    #include <xc.h> void main(void) {     TRISA = 0;     while(1)     {         PORTA = 0;         PORTA = 0x04;     }     return; } Let us put this together with MPLAB and get it into the device.
    First we will make a new project:

    Second, we will create our first source file by selecting New File and then follow the Microchip Embedded -> XC8 Compiler -> main.c

    give your file a name (I chose main.c)

    And you are ready to enter the program above.

    And this is what it looks like typed into MPLAB.

    But does it work?
    Plug in your shiny demo board and press this button:

    And Voila!, the LED is lit... but wait, my code should turn the LED ON and OFF... Why is my LED simply on?
    To answer that question I will break out my trusty logic analyzer.  That is my Saleae Logic Pro 16.  This device can quickly measure the voltage on the pins and draw a picture of what is happening.

    One nice feature of this device is it can show both a simple digital view of the voltage and an analog view.  So here are the two views at the same time.  Note the LED is on for 3.02µs (microseconds for all of you 7'th graders).  That is 0.00000302 seconds.  The LED is off for nearly 2µs.  That means the LED is blinking at 201.3kHz. (201 thousand times per second).  That might explain why I can't see it.  We need to add a big delay to our program and slow it down so humans can see it.
    One way would be to make a big loop and just do nothing for a few thousand instructions.  Let us make a function that can do that.
    Here is the new program.
    #include <xc.h> void go_slow(void) { for(int x=0;x<10000;x++) { NOP(); } } void main(void) { TRISA = 0; while(1) { PORTA = 0; go_slow(); PORTA = 0x04; go_slow(); } return; } Note the new function go_slow().  This simply executes a NOP (No Operation) 10,000 times.  I called this function after turning the LED OFF and again after turning the LED ON.  The LED is now blinking at a nice rate.  If we attach the saleae to it, we can measure the new blink.

    Now is is going at 2.797 times per second.  By adjusting the loop from 10,000 to some other value, we could make the blink anything we want.
    To help you make fast progress, please notice the complete project Step_3.zip attached to this post.
    Next time we will be exploring the button on this circuit board.  For your homework, see if you can make your LED blink useful patterns like morse code.
    Good Luck 

  8. N9WXU
    It is now time for my favorite introductory lab.  This lab is my favorite because it.... nope I will not spoil the surprise!
    Today we will do two labs and we will learn about digital inputs, and pushbuttons.
    First the digital input.  The PIC16F18446 has 20 pins spread across 3 ports.  PORTA, PORTB and PORTC.  Last time we did not learn about the digital I/O pins but we did learn how to drive a pin as an output.  It turns out that making a pin an output is pretty simple... Just clear the appropriate TRIS bit to zero and that pin is an output.  It turns out that all the pins are inputs by default when the chip starts up.  Additionally, many of the pins have an alternate analog function and the pin is configured as an analog input by default in order to keep the port in the lowest power state that cannot conflict with outside circuitry.  To get a digital input, we need to leave the digital input function enabled (TRIS is 1) and turn the analog function off (ANSEL is 0).  If we leave the analog function enabled, the digital function will always read a zero.  Here is the port diagram from the data sheet.

    You can see in the diagram that if the ANSELx signal is set to a 1 the leading to the data bus AND gate will always output a 0.  (Remember, both inputs to an AND gate must be 1 for the output to be 1 and the little circle on the ANSELx signal inverts 1's to 0's). From this diagram we learn that ANSELx must be 0 and TRISx must be 1 or the output buffer connected to the TRISx signal will drive the pin.  
    It is time for some code.  We will make a simple program that lights the LED when the button on our Curiosity is pressed and when the button is released, the LED will turn off.  A quick peak at the schematic shows the button is on PORTC bit 2.

    The Program Please:
    void main(void) { TRISAbits.TRISA2 = 0; ANSELCbits.ANSC2 = 0; WPUCbits.WPUC2 = 1; while(1) { if(PORTCbits.RC2) { LATAbits.LATA2 = 1; } else { LATAbits.LATA2 = 0; } } } A quick surprise, the WPUC register is the weak pull up control register for PORTC.  This is not shown in the generic diagram but the explanation is simple.  The pushbutton will connect the RC2 pin to ground which will create a 0 on the input.  The weak pull up is required to make a 1 when the button is NOT pressed.  Setting the WPUC2 bit will enable the weak pull up for RC2.
    Programming.... Testing... Viola!  it works,
    But this is pretty boring after all, we could replace the Curiosity with a wire and save some money.  It is time to make that computer earn its place in the design.
    We are going to make a small change to the program so it will toggle the LED each time the button is pressed.
    void main(void) { TRISAbits.TRISA2 = 0; ANSELCbits.ANSC2 = 0; WPUCbits.WPUC2 = 1; while(1) { if(PORTCbits.RC2) { if(LATAbits.LATA2 == 1) LATAbits.LATA2 = 0; else LATAbits.LATA2 = 1; } } } Well that is strange.... it seems like it works when I press the button.  AH HA, RC2 is a 1 when the button is NOT pressed.... stand by...
    void main(void) { TRISAbits.TRISA2 = 0; ANSELCbits.ANSC2 = 0; WPUCbits.WPUC2 = 1; while(1) { if(PORTCbits.RC2==0) { if(LATAbits.LATA2 == 1) LATAbits.LATA2 = 0; else LATAbits.LATA2 = 1; } } } Well darn, it almost works again.  It seems like it is dim when I hold the button and when I release it, the LED is randomly on or off.  AH HA!, It is toggling the LED as long as I hold the pin.  We need to add a variable and only trigger the LED change when the pin changes state.
    __bit oldRC2 = 0; void main(void) { TRISAbits.TRISA2 = 0; ANSELCbits.ANSC2 = 0; WPUCbits.WPUC2 = 1; while(1) { if(PORTCbits.RC2==0 && oldRC2 == 1) { if(LATAbits.LATA2 == 1) LATAbits.LATA2 = 0; else LATAbits.LATA2 = 1; } oldRC2 = PORTCbits.RC2; } } I added a variable called oldRC2.  I used the compiler built-in type __bit to represent a single bit of data (matching the data size of a single pin) and the LED is triggered when the button is pressed (RC2 == 0) AND the button was previously NOT pressed (oldRC2 == 1).  The value of oldRC2 is set to be RC2 after the testing.
    Well Heck... It is getting closer but something is still not quite right.  I press the button and the LED changes state... usually... sometimes...
    I see the problem.  The pin RC2 is sampled twice.  Once at the beginning where it is compared to the oldRC2 and once a bit later to make a new copy for oldRC2.  What if the value on RC2 changed between these two samplings.  That would mean that we could miss an edge.  The solution is simple.
    __bit oldRC2 = 0; __bit newRC2 = 0; void main(void) { TRISAbits.TRISA2 = 0; ANSELCbits.ANSC2 = 0; WPUCbits.WPUC2 = 1; while(1) { newRC2 = PORTCbits.RC2; if(newRC2==0 && oldRC2 == 1) { if(LATAbits.LATA2 == 1) LATAbits.LATA2 = 0; else LATAbits.LATA2 = 1; } oldRC2 = newRC2; } } We will just create a new variable and sample RC2 once into the variable newRC2.  Then we will use that for the comparison and the assignment.  Ah, this seems to be pretty good.
    Time for some more extensive testing... This is strange.  If I press this many times, every so often it looks like a button press was skipped.  Let us put the logic analyzer on this problem.

    Ok, the top trace is RC2(the button) and the bottom trace is RA2 (the LED).  Every falling edge of RC2 the LED changes state (on to off, or off to on).  But look right in the middle.  It appears that the LED changed state on the rising edge as well as the falling edge.  Perhaps we should look at that spot more closely.

    Look at that.  An extra transition on the button.  It turns out that buttons are made of two pieces of metal that are pressed together.  Sometimes the metal will bounce and break the connection.  If we measure this bounce we find that it is nearly 20useconds wide ( 0.000020 seconds).

    That is pretty fast but the PIC16F18446 detected the bounce and changed the LED state.  If you research button bouncing you will find that this phenomenon is on ALL buttons, but some buttons are much worse than others.  This button is actually pretty good and it took a large number of tries before I was caught by it.
    This button is very good.  So I will do the simplest button debouncer I have ever done.
    __bit oldRC2 = 0; __bit newRC2 = 0; void main(void) { TRISAbits.TRISA2 = 0; ANSELCbits.ANSC2 = 0; WPUCbits.WPUC2 = 1; while(1) { newRC2 = PORTCbits.RC2 && PORTCbits.RC2; if(newRC2==0 && oldRC2 == 1) { if(LATAbits.LATA2 == 1) LATAbits.LATA2 = 0; else LATAbits.LATA2 = 1; } oldRC2 = newRC2; } } I will simply read PORTC twice and let it read a one if it reads high on BOTH reads. Note the AND function between the two reads of RC2.
    That is all for today.  Your homework is to do some research on different ways to solve the debounce problem.  Next week we will introduce our first peripheral.
  9. N9WXU
    I have often been asked "how do I get started" in embedded software.  As I think about this question, I realize that the basic steps to get started with embedded software are nearly identical to the basic steps required to bring up a new MCU or new hardware.  There is always a bootstrapping process and a logic progression of steps before you are "home-free" and building your product.  So here is my bootstrapping process, but broken down into each step so the process is clear for those just starting out.
    Step 1 - Collect the tools.
    The tools of the trade for embedded engineering are quite simple.
    Development environment. Hardware to develop on. Measurement tools to check signals. Programming tool (sometimes built into a development kit) Serial Monitoring tool (sometimes built into a development kit) an LED (usually built into a development kit) Step 2 - Install the development environment
    This process can be quite simple, like installing MPLAB IDE, or it can be quite involved, like installing the ESP-IDF environment for the ESP32.  You will be living in this environment for the duration of your project so get it right.
    Step 3 - That First Project.
    This first project is THE MOST IMPORTANT ONE.  If it goes well, you are off to the races, but if it goes poorly, you will likely regret your choice of MCU and start hunting for a different one.  The first project is to blink an LED.  The actual code is trivial, but this project will ensure the entire development workflow is working and you can program your target.
    Step 4 - Building Out
    This is where you start exploring your new world.  What are the peripherals? How do they work?  What kinds of things do folks do with these peripherals?
    Step 5 - Techniques of the experts
    Now you probably know enough to bull your way to success.  I have seen some amazing projects built by folks who simply did not know how to quit.  But, it would sure be nice if you could stand on the shoulders of giants and make your program easy and effective.  Back in university one of my professors told a story of his first program.  He was a physicist and needed to run a simulation.  The idea was to write a program for the departments new computer to perform the simulation.  Like many simulation, this one involved a newtonian solver for some of the math.  This sort of solver converges on the correct answer iteratively.  So he started his work.  After a few days, he was bugging his computer science friend on the syntax of fortran and how to declare variables.  Eventually, he had it doing one pass through math.  Finally, he figured he knew his tool chain (fortran) and plowed ahead.  A week later he proudly showed off his new program to his friend.  His CS friend was impressed that this new programmer has produced such a complex application and it was working.  His friend asked to see the program and was shocked at the 1 meter high printout.  Scanning through the printout, he quickly discovered that while the physicist had figured out how to setup the math, he never figured out how to write a loop.  The entire program was the same set of functions RETYPED (he did not figure out cut/paste) hundreds of times until the math had run enough times to converge on the result.  Brute force does work....but there is usually a better way.
    Enough talking.  It is time to get started.  If you want to follow along, download and install MPLAB IDE from www.microchip.com/mplab and find a Curiosity Nano Evaluation Kit DM164144 (https://new.microchipdirect.com/product/search/all/DM164144).
    Next time, we will install the IDE, setup MCC and start that all-important LED blink.
    Good Luck.
  10. N9WXU
    Today we are going to explore the weird and wonderful trie (try).  We are actually going to brute force a trie and we will write a basic code generator to synthesize a static trie so it our trusty NMEA token tester can stay in flash memory and fit in the PICmcu.  I promise this is going to be the LEAST memory efficient trie you ever will see, but it is quick and it is easy to understand so let us press on.
    First things first, what is a trie?  Simply, the trie takes the if-else construct we did in pt 2 and turns it into a data structure.  I love this concept because it means I can write the trie parser once and simply be feeding it different data sets, it can process any language.  In fact, I try to make most of my programs data driven so the code can be well tested and the data can be focus of our attention.
    Our basic trie will be implemented as a 2D array 26 x N.  N will be adjusted to be as small as possible.  The 26 comes from the 26 unique characters that can make up my NMEA string set.  We will ignore lower case to make this dataset as small as possible.  
    Here is a picture showing what we plan to do.  In the picture we are finding the NMEA string "GPGGA".

    Each row of our 2D array represents each letter.  If a letter is not involved at that layer of lookup, we will fill it with a -1.  Each time we match, we learn the row where the next match is.   By matching to the row, GPGGA and GNGSA both share the first G by pointing to the second row.  On the second row, The P and the N will point further into the structures.
    Here is the search code:
    int trie_search(const char *s,int length) { int t=0; int p; for(int i=0;i<length;i++) { p = s[i] - 'A'; if((t = trie[t][p].index)==-1) return -1; } return trie[t][p].key_index - 1; // -1 hold over from a previous version. Generator can post increment the key } This looks fantastic.  Only one table lookup per letter so this should be blindly fast.  Let us take a look at the data.
      STRNCMP IF-ELSE RAGEL -G2 Hash Search
    (Hash/Search) Hash
    GPERF Hash
    GPERF no compare Trie Search GNGSA 399 121 280 326
    167/159 374 126 915 GPGSV 585 123 304 288
    167/121 374 126 871 GLGSV 724 59 225 503
    167/336 113 113 1395 GPRMC 899 83 299 536
    167/369 374 126 984 GPGGA 283 113 298 440
    167/273 374 126 821 Well Heck! that is certainly not what we expected to see.  What is going on?
    At the core, this algorithm is wonderful because it only touches each letter once.  This is very similar to the IF-ELSe so I would expect that this algorithm would be very similar to the IF-ELSE.  We need to dig a bit deeper.  As I look at the code, the only thing that stands out is the actual table lookup.  We have a 2D table of structs, with each struct being 2 bytes.  I wonder if this function is using a multiply to compute the table address.
    9840 l1579: 9841 031A ;example5.c: 65: if((t = trie[t][p].index)==-1) 9842 movf (trie_search@p+1),w 9843 031A 082C movwf (??_trie_search+0)+0+1 9844 031B 00A1 movf (trie_search@p),w 9845 031C 082B movwf (??_trie_search+0)+0 9846 031D 00A0 lslf (??_trie_search+0)+0,f 9847 031E 35A0 rlf (??_trie_search+0)+1,f 9848 031F 0DA1 lslf (??_trie_search+0)+0,f 9849 0320 35A0 rlf (??_trie_search+0)+1,f 9850 0321 0DA1 movf (trie_search@t+1),w 9851 0322 082E movwf (___wmul@multiplier+1) 9852 0323 00F1 movf (trie_search@t),w 9853 0324 082D movwf (___wmul@multiplier) 9854 0325 00F0 movlw 068h 9855 0326 3068 movwf (___wmul@multiplicand) 9856 0327 00F2 movlw 0 9857 0328 3000 movwf ((___wmul@multiplicand))+1 9858 0329 00F3 fcall ___wmul Here is the list file of line 65 and sure enough, there it is.  Lots of code to setup the multiply and then a brutal software multiply.  We were doomed.  This simple brute force approach needs to be a bit more elegance in its data structures.
    I will include the packaged project and the generator and you can duplicate the work.  I still like this concept, I challenge the readers to run some experiments and post the fastest/smallest version they can devise.
    Good Luck
  11. N9WXU
    This week we will discuss a different strategy for tokenizing our words.  This strategy will be to convert our word to a number in an unambiguous manner.  Then we can simply see if our number is one we recognize.  How hard can that be?
    The process of converting something into an unambiguous number is hashing.  Hash algorithms come in all shapes and sizes depending upon their application needs.  One nice hash is the Pearson hash which can make an 8-bit number of a string.  For a reasonably comprehensive list of hash algorithms check out this List of hash functions.  Generally hash functions are intended to convert some string into a unique number that is very unlikely to represent any other string.  How unlikely depends upon the algorithm and the size of the number.  For example, the Pearson hashes are quick and easy.  They produce an 8-bit value using an XOR table.  However, there are more than 256 words in the English language, much less the rest of the worlds languages, so the odds of a hash collision are quite high with a large set of words.  However, it is relatively easy to manipulate the XOR tables so that there are no collisions for a small word set.
    Assuming we can apply some hash algorithm that will uniquely identify our words, it should be pretty easy to apply this technique to find our tokens.  For example, in the first example we had the following table of keywords:
    const char *wordList[] = {"GPGGA","GNGSA","GPGSV","GPBOD","GPDBT","GPDCN"}; enum wordTokens {NO_WORD = -1,GPGGA,GNGSA,GPGSV,GPBOD,GPDBT,GPDCN}; What if, that changed to this:
    const int hashList[] = {<hash_of_GPGGA>,<hash_of_GNGSA>,<hash_of_GPGSV>,<hash_of_GPBOD>,<hash_of_GPDBT>,<hash_of_GPDCN>}; enum wordTokens {NO_WORD = -1,GPGGA,GNGSA,GPGSV,GPBOD,GPDBT,GPDCN}; It is not hard to imagine hashing the incoming word into an integer and then scanning the hashList looking for a match.  This would be 2.5x smaller in memory if we did not need to store the master wordList and the comparison would be a 2 byte compare (or a 1 byte compare if we used a smaller hash).  It would only require 2N comparisons where N is the number of words to check for.  Of course hashing the incoming word creates an up-front cost but that cost could be buried inside the character receive function.
    The hash methods are not perfect.  With any one-way algorithm on unknown input, there is the possibility of a collision. That is where two words have the same computed hash value.  This could mean that two of the keywords have the same value, or it could mean that a randomly chosen word (or random characters) matches a valid input.  In a system where the keyword list is static and known at compile time, it is possible to develop a "perfect hash".  That is a hash that guarantees all valid inputs are unique.  If your system is concerned about random "noise" being treated as valid data, there are at least two ways to solve this.
    Keep a list of the original words and do a final byte-for-byte compare one time. Add a checksum to the input and make sure the checksum has to be valid in addition to the hash match.  For NMEA strings, this is already available. Can we go faster still?
    The integer compare search method works very well, but there are a few ways to go even faster.
    Sort the hashes in the list and use a search algorithm like a binary search to find the match.  This reduces the the time from O(n) to O(log(n)). Much faster. Use the hash as an index into an array of the tokens. This reduces the time from O(n) to O(1).  Much Much faster but makes a potentially HUGE array (2^<hash bit length>) Use the hash%<word count> to create a minimally sized table.  This works but requires a minimal perfect hash.  That is, a hash with the property of producing an N record table for N words.  These algorithms are hard to find. Use the hash as the token in the rest of the system.  Why bother looking up a token if you already have a nice number.  This is a good solution but assumes that you never need to use the tokens as indices of other arrays. The idea that you can use the hash as an index into an array of records is the bases of a data structure called a Hash Table.  Accessing a hash table is typically O(1) until a collision is detected and then it goes to O(m) where m is the number of items in the collisions.  Typically the system implements the hash table as a sparse array with additional arrays holding the matching hash items.
    That is a lot of words.  I think it is time for a few examples.  First let us implement a basic hash search using our short word list.
    That was pretty easy but it is easy to see how finding a perfect hash function can get more and more difficult as we add words.  Fortunately for us, there is a tool that is part of the GCC compiler suite called Gperf.  Gperf's job is to find a perfect hash algorithm for a list of words and produce the C code to process it.  Sounds perfect, so here is an example of how that works.  First we must prepare a word list.  The word list below shows a structure that will be used to store the word list in the C program.  This is followed by the list of words and indices that will be used to populate an array of the structure.
    struct keyword_s {const char *name; int index;}; %% GPGGA,0 GNGSA,1 GPGSV,2 GPBOD,3 GPDBT,4 GPDCN,5 GPRMC,6 GPBWC,7 The word list is converted into C code with the following command line:
    gperf -t -I --output-file=hashTable.c keywordlist.txt This command will create a file called hashTable.c.  Inside this file is one public function called in_word_set.  Below you can see where I modified the NMEA_findToken function to use the in_word_set function supplied by gperf. 
    enum wordTokens {NO_WORD = -1,GPGGA,GNGSA,GPGSV,GPBOD,GPDBT,GPDCN, GPRMC, GPBWC}; struct keyword_s {const char *name; int index;}; extern struct keyword_s *in_word_set (register const char *str, register size_t len); enum wordTokens NMEA_findToken(char *word) { enum wordTokens returnValue = NO_WORD; struct keyword_s *kw; kw = in_word_set(word,5); if(kw) returnValue = kw->index; return returnValue; } Compile and Run and you get the following results.  Note how the hash search spends a fixed amount of time computing the 8-bit Pearson hash of the keyword.  Then it spends a small amount of time searching for the hash value in the list of hash keys.  This search is a brute force linear search.  A binary search would likely be faster with a large word set, but with only 8 words in the word set most o the time its spent computing the hash.
    The GPERF code is very interesting.  Notice how the function is fixed time if the word is present and is much faster when the word is absent.   There are a number of options to GPERF to allow it to produce code with different strategies.  If you look at the code produced by GPERF you will notice that there is a final test using strcmp (or optionally strncmp).  This will eliminate the possibility of a collision.  If we don't care about collisions, look how much faster this gets.
      STRNCMP IF-ELSE RAGEL -G2 Hash Search
    (Hash/Search) Hash
    GPERF Hash
    GPERF no compare GNGSA 399 121 280 326
    167/159 374 126 GPGSV 585 123 304 288
    167/121 374 126 GLGSV 724 59 225 503
    167/336 113 113 GPRMC 899 83 299 536
    167/369 374 126 GPGGA 283 113 298 440
    167/273 374 126 So far I would have to say that the GPEF solution is easily the best way to decipher the sentences from my GPS.  I know the GPS will deliver good strings so I would feel pretty comfortable stripping off the final compare.   However, even with the string compare the GPERF solution is pretty good.  It is only consistently beat by the hand-crafted if-else which will be a challenge to maintain.  Perhaps we should consider writing a code generator for that method.
    Hashing is a very interesting topic with a lot of ratholes behind it.  Some of the fastest algorithms and search methods take advantage of hashes to some degree.  Our passwords also use hashes so that passwords can be compared without knowing what the actual password is.  I hope this has been as informative to you as it was to write.
    As usual, take a look at the attached MPLAB projects to try out the different ideas.
    Good Luck.
    example4 gperf.zip
    example4 gperf no compare.zip
  12. N9WXU
    “Code generation, like drinking alcohol, is good in moderation.”
    — Alex Lowe
    This episode we are going to try something different.  The brute force approach had the advantage of being simple and easy to maintain.  The hand-crafted decision tree had the advantage of being fast.  This week we will look at an option that will hopefully combine the simplicity of the string list and the speed of the decision tree.  This week we will use a code generator to automatically create the tokenizing state machine.  I will leave it to you to decide if we use generation in moderation.
    Let me introduce RAGEL http://www.colm.net/open-source/ragel/.  I discovered RAGEL a few years ago when I was looking for a quick and dirty way to build some string handling state machines.  RAGEL will construct a complete state machine that will handle the parsing of any regular expression.  It can do tokenizing and it can do parsing.  Essentially, you define the rules for the tokens and the functions to call when each token is found.  For instance, you can write a rule to handle any integer and when an integer is found it can call your doInteger() method.  For our simple example of identifying 6 words, the RAGEL code will be a bit overkill but it will be MUCH faster than a brute force string search and in the same ball park as the hand crafted decision tree.  Let us get started.

    First let us get the housekeeping out of the way.  This part of the code you have seen before.  It is identical to the first two examples I have already provided.  There are two differences.  First, this only LOOKS like C code.  In fact, it is a RAGEL file (I saved it with a .rl extension) and you will see the differences in a moment.  When I use a code synthesizer, I like to place the needed command line at the top of the file in comments.  While comments are a smell, this sort of comment is pretty important.
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 // compile into C with ragel // ragel -C -L -G2 example3.rl -o example3.c // #include <string.h> #include <stdio.h> #include "serial_port.h" char * NMEA_getWord(void) { static char buffer[7]; memset(buffer,0,sizeof(buffer)); do { serial_read(buffer,1); } while(buffer[0] != '$'); for(int x=0;x<sizeof(buffer)-1;x++) { serial_read(&buffer[x], 1); if(buffer[x]==',') { buffer[x] = 0; break; } } return buffer; } enum wordTokens {NO_WORD = -1,GPGGA,GNGSA,GPGSV,GPBOD,GPDBT,GPDCN, GPRMC, GPBWC}; RAGEL is pretty nice in that they choose some special symbols to identify the RAGEL bits so the generator simply passes all input straight to the output until it finds the RAGEL identifiers and then it gets to work.  This architecture allows you to simply insert RAGEL code directly into your C (or other languages) and add the state machines in place.
    The first identifiers we find are the declaration of a state machine (foo seemed traditional).  You can define more than one machine so it is important to provide a hint to the generator about which one you want to define.
    After the machine definition, I specified the location to place all the state machine data tables.  There are multiple ways RAGEL can produce a state machine.  If the machine requires data, it will go at the write data block.
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 %% machine foo; %% write data; enum wordTokens NMEA_findToken(char *word) { const char *p = word; const char *pe = word + strlen(word); int cs; enum wordTokens returnValue = NO_WORD; %%{ action gpgga { returnValue = GPGGA; fbreak; } action gngsa { returnValue = GNGSA; fbreak; } action gpgsv { returnValue = GPGSV; fbreak; } action gpbod { returnValue = GPBOD; fbreak; } action gpdbt { returnValue = GPDBT; fbreak; } action gpdcn { returnValue = GPDCN; fbreak; } action gpbwc { returnValue = GPBWC; fbreak; } action gprmc { returnValue = GPRMC; fbreak; } gpgga = ('GPGGA') @gpgga; gngsa = ('GNGSA') @gngsa; gpgsv = ('GPGSV') @gpgsv; gpbod = ('GPBOD') @gpbod; gpdbt = ('GPDBT') @gpdbt; gpdcn = ('GPDCN') @gpdcn; gpbwc = ('GPBWC') @gpbwc; gprmc = ('GPRMC') @gprmc; main := ( gpgga | gngsa | gpgsv | gpbod | gpdbt | gpdcn | gpbwc | gprmc )*; write init; write exec noend; }%% return returnValue; } Next is the C function definition starting at line 4 above.  I am keeping the original NMEA_findToken function as before.  No sense in changing what is working.  At the beginning of the function is some RAGEL housekeeping defining the range of text to process.  In this case the variable p represents the beginning of the test while pe represents the end of the text.  The variable cs is a housekeeping variable and the token is the return value so initialize it to NO_WORD.  The next bit is some RAGEL code.  The %%{ defines a block of ragel much like /* defines the start of a comment block.  The first bit of ragel is defining all of the actions that will be triggered when the strings are identified.  Honestly, these actions could be anything and I held back simply to keep the function identical to the original.  It would be easy to fully define the NMEA data formats and fully decode each NMEA sentence.  These simply identify the return token and break out of the function.  If we had not already sliced up the tokens we would want to keep store our position in the input strings so we could return to the same spot.  It is also possible to feed the state machine one character a time like in an interrupt service routine.
    After the actions, line 21 defines the search rules and the action to execute when a rule is matched.  These rules are simply regular expressions (HA! REGEX and SIMPLE in the same sentence).  For this example, the expressions are simply the strings.  But if your regular expressions were more complex, you could go crazy.  
    Finally, the machine is defined as matching any of the rules.  The initialization and the actual execute code are placed and the RAGEL is complete.
    Whew!  Let us look at what happened when we compile it.
    One of my favorite programming tools is graphviz.  Specifically DOT.  It turns out that RAGEL can produce a dot file documenting the produced state machine.  Lets try it out.
    bash> ragel -C -L -V example3.rl -o example3.dot bash> dot example3.dot -T png -O
    It would be nicer if all the numbers on the arrows were the characters rather than the ASCII codes but I suppose I am nitpicking.  Now you see why I named my actions after the sentences.  The return arrow clearly shows which action is being executed when the words are found.  It also shows that the action triggers when the last letter is found rather than a trailing character.  I suppose if you had the word gpgga2, then you would need to add some additional REGEX magic.  The dotted arrow IN leading to state 17 refers to any other transition not listed.  That indicates that any out-of-place letter simply goes back to 17 without triggering an ACTION.  It is possible to define a “SYNTAX ERROR” action to cover this case but I did not care.  For my needs, failing quietly is a good choice.
    This all looks pretty good so far. What does the C look like?
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 /* #line 1 "example3.rl" */ // compile into C with ragel // ragel -C -L -G2 example3.rl -o example3.c // #include < string.h > #include < stdio.h > #include "serial_port.h" char * NMEA_getWord(void) { static char buffer[7]; memset(buffer, 0, sizeof(buffer)); do { serial_read(buffer, 1); } while (buffer[0] != '$'); for (int x = 0; x < sizeof(buffer) - 1; x++) { serial_read( & buffer[x], 1); if (buffer[x] == ',') { buffer[x] = 0; break; } } return buffer; } enum wordTokens { NO_WORD = -1, GPGGA, GNGSA, GPGSV, GPBOD, GPDBT, GPDCN, GPRMC, GPBWC }; /* #line 34 "example3.rl" */ /* #line 39 "example3.c" */ static const int foo_start = 17; static const int foo_first_final = 17; static const int foo_error = 0; static const int foo_en_main = 17; /* #line 35 "example3.rl" */ enum wordTokens NMEA_findToken(char * word) { const char * p = word; const char * pe = word + strlen(word); int cs; enum wordTokens returnValue = NO_WORD; /* #line 57 "example3.c" */ { cs = foo_start; } /* #line 62 "example3.c" */ { switch (cs) { tr5: /* #line 45 "example3.rl" */ { returnValue = GNGSA; { p++; cs = 17; goto _out; } } goto st17; tr12: /* #line 47 "example3.rl" */ { returnValue = GPBOD; { p++; cs = 17; goto _out; } } goto st17; tr13: /* #line 50 "example3.rl" */ { returnValue = GPBWC; { p++; cs = 17; goto _out; } } goto st17; tr16: /* #line 48 "example3.rl" */ { returnValue = GPDBT; { p++; cs = 17; goto _out; } } goto st17; tr17: /* #line 49 "example3.rl" */ { returnValue = GPDCN; { p++; cs = 17; goto _out; } } goto st17; tr20: /* #line 44 "example3.rl" */ { returnValue = GPGGA; { p++; cs = 17; goto _out; } } goto st17; tr21: /* #line 46 "example3.rl" */ { returnValue = GPGSV; { p++; cs = 17; goto _out; } } goto st17; tr23: /* #line 51 "example3.rl" */ { returnValue = GPRMC; { p++; cs = 17; goto _out; } } goto st17; st17: p += 1; case 17: /* #line 101 "example3.c" */ if (( * p) == 71) goto st1; goto st0; st0: cs = 0; goto _out; st1: p += 1; case 1: switch (( * p)) { case 78: goto st2; case 80: goto st5; } goto st0; st2: p += 1; case 2: if (( * p) == 71) goto st3; goto st0; st3: p += 1; case 3: if (( * p) == 83) goto st4; goto st0; st4: p += 1; case 4: if (( * p) == 65) goto tr5; goto st0; st5: p += 1; case 5: switch (( * p)) { case 66: goto st6; case 68: goto st9; case 71: goto st12; case 82: goto st15; } goto st0; st6: p += 1; case 6: switch (( * p)) { case 79: goto st7; case 87: goto st8; } goto st0; st7: p += 1; case 7: if (( * p) == 68) goto tr12; goto st0; st8: p += 1; case 8: if (( * p) == 67) goto tr13; goto st0; st9: p += 1; case 9: switch (( * p)) { case 66: goto st10; case 67: goto st11; } goto st0; st10: p += 1; case 10: if (( * p) == 84) goto tr16; goto st0; st11: p += 1; case 11: if (( * p) == 78) goto tr17; goto st0; st12: p += 1; case 12: switch (( * p)) { case 71: goto st13; case 83: goto st14; } goto st0; st13: p += 1; case 13: if (( * p) == 65) goto tr20; goto st0; st14: p += 1; case 14: if (( * p) == 86) goto tr21; goto st0; st15: p += 1; case 15: if (( * p) == 77) goto st16; goto st0; st16: p += 1; case 16: if (( * p) == 67) goto tr23; goto st0; } _out: {} } /* #line 66 "example3.rl" */ return returnValue; } int main(int argc, char ** argv) { if (serial_open() > 0) { for (int x = 0; x < 24; x++) { char * w = NMEA_getWord(); enum wordTokens t = NMEA_findToken(w); printf("word %s,", w); if (t >= 0) printf("token %d\n", t); else printf("no match\n"); } } serial_close(); return 0; } And this is why we use a code generator.  The code does not look too terrible.  i.e., I could debug it if I thought there were some bugs and it does follow the state chart in a perfectly readable way.  BUT, I hope you are not one of those programmers who finds GOTO against their religion.  (Though Edsger Dijkstra did allow an exception for low level code when he wrote EWD215 https://www.cs.utexas.edu/users/EWD/transcriptions/EWD02xx/EWD215.html )
    So how does this perform?
      STRNCMP IF-ELSE RAGEL -G2 GNGSA 399 121 280 GPGSV 585 123 304 GLGSV 724 59 225 GPRMC 899 83 299 GPGGA 283 113 298 And for the code size MPLAB XC8 in Free mode on the PIC16F1939 shows 2552 bytes of program and 1024 bytes of data.  Don’t forget that printf is included.  But this is comparable to the other examples because I am only changing the one function.
    So our fancy code generator is usually faster than the brute force approach, definitely slower than the hand-crafted approach and is fairly easy to modify.  I think I would use the string compare until I got a few more strings and then make the leap to RAGEL.  Once I was committed to RAGEL, I think I would see how much of the string processing I could do with RAGEL just to speed the development cycles and be prepared for that One Last Feature from Marketing.
    Next week we will look at another code generator and a completely different way to manage this task.
    Good Luck.
  13. N9WXU
    While we are on the topic of the wisdom of Dijkstra let us not forget what he said about computer architecture.
    I refer you to EWD 32, Paragraph 5.  http://www.cs.utexas.edu/users/EWD/transcriptions/EWD00xx/EWD32.html 
    The first time I read that I almost fell on the floor.  It is so true.  Most of the time we are awed every time the CPU architects have clearly anticipated our needs and built the correct facilities into the MCU's.  But occasionally, you get a strange oversight like the TRMT bit without an interrupt in the EUART on PICmcu's.  Or the advanced math options on the ADC with Computation but they work on every conversion and don't respect the channel.  i.e. you are limited to operating on a single ADC channel if you use the advanced features.
    We all need a good laugh, so post your favorites "features" below. 
  14. N9WXU
    The PICmcu is not known for being fast or supporting large memories but it does have one feature that can significantly simplify developing your applications.  That feature is the lack of a hardware call stack.
    Sacrilege you say!  But wait...
    The lack of this important feature has caused the compiler team to develop an incredibly useful alternative that I would argue is better in nearly every way than an actual stack.
    For those of you who are now wondering what I am talking about, let's take a quick diversion into stacks.
    A stack is simply a data structure that arranges a number of data elements so that the last thing you inserted is the first thing you get back.  Think of a stack of plates, you add plates to the top of the stack and remove them in reverse order from the top of the stack.  This is important for a few reasons.  First, imagine your code was interrupt by a hardware interrupt.  The current address of your code is pushed onto the stack, the interrupt runs and your address is popped from the stack so the interrupt can return where it interrupt you.  This is a handy feature and for "free" you can handle any number of interruptions.  In fact, a function can call itself and so long as there is sufficient room on the stack, everything will be sorted out on the return.
    Now, the PICmcu's have hardware stacks for the the function calls and returns.  They simply don't have any hardware support for anything else.  If the hardware stack is so useful for keeping track of return addresses, it would also be useful for all the parameters your function will need and the values your functions will return.  This parameter stack is an important feature of most languages and most especially the C language.  The AVR, ARM, 68000, IA86, Z80, VAX11, all have instruction level support for implementing a parameter stack for each function.  I have written millions of lines of C code for the PIC16 so how does it do its job without this important part of the language and why do I think this missing features is such a strong strength of the CPU.
    The secret to the ability of XC8 to produce reasonably efficient C code for the PIC16 and PIC18 without a stack lies in the "compiled stack" feature.  This feature analyses the call tree of your program and determines what the stack would look like at any point in the program.  Functions that could be in scope at the same time (consider a multiply function in "main" and the interrupt) are duplicated so there are no parameters that need to be in two places at the same time.  Any recursive functions are detected and the user alerted.  Finally, the complete stack is converted to absolute addresses and mapped into the physical memory of the CPU.  Then all the instructions are fixed up with those absolute addresses.  This big-picture view of the program also allows the compiler to move parameters around to minimize banking (banking is extra instructions required to reach addresses further than 128 bytes away) and finally, the finished program is ready to run in your application.  The amazing thing is the final memory report is the complete memory requirements of the program INCLUDING THE LOCAL VARIABLES.  This is a shocker.  In fact, when I was looking at the Arduino forums,. I would frequently encounter users who added one more line of code and suddenly their application stopped working.  They were told to go buy the bigger CPU.  Imagine if your compiler could tell you if the program would fit and that would include all the stack operations.  This is a game changer and I would love to see the industry apply this technology across all CPU's.
    There is no reason why any CPU would not be able to operate with a compiled stack.  In fact, most CPU's are capable of operating with both kinds of stacks at the same time.  The biggest reason against this sort of operation is really in large project management.  Consider, you develop a library for some special feature, perhaps GPS token parsing,  Now you want to include this library in all of your applications.  You MUST NOT recompile this code because it has passed all of your certification testing and is known good "validated" code.  (remember, the compiler is permitted to produce a different binary on each run).  If you cannot recompile the code, on some architectures you cannot change the addresses as there could be side effects (banking on a PIC16).  If your code only relies upon a stack, then there is never any recompiling required and the linker task is dramatically simplified.
    Anyway, I must go back into the FreeRTOS world and continue my quest to find the thread with the overflowing stack.  Life would be simpler with static analysis tools for the stack.
    Until next time.,
  15. N9WXU
    When we left off we had just built a test framework that allowed us to quickly and easily try out different ways to identify NMEA keywords.  The first method shown was a brute force string compare search.  For this week, I promised to write about an if-else decoder.  The brute force search was all about applying computing resources to solve the problem.  This approach is all about applying human resources to make life easy on the computer.  So this solution will suck.  Let us press on.
    The big problem with the string compare method is each time we discard a word, we start from scratch on the second word.  Consider that most NMEA strings from a GPS start with the letters GP.  It would be nice to discard every word that does not begin with a G and only look at each letter once.  Consider this state machine:

    I did simplify the drawing… every invalid letter will transfer back to state 1 but that would clutter the picture.  This would require the smallest number of searches to find the words.  So one way to build this is to write a big IF-ELSE construct that covers all the choices.  This will step through the letters and end up with a decision on what keyword was found.
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 enum wordTokens NMEA_findToken(char *word) { enum wordTokens returnValue = NO_WORD; char c = *word++; if(c == 'G') { c = *word++; if(c == 'P') { c = *word++; if(c == 'G') // gpGga or gpGsv { c = *word++; if(c == 'G') // gpgGa { c = *word++; if(c == 'A') { if(*word == 0) // found GPGGA { returnValue = GPGGA; } } } else if(c == 'S') // gpgSv { c = *word++; if(c == 'V') { if(*word == 0) // found GPGSV { returnValue = GPGSV; } } } } else if(c == 'B') // gpBod { c = *word++; if(c == 'O') { c = *word++; if(c == 'D') { if(*word == 0) { returnValue = GPBOD; } } } } else if(c == 'D') // gpDcn or gpDbt { c = *word++; if(c == 'C') { c = *word++; if(c == 'N') { if(*word == 0) { returnValue = GPDCN; } } } else if(c == 'B') { c = *word++; if(c == 'T') { if(*word == 0) { returnValue = GPDBT; } } } } } else if(c == 'N') // gNgsa { c = *word++; if(c == 'G') { c = *word++; if(c == 'S') { c = *word++; if(c == 'A') { if(*word == 0) { returnValue = GNGSA; } } } } } } return returnValue; }  
    And it is just that easy.  This is fast, small and has only one serious issue in my opinion.  I hope you are very happy with the words chosen, because making changes is expensive in programmer time.  This example only has 6 6-letter words and is 100 lines of code.  They are easy lines, but almost all of them will require rewriting if you change even one word.
    Here are the stats so you can compare with last weeks string compare.
      STRNCMP IF-ELSE GNGSA 399 121 GPGSV 585 123 GLGSV 724 59 GPRMC 899 83 GPGGA 283 113  
    These are the CPU cycles required on a PIC16F1939.  You can verify in the simulator.
    That is all for now.  Stay tuned, next time we will show a nice way to manage this maintenance problem.
    Good Luck
  16. N9WXU
    This is the first of a 5 part article where I will explore some different ways to tokenize keywords.  This is a simple and common task that seems to crop up when you least expect it.  We have all probably done some variation of the brute force approach in this first posting, but the posts that follow should prove interesting.  Here is the sequence:
    Part 1 : STRCMP Brute Force and the framework
    Part 2 : IF-ELSE for speed
    Part 3 : Automating the IF-ELSE for maintenance
    Part 4 : Perfect Hash Maps
    Part 5 : Trie's
    Feel free to skip around once they are all published if there is an interesting topic that you want to learn more about.  In this first posting I will be building a complete testable example so we can do science rather than expounding upon our opinions.
    About The Test Data
    I am currently working on a GPS project so I am being subjected to NMEA sentences.  (https://en.wikipedia.org/wiki/NMEA_0183).  These sentences are quite easy to follow and you generally get a burst of text data every second from your GPS module.  The challenge is to quickly & efficiently extract this information while keeping your embedded system performing all the tasks required and do this with the smallest possible program.
    As far as stream data is concerned, NMEA sentences are remarkably well behaved.  Here are a few sample messages from my new Bad Elf GPS Pro +.
    $GNGSA,A,3,32,14,01,20,18,08,31,11,25,10,27,,1.19,0.61,1.02*16 $GNGSA,A,3,65,66,88,75,81,82,76,67,,,,,1.19,0.61,1.02*13 $GPGSV,4,1,14,14,73,336,25,32,60,023,31,18,52,294,20,51,51,171,30*7A $GPGSV,4,2,14,31,47,171,34,10,44,087,33,11,30,294,26,01,26,316,15*78 $GPGSV,4,3,14,20,20,108,34,22,19,304,16,08,13,247,21,27,08,218,18*7A The pattern is:
    $<IDENTIFIER>,<COMMA SEPERATED DATA>*<CHECKSUM> The work of separating out the parts of a sentence is called parsing.  We will not be fully parsing the NMEA sentence but if we wanted to, there are many useful tools like BISON that can automate this job.  Each of the automated tools has a minimum price of entry in terms of code size and CPU cycles.  This brute force approach has the benefit of simplicity, maintainability and a low initial size/speed cost.  So let us press on and establish a performance baseline.
    Our Test Program
    The program we are about to construct works like this:

    Like all instructors I will concentrate on the necessary parts of the lesson and leave all the processing as an exercise for the student.  Once you have identified the sentence is easy to apply a suitable sscanf to or strtok to process each sentence individually.  The actual part we are going to measure is Identify The Keyword but Find a Keyword is a necessary first step and a little bit of wrapper/measurement code is needed to complete the experiment.
    Parsing out our Keywords
    To get the identifier all I need to do is discard characters until I see a '$'.  Then collect characters until I see a ','.  That is easy code to write so here is a quick version:
    char * NMEA_getWord(void) {     static char buffer[7];          memset(buffer,0,sizeof(buffer));     do     {         serial_read(buffer,1);     } while(buffer[0] != '$');     for(int x=0;x<sizeof(buffer)-1;x++)     {         serial_read(&buffer[x], 1);         if(buffer[x]==',')         {             buffer[x] = 0;             break;         }     }     return buffer; }
    I did not try to collect the entire NMEA sentence, validate the checksum, handle line noise.  So this is not a complete application.  It does fill a buffer with the NMEA sentence quickly and legibly.  One more note: the buffer is 7 bytes large because my Bad Elf GPS sends PELFID as the FIRST sentence upon connection.  If I wanted to decode this sentence, the buffer needs another byte.
    This simple function will be used for all the testing in this series of articles so it is worth understanding.
    So here is a description of each line of code:
    the function will take no parameters and return a pointer to a NMEA sentence.  It will block until it has such a sentence so it assumes a continuous data stream. There is a static buffer of 7 characters.  A pointer to this buffer will be sent to the caller with every iteration.  The buffer is 7 characters to allow for the 6 character message from my GPS and a null terminator. Because this is a static buffer we cannot initialize it in the definition.  So every time this function is called, we will clear out the buffer to 0's The do/while loop reads single characters from the serial port (provided by the user) until a '$' character is received.  The getchar() pattern may have been a better abstraction for the serial port but I was originally planning to receive 5 bytes after the $ with a single call.  The 6 byte PELFID changed my mind. The for loop simply reads UP-TO sizeof(buffer) -1 bytes unless the received byte is a ','.  I prefer to use sizeof whenever possible to make the compiler do the math to determine how big arrays are. if a ',' is received, we convert it to a 0 and break. lastly we return the buffer pointer. The next piece of housekeeping we should discuss is the main function.  This is a simple test application that will provide the framework for evaluating ways of detecting text strings and timing their impact.
    int main(int argc, char **argv) {     if(serial_open()>0)     {         for(int x = 0; x < 24; x ++)         {             char *w = NMEA_getWord();             enum wordTokens t = NMEA_findToken(w);             printf("word %s,",w);             if(t >= 0)                 printf("token %d\n",t);             else                 printf("no match\n");         }     }     serial_close();   return 0; }
    Like before, I will give a quick line-by-line explanation
    This is the standard main declaration for ALL C programs.  For embedded systems we often ignore the parameters and return values but I like to do it for test programs simply because I generally prototype as command line programs on my Mac. serial_open is a simple abstraction that is easy to adapt to any system.  It will prepare the serial input from my GPS.  The version attached to this project will work on a Linux or Mac and open a specific /dev/cu.<device> that is hardcoded.  This is appropriate for such a test application where knowledgeable users can easily edit the code.  Obviously a robust version would pass the serial device in as a command line parameter and an embedded system would just activate the hardware. This program will decode 24 keywords.  That makes a simple limit to run in the simulator or on my Mac and is sufficient for these tests. Now we run the test.  First, get a keyword Second find the keyword and return the token Third display some output Repeat Finally we close the serial port.  Often I will skip this step but on a PC program, leaving the port open will cause an error on the second run of the program until the port times out and closes automatically.  By closing I save time for the user.  For an embedded system, the serial_close function is usually left empty. For the simulator I would add a while(1) line before the return 0.  This will freeze the simulation and allow us to study the results.  Otherwise, XC8 may simply restart our application from the beginning and fill up my log file with repeat runs. This covers all the code except for the bit that we are interested in.  For these type of experiments I like to be as agile as possible.  This is about failing fast so we can discard bad ideas as quickly as possible.  We should never invest so much of our energy in a bad idea that we emotionally feel obligated to "make it work at all costs". I have seen too many bad ideas last for too long due to this human failing.
    So lets briefly talk about findToken
    The function NMEA_findToken takes a pointer to a possible keyword (as identified by NMEA_getWord) and returns an integer (enum really) that identifies it as a number to the rest of the code.  I will leave as an exercise to the student the work of doing something useful with a token id.  In the future we can replace this function with different techniques and so long as the interface stays constant, we can simply insert the code and measure the performance.
    For today's test, we will simply apply a brute force strategy.  We have a list of words:
    const char *wordList[] = {"GPGGA","GNGSA","GPGSV","GPBOD","GPDBT","GPDCN"}; and a matching enumeration:
    enum wordTokens {NO_WORD = -1,GPGGA,GNGSA,GPGSV,GPBOD,GPDBT,GPDCN}; All we need to do is compare our test word to each word in the list until a match is found.  The C standard library provides a useful string compare function strncmp.  But XC8 does not include it (it arrived with C99) so we will use its predecessor strcmp and press on.  We will just be very careful not to loose our terminating character.
    enum wordTokens NMEA_findToken(char *word) {   enum wordTokens retValue = NO_WORD;   for(int x=0; x < sizeof(wordList)/sizeof(*wordList);x++)   {     if(strcmp(word,wordList[x])==0)     {       retValue = x;       break;     }   }   return retValue; } Here is the blow-by-blow:
    This function will take a pointer to a string and return a token. Initialize the return value to the no_word value Step through the word list Test each word looking for a match on a match, set the return value and leave the loop Return the value to the caller I ran it on my Mac and it runs fine identifying GPS strings from my GPS.  I also ran it in the PIC16F1939 simulator so I could measure the CPU cycles and code size for a baseline.
    For the testing I am looking for 6 words which were carefully chosen to have some missing characters, overlapping characters, and matching words at the beginning and end of the list.
    Before I show the measurements we should make this scientific and formulate a hypothesis.
    My test sequence is GNGSA, GPGSV, GLGSV, GPRMC, GPGGA
    The words in the list are in this order : GPGGA, GNGSA, GPGSV, GPBOD, GPDBT, GPDCN
    So I expect the first word to fail at the P in GPGGA, then pass GNGSA.  This will be a total of 7 tests to execute.
    The second word will fail GPGGA on the third test, fail GNGSA on the second test and then pass for a total of 10 tests to execute.
    GLGSV will require 12 tests to fail.  In every word it will fail the second test.
    GPRMC will require 14 tests to fail
    GPGGA will require 5 tests to pass.
    The most number of tests required to pass or fail is equal to the number of characters in the entire list.  That assumes that the list is constructed so the only differences are in the last letter so a possible word must go through the entire word of the entire list before determining if it is a match or fail.  For our little test, that is not possible, but if the list were different it would take 30 tests.
    So I expect it to be O(n) where n is the total character count in the word list.
    Here is the actual data for executing NMEA_findToken in cpu instruction cycles on a PIC16F1939.  The data was taken using the MPLAB X simulator.
    Word STRCMP Cycles Notes GNGSA 399 Second word in the word list GPGSV 585 Third word in the word list GLGSV 724 Not in word list GPRMC 899 Not in word list GPGGA 283 First word in the word list The program did include printf and some buffers so the program did end up large at 2153 bytes of flash memory.
    If you decided that this was the strategy for you, then you can make some improvements simply by carefully ordering the list so the most frequent words are first.  But after looking things over, it would be nice if we could do our search where we did not need to repeat letters.  i.e. all the words start with G, so if the first letter is not a G we can stop the entire process.  We will explore one way to accomplish this optimization next time.
    For now, take a look at the attached files and I welcome's any suggestions for improvements to this strategy.
    Good Luck
  17. N9WXU
    Assembly language may no longer be the mainstream way to write code for embedded systems, however it is the best way to learn how a specific CPU works without actually building one.    Assembly language is simply the raw instruction set of a specific CPU broken into easy to remember pneumonics with a very basic syntax.  This enables you full control of everything the CPU does without any translation provided by a compiler.  Sometimes this is the only reasonable way to do something that cannot be represented by a higher level language.  Here is an example from a project I was working on today.
    Today I wanted to create a 128-bit integer (16 bytes).  That means I will need to add, subtract, multiply, etc. on my new 128-bit datatype.  I was writing for a 32-bit CPU so this would require 4 32-bit values concatenated together to form the 128-bit value.  If we consider the trivial problem of adding two of these numbers together, lets consider the following imaginary code.
    int128_t foo = 432123421234; int128_t bar = 9873827438282; int128_t sum = foo + bar; But my 32-bit CPU does not understand int128_t so I must fake it.  How about this idea.
    int32_t foo[] = {0x00112233, 0x44556677, 0x8899AABB, 0xCCDDEEFF}; int32_t bar[] = {0xFFEEDDCC, 0xBBAA9988, 0x77665544, 0x33221100}; int32_t sum[4]; sum[0] = foo[0] + bar[0]; sum[1] = foo[1] + bar[1]; sum[2] = foo[2] + bar[2]; sum[3] = foo[3] + bar[3]; But back in grade school I learned about the 10's place and how I needed to carry a 1 when the sum of the one's place exceeded 10.  It seems that it is possible that FOO[0] + BAR[0] could exceed the maximum value that can be stored in an int32_t so there will be a carry from that add.  How do I add carry into the next digit?  In C I would need to rely upon some math tricks to determine if there was a carry.  But the hardware already has a carry flag and there are instructions to use it.  We could easily incorporate some assembly language and do this function in the most efficient way possible.
    So enough rambling.  Let us see some code.  First, we need to configure MPLAB to create an ASM project.
    Create a project in the normal way, but when you get to select a compiler you will select MPASM.

    Now you are ready to get the basic source file up and running.  Here is a template to cut/paste.
    #include "p16f18446.inc" ; CONFIG1 ; __config 0xFFFF __CONFIG _CONFIG1, _FEXTOSC_ECH & _RSTOSC_EXT1X & _CLKOUTEN_OFF & _CSWEN_ON & _FCMEN_ON ; CONFIG2 ; __config 0xFFFF __CONFIG _CONFIG2, _MCLRE_ON & _PWRTS_OFF & _LPBOREN_OFF & _BOREN_ON & _BORV_LO & _ZCD_OFF & _PPS1WAY_ON & _STVREN_ON ; CONFIG3 ; __config 0xFF9F __CONFIG _CONFIG3, _WDTCPS_WDTCPS_31 & _WDTE_OFF & _WDTCWS_WDTCWS_7 & _WDTCCS_SC ; CONFIG4 ; __config 0xFFFF __CONFIG _CONFIG4, _BBSIZE_BB512 & _BBEN_OFF & _SAFEN_OFF & _WRTAPP_OFF & _WRTB_OFF & _WRTC_OFF & _WRTD_OFF & _WRTSAF_OFF & _LVP_ON ; CONFIG5 ; __config 0xFFFF __CONFIG _CONFIG5, _CP_OFF ; GPR_VAR UDATA Variable RES 1 SHR_VAR UDATA_SHR Variable2 RES 1 ;******************************************************************************* ; Reset Vector ;******************************************************************************* RES_VECT CODE 0x0000 ; processor reset vector pagesel START ; the location of START could go beyond 2k GOTO START ; go to beginning of program ISR CODE 0x0004 ; interrupt vector location ; add Interrupt code here RETFIE ;******************************************************************************* ; MAIN PROGRAM ;******************************************************************************* MAIN_PROG CODE ; let linker place main program START ; initialize the CPU LOOP ; do the work GOTO LOOP END The first thing you will notice is the formatting is very different than C.  In assembly language programs the first column in your file is for a label, the second column is for instructions and the third column is for the parameters for the instructions.  In this code RES_VECT, ISR, MAIN_PROG, START and LOOP are all labels.  In fact, Variable and Variable2 are also simply labels.  The keyword CODE tells the compiler to place code at the address following the keyword.  So the RES_VECT (reset vector) is at address zero.  We informed the compiler to place the instructions pagesel and GOTO at address 0.  Now when the CPU comes out of reset it will be at the reset vector (address 0) and start executing these instructions.  Pagesel is a macro that creates a MOVLP instruction with the bits <15:11> of the address of START.  Goto is a CPU instruction for an unconditional branch that will direct the program to the address provided.  The original PIC16 had 35 instructions plus another 50 or so special keywords for the assembler.  The PIC16F1xxx family (like the PIC16F18446) raises that number to about 49 instructions.  You can find the instructions in the instruction set portion of the data sheet documented like this:

    The documentation shows the syntax, the valid range of each operand, the status bits that are affected and the work performed by the instruction.  In order to make full use of this information, you need one more piece of information.  That is the Programmers Model.  Even C has a programmers model but it does not always match the underlying CPU.  In ASM programming the programmers model is even more critical.  You can also find this information in the data sheet.  In the case of the PIC16F18446 it can be found in chapter 7 labeled Memory Organization.  This chapter is required reading for any aspiring ASM programmers.
    Before I wrap up we shall modify the program template above to have a real program.
    START banksel TRISA clrf TRISA banksel LATA loop bsf LATA,2 nop bcf LATA,2 GOTO loop ; loop forever END This program changes to the memory bank that contains TRISA and clears TRISA making all of PORT A an output.
    Next is changes to the memory bank that contains the LATCH register for PORT A and enters the loop.
    BSF is the pneumonic for Bit Set File and it allows us to set bit 2 of the LATA register.  NOP is for No OPeration and just lets the bit set settle.  BCF is for Bit Clear File and allows us to clear bit 2 and finally we have a branch to loop to do this all over again.  Because this is in assembly we can easily count up the instruction cycles for each instruction and determine how fast this will run.  Here is the neat thing about PIC's.  EVERY instruction that does not branch takes 1 instruction cycle (4 clock cycles) to execute.  So this loop is 5 cycles long.  We can easily add instructions if we need to produce EXACTLY a specific waveform.
    I hope this has provided some basic getting started information for assembly language programming.  It can be rewarding and will definitely provide a deeper understanding on how these machines work.
    Good Luck
  • Create New...