Jump to content
 

N9WXU

Member
  • Content Count

    92
  • Joined

  • Last visited

  • Days Won

    32

N9WXU last won the day on March 18

N9WXU had the most liked content!

Community Reputation

49 Excellent

1 Follower

About N9WXU

  • Rank
    Contributor

Recent Profile Visitors

744 profile views
  1. N9WXU

    Home Offices!

    Time to share our home offices. Here is where I am spending my time lately.
  2. I am leaning towards adding a current transformer to the smart switch. Then I have positive feedback that the regulator is also commanding the motor to be on and I can more positively put a time limit on the motor run.
  3. A few weeks ago, I installed shop air in my garage. I was pretty proud when it held 150psi all night. But of course I did not quite tighten a connection and at 2 AM (or so my daughter tells me) there was a loud bang followed by a steady "compressor" noise. I did not notice until the next morning when I wondered why there was a noise from the garage. That compressor was pretty hot for running 6 hours straight. Of course this could be stopped by turning the compressor off each night. But, I write embedded software for a living and lately I have been deep into IoT projects. Naturally, this was an ideal chance to do something about my dumb compressor. Ingredients First, I needed a way to switch the compressor on-off remotely. These Sonoff switches are almost perfect. On the plus side, they have an ESP8266 inside so I can run TASOMOTA which is a generic Home Automation / IoT firmware for all things 8266. On the down side, they only are good for 10A. So I added a 120VAC 2 pole relay good for 30A. The compressor has a 16.6A motor draw so some overkill seems appropriate. I refreshed the Sonoff Basic with Tasmota and installed everything inside a metal electrical box. And when I visited the web page: I can turn the compressor on/off from my phone. Fantastic! As long as I had everything opened up, I went ahead and added 2 pressure sensors. Left and right of the primary pressure regulator. The left side sensor goes to the compressor and lets me know what it is doing. I am now tempted to remove the mechanical hysteretic controller on the compressor and simply use the Sonoff switch and some electronic pressure sensing to do the same thing. We shall see. Everything is now in place to ensure the compressor can be automatically turned off, or have a maximum run limit. The only thing left is software! Good Luck.
  4. I am glad to provide some ideas. Let us know what you find. Learning from others is generally cheaper and less painful than learning from my own mistakes.
  5. I think the C libraries can make linear optimization problems much easier to solve in a system. But they are a general purpose solution intended for the general class of optimization problems. There is a good chance that the code you wrote was far simpler and sufficiently optimal for the task at hand due to task specific optimizations. This story is often the case for embedded systems. A general purpose solution is nice and easy but too big/slow for an embedded microcontroller. As the microcontrollers get larger/faster for low costs then these generic or more complete solutions become available. Interestingly this does not always create a "better" performing solution but it often produces a more reliable, faster to market solution by leveraging more widely developed software that has more hours of operation/debugging on it.
  6. This tool does three things. Solve the problem helps you learn to think declaratively Helps you develop constraints. To add code to your project, look at the google OR-Tools. In those tools you will provide constraints and data sets. The tools will then do the solving. I would expect some solutions to take quite a bit of CPU time. Currently I am taking basic modeling in discrete optimization in Coursera to learn more about this topic. Training my mind to describe a problem instead of solving it is actually quite hard.
  7. I am doing some work with combinatorial optimizers. It is amazing what happens when you turn over one more rock and see what scurries out. There is a whole class of programming called declarative programming and I have worked with Haskel enough to be slightly familiar with the concepts. I just learned about flat zinc and an easier environment called MiniZinc which are completely declarative and can be used to solve optimization problems by describing the constraints a valid solution fits inside. So here is a quick example of a program to find the smallest area rectangle where the area is 10 times the circumference. var 1..1000: side1; var 1..1000: side2; var float: area; var float: circumference; constraint area = side1 * side2; constraint circumference = 2 * side1 + 2 * side2; constraint area = 10*circumference; solve minimize area; output ["side1 = \(side1)\nside2 = \(side2)\narea = \(area)\ncircumference = \(circumference)\n"]; and here is the output showing every iteration. side1 = 420 side2 = 21 area = 8820.0 circumference = 882.0 ---------- side1 = 220 side2 = 22 area = 4840.0 circumference = 484.0 ---------- side1 = 120 side2 = 24 area = 2880.0 circumference = 288.0 ---------- side1 = 100 side2 = 25 area = 2500.0 circumference = 250.0 ---------- side1 = 70 side2 = 28 area = 1960.0 circumference = 196.0 ---------- side1 = 60 side2 = 30 area = 1800.0 circumference = 180.0 ---------- side1 = 45 side2 = 36 area = 1620.0 circumference = 162.0 ---------- side1 = 40 side2 = 40 area = 1600.0 circumference = 160.0 ---------- ========== Finished in 82msec Obviously this is a trivial example but it turns out there is quite a bit of research and libraries in this field. For example the google OR-Tools which could be incorporated in your C code. If you need to optimize something and you can describe what the answer looks like (the constraints) then these tools are pretty good. Of course these problems are NP-Complete, so solutions can take some time. Good Luck.
  8. I have not used harmony or web net server so I have not run into this directly. But there may be a few other places to check that cause resets on other systems. Often the assert() functions will end in a software reset, so your code may not call the reset directly, but if you use assert in your error checks you will reset Some malloc libraries will fail with a reset if there is a heap failure.i.e. the stack runs into the heap. This is often detected with a no-mans land between the stack and the heap. The no-mans land is filled with a magic number. If the magic number changed, the stack ran into the no-mans land and may have corrupted the heap.
  9. Hey, I just noticed that there are some over-clock options. Here is the result when clocked at 960MHz. I could not get it to run at 1GHz. They did warn that cooling was required.
  10. More Data! I just got a Teensy 4 and it is pretty fast. Compiling it in "fastest" and 600Mhz provides the following results. Strangely compiling it in "faster" provides the slightly better results. (6ns) This is pretty fast but I was expecting a bit more performance since it is 6x faster than the Teensy 3.2 tested before. There is undoubtedly a good reason for this performance, and I expect pin toggling to be limited by wait states in writing to the GPIO peripherals. In any case this is still a fast result.
  11. When comparing CPU's and architectures it is also a good idea to compare the frameworks and learn how the framework will affect your system. In this article I will be comparing a number of popular Arduino compatible systems to see how different "flavors" of Arduino stack up in the pin toggling test. When I started this effort, I thought it would be a straight forward demonstration of CPU efficiency, clock speed and compiler performance on the one side against the Arduino framework implementation on the other. As is often the case, if you poke deeply into even the most trivial of systems you will always find something to learn. As I look around my board stash I see that there are the following Arduino compatible development kits: Arduino Nano Every (ATMega 4809 @ 20MHz AVR Mega) Mini Nano V3.0 (ATMega 328P @ 16MHz AVR) RobotDyn SAMD21 M0-Mini (ATSAMD21G18A @ 48MHz Cortex M0+) ESP-12E NodeMCU (ESP8266 @ 80MHz Tenselica) Teensy 3.2 (MK20DX256VLH7 @ 96MHz Cortex M4) ESP32-WROOM-32 (ESP32 @ 240MHz Tenselica) And each of these kits has an available Arduino framework. Say what you will about the Arduino framework, there are some serious advantages to using it and a few surprises. For the purpose of this testing I will be running one program on every board. I will use vanilla "Arduino" code and make zero changes for each CPU. The Arduino framework is very useful for normalizing the API to the hardware in a very consistent and portable manner. This is mostly true at the low levels like timers, PWM and digital I/O, but it is very true as you move to higher layers like the String library or WiFi. Strangely, there are no promises of performance. For instance, every Arduino program has a setup() function where you put your initialization and a loop() function that is called very often. With this in mind it is easy to imagine the following implementation: extern void setup(void); extern void loop(void); void main(void) { setup(); while(1) { loop(); } } And in fact when you dig into the AVR framework you find the following code in main.cpp int main(void) { init(); initVariant(); #if defined(USBCON) USBDevice.attach(); #endif setup(); for (;;) { loop(); if (serialEventRun) serialEventRun(); } return 0; } There are a few "surprises" that really should not be surprises. First, the Arduino environment needs to be initialized (init()), then the HW variant (initVariant()), then we might be using a usb device so get USB started (USBDevice.attach()) and finally, the user setup() function. Once we start our infinite loop. Between calls to the loop function the code maintains the serial connection which could be USB. I suppose that other frameworks could implement this environment a little bit differently and there could be significant consequences to these choices. The Test For this test I am simply going to initialize 1 pin and then set it high and low. Here is the code. void setup() { pinMode(2,OUTPUT); } void loop() { digitalWrite(2,HIGH); digitalWrite(2,LOW); } I am expecting this to make a short high pulse and a slightly longer low pulse. The longer low pulse is to account for the extra overhead of looping back. This is not likely to be as fast as the pin toggles Orunmila did in the previous article but I do expect it to be about half as fast. Here are the results. The 2 red lines at the bottom are the best case optimized raw speed from Orunmila's comparison. That is a pretty interesting chart and if we simply compare the data from the ATMEGA 4809 both with ASM and Arduino code, you see a 6x difference in performance. Let us look at the details and we will summarize at the end. Nano 328P So here is the first victim. The venerable AVR AT328P running 16MHz. The high pulse is 3.186uS while the low pulse is 3.544uS making a pulse frequency of 148.2kHz. Clearly the high and low pulses are nearly the same so the extra check to handle the serial ports is not very expensive but the digitalWrite abstraction is much more expensive that I was anticipating. Nano Every The Nano Every uses the much newer ATMega 4809 at 20Mhz. The 4809 is a different variant of the AVR CPU with some additional optimizations like set and clear registers for the ports. This should be much faster. The high pulse is 1.192uS and the low pulse is 1.504uS. Again the pulses are almost the same size so the additional overhead outside of the loop function must be fairly small. Perhaps it is the same serial port test. Interestingly, one of the limiting factors of popular Arduino 3d printer controller projects such as GRBL is the pin toggle rate for driving the stepper motor pulses. A 4809 based controller could be 2x faster for the same stepper code. Sam D21 Mini M0 Now we are stepping up to an ARM Cortex M0 at 48Mhz. I actually expect this to be nearly 2x performance as the 4809 simply because the instructions required to set pins high and low should be essentially the same. Wow! I was definitely NOT expecting the timing to get worse than the 4809. The high pulse width is 1.478uS and the low pulse width is 1.916uS making the frequency 294.6kHz. Obviously toggling pins is not a great measurement of CPU performance but if you need fast pin toggling in the Arduino world, perhaps the SAMD21 is not your best choice. Teensy 3.2 This is a NXP Cortex M4 CPU at 96 MHz. This CPU is double the clock speed as the D21 and it is a M4 CPU which has lots of great features, though those features may not help toggle pins quickly. Interesting. Clearly this device is very fast as shown by the short high period of only 0.352uS. But, this framework must be doing quite a lot of work behind the scenes to justify the 2.274uS of loop delay. Looking a little more closely I see a number of board options for this hardware. First, I see that I can disable the USB. Surely the USB is supported between calls to the loop function. I also see a number of compiler optimization options. If I turn off the USB and select the "fastest" optimizations, what is the result? Teensy 3.2, No USB and Fastest optimizations Making these two changes and re-running the same C code produces this result: That is much better. It is interesting to see the compiler change is about 3x faster for this test (measured on the high pulse) and the lack of USB saves about 1uS in the loop rate. This is not a definitive test of the optimizations and probably the code grew a bit, but it is a stark reminder that optimization choices can make a big difference. ESP8266 The ESP8266 is a 32-bit Tenselica CPU. This is still a load/store architecture so its performance will largely match ARM though undoubtedly there are cases where it will be a bit different. The 8266 runs at 80Mhz so I do expect the performance to be similar to the Teensy 3.2. The wildcard is the 8266 framework is intended to support WiFI so it is running FreeRTOS and the Arduino loop is just one thread in the system. I have no idea what that will do to our pin toggle so it is time to measure. Interesting. It is actually quite slow and clearly there is quite a bit of system house-keeping happening in the main loop. The high pulse is only 0.948uS so that is very similar to Nano Every at 1/4th the clock speed. The low pulse is simply slow. This does seem to be a good device for IoT but not for pin toggling. ESP32 The ESP32 is a dual core very fast machine, but it does run the code out of a cache. This is because the code is stored in a serial memory. Of course our test is quite short so perhaps we do not need to fear the cache miss. Like the ESP8266, the Arduino framework is built upon a FreeRTOS task. But this has a second CPU and lots more clock speed so lets look at the results: Interesting, the toggle rate is about 2x the Teensy while the clock speed is about 3x. I do like how the pulses are nearly symmetrical. A quick peek at the source code for the framework shows the Arduino running as a thread but the thread updates the watchdog timer and the serial drivers on each pass through the loop. Conclusions It is very educational to make measurements instead of assumptions when evaluating an MCU for your next project. A specific CPU may have fantastic specifications and even demonstrations but it is critical to include the complete development system and code framework in your evaluation. It is a big surprise to find the 16MHz AVR328P can actually toggle a pin faster than the ESP8266 when used in a basic Arduino project. The summary graph at the top of the article is duplicated here: In this graph, the Pin Toggling Speed is actually only 1/(the high period). This was done on purpose so only the pin toggle efficiency is being compared. In the test program, the low period is where the loop() function ends and other housekeeping work can take place. If we want to compare the CPU/CODE efficiency, we should really normalize the pin toggling frequency to a common clock speed. We can always compensate for inefficiency with more clock speed. This graph is produced by dividing the frequency by the clock speed and now we can compare the relative efficiencies. That Cortex M4 and its framework in the Teensy 3.2 is quite impressive now. Clearly the ESP-32 is pretty good but using its clock speed for the win. The Mega 4809 has a reasonable framework just not enough clock speed. All that aside, the ASM versions (or even a faster framework) could seriously improve all of these numbers. The poor ESP8266 is pretty dismal. So what is happening in the digitalWrite() function that is making this performance so slow? Put another way, what am I getting in return for the low performance? There are really 3 reasons for the performance. Portability. Each device has work to adapt to the pin interface so the price of portability is runtime efficiency Framework Support. There are many functions in the framework that could be affected by the writing to the pins so the digitalWrite function must modify other functions. Application Ignorance. The framework (and this function) cannot know how the system is constructed so they must plan for the worst. Let us look at the digitalWrite for the the AVR void digitalWrite(uint8_t pin, uint8_t val) { uint8_t timer = digitalPinToTimer(pin); uint8_t bit = digitalPinToBitMask(pin); uint8_t port = digitalPinToPort(pin); volatile uint8_t *out; if (port == NOT_A_PIN) return; // If the pin that support PWM output, we need to turn it off // before doing a digital write. if (timer != NOT_ON_TIMER) turnOffPWM(timer); out = portOutputRegister(port); uint8_t oldSREG = SREG; cli(); if (val == LOW) { *out &= ~bit; } else { *out |= bit; } SREG = oldSREG; } Note the first thing is a few lookup functions to determine the timer, port and bit described by the pin number. These lookups can be quite fast but they do cost a few cycles. Next we ensure we have a valid pin and turn off any PWM that may be active on that pin. This is just safe programming and framework support. Next we figure out the output register for the update, turn off the interrupts (saving the interrupt state) set or clear the pin and restore interrupts. If we knew we were not using PWM (like this application) we could omit the turnOffPWM function. If we knew all of our pins were valid we could remove the NOT_A_PIN test. Unfortunately all of these optimizations require knowledge of the application which the framework cannot know. Clearly we need new tools to describe embedded applications. This has been a fun bit of testing. I look forward to your comments and suggestions for future toe-to-toe challenges. Good Luck and go make some measurements. PS: I realize that this pin toggling example is simplistic at best. There are some fine Arduino libraries and peripherals that could easily toggle pins much faster than the results shown here. However, this is a simple Apples to Apples test of identical code in "identical" frameworks on different CPU's so the comparisons are valid and useful. That said, if you have any suggestions feel free to enlighten us in the comments.
  12. The two functions you see are both halves of a ring buffer driver. The first function unloads the UART receive buffer and puts the bytes into the array eusart2RXBuffer. This array is indexed by eusart2RXHead. The head is always incremented and it rolls over when it reaches the maximum value. This receiving function creates a basic ring buffer insert that sacrifices error handling for speed. There are four possible errors that can occur. UART framing error. If a bad UART signal arrives the UART will abort reception with a framing error. It can be important to know if framing errors have occurred, and it is critical that the framing error bit be cleared if it gets set. The UART receiver is overrun. This happens if a third byte begins before any bytes are removed from the UART. With an ISR unloading the receiver this is generally not a real threat but if the baudrate is very high, and/or interrupts are disabled for too long, it can be a problem. The ring buffer head overwrites the tail. The oldest bytes will be lost but worse, the tail is not "pushed" ahead so the next read will return the newest data and then the oldest data. That can be a strange bug to sort out. It is better to add a check for head == tail and then increment the tail in that instance. This error is perhaps an extension of #3. The eusart2RxCount variable keeps track of the bytes in the buffer. This makes the while loop at the beginning of the read function much more efficient (probably 2 instructions on a PIC16). However if there is a head-tail collision, the the count variable will be too high which will later cause a undetected underrun in the read function. The second function is to be called from your application to retrieve the data captured by the interrupt service routine. This function will block until data is available. If you do not want to block, there are other functions that indicate the number of bytes available. The read function does have a number of lines of code, but it is a very efficient ring buffer implementation which extends the UART buffer size and helps keep UART receive performance high. That said, not all UART applications require a ring buffer. If you turn off the UART interrupts, you should get simple polling code that blocks for a character but does not add any buffers. The application interface should be identical (read) there will simply be no interrupt or buffers supporting the read function.
  13. Excellent points and I think we are almost completely in agreement. I have only four complaints with these helper macros and two of them are relatively minor. Consider the following snapshot from ATMEL Studio in a SAMD21 project the <CTRL><SPACE> pattern works great when you know the keyword. If you simply start with SERCOM, you get a number of matches that are not UART related. ALL the choices will compile. See the BAUD value placed in the CTRLA register... Some of these choices are used like macros...And others are not (CMODE) Placing an invalid value in these macros is completely reasonable and will compile. MISRA 19.7 disallows function-like macros. (though this is not really an issue because it does not apply in this situation.) So, these constructs are handy because they help prevent a certain sort of bookkeeping error related to sorting out all of these offsets. But the constructs allow range errors and semantic errors. Since we are talking about the SERCOM and I am using the SAMD21 in my example. Here is what START produces. hri_sercomusart_write_CTRLA_reg( SERCOM0, 1 << SERCOM_USART_CTRLA_DORD_Pos /* Data Order: enabled */ | 0 << SERCOM_USART_CTRLA_CMODE_Pos /* Communication Mode: disabled */ | 0 << SERCOM_USART_CTRLA_FORM_Pos /* Frame Format: 0 */ | 0 << SERCOM_USART_CTRLA_SAMPA_Pos /* Sample Adjustment: 0 */ | 0 << SERCOM_USART_CTRLA_SAMPR_Pos /* Sample Rate: 0 */ | 0 << SERCOM_USART_CTRLA_IBON_Pos /* Immediate Buffer Overflow Notification: disabled */ | 0 << SERCOM_USART_CTRLA_RUNSTDBY_Pos /* Run In Standby: disabled */ | 1 << SERCOM_USART_CTRLA_MODE_Pos); /* Operating Mode: enabled */ This is completely different than the other examples provided. This style is defined in hri_sercom_d21.h and a word to the wise, DO NOT BROWSE THIS INSIDE OF ATMEL START, This file must be huge as the page is completely frozen as it populates the source viewer. So I do like the construct, but often it does not help me very much because I must still go through the entire register and decide my values. When I string these values together any mistakes made will be sorted at once. All that aside, I don't really care much about the HW initialization. I want it to be short, to the point, and perfectly clear about what is placed in each register. In a typical project, only 1% of the code is some sort of HW work as you develop the application specific HW interfaces and bring up your board. Once these are tested, you are not likely to spend any more time on them. In a 9 month project, I expect to spend 2 weeks in the HW functions so if a magic value is clear and to the point, use it. If you want to construct your values with logical operations. Go for it.
  14. But in all of your examples you are not telling me why you are doing that bit of work. I cannot possibly determine if there is a bug if I don't know why you are configuring the SERCOM with that particular value. How about simply saying: void configureSerialForMyDataLink(void) { // datalink specifications found in specification 4.3.2 // using SERCOM0 as follows: // - Alternate Pin x,y,z // - 9600 baud // - half duplex // - SamD21 datasheet page 26 for specifics SERCOM0 = <blah blah blah>; } Now you know why. You have a function that has a clear purpose. And if the link is invalid, you can see the intent. The specifics of the bits are in the datasheet and clearly referenced. No magic here. As for the special access mode for performance... inline void SERCOM0_WRITE(uint32_t ControllOffset, uint32_t Value) { // Accessing the SERCOM via DFP offsets for high performance (* (uint32_t*) (0x42000400 + ControlOfset)) = Value; } Now a future engineer has a handy helper and the details are nicely removed. And an interested engineer can debug it because the intent is clear. Obviously you need to be a DFP expert (or have the datasheet) to understand/edit it. But no magic. But the application should NEVER use this helper. It should be buried in the HAL. The first function is much more clear for the HAL because it conveys application level intent. i.e. the Application will rarely care about the SERCOM and will always care about its DataLink. If I port the code to something without a SERCOM, the application will still need a DataLink so this function will simply be refilled with something suitable for the other CPU. The application remains unchanged.
  15. As tempting as it is to duplicate the datasheet in your code, there is much to dislike about this strategy. Single Source of Truth should be the manufacturers datasheet... Not what you copied into the code. The signal to noise ratio of the code will suck, making debugging more challenging. The register description is not always enough so this slippery slope will have you copying the entire peripheral chapter. Your future maintainer will not have your knowledge of the device so they will need the datasheet anyway.
×
×
  • Create New...