Jump to content
 

Orunmila

Member
  • Content Count

    177
  • Joined

  • Last visited

  • Days Won

    19

Orunmila last won the day on September 8

Orunmila had the most liked content!

Community Reputation

40 Excellent

About Orunmila

  • Rank
    Teacher

Recent Profile Visitors

415 profile views
  1. Some advice for Microchip: If this was my product I would stop selling development kits with A1 or A3 silicon to customers. I2C is widely used and it will create a really bad impression of the product's reliability if customers were to evaluate it with defective silicon. And please fix the Errata, your workaround for I2C Issue 1 does not work as advertized !
  2. Ok some feedback on this one. The workaround in the Errata turns out does not work. The Errata claims you can But this does not work at all. We clear BCL and wait for both S and P bits to be 0 but this never happens and we end up waiting forever. As an attempt to work around this we decided to try to reset the entire module, this means that we set the ON bit in I2CxCON to 0 to disable the module, this resets all the status bits and resets the I2C state machine, once this is done we wait 4 clock cycles (since the second workaround in the Errata suggests we should wait for 4 clock cycles) and then we set the ON bit back to a 1. This clears the BCL error condition correctly and allows us to continue using the peripheral. We have not yet tried to implement the workaround with the timeout that resets the I2C peripheral if it becomes unresponsive without warning, that will be coming up next, but it does seem like that will work fine as it will also disable the entire module when the condition happens which seems to clean out the HW state machine which it looks like is the culprit here. The I2C peripheral section 24 of the family datasheet can be found here http://ww1.microchip.com/downloads/en/devicedoc/61116f.pdf
  3. I am struggling to figure out how to work around what seems to be a silicon bug in the PIC32MZ2048EFM on A1 silicon. I am using the development kit DM320104. From MPLABX I can see that the board I have is running A1 revision silicon. Looking at the Errata for the device I found that there is a silicon Errata on the I2C peripheral and I am hitting at least 2 of the described problems. • False Error Condition 1: False Master Bus Collision Detect (Master-mode only) – The error is indicated through the BCL bit (I2CxSTAT). • False Error Condition 3: Suspended I2C Module Operations (Master or Slave modes) – I2C transactions in progress are inadvertently suspended without error indications. In both cases the Harmony I2C driver ends up in a loop never returning again. For condition 1 the ISR keeps triggering and I2C stops working and for condition 3 the driver just gets stuck. I have tried to implement the workarounds listed in that Errata but I seem to have no luck. The Errata does not have an example, only a text description so I was hoping someone on here has tried this and can help me figure out what I am doing wrong. Currently for condition 1 from the bus collision ISR we are clearing the ISR flag and the BCL bit and then setting the start bit in the I2C1STAT register, but the interrupt keeps on firing away and no start condition is happening. Any idea what we are doing wrong?
  4. Absolutley, and nice examples! Hungarian notation breaks the abstraction of having a variable name with unspecified underlying storage, so I think it is the worst way to leak implementation details!
  5. I think specifically we need to know what processor you are trying to use as this differs from device to device. The simplest and most generic answer would be to add the UART to your project and click on the checkbox to enable interrupts for the driver. After generating code you will have to set the callback which you want called when the interrupt occurs. After this you need to make sure you are enabling interrupts in your main code and it should work. If you supply us with the details above I will post some screenshots for you on how to do this. Just to show you the idea I picked the 16F18875 and added the EUSART as follows: You can see I clicked next to "Enable EUSART Interrupts" Then in my main I ensured the interrupts are enabled. When I now run the code the ISR created by MCC is executed every time a byte is received. The ISR function is called EUSART_Receive_ISR and it is located in the eusart.c file. You can edit this function or replace it by setting a different function as ISR by calling EUSART_SetRxInterruptHandler if you want to change the behavior.
  6. Yes Doxygen is great for that, it also allows you to click on the boxes and drill down into the details. I use it for this all of the time!
  7. With C it can be very tricky. The linker will resolve the symbols at link time and up until then you cannot trace the dependencies in any easy way. You can try with something that does static code analysis, but if you are using #defines it can be unreliable if you do not get all of the settings correct, especially if some of your #ifdefs depend on things the compiler defines for you. The best thing you can do is fully explore which files are being used, even if that means removing them one at a time and testing it all out. Always include only the files that you are reallly using, if you have dead code in your project it just makes it harder to understand and that kind of rot just accumulates over time.
  8. Comments I was musing over a piece of code this week trying to figure out why it was doing something that seemed to not make any sense at first glance. The comments in this part of the code were of absolutely no help, they were simply describing what the code was doing. Something like this: // Add 5 to i i += 5; // Send the packet sendPacket(&packet); // Wait on the semaphore sem_wait(&sem); // Increment thread count Threadcount++; These comments just added to the noise in the file, made the code not fit on one page, harder to read and did not tell me anything that the code was not already telling me. What was missing was what I was grappling with. Why was it done this way, why not any other way? I asked a colleague and to my frustration his answer was that he remembered that there was some discussion about this part of the code and that it was done this way for a very good reason! My first response was of course "well why is that not in the comments!?" I remember having conversations about comments being a code smell many times in the past. There is an excellent talk by Kevlin Henney about this on youtube. Just like all other code smells, comments are not universally bad, but whenever I see a comment in a piece of code my spider sense starts tingling and I immediate look a bit deeper to try and understand why comments were actually needed here. Is there not a more elegant way to do this which would not require comments to explain, where reading the code would make what it is doing obvious? WHAT vs. WHY Comments We all agree that good code is code which is properly documented, referring to the right amount of comments, but there is a terrible trap here that programmers seem to fall in all of the time. Instead of documenting WHY they are doing things a particular way, they instead put in the documentation WHAT the code is doing. As Henney explains English, or whatever written language for that matter, is not nearly as precise a language as the programming language used itself. The code is the best way to describe what the code is doing and we hope that someone trying to maintain the code is proficient in the language it is written in, so why all of the WHAT comments? I quite like this Codemanship video, which shows how comments can be a code smell, and how we can use the comments to refactor our code to be more self-explanatory. The key insight here is that if you have to add a comment to a line or a couple of lines of code you can probably refactor the code into a function which has the comment as the name. If you have a line which only calls a function that means that the function is probably not named well enough to be obvious. Consider taking the comment and using it as the name of the function instead. This blog has a number of great examples of how NOT to comment your code, and comical as the examples are the scary part is how often I actually see these kinds of comments in production code! It has a good example of a "WHY" comment as follows. /* don't use the global isFinite() because it returns true for null values */ Number.isFinite(value) So what are we to do, how do we know if comments are good or bad? I would suggest the golden rule must be to test your comment by asking whether is it explaining WHY the code is done this way or if it is stating WHAT the code is doing. If you are stating WHAT the code is doing then consider why you think the comment is necessary in the first place. First, consider deleting the comment altogether, the code is already explaining what is being done after all. Next try to rename things or refactor it into a well-named method or fix the problem in some other way. If the comment is adding context, explaining WHY it was done this way, what else was considered and what the trade-offs were that led to it being done this way, then it is probably a good comment. Quite often we try more than one approach when designing and implementing a piece of code, weighing various metrics/properties of the code to settle finally on the preferred solution. The biggest mistake we make is not to capture any of this in the documentation of the code. This leads to newcomers re-doing all your analysis work, often re-writing the code before realizing something you learned when you wrote it the first time. When you comment your code you should be capturing that kind of context. You should be documenting what was going on in your head when you were writing the code. Nobody should ever read a piece of your code and ask out loud "what were they thinking when they did this?". What you were thinking should be there in plain sight, documented in the comments. Conclusion If you find that you need to find the right person to maintain any piece of code in your system because "he knows what is going on in that code" or even worse "he is the only one that knows" this should be an indication that the documentation is incomplete and more often than not you will find that the comments in this code are explaining WHAT it is doing instead of the WHY's. When you comment your code avoid at all costs explaining WHAT the code is doing. Always test your comments against the golden rule of comments, and if it is explaining what is happening then delete that comment! Only keep the WHY comments and make sure they are complete. And make especially sure that you document the things you considered and concluded would be the wrong thing to do in this piece of code and WHY that is the case.
  9. Because in C89 this would be a syntax error. The syntax did not exist until it was introduced in C99 together with named initializers. In C89 it was not possible to initialize a union by it's second member because it was not possible to name the target member. This is important because many compilers today are still not fully C99 compliant and support only some of it's constructs, which means that if you use named initializers your code may be less portable because some compilers may still choke on that syntax. This example is verbatim from the C99 standard section 6.7.7 paragraph 6. The answer to your question is right there in the last sentence "The first two bit-field declarations differ in that unsigned is a type specifier (which forces t to be the name of a structure member), while const is a type qualifier (which modifies t which is still visible as a typedef name). " So in other words because of the "unsigned" the t is forced to be the name of the member and it is NOT the type of the member as you may expect. This means that when used like that the member is indeed not unnamed, it is named as t of type unsigned and the typedef from above is not applicable at all. I know, that is why even in the standard they refer to this as "obscure"! I have no idea, navigation keys and Enter work just fine for me. I am using Google Chrome, perhaps it is the browser or a setting. Which browser are you using?
  10. This happens from time to time, I have also had periods where nobody could post anything. If that happens please do report it here, it may just help some poor soul who is desperately looking for help 🙂
  11. Something that comes up all the time, PWM resolution. Engineers are often disappointed when they find out that the achievable resolution of the PWM is not nearly what they expected from the headline claims made in the microcontroller datasheet. Here is just one example of a typically perplexed customer. Most of the time this is not due to dishonest advertising but rather a easily overlooked property of how PWM's work, so lets clear that up so that we can avoid the disappointment. A conventional PWM will let you set the period and the duty cycle something like the image to the right shows. In the picture Tsys represents the clock of the PWM, and Tpwm shows the period of the PWM. In hte example the perod (Tpwm) of the PWM is 4x the system clock. Additionally you can set the duty cycle register, which will let you choose for how many of the Tsys clocks that fit into one Tpwm the output should remain high. In the example the duty cycle is set to 3. It is typical for a microcontroller datasheet to advertise that it can accomodate a duty cycle of 12-bits or 16-bits or something in that order. Of course this is an achievable number of clocks for the PWM to remain high, but the range of the number is always going to be limited by the period of the PWM. That means that if we select the period to be 2^16 = 65536 clocks, then we will also be able to control the duty cycle up to 16 bits. It is easy to make the mistake of believing that you will get 16 bits of resolution over the achievable range, but this is very rarely the case. Let's look at some real numbers as an example using the PIC16F1778. The first page from the datasheet can be seen to the right. It advertises here that the PWM on this device is 16-bit. Importantly it also shows that the timer capability is limited to 16-bit. Looking at the PWM's on this device we will try to see what is the highest frequency (lowest period) at which we can get 16-bits of PWM resolution. The fastest clock this PWM can use as it's time base is the system clock, which is limited to 32MHz on this device. That means in terms of Figure 1 above that Tsys would be the period of one clock at 32MHz = 31.25ns. If we want to achieve the full resolution of the PWM we have to run the timer at it's 16-bit limit, which means that the PWM frequency will be 32MHz / 2^16 = 488Hz ! So if you need the PWM frequency to be anything more than that you will have to compromise on the resolution in order to achieve a faster switching frequency. Typically engineers will try to run at switching frequencies above 20kHz because this is roughly the audible range of the human ear. If you switch at a lower frequency people will hear a hum or a high pitched tone which can be very irritating. So let's say we compromise to the lowest limit here and try to run the PWM at 20kHz, how much of the PWM resolution will we be giving up by using a higher frequency? The easiest way to calculate this is to simply realize that one clock is 31.25ns and the resolution of the PWM will be limited to how many times 31.25ns fits into the period of the PWM. At 488Hz the period is 1/488Hz = 2ms and we can calculate that 2ms/31.25ns = 65536. We can determine how many bits are required to represent that by taking log(65536)/log(2) = 16-bits of resolution. This would mean that to get the number of usable options for duty cycle at 20kHz we need to calculate (1/20kHz)/31.25ns = 1600. So with 1600 divisions the resolution of the PWM is reduced to log(1600)/log(2) = 10.64bits, which means that we achieve slightly better than 10-bits of resolution. This is the point where people are usually unhappy that the advertised 16-bits of resolution has somehow evaporated and turned into only 10 bits! So the advice I have for you is this. When selecting a device where you have PWM resolution requirements you better make sure you do all the math to make sure that you can run at the resolution you need with the clocks you have available. And remember that when it comes to PWM resolution the PWM clock speed is always going to be king, so it is typically better to select the device with the higher clock speed instead of the one that claims the highest PWM resolution (at a snail's pace...) And you you feel adventurous you can always try something more exotic like using the NCO to generate a high resolution PWM at high frequencies as described in this application note
  12. I2C is such a widely used standard, yet it has caused me endless pain and suffering. In principle I2C is supposed to be simple and robust, a mechanism for "Inter-Integrated Circuit" communication. I am hoping that this summary of the battle scars I have picked up from using I2C might just save you some time and suffering. I have found that despite a couple of typical initial hick-ups it is generally not that hard to get I2C communication going, but making it robust and reliable can prove to be a quite a challenge. Problem #1 - Address Specification I2C data is not represented as a bit-stream, but rather a specific packet format with framing (start and stop conditions) preceded by an address, which encapsulates a sequence of 8-bit bytes, each followed by an ACK or NAK bit. The first byte is supposed to be the address, but right from the bat, you have to deal with the first special case. How to combine this 7-bit address with the R/W bit always causes confusion. There is no consistency in datasheets of I2C slave devices for specifying the device address, and even worse most vendors fail to specify which approach they use, leaving users to figure it out through trial and error. This has become bad enough that I would not recommend trying to implement I2C without an oscilloscope in hand to resolve these kinds of guessing games. Let's say the 7-bit device address was 0x76 (like the ever-popular Bosh Sensortech BME280). Sometimes this will be specified simply as 0x76, but the API in the software library, in order to save the work of shifting this value by 1 and masking in the R/W bit will often require you to pass in 0xEC as the address (0x76 left-shifted by one). Sometimes the vendor will specify 0xEC as the "write" address and 0xED as the "read" address. To add insult to injury your bus analyzer or Saleae will typically show the first 8-bits as a hex value so you will never see the actual 7-bit address as a hex number on the screen, leaving you to be bit twiddling in your head on a constant basis while trying to make sense of the traces. Problem #2 - Multiple Addresses To add to the confusion from above many devices (like the BME280) has the ability to present on more than one address, so the datasheet will specify that (in the case of the BME280) if you pull down the unused SDO pin on the device it's address will be 0x76, but if you pull the pin up it will be 0x77. I have seen many users leave this "unused" pin floating in their layouts, causing the device to schizophrenically switch between the 2 addresses at runtime and behavior to look erratic. This also, of course, doubles the number of possible addresses the device may end up responding to, and the specification of exactly 2 addresses fools a lot of people into thinking that the vendor is actually specifying a read and write address as described above. This all adds to the guessing game of what the actual device address may be. To add to the confusion most devices have internal registers and these also have their own addresses, so it is very easy to get confused about what should go in the address byte. It is not the register address, it is the slave address, the register address goes in the data byte of the "write" you need to use if you want to do a "read", in order to read a register from a specific address on the slave. Ok, if that is not confusing to you I salute you sir! Problem #3 - 10-bit address mode As if there was not enough address confusion already the limitation of only 127 possible device addresses lead to the inclusion of an extension called 10-bit addressing. A 10-bit address is actually a pre-defined 5-bits, followed by the 2 most significant bits of the 10-bit address, then the R/W bit, after this an Ack from all the devices on the bus using 10-bit addressing with the same 2 MSB addresses, and after this the remaining 8 bits of the address followed by the real/full address ack. So once again there is no standard way to represent the 10-bit address. Let's say the device has 10-bit address 0x123, how would this be specified now? The vendor could say 0x123 (and only 10 of the 12 bits implied are the 10-bit address), or they could include the prefix and specify it as 0xF223. Of course that number contains the R/W bit in the middle somewhere, so they may specify a "read" and a "write" address as 0xF223 and 0xF323, or they could right-shift the high-byte to show it as a normal 7-bit address, removing the R/W bit, and say it is 0x7123. I think you get the picture here, lots of room for confusion and we have not even received our first ACK yet! Problem #4 - Resetting during Debugging Since I2C is essentially transaction/packet based and it does not include timeouts in the specification (SMBUS does of course, but most slave sensors conform to I2C only) there is a real chance that you are going to reset your host processor (or bus master) in the middle of such a transaction. This happens as easily as re-programming the processor during development (which you will likely be doing a lot). The problem that tends to catch everybody at some point is that a hardware reset of your host processor is entirely invisible to the slave device which does not lose power when you toggle the master device's reset pin! The result is that the slave thinks that it is in the middle of an I2C transaction and awaits the expected number of master clock pulses to complete the current transaction, but the master thinks that it should be creating a start condition on the bus. This often leads to the slave holding the data line low and the master unable to generate a start condition on the bus. When this happens you will lose the ability to communicate with the I2C sensor/slave and start debugging your code to find out what has broken. In reality, there is nothing wrong with your code and simply removing and re-applying the power to the entire board will cause both the master and slave to be reset, leaving you able to communicate again. Of course, re-applying the power typically causes the device to start running, and if you want to debug you will have to attach the debugger which may very well leave you in a locked-up state once again. The only way around this is to use your oscilloscope or Saleae all of the time and whenever the behavior seems strange stare very carefully at what is happening with the data line, is the address going out, is the start condition recognized and is the slave responding as it should, if not you are stuck and need to reset the slave device somehow. Problem #5 - Stuck I2C bus The situation described in #4 above is often referred to as a "stuck bus" condition. I have tried various strategies in the past to robustly recover from such a stuck bus condition programmatically, but they all come with a number of compromises. Firstly slave devices are essentially allowed to clock-stretch indefinitely, and if a slave device state machine goes bonkers it is possible that a single slave device can hold the entire bus hostage indefinitely and the only thing you can possibly do is remove the power from all slave devices. This is not a very common failure mode but it is definitely possible and needs addressing for robust or critical systems. Often getting the bus "unstuck" is as simple as providing the slave device enough clocks to convince it that the last transaction is complete. Some slaves behave well and after clocking them 8 times and providing a NAK they will abort their current transaction. I have seen slaves, especially I2C memories, where you have to supply more than 8 clocks to be certain that the transaction terminates, e.g. 32 clocks. I have also seen specialized slave devices that will ignore your NAK's and insist on sending even more data e.g. 128 or more bits before giving up on an interrupted transaction. The nasty part about getting an I2C bus "unstuck" is that you usually not use the I2C peripheral itself to do this service. This means that typically you will need to disable the peripheral, change the pins to GPIO mode and bit-bang the clocks you need out of the port, after which you need to re-initialize the I2C peripheral and try the next transaction, and if this fails then rinse and repeat the process until you succeed. This, of course, is expensive in terms of code space, especially on small 8-bit implementations. Problem #6 - Required Repeated Start conditions The R/W bit comes back to haunt us for this one. The presence of this bit implies that all transactions on I2C should be uni-directional, that is they must either read or write, but in practice, things are not that simple. Typically a sensor or memory will have a number of register locations inside of the device and you will have to "write" to the device to specify which location you wish to address, followed by "reading" from the device to get the data. The problem with a bus is that something may interrupt you between these two operations that form one larger transaction. In order to overcome this limitation, I2C allows you to concatenate 2 I2C operations into a single transaction by omitting the stop condition between them. So you can do the write operation and instead of completing it with a stop condition on the bus you can follow with a second start condition and the latter half of the operation, terminating the whole thing with a stop condition only when you are done. This is called a "repeated start" condition and looks as follows (from the BME280 datasheet). It can often be quite a challenge to generate such a repeated start condition as many I2C drivers will require you to specify read/write and a pointer and number of bytes and not give you the option to omit the stop condition, and many slave devices will reset their state machines at a stop condition so without a repeated start it is not possible to communicate with these devices. Of course, I should also mention that the requirement to send the slave address twice for these transactions significantly reduces the throughput you can get through the bus. Problem #7 - What is Ack and Nak supposed to be? This brings us to the next problem. It is quite clear that the Address is ack-ed by the slave, but when you are reading data what is the exact semantics of the ack/nak? The BME280 datasheet is a bit unique in that it clearly distinguishes in that figure in #6 above whether the ack should be generated by the master or the slave (ACKS vs ACKM), but from the specification, it is not immediately clear. If I read data from a slave, who is supposed to provide the ack at the end of the data? Is this the master or the slave? What would be the purpose of the master providing an ack to the slave to data? Clearly, the master is alive as it is generating clocks, and the slave may be sending all 1's which means it does not touch the bus at all. So what is the slave supposed to do if the master makes a Nak in response to a byte? And how would a slave determine if it should Nak your data since there is no checksum or CRC on it there is no way to determine if it is correct? None of this is clearly specified anywhere. To add confusion I have seen people spend countless hours looking for the bug in their BME280 code which causes the last data byte to get a NAK! When you look on the bus analyzer or Oscilloscope you will be told that every byte was followed by an ACK except for the last one where you will see a NAK. Most people interpret this NAK to be an indication that something is wrong, but no, look carefully at the image from the datasheet in section #6 above! Each byte received by the master is followed by an ACKM (ack-ed by the master) EXCEPT for the last byte, in which case the master will not ACK it, causing a NAK to proceed the stop condition! To make this even harder, most I2C hardware peripherals will not allow you fine-grained control of whether the master will ACK or NAK. Very often the peripheral will just blithely ack every byte that it reads from the slave regardless. Problem #8 - Pull-up resistors and bus capacitance The I2C bus is designed to be driven only through open-drain connections pulling the bus down, it is pulled up by a pair of pull-up resistors (one on the clock line and one on the data line). I have seen many a young engineer struggle with unreliable I2C communication due to either the entire lack of or incorrect pull-up resistors. Yes it is possible to actually communicate even without the resistors due to parasitic pull-ups which will be much larger than required, meaning that it will pull weakly enough to get the bus high-ish and under some conditions can provoke an ack from a slave device. There is no clear specification of what the size of these pull-up resistors should be, and for good reason, but this causes a lot of uncertainty. The I2C specification does specify that the maximum bus capacitance should be 400pF. This is a pretty tricky requirement to meet if you have a large PCB with a number of devices on the bus and it is often overlooked, so it is typical to encounter boards where the capacitance is exceeding the official specification. In the end, the pull-up needs to be strong enough (that is small enough) to pull the bus to Vdd fast enough to communicate at the required bus speed (typically 100kHz or 400kHz). The higher the bus capacitance is the stronger you will have to pull up in order to bring the bus to Vdd in time. If you look at the Oscilloscope you will see that the bus goes low fairly quickly (pulled down strongly to ground) but goes up fairly slowly, something like this: As you can see in the trace to the right there are a number of things to consider. If your pull-ups are too large you will get a lot of interference as indicated by the red arrows in the trace where the clock line is coupling through onto the data line which has too high an impedance. This can often be alleviated with good layout techniques, but if you see this on your scope consider lowering the pull-up value to hold the bus more steady. If the rise times through the pull-up are too slow for the bus speed you are using you will have to either work on reducing the capacitance on the bus or pulling up harder through a smaller resistor. Of course, you cannot just tie the bus to Vdd in the extreme as it still needs to be pulled to 0 by the master and slaves. As a last consideration, the smaller the resistor is you use the more power you will consume while driving the bus. Problem #9 - Multi-master I have been asked many times to implement multi-master I2C. There are a large number of complications when you need multiple masters on an I2C bus and this should only be attempted by true experts. Arbitrating the bus when multiple masters are pulling it down simultaneously presents a number of race conditions that are extremely hard to robustly deal with. I would like to just point out one case here as illustration and leave it at that. Typical schemes for multi-master will require the master to monitor the bus while it is emitting the address byte. When the master is not pulling the bus low, but it reads low this is an indication that another master is trying to emit an address at the same time and you as a master should then abort your transaction immediately, yielding the bus to the other master. A problem arises though when both masters are trying to read from the same slave device. When this happens it is possible that both addresses match exactly and that the 2 masters start their transactions in close proximity. Due to the clock skew between the masters, it is possible that they are trying to read from different control registers on the slave, that the slave will match only one of the 2 masters, but both masters will think that they have the bus and the slave is responding to their request. When this happens the one master will end up receiving incorrect data from the wrong address. Consider e.g. a BME280 where you may get the pressure reading instead of humidity, causing you to react incorrectly. Like I said there are many obscure ways multi-master can fail you, so beware when you go there. Problem #10 - Clock Stretching In the standard slaves are allowed to stretch the clock by driving the clock line low after the master releases it. Clock stretching slaves are a common cause of I2C busses becoming stuck as the standard does not provide for timeouts. This is something where SMBUS has provided a large improvement over the basic I2C standard, although there can still be ambiguity around how long you really have to wait to ensure that all slaves have timed out, and the idea with SMBUS is that you can safely mix with non-SMBUS slaves, but this one aspect makes it unreliable to do so. In critical systems, you will as a result very often see 2 I2C slave devices connected via different sets of pins, using I2C as a point to point communications channel instead of a bus in order to isolate failure conditions to a single sensor. Problem #11 - SMBUS voltage levels In I2C logical 1 voltage levels depends on the bus voltage and are above 70% of bus voltage for a 1 and below 30% for a 0. The problems here are numerous, resulting in different devices seeing a 0 or 1 at different levels. SMBUS devices do not use this mechanism but instead specify thresholds at 0.8v and 2.1v. These levels are often not supported by the microcontroller you are using leaving some room for misinterpretation, especially if you add the effects of bus capacitance and the pull-up resistors to the signal integrity. For more information about SMBUS and where it differs from the standard I2C specification take a look at this WikiPedia page. Problem #12 - NAK Polling NAK polling often comes into play when you are trying to read from or write to an I2C memory and the device is busy. These memory devices will use the NAK mechanism to signal the master that he has to wait and retry the operation in a short while. The problem here is that many hardware I2C peripherals simply ignore acks and nak's altogether or does not give you the required hooks to respond to these. Many vendors try to accelerate I2C operations by letting you pre-load a transaction for sending to the slave and doing all of the transmission in hardware using a state machine, but these implementations rarely have accommodations for retrying the byte if the slave was to NAK it. NAK-polling also makes it very hard to use DMA for speeding up I2C transmissions as once again you need to make a decision based on the Ack/Nak after every byte, and the hooks to make these decisions typically require an interrupt or callback at the end of every byte which causes huge overhead. Problem #13 - Bus Speeds When starting to bring up an I2C bus I often see engineers starting with one sensor and working their way through them one by one. This can lead to a common problem where the first sensor is capable of high-speed transmission e.g. 1MHz, but you only need 1 sensor on the bus that is limited to 100KHz and this can cause all kinds of intermittent failures. When you have more than 1 slave on the same bus make sure that the bus is running at a speed that all the slaves can handle, this means that when you bring up the bus it is always a good idea to start things out at 100kHz and only increase the speed once you have communication established with all the slaves on the bus. The more slaves you have on the bus the more likely you will be to have an increased bus capacitance and signal integrity problems. In conclusion I2C is quite a widely used standard. When I am given the choice between using I2C or something like SPI for communication with sensor devices I tend to prefer SPI for a number of reasons. It is possible to go much faster using SPI as the bus is driven hard to both 0 and 1, the complexities of I2C and the problems outlined above inevitably raises the complexity significantly and presents a number of challenges to achieving a robust system. To be clear I am not saying I2C is not a robust protocol, just that it takes some real skill to use it in a truly robust way, and other protocols like SPI do not require the same effort to achieve a robust solution. So like the Plain White T's say, hate is a strong word, but I really really don't like I2C ... I am sure there are more ways I2C has bitten people, please share your additional problems in the comments below, or if you are struggling with I2C right now, feel free to post your problem and we will try to help you debug it!
  13. I decided to write this up as bootloaders have pretty much become ubiquitous for 32-bit projects, yet I was unable to find any good information on the web about how to use linker scripts with XC32 and MPLAB-X. When you need to control where the linker will place what part of your code, you need to create a linker script which will instruct the linker where to place each section of the program. Before we get started you should download the MPLAB XC32 C/C++ Linker and Utilities Users Guide. There is also some useful information in the MPLAB XC32 C/C++ Compiler User’s Guide for PIC32M MCUs, the appropriate version for your compiler should be in the XC32 installation folder under "docs". This business of linker scripts is quite different from processor to processor. I have recently been working quite a bit with the PIC32MZ2048EFM100, so I will target this to this device using the latest XC32 V2.15. This post will focus on what you need to to do to get the tools to use your linker script. Since XC32 is basically a variant of the GNU C compiler you can find a lot of information on the web about how to write linker scripts, here is a couple. http://www.scoberlin.de/content/media/http/informatik/gcc_docs/ld_3.html https://sourceware.org/binutils/docs-2.17/ld/Scripts.html#Scripts Adding a linker script The default linker script for the PIC32MZ2048EFM100 can be found in the compiler folder at /xc32/v2.15/pic32mx/lib/proc/32MZ2048EFM100/p32MZ2048EFM100.ld. If you need a starting point that would be a good place. For MPLAB-X and XC32 the extention of the linker script does not have any meaning. The linker script itself is not compiled, it is passed into the linker at the final step of building your program. The command line should look something like this for a simple program: "/Applications/microchip/xc32/v2.15/bin/xc32-gcc" -mprocessor=32MZ2048EFM100 -o dist/default/production/mine.X.production.elf build/default/production/main.o -DXPRJ_default=default -legacy-libc -Wl,--defsym=__MPLAB_BUILD=1,--script="myscript.ld",--no-code-in-dinit,--no-dinit-in-serial-mem,-Map="dist/default/production/mine.X.production.map",--memorysummary,dist/default/production/memoryfile.xml" The linker script should be listed on the command line as "--script="name" When you create a new project MPLAB will create a couple of "Logical Fodlers" for you. These folders are not actual folders on your file system, but files in these are sometimes treated differently, and Linker Files is a particular case of this. My best advice is not to ever rename of in any other way mess with these folders created for you by MPLAB. If you did edit configurations.xml or renamed any of these I suggest you just create new project file as there are so many ways this could go wrong fixing it will probably take you longer than just re-creating it. I have seen cases where it all looks 100% but the IDE simply does not use the linker script, just ignoring it. The normal way to add files to a MPLAB-X project is to right-click on the Logical folder you wanted the file to appear in and select which kind of file under the "New" menu. In this menu files that you use often are shown as a shortcut, to see the entire list of possible files you need to select "Other..." at the bottom of the list. Unfortunatley Microchip has not placed "Linker Script" in this list, so there is no way to discover using the IDE how to add a linker script. When it all goes according to plan (the happy path) you can simply right-click on "Linker Files" and add your script. This is also what the manual says to do of course. When you have added the file it should look like this (pay careful attention to the icon of the linker script file, it should NOT have a source code icon. It should just be a white block like this, and if this is the case the program should compile just fine using the linker script, you can confirm that the script is being passed in by inspecting the linker command line. Adding a linker script - Problems - when it all goes wrong! I noticed in the IDE that the icon for the script was actually that of a .C source file. When this happens something has gone very wrong, and the compiler will attempt to compiler your linker script as a C source file. You will end up getting an error similar to this, stating that there is "No rule to make target": CLEAN SUCCESSFUL (total time: 51ms) make -f nbproject/Makefile-default.mk SUBPROJECTS= .build-conf make[2]: *** No rule to make target 'build/default/production/newfile.o', needed by 'dist/default/production/aaa.X.production.hex'. Stop. make[1]: Entering directory '/Users/cobusve/MPLABXProjects/aaa.X' make[2]: *** Waiting for unfinished jobs.... make -f nbproject/Makefile-default.mk dist/default/production/aaa.X.production.hex make[2]: Entering directory '/Users/cobusve/MPLABXProjects/aaa.X' make[1]: *** [.build-conf] Error 2 "/Applications/microchip/xc32/v2.15/bin/xc32-gcc" -g -x c -c -mprocessor=32MZ2048EFM100 -MMD -MF build/default/production/main.o.d -o build/default/production/main.o main.c -DXPRJ_default=default -legacy-libc make: *** [.build-impl] Error 2 make[2]: Leaving directory '/Users/cobusve/MPLABXProjects/aaa.X' nbproject/Makefile-default.mk:90: recipe for target '.build-conf' failed make[1]: Leaving directory '/Users/cobusve/MPLABXProjects/aaa.X' nbproject/Makefile-impl.mk:39: recipe for target '.build-impl' failed BUILD FAILED (exit value 2, total time: 314ms) I tried jumping through every hoop here, even did the hokey pokey but nothing would work to get the IDE to accept my linker script! I even posted a question on the forum here and got no help. At first I thought I would be clever and remove the script I just added, and just re-add it to the project, but no luck there. So now I was following the instructions exactly, my project was building without the script, I right-clicked on "Linker Files" selected "Add Existing Item" and then selected my script and once again it showed up as a source file and caused the project build to fail by trying to compile this as C code 😞 Next attempt was to remove the file, then close the IDE. Open the IDE, build the project and then after this add the existing file. Nope, still does not work 😞 I know MPLAB-X from time to time will cache information and you can get rid of this by deleting everything from the project except for your source files, Makefile and configurations.xml and project.xml. I went ahead and deleted all these files, restarted the IDE, added the file again - nope - still does not work. So much for RTFM! Eventually our of desperation I tried to rename the file before adding it back in. Even this did not work until I got lucky - I changed the extention of the file to gld (a commonly used extension for gnu linker files), and tried to re-add the file, and this eventually worked ! If you are having a hard time getting MPLAB-X to add your linker script, do not dispair. You are probably not doing anywthing wrong! The right way to add a linker script to your project is indeed to just add it to "Linker Files" as they say, sometimes you just get unlucky due to some exotic bugs in the IDE. Just remove the file from your project, change the extention to something else (it seems like you can choose anything as long as it is different) and add the file back in and it should work. If not come back here and let me know and we can figure it out together :)
  14. Ok great news I figured out what was going wrong! I was working with an old project file. The project was not using a linker script before. It turns out that MPLAB is doing all kinds of strange things in the background to figure out that it has to treat files in the Logical Folder called by "name=LinkerScript" and "displayname=Linker Files" as linker scripts instead of C files, and once it has gotten itself confused about this there is no going back without recreating the entire project file. Now since ours contained hundreds of source files we tried to avoid this but alas, turns out there is not really another way :( There is an example here https://www.microchip.com/forums/m651658.aspx on how to add the item back in. This seems to only work if you add it in AND rename the item BEFORE opening the project in MPLAB-X, if you open the project first you will be out of luck. For now you will have to do a lot of trial and error, or just re-create the project if you need to add a linker script, and even then good luck, the IDE can muck it up quite easily! I think I see a blog post coming on how to get a linker script into your MPLAB-X project. It seems to be harder than it should be! Edit: I have written up my experience in a blog entry here:
  15. There is one problem we keep seeing bending people's minds over and over again and that is race conditions under concurrency. Race conditions are hard to find because small changes in the code which subtly modifies the timing can make the bug disappear entirely. Adding any debugging code such as a printf can cause the bug to mysteriously disappear. We spoke about this kind of "Heisenbug" before in the lore blog. In order for there to be a concurrency bug you have to either have multiple threads of execution (as you would have when using an RTOS) or, like we see more often in embedded systems, you are using interrupts. In this context an interrupt behaves the same as a parallel execution thread and we we have to take special care whenever both contexts of execution will be accesing the same variable. Usually this cannot be avoided, we e.g. receive data in the ISR and want to process it in the main loop. Whenever we have this kind of interaction where we need to access shared data which will be accessed by multiple contexts we need to serialize access to the data to ensure that the two contexts will safely navigate this. You will see that a good design will minimise the amount of data that will be shared between the two contexts and carefully manage this interaction. Simple race condition example The idea here is simply that we receive bytes in the ISR and process them in the main loop. When we receive a byte we increase a counter and when we process one we reduce the counter. When the counter is at 0 this means that we have no bytes left to process. uint8_t bytesInBuffer = 0; void interrupt serialPortInterrupt(void) { enqueueByte(RCREG); // Add received byte to buffer bytesInBuffer++; } void main(void) { while(1) { if ( bytesInBuffer > 0 ) { uint8_t newByte = dequeueByte(); bytesInBuffer--; processByte(newByte); } } } The problem is that both the interrupt and the mainline code access the same memory here and these interactions last for multiple instruction cycles. When an operation completes in a single cycle we call it "atomic" which means that it is not interruptable. There are a numbe of ways that instructions which seem to be atomic in C can take multiple machine cycles to complete. Some examples: Mathematical operations - these often use an accumulator or work register Assignment. If I have e.g. a 32bit variable on an 8-bit processor it can take 8 cycles or more to do x = y. Most pointer operations (indirect access). [This one can be particularly nasty btw.] Arrays [yes if you do i[12] that actually involves a multiply!] In fact this happens at two places here, the counter as well as the queue containing they bytes, but we will focus only on the first case for now, the counter "bytesInBuffer". Consider what would happen if the code "bytesInBuffer++" compiled to something like this: MOVFW bytesInBuffer ADDLW 1 MOVWF bytesInBuffer Similarly the code to reduce the variable could look like this: MOVFW bytesInBuffer SUBLW 1 MOVWF bytesInBuffer The race happens once the main execution thread is trying the decrement the variable. Once the value has been copied to the work register the race is on. If the mainline code can complete the calculation before an interrupt happens everything will work fine, but this will take 2 instructions. If an interrupt happens after the first instruction or after the 2nd instruction the data will be corrupted. Lets look at the case where there is 1 byte in the buffer and the main code starts processing this byte, but before the processing is complete another byte arrives. Logically we would expect the counter to be incremented by one in the interrupt and decremented by 1 in the mainline code, so the end result should be bytesInBuffer = 1, and the newly received byte should be ready for processing. The execution will proceed something like this - we will simplify to ignore clearing and checking of interrupt flags etc. (W is the work register, I_W is the interrupt context W register): // State before : bytesInBuffer = 1. Mainline is reducing, next byte arrives during operation Mainline Interrupt // state ... [bytesInBuffer = 1] MOVFW bytesInBuffer [bytesInBuffer = 1, W=1] SUBLW 1 [bytesInBuffer = 1, W=0] MOVFW bytesInBuffer [bytesInBuffer = 1, W=0,I_W=1] ADDLW 1 [bytesInBuffer = 1, W=0,I_W=2] MOVWF bytesInBuffer [bytesInBuffer = 2, W=0,I_W=2] MOVWF bytesInBuffer [bytesInBuffer = 0, W=0,I_W=2] ... [bytesInBuffer = 0] As you can see instead of ending up with bytesInBuffer = 0 instead of 1 and the newly received byte is never processed. This typically leads to a bug report saying that the serial port is either losing bytes randomly or double-receiving bytes e.g. UART (or code) having different behaviors for different baudrates - PIC32MX - https://www.microchip.com/forums/m1097686.aspx#1097742 Deadlock Empire When I get a new junior programmer in my team I always make sure they understand this concept really well by laying down a challenge. They have to defeat "The Deadlock Empire" and bring me proof that they have won the BossFight at the end. Deadlock Empire is a fun website which uses a series of examples to show just how hard proper serialization can be, and how even when you try to serialze access using mutexes and/or semaphores you can still end up with problems. You can try it out for youerself - the link is https://deadlockempire.github.io Even if you are a master of concurrency I bet you will still learn something new in the process! Serialization When we have to share data between 2 or more contexts it is critical that we properly serialize access to it. By this we mean that the one context should get to complete it's operation on the data before the 2nd context can access it. We want to ensure that access to the variables happen in series and not in parallel to avoid all of these problems. The simplest way to do this (and also the least desireably) is to simply disable interrupts while accessing the variable in the mainline code. That means that the mainline code will never be interrupted in the middle of an operation on the shared date and so we will be safe. Something like this: // State before : bytesInBuffer = 1. Mainline is reducing, next byte arrives during operation Mainline Interrupt // state ... [bytesInBuffer = 1] BCF GIE /* disable interrupts */ [bytesInBuffer = 1] MOVFW bytesInBuffer [bytesInBuffer = 1, W=1] SUBLW 1 [bytesInBuffer = 1, W=0] MOVWF bytesInBuffer [bytesInBuffer = 0, W=0] BSF GIE /* enable interrupts */ [bytesInBuffer = 0, W=0] MOVFW bytesInBuffer [bytesInBuffer = 0, W=0, I_W=0] ADDLW 1 [bytesInBuffer = 0, W=0, I_W=1] MOVWF bytesInBuffer [bytesInBuffer = 1, W=0, I_W=1] ... [bytesInBuffer = 1] And the newly received byte is ready to process and our counter is correct. I am not going to go into fancy serialization mechanisms such as mutexes and semaphores here. If you are using a RTOS or are working on a fancy advanced processor then please do read up on the abilities of the processor and the mecahnisms provided by the OS to make operations atomic, critical sections, semaphores, mutexes and concurrent access in general. Concurrency Checklist We always want to explicitly go over a checklist when we are programming in any concurrent environment to make sure that we are guarding all concurrent access of shared data. My personal flow goes something like this: Do you ever have more than one context of execution? Interrupts Threads Peripherals changing data (We problems with reading/writing 16-bit timers on 8-bit PIC's ALL OF THE TIME!) List out all data that is accessed in more than one context. Look deep, sometimes it is subtle e.g. accessing an array index may be calling multiply internally. Don't miss registers, especially 16-bit values like Timers or ADC results Is it possible to reduce the amount of data that is shared? Take a look at the ASM generated by the compiler to make 100% sure you understand what is happening with your data. Operations that look atomic in C are often not in machine instructions. Explicitly design the mechanism for serializing between the two contexts and make sure it is safe under all conditions. Serialization almost always causes one thread to be blocked, this will slow down either processing speed by blocking the other thread or increase latency and jitter in the case of interrupts We only looked into the counter variable above. Of course the queue holding the data is also shared, and in my experience I see much more issues with queues being corrupted due to concurrent access than I see with simple counters, so do be careful with all shared data. One last example I mentioned 16-bit timers a couple of times, I have to show my favourite example of this, which happens when people write to the timer register without stopping the timer. // Innocent enough looking code to update Timer1 register void updateTimer1(uint16_t value) { TMR1 = value; } // This is more common, same problem void updateTimer1_v2(uint16_t value) { TMR1L = value >> 8; TMR1H = value & 0xFF; } With the code above the compiler is generally smart enough not to do actual right shifts or masking, realizing that you are working with the lower and upper byte and this code compiles in both cases to something looking like this: MOVFW value+1 // High byte MOVWF TMR1H MOVFW value // Low byte MOVWF TMR1L And people always forget that when the timer is still running this can have disastrous results as follows when the timer increments in the middle of this all like this: // We called updateTimer1(0xFFF0) - expecting a period of 16 cycles to the next overflow // We show the register value to the right for just one typical error case // The timer ticks on each instruction ... [TMR1 = 0x00FD] MOVFW value+1 [TMR1 = 0x00FE] MOVWF TMR1H [TMR1 = 0xFFFF] MOVFW value [TMR1 = 0x0000] MOVWF TMR1L [TMR1 = 0x00F0] ... [TMR1 = 0x00F1] ... [TMR1 = 0x00F2] This case is always hard to debug because you can only catch it failing when the update is overlapping with the low byte overflowing into the high byte, which happens only 1 in 256 timer cycles, and since the update takes 2 cycles we have only a 0.7% chance of catching it in the act. As you can see, depending on whether you clear the interrupt flag after this operation or not you can end up with either getting an immediate interrupt due to the overflow in the middle of the operation, or a period which vastly exceeds what you are expecting (by 65280 cycles!). Of course if the low byte did not overflow in the middle of this the high byte is unaffected and we get exactly the expected behavior. When this happens we always see issues reported on the forums that sound something like "My timer is not working correctly. Every once in a while I get a really short or really long period" When I see these issues posted in future I will try to remember to come update this page with links just for fun 🙂
×
×
  • Create New...