Jump to content
 

The Problems of I2C - common problems and errors with using I2C


Orunmila

29,110 views

 Share

Quote

Hate is a strong word
But I really, really, really don't like you
Now that it's over
I don't even know what I liked about you
           - Plain White T's

I2C is such a widely used standard, yet it has caused me endless pain and suffering. In principle I2C is supposed to be simple and robust, a mechanism for "Inter-Integrated Circuit" communication. 

I am hoping that this summary of the battle scars I have picked up from using I2C might just save you some time and suffering. I have found that despite a couple of typical initial hick-ups it is generally not that hard to get I2C communication going, but making it robust and reliable can prove to be a quite a challenge.

Problem #1 - Address Specification

I2C data is not represented as a bit-stream, but rather a specific packet format with framing (start and stop conditions) preceded by an address, which encapsulates a sequence of 8-bit bytes, each followed by an ACK or NAK bit.
image.png

The first byte is supposed to be the address, but right from the bat, you have to deal with the first special case. How to combine this 7-bit address with the R/W bit always causes confusion.

There is no consistency in datasheets of I2C slave devices for specifying the device address, and even worse most vendors fail to specify which approach they use, leaving users to figure it out through trial and error. This has become bad enough that I would not recommend trying to implement I2C without an oscilloscope in hand to resolve these kinds of guessing games.

Let's say the 7-bit device address was 0x76 (like the ever-popular Bosh Sensortech BME280).

Sometimes this will be specified simply as 0x76, but the API in the software library, in order to save the work of shifting this value by 1 and masking in the R/W bit will often require you to pass in 0xEC as the address (0x76 left-shifted by one). 

Sometimes the vendor will specify 0xEC as the "write" address and 0xED as the "read" address.

image.png

To add insult to injury your bus analyzer or Saleae will typically show the first 8-bits as a hex value so you will never see the actual 7-bit address as a hex number on the screen, leaving you to be bit twiddling in your head on a constant basis while trying to make sense of the traces.

Problem #2 - Multiple Addresses

To add to the confusion from above many devices (like the BME280) has the ability to present on more than one address, so the datasheet will specify that (in the case of the BME280) if you pull down the unused SDO pin on the device it's address will be 0x76, but if you pull the pin up it will be 0x77.

I have seen many users leave this "unused" pin floating in their layouts, causing the device to schizophrenically switch between the 2 addresses at runtime and behavior to look erratic. This also, of course, doubles the number of possible addresses the device may end up responding to, and the specification of exactly 2 addresses fools a lot of people into thinking that the vendor is actually specifying a read and write address as described above. This all adds to the guessing game of what the actual device address may be.

To add to the confusion most devices have internal registers and these also have their own addresses, so it is very easy to get confused about what should go in the address byte. It is not the register address, it is the slave address, the register address goes in the data byte of the "write" you need to use if you want to do a "read",  in order to read a register from a specific address on the slave. Ok, if that is not confusing to you I salute you sir!

Problem #3 - 10-bit address mode

image.png

As if there was not enough address confusion already the limitation of only 127 possible device addresses lead to the inclusion of an extension called 10-bit addressing.

A 10-bit address is actually a pre-defined 5-bits, followed by the 2 most significant bits of the 10-bit address, then the R/W bit, after this an Ack from all the devices on the bus using 10-bit addressing with the same 2 MSB addresses, and after this the remaining 8 bits of the address followed by the real/full address ack.

So once again there is no standard way to represent the 10-bit address. Let's say the device has 10-bit address 0x123, how would this be specified now? The vendor could say 0x123 (and only 10 of the 12 bits implied are the 10-bit address), or they could include the prefix and specify it as 0xF223. Of course that number contains the R/W bit in the middle somewhere, so they may specify a "read" and a "write" address as 0xF223 and 0xF323, or they could right-shift the high-byte to show it as a normal 7-bit address, removing the R/W bit, and say it is 0x7123.

I think you get the picture here, lots of room for confusion and we have not even received our first ACK yet!

Problem #4 - Resetting during Debugging

Since I2C is essentially transaction/packet based and it does not include timeouts in the specification (SMBUS does of course, but most slave sensors conform to I2C only) there is a real chance that you are going to reset your host processor (or bus master) in the middle of such a transaction. This happens as easily as re-programming the processor during development (which you will likely be doing a lot).

The problem that tends to catch everybody at some point is that a hardware reset of your host processor is entirely invisible to the slave device which does not lose power when you toggle the master device's reset pin! The result is that the slave thinks that it is in the middle of an I2C transaction and awaits the expected number of master clock pulses to complete the current transaction, but the master thinks that it should be creating a start condition on the bus. This often leads to the slave holding the data line low and the master unable to generate a start condition on the bus.

When this happens you will lose the ability to communicate with the I2C sensor/slave and start debugging your code to find out what has broken. In reality, there is nothing wrong with your code and simply removing and re-applying the power to the entire board will cause both the master and slave to be reset, leaving you able to communicate again.

Of course, re-applying the power typically causes the device to start running, and if you want to debug you will have to attach the debugger which may very well leave you in a locked-up state once again.

The only way around this is to use your oscilloscope or Saleae all of the time and whenever the behavior seems strange stare very carefully at what is happening with the data line, is the address going out, is the start condition recognized and is the slave responding as it should, if not you are stuck and need to reset the slave device somehow.

Problem #5 - Stuck I2C bus

The situation described in #4 above is often referred to as a "stuck bus" condition. I have tried various strategies in the past to robustly recover from such a stuck bus condition programmatically, but they all come with a number of compromises. 

Firstly slave devices are essentially allowed to clock-stretch indefinitely, and if a slave device state machine goes bonkers it is possible that a single slave device can hold the entire bus hostage indefinitely and the only thing you can possibly do is remove the power from all slave devices. This is not a very common failure mode but it is definitely possible and needs addressing for robust or critical systems.

Often getting the bus "unstuck" is as simple as providing the slave device enough clocks to convince it that the last transaction is complete. Some slaves behave well and after clocking them 8 times and providing a NAK they will abort their current transaction.  I have seen slaves, especially I2C memories, where you have to supply more than 8 clocks to be certain that the transaction terminates, e.g. 32 clocks. I have also seen specialized slave devices that will ignore your NAK's and insist on sending even more data e.g. 128 or more bits before giving up on an interrupted transaction.

The nasty part about getting an I2C bus "unstuck" is that you usually not use the I2C peripheral itself to do this service. This means that typically you will need to disable the peripheral, change the pins to GPIO mode and bit-bang the clocks you need out of the port, after which you need to re-initialize the I2C peripheral and try the next transaction, and if this fails then rinse and repeat the process until you succeed. This, of course, is expensive in terms of code space, especially on small 8-bit implementations.

Problem #6 - Required Repeated Start conditions

The R/W bit comes back to haunt us for this one. The presence of this bit implies that all transactions on I2C should be uni-directional, that is they must either read or write, but in practice, things are not that simple. Typically a sensor or memory will have a number of register locations inside of the device and you will have to "write" to the device to specify which location you wish to address, followed by "reading" from the device to get the data. The problem with a bus is that something may interrupt you between these two operations that form one larger transaction. 

In order to overcome this limitation, I2C allows you to concatenate 2 I2C operations into a single transaction by omitting the stop condition between them. So you can do the write operation and instead of completing it with a stop condition on the bus you can follow with a second start condition and the latter half of the operation, terminating the whole thing with a stop condition only when you are done.

This is called a "repeated start" condition and looks as follows (from the BME280 datasheet).

image.png

It can often be quite a challenge to generate such a repeated start condition as many I2C drivers will require you to specify read/write and a pointer and number of bytes and not give you the option to omit the stop condition, and many slave devices will reset their state machines at a stop condition so without a repeated start it is not possible to communicate with these devices.

Of course, I should also mention that the requirement to send the slave address twice for these transactions significantly reduces the throughput you can get through the bus.

Problem #7 - What is Ack and Nak supposed to be?

This brings us to the next problem. It is quite clear that the Address is ack-ed by the slave, but when you are reading data what is the exact semantics of the ack/nak? The BME280 datasheet is a bit unique in that it clearly distinguishes in that figure in #6 above whether the ack should be generated by the master or the slave (ACKS vs ACKM), but from the specification, it is not immediately clear. If I read data from a slave, who is supposed to provide the ack at the end of the data? Is this the master or the slave? 

What would be the purpose of the master providing an ack to the slave to data? Clearly, the master is alive as it is generating clocks, and the slave may be sending all 1's which means it does not touch the bus at all. So what is the slave supposed to do if the master makes a Nak in response to a byte? And how would a slave determine if it should Nak your data since there is no checksum or CRC on it there is no way to determine if it is correct? None of this is clearly specified anywhere.

To add confusion I have seen people spend countless hours looking for the bug in their BME280 code which causes the last data byte to get a NAK! When you look on the bus analyzer or Oscilloscope you will be told that every byte was followed by an ACK except for the last one where you will see a NAK. Most people interpret this NAK to be an indication that something is wrong, but no, look carefully at the image from the datasheet in section #6 above! Each byte received by the master is followed by an ACKM (ack-ed by the master) EXCEPT for the last byte, in which case the master will not ACK it, causing a NAK to proceed the stop condition!

To make this even harder, most I2C hardware peripherals will not allow you fine-grained control of whether the master will ACK or NAK. Very often the peripheral will just blithely ack every byte that it reads from the slave regardless.

Problem #8 - Pull-up resistors and bus capacitance

The I2C bus is designed to be driven only through open-drain connections pulling the bus down, it is pulled up by a pair of pull-up resistors (one on the clock line and one on the data line). I have seen many a young engineer struggle with unreliable I2C communication due to either the entire lack of or incorrect pull-up resistors. Yes it is possible to actually communicate even without the resistors due to parasitic pull-ups which will be much larger than required, meaning that it will pull weakly enough to get the bus high-ish and under some conditions can provoke an ack from a slave device.

There is no clear specification of what the size of these pull-up resistors should be, and for good reason, but this causes a lot of uncertainty. The I2C specification does specify that the maximum bus capacitance should be 400pF. This is a pretty tricky requirement to meet if you have a large PCB with a number of devices on the bus and it is often overlooked, so it is typical to encounter boards where the capacitance is exceeding the official specification.

In the end, the pull-up needs to be strong enough (that is small enough) to pull the bus to Vdd fast enough to communicate at the required bus speed (typically 100kHz or 400kHz). The higher the bus capacitance is the stronger you will have to pull up in order to bring the bus to Vdd in time. If you look at the Oscilloscope you will see that the bus goes low fairly quickly (pulled down strongly to ground) but goes up fairly slowly, something like this:

image.pngAs you can see in the trace to the right there are a number of things to consider. If your pull-ups are too large you will get a lot of interference as indicated by the red arrows in the trace where the clock line is coupling through onto the data line which has too high an impedance. This can often be alleviated with good layout techniques, but if you see this on your scope consider lowering the pull-up value to hold the bus more steady.

If the rise times through the pull-up are too slow for the bus speed you are using you will have to either work on reducing the capacitance on the bus or pulling up harder through a smaller resistor. Of course, you cannot just tie the bus to Vdd in the extreme as it still needs to be pulled to 0 by the master and slaves.

As a last consideration, the smaller the resistor is you use the more power you will consume while driving the bus.

Problem #9 - Multi-master

I have been asked many times to implement multi-master I2C. There are a large number of complications when you need multiple masters on an I2C bus and this should only be attempted by true experts. Arbitrating the bus when multiple masters are pulling it down simultaneously presents a number of race conditions that are extremely hard to robustly deal with.

I would like to just point out one case here as illustration and leave it at that. 

Typical schemes for multi-master will require the master to monitor the bus while it is emitting the address byte. When the master is not pulling the bus low, but it reads low this is an indication that another master is trying to emit an address at the same time and you as a master should then abort your transaction immediately, yielding the bus to the other master. A problem arises though when both masters are trying to read from the same slave device. When this happens it is possible that both addresses match exactly and that the 2 masters start their transactions in close proximity. 

Due to the clock skew between the masters, it is possible that they are trying to read from different control registers on the slave, that the slave will match only one of the 2 masters, but both masters will think that they have the bus and the slave is responding to their request.

When this happens the one master will end up receiving incorrect data from the wrong address. Consider e.g. a BME280 where you may get the pressure reading instead of humidity, causing you to react incorrectly.

Like I said there are many obscure ways multi-master can fail you, so beware when you go there.

Problem #10 - Clock Stretching

In the standard slaves are allowed to stretch the clock by driving the clock line low after the master releases it. Clock stretching slaves are a common cause of I2C busses becoming stuck as the standard does not provide for timeouts. This is something where SMBUS has provided a large improvement over the basic I2C standard, although there can still be ambiguity around how long you really have to wait to ensure that all slaves have timed out, and the idea with SMBUS is that you can safely mix with non-SMBUS slaves, but this one aspect makes it unreliable to do so.

In critical systems, you will as a result very often see 2 I2C slave devices connected via different sets of pins, using I2C as a point to point communications channel instead of a bus in order to isolate failure conditions to a single sensor.

image.png

Problem #11 - SMBUS voltage levels

In I2C logical 1 voltage levels depends on the bus voltage and are above 70% of bus voltage for a 1 and below 30% for a 0. The problems here are numerous, resulting in different devices seeing a 0 or 1 at different levels. SMBUS devices do not use this mechanism but instead specify thresholds at 0.8v and 2.1v. These levels are often not supported by the microcontroller you are using leaving some room for misinterpretation, especially if you add the effects of bus capacitance and the pull-up resistors to the signal integrity.

For more information about SMBUS and where it differs from the standard I2C specification take a look at this WikiPedia page.

Problem #12 - NAK Polling

NAK polling often comes into play when you are trying to read from or write to an I2C memory and the device is busy. These memory devices will use the NAK mechanism to signal the master that he has to wait and retry the operation in a short while.

The problem here is that many hardware I2C peripherals simply ignore acks and nak's altogether or does not give you the required hooks to respond to these. Many vendors try to accelerate I2C operations by letting you pre-load a transaction for sending to the slave and doing all of the transmission in hardware using a state machine, but these implementations rarely have accommodations for retrying the byte if the slave was to NAK it. 

NAK-polling also makes it very hard to use DMA for speeding up I2C transmissions as once again you need to make a decision based on the Ack/Nak after every byte, and the hooks to make these decisions typically require an interrupt or callback at the end of every byte which causes huge overhead.

Problem #13 - Bus Speeds

When starting to bring up an I2C bus I often see engineers starting with one sensor and working their way through them one by one. This can lead to a common problem where the first sensor is capable of high-speed transmission e.g. 1MHz, but you only need 1 sensor on the bus that is limited to 100KHz and this can cause all kinds of intermittent failures.

When you have more than 1 slave on the same bus make sure that the bus is running at a speed that all the slaves can handle, this means that when you bring up the bus it is always a good idea to start things out at 100kHz and only increase the speed once you have communication established with all the slaves on the bus.

The more slaves you have on the bus the more likely you will be to have an increased bus capacitance and signal integrity problems.

In conclusion

I2C is quite a widely used standard. When I am given the choice between using I2C or something like SPI for communication with sensor devices I tend to prefer SPI for a number of reasons. It is possible to go much faster using SPI as the bus is driven hard to both 0 and 1, the complexities of I2C and the problems outlined above inevitably raises the complexity significantly and presents a number of challenges to achieving a robust system. To be clear I am not saying I2C is not a robust protocol, just that it takes some real skill to use it in a truly robust way, and other protocols like SPI do not require the same effort to achieve a robust solution.

So like the Plain White T's say, hate is a strong word, but I really really don't like I2C ...

I am sure there are more ways I2C has bitten people, please share your additional problems in the comments below, or if you are struggling with I2C right now, feel free to post your problem and we will try to help you debug it!

 

 

 Share

4 Comments


Recommended Comments

Problem #14: I2C bus repeaters.

A lot of them use miniscule voltage shifts to determine which end is driving the bus so it can drive the signal the right way.  If you get them to work reliably, go out and buy some lottery tickets before it's too late.

  • Like 1
Link to comment

Im not sure which problem im having actually but I have one stm32 device as a master and i have two peripherals( 1 is OLED and then other is an Stm32 device(same product number)) and I am trying to get the master to recognize both peripheral addresses; however, when the OLED is only connected the address is found but when i also connect the stm32 device both address are not found and the debug keeps running.. im not sure if its an bus problems (#4 or #5) or is my initialization wrong even though I have followed the datasheet thoroughly and have set the address properly... hopefully anyone else has experienced this and figured out a way to fix this please let me know

 

Link to comment
  • Member

The way I2C works when you add a second device no changes should be required in your configuration. If the working address stops working it is usually either because of multiple devices with the same address or it is due to an electrical problem. Could be the capacitive loading on the bus or even reflections if you have a star with long lines without termination (could also be incorrect size of or even missing pullup resistors). What I would suggest is to try and reduce the speed after ensuring that you have different addresses. 

Also look at the lines with an oscilloscope and check to see if the bus is being pulled down (ACK) after the address. If not no device is recognising the address.

Link to comment
Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...