API calls failing

indigo_jay · February 26, 2018, 7:44pm

I built a plugin for Indigo, our home automation server, to integrate using your API. At 2018-02-24 17:13:03 CST, all API calls began failing with:

HTTPError: 429 Client Error: Too Many Requests for url: https://api.rach.io/1/public/person/XXXXXXXXXX

and

HTTPError: 429 Client Error: Too Many Requests for url: https://api.rach.io/1/public/device/XXXXXXXX/forecast?units=US

Since they had all been working before that, I can only assume that something on your end has changed. Any thoughts on how we can fix this issue? We have a number of users who purchased Rachio hardware simply because of the plugin integration.

indigo_jay · February 26, 2018, 8:03pm

Disabled the plugin for 10 minutes, reenabled it and now it’s working again. Can someone point me to whatever documentation you have on your rate-limiting so that I can make sure the plugin doesn’t accidentally run across it?

Gene · February 26, 2018, 11:42pm

Seems that Rachio is testing the rate limiting to improve API performance. I don’t have any details on the final / official position, but it will be a good idea to start implementing rate limiting within your code. I’ve tired to follow all of the ways an update can be triggered with in your plugin, but being unfamiliar with indigo I couldn’t approximate external triggers. In any case, it may be a good idea to cache the data and limit any updates to one time per minute.

I’m not quite sure if Rachio will be operating on a daily API limit or use a shorter time interval, one thing I would recommend is that you consider adding inactive time periods for your code. (save your API calls for when you need them, such as update less frequently outside of “active” hours).

Be careful about your weather updates being tied in with your refresh rate. anyone who sets their refresh rate to 1 (update / second) will also check weather every 10 seconds. These updates also count toward your API (not to mention any actions you may need to do, such as starting / stopping any of the zones).

Gene

indigo_jay · February 27, 2018, 4:47pm

My biggest issue in the short term is really one of their throttle clearing. Under normal circumstances (not trying to control the sprinkler but rather just get status), we’re currently throttling at 1 request per minute, except once every 10 minutes we do 2 requests (for status and the forecast). When the sprinkler is being controlled by Indigo, it could be more frequent. But, even if we did manage to trigger throttling denials, it seems that there is some other limit needed to turn off the throttle denies. Longer than 1 per minute or some other incomprehensible logic.

This is why Rachio must provide throttling details so that we can make sure that we understand what they are and can plan accordingly. We know that there will be times when API calls will be more frequent - like when some external service is controlling the sprinkler (like Indigo does). We also know there will be times when we just want status (in case the Rachio is running a schedule independently). Finally, we know that once a throttle limit is set, we will get refusals until such time that some currently unknown threshold is crossed and we can start calling successfully again. So a single throttle time is clearly not going to be enough. We need to know the details. Otherwise, integrations like ours are pointless (if we can’t predict when we might hit a threshold and risking a denial that will cause any given sprinkler run to fail).

I also have to say, one call a minute is really not very fast when trying to recover from a throttle deny.

indigo_jay · February 27, 2018, 5:18pm

Correction to the 1 request per minute - it’s more like 2, which don’t have any throttling in between. I suppose it’s possible that’s causing the initial triggering. I will continue to investigate ways of mitigating, but it definitely feels like shooting in the dark without any specifics about what’s getting throttled when and under what circumstances a throttle deny is stopped.

Seth_J · February 27, 2018, 5:22pm

Hit the same issue as well with Control4 integration. If Rachio is going to do this they need to update their documentation. I don’t mind making whatever changes are necessary but now I’ve got to field phone calls from dealers who can’t use their system until after my account is unlocked and I can get around to making the changes.

Luckily its winter so not too many have noticed.

indigo_jay · February 27, 2018, 5:54pm

I might also mention that we would be more than happy to hand off scheduling to the rachio rather than do it ourselves, but the API doesn’t implement pause/resume, next/previous zone, etc. which is what our users are accustomed to using with other sprinkler controllers.

Finally, there is at least one call which should ALWAYS be accepted regardless of any throttling: stop_water. As a last resort, we should always be able to turn the sprinkler off in all cases to prevent unintentional over-watering.

indigo_jay · February 27, 2018, 7:59pm

Through trial and error, I’ve determined that (as of right now) the rate limiting block is removed after 30 minutes of API inactivity. This seems excessively long. If the block should occur after my software has instructed a zone to begin running but before we tell it to end, this could cause a zone to get over-watered by 30 minutes at least. This is unacceptable.

I’ve warned our customers to stop using the plugin for the time being until we get some concrete details from Rachio about their throttling buckets. I’ve also warned any of our customers who are considering purchasing a Rachio because of the Indigo integration to put off the purchase until we get this information and can make sure our plugin won’t accidentally cause either over watering or missed watering schedules.

I really hope someone from Rachio is paying attention here.

mckynzee · February 27, 2018, 11:39pm

Hey all,

I apologize for my delay in response. @Gene was exactly right in his assumption, we are now implementing rate limiting in order to improve API performance for all users. We are only allowing 1700 calls per day, which is over 1 call per minute. We are typically only seeing this limit exceeded by integrations that are polling. If you’d like to explore a non-polling method we do support webhooks.We are absolutely willing to work with users and discuss exceptions for specific use cases.

We are sorry for not informing you all earlier, we appreciate all of the involved work you guys have done to integrate with the product. We saw an immediate need to lower system usage and missed the mark on properly communicating this.

Hope this helps.

McKynzee

indigo_jay · February 28, 2018, 12:06am

McKynzee, can you provide more insight? For instance, is it 1700 calls in any given 24 hour period or 1700 calls per calendar day? This would shed some light on when a threshold violation would clear. It’ll also tell us if we need to add some time between sequential calls if there’s a shorter threshold bucket (X number of calls in 1 minute, etc.).

Giving us no notice puts the developers of these integrations in a bind: we now have to drop everything to go back and implement other strategies rather than slot the work in with everything else that we have on our roadmaps. I would respectfully request that you increase the limits for some period of time (a month perhaps) and give us a firm date for when you’ll implement the 1700 call limit. This would at least give us at least a little bit of time to get our integrations reworked and deployed to our users.

I’m also surprised that we’ve had users tripping over the 1700 limit. Have you guys had shorter limits in production at various times over the last several days?

Seth_J · February 28, 2018, 1:33am

It sounds like 1700 in a 24 hour period. Mine just started working again.

The 1700 calls limitation is counted in a 24 hour period. Meaning, you could use up all 1700 calls in one hour, and none for the rest of the day, and still be within satisfactory limits.

I agree that we should have been notified or the documentation update to reflect the change. Also every API call should return a “callsLeft” number.

We were polling the hell out of the servers but nothing in the API docs said we shouldn’t. ¯_(ツ)_/¯

We’ll have to implement a cloud base solution for the web hooks as our integrations typically don’t expose any openings to the outside world. Fun fun.

Gene · February 28, 2018, 2:09am

Nothing beats a new challenge when it comes to developing new features

plainsane · February 28, 2018, 3:25am

For those with a dynamic up, webhooks can be problematic. Does rachio support ipv6? For now my webhook is pointed to my aws account but I could remove that requirement if I can point the hook to http://[203:d928…]/hook

indigo_jay · February 28, 2018, 5:00pm

I don’t mind challenges, and I certainly understand the need to adapt to scaling issues. What I do mind is having them forced on me without any warning and without any ability to schedule/predict when I need to get them done by.

Of course, all this could be avoided if we could just talk directly to the controller - no cloud infrastructure required and communication stays within the local network.

Gene · February 28, 2018, 5:40pm

Considering how little time you Jay and Seth have spent on the forum, I doubt they would have noticed any official messages about upcoming API changes. As a fellow developer I would like to sympathize with your situation, but in reality, thanks to your over-enthusiastic code, all of the developers (not just you) are effected. Rachio has exposed the upcoming schedule information via API, you do not have to poll to figure out when it would run next.

I recommend that you make it clear for your users that if they wish to get the most accurate information via your application, they should use your application to start / stop their Rachio(s), otherwise (should users decide to use Rachio’s official app), just tell them that real-time information on their actions via competing interfaces are not supported and stick with reasonable updates of 1 update for every 5 minute normally and 1 per minute near / during the schedule run.

You should always develop with the least impact in mind. I’m sure even 1700 calls / day is not sustainable if more and more users will start taking advantage of your applications and god forbid that you share one user which means you would now need to split the 1700 limit between the two of you (+ anyone else who wishes to interface with that user).

Gene

Seth_J · February 28, 2018, 7:18pm

With all due respect making no effort to communicate what is a breaking change to, as you put it, “all developers” is in any way acceptable. Even now there is still no mention of the API limit on rachio.readme.io and the Support Link at the bottom to this forum, where I came to find out what was going on, is broken.

What I don’t need a lesson in product design. What I do need communication and documentation. If those are correct I can make my product work just fine.

indigo_jay · February 28, 2018, 9:35pm

Well, Gene, a company that publishes an API (particularly a cloud-based one) has some implicit if not explicit responsibilities, not the least of which is to let users know if there are backend changes that may effect their operation. Each of our customers (and probably other integrations as well) use their own API key as published in the Rachio app(s), so Rachio knows from whom API calls are originating. They have the information needed to contact customers that are using offending integrations.

Saying that it’s the responsibility of an integration creator to continually monitor a forum looking for stuff that might break an integration is not practical. And since there was nothing documented about rate limits, it was left up to developers to interpret what that meant in the first place. It’s utterly ridiculous to make any claims about how “little time” we spend on the forums here.

In summary, I reject the assumptions you so unadvisedly made in your post. This problem lies solely with Rachio and their failure to think through the ramifications of implementing rate limits without any thought of communication to those who would be effected.

I would like to suggest a couple of things for Rachio to consider:

Implement some sort of integration developer notification mechanism. This could be as easy as allowing anyone who wants to know about API changes to sign up for an email. Correcting the issues Seth points out (broken links, incomplete docs) would also go a long way.
When making potentially breaking changes, contact users who are using API keys that are going to be effected. Even if I hadn’t gotten notification, I can guarantee that if any of our shared customers received such an email from Rachio they would have very quickly let us know and we could have reacted appropriately.
Consider opening up the API for direct communication to the controller. They could significantly mitigate scaling issues for the API if it wasn’t necessary to contact their cloud-based servers for all communication to the controller. Even if it were just for direct control commands (turn on zone, stop watering, etc) then at the very least those calls could never fail due to network or capacity issues.

Rachio makes great hardware, as many of our customers have found out. We even recommend it when customers ask about irrigation solutions. Implementing a few relatively simple things could help harden integrations and make Rachio an even better product.

Bill.Maupin · February 28, 2018, 11:35pm

Not a BIG dev guy here, but have to agree with this one.
“Consider opening up the API for direct communication to the controller. They could significantly mitigate scaling issues for the API if it wasn’t necessary to contact their cloud-based servers for all communication to the controller. Even if it were just for direct control commands (turn on zone, stop watering, etc) then at the very least those calls could never fail due to network or capacity issues.”

The lack of direct control worries me that this could easily become a forced subscription device in the future…

Holding on to my old controller just in case.

Gene · March 1, 2018, 12:50am

It is easy to be dismissive when it comes to feedback. Could Rachio have handled it better? Yes. @mckynzee even apologized for missing the opportunity to inform everyone about upcoming changes, but implying that API comes with any sort of grantees… I’m afraid you are in for a shock. I could go on about how Rachio has a section within their TOS about reasonable load, but I really do hope that you will take this event as a lesson about what could happen with any of your other integrations which leave it completely up to end user about how often they should refresh the data.

You raise good points. Would it be nice for Rachio to provide a local interface? YES! I’d love to see this feature, especially for times when unit operation is essential, but internet infrastructure may be compromised (such as during the wild wire). Is it easy to implement? No, exposing services on a local network includes a higher risk that something will go wrong. Will Rachio ever develop this functionality? I hope so, but it will be much more likely that they do so in case we (as the developers) can prove that they will not be dealing with overwhelming amount of traffic, just because it is local.

I come from a different type of development. Thought out my carrier I was dealing with embedded programming, and now I dabble with web programming. In either application, resources are always limited and loads are not easily dismissed. I’m not a Rachio employee, nor do I have any reason to defend their position, but I do have a good idea of why they had to do what they’ve done.

Be nice to your service providers, would your code have been reasonable to being with, you may not have even noticed the reduction in API allowance. As is, treat this as a lesson and revisit some of your other code to reduce the load where it matters.

Gene

a0128958 · March 1, 2018, 1:20am

I see similar issues with network connectable thermostats.

Some thermostats are ‘closed’ to any local interaction. You ‘talk’ to the cloud to ‘talk’ to the tstat, or you don’t talk to the tstat at all. There’s a variety of reasons why some manufacturers do this, including marketing reasons.

Some thermostats are ‘interfaceable’ via local LAN connection, You’re going to ‘talk’ to the tstat without ever using the WAN.

Tstat manufacturers are very watchful on subjects like polled or non-polled. And if they’re polled, the manufacturers limit how often - typically the best you can get is once per minute ‘handshaking.’

It’s been interesting to read all you developers comment about need and philosophies for ‘talking’ to a sprinkler controller. They all remind me of the same subjects with tstats.

My observation is I don’t think you’ll ever get a local interface to the Iro. The architecture just isn’t set up to support this. It’s the cloud or nothing.

Interestingly, my network connected tstats are local to the LAN (controllable without need for a cloud), and polled. And I’m limited to polling no faster than 1 per minute.

Good luck you guys! And be nice to the manufacturer!

Best regards,

Bill