API calls failing

My biggest issue in the short term is really one of their throttle clearing. Under normal circumstances (not trying to control the sprinkler but rather just get status), we’re currently throttling at 1 request per minute, except once every 10 minutes we do 2 requests (for status and the forecast). When the sprinkler is being controlled by Indigo, it could be more frequent. But, even if we did manage to trigger throttling denials, it seems that there is some other limit needed to turn off the throttle denies. Longer than 1 per minute or some other incomprehensible logic.

This is why Rachio must provide throttling details so that we can make sure that we understand what they are and can plan accordingly. We know that there will be times when API calls will be more frequent - like when some external service is controlling the sprinkler (like Indigo does). We also know there will be times when we just want status (in case the Rachio is running a schedule independently). Finally, we know that once a throttle limit is set, we will get refusals until such time that some currently unknown threshold is crossed and we can start calling successfully again. So a single throttle time is clearly not going to be enough. We need to know the details. Otherwise, integrations like ours are pointless (if we can’t predict when we might hit a threshold and risking a denial that will cause any given sprinkler run to fail).

I also have to say, one call a minute is really not very fast when trying to recover from a throttle deny.

Correction to the 1 request per minute - it’s more like 2, which don’t have any throttling in between. I suppose it’s possible that’s causing the initial triggering. I will continue to investigate ways of mitigating, but it definitely feels like shooting in the dark without any specifics about what’s getting throttled when and under what circumstances a throttle deny is stopped.

Hit the same issue as well with Control4 integration. If Rachio is going to do this they need to update their documentation. I don’t mind making whatever changes are necessary but now I’ve got to field phone calls from dealers who can’t use their system until after my account is unlocked and I can get around to making the changes.

Luckily its winter so not too many have noticed.

I might also mention that we would be more than happy to hand off scheduling to the rachio rather than do it ourselves, but the API doesn’t implement pause/resume, next/previous zone, etc. which is what our users are accustomed to using with other sprinkler controllers.

Finally, there is at least one call which should ALWAYS be accepted regardless of any throttling: stop_water. As a last resort, we should always be able to turn the sprinkler off in all cases to prevent unintentional over-watering.

1 Like

Through trial and error, I’ve determined that (as of right now) the rate limiting block is removed after 30 minutes of API inactivity. This seems excessively long. If the block should occur after my software has instructed a zone to begin running but before we tell it to end, this could cause a zone to get over-watered by 30 minutes at least. This is unacceptable.

I’ve warned our customers to stop using the plugin for the time being until we get some concrete details from Rachio about their throttling buckets. I’ve also warned any of our customers who are considering purchasing a Rachio because of the Indigo integration to put off the purchase until we get this information and can make sure our plugin won’t accidentally cause either over watering or missed watering schedules.

I really hope someone from Rachio is paying attention here.

Hey all,

I apologize for my delay in response. @Gene was exactly right in his assumption, we are now implementing rate limiting in order to improve API performance for all users. We are only allowing 1700 calls per day, which is over 1 call per minute. We are typically only seeing this limit exceeded by integrations that are polling. If you’d like to explore a non-polling method we do support webhooks.We are absolutely willing to work with users and discuss exceptions for specific use cases.

We are sorry for not informing you all earlier, we appreciate all of the involved work you guys have done to integrate with the product. We saw an immediate need to lower system usage and missed the mark on properly communicating this.

Hope this helps.

McKynzee :rachio:

McKynzee, can you provide more insight? For instance, is it 1700 calls in any given 24 hour period or 1700 calls per calendar day? This would shed some light on when a threshold violation would clear. It’ll also tell us if we need to add some time between sequential calls if there’s a shorter threshold bucket (X number of calls in 1 minute, etc.).

Giving us no notice puts the developers of these integrations in a bind: we now have to drop everything to go back and implement other strategies rather than slot the work in with everything else that we have on our roadmaps. I would respectfully request that you increase the limits for some period of time (a month perhaps) and give us a firm date for when you’ll implement the 1700 call limit. This would at least give us at least a little bit of time to get our integrations reworked and deployed to our users.

I’m also surprised that we’ve had users tripping over the 1700 limit. Have you guys had shorter limits in production at various times over the last several days?

1 Like

It sounds like 1700 in a 24 hour period. Mine just started working again.

The 1700 calls limitation is counted in a 24 hour period. Meaning, you could use up all 1700 calls in one hour, and none for the rest of the day, and still be within satisfactory limits.

I agree that we should have been notified or the documentation update to reflect the change. Also every API call should return a “callsLeft” number.

We were polling the hell out of the servers but nothing in the API docs said we shouldn’t. ¯_(ツ)_/¯

We’ll have to implement a cloud base solution for the web hooks as our integrations typically don’t expose any openings to the outside world. Fun fun.

1 Like

Nothing beats a new challenge when it comes to developing new features :beers:

For those with a dynamic up, webhooks can be problematic. Does rachio support ipv6? For now my webhook is pointed to my aws account but I could remove that requirement if I can point the hook to http://[203:d928…]/hook

1 Like

I don’t mind challenges, and I certainly understand the need to adapt to scaling issues. What I do mind is having them forced on me without any warning and without any ability to schedule/predict when I need to get them done by.

Of course, all this could be avoided if we could just talk directly to the controller - no cloud infrastructure required and communication stays within the local network.

Considering how little time you Jay and Seth have spent on the forum, I doubt they would have noticed any official messages about upcoming API changes. As a fellow developer I would like to sympathize with your situation, but in reality, thanks to your over-enthusiastic code, all of the developers (not just you) are effected. Rachio has exposed the upcoming schedule information via API, you do not have to poll to figure out when it would run next.

I recommend that you make it clear for your users that if they wish to get the most accurate information via your application, they should use your application to start / stop their Rachio(s), otherwise (should users decide to use Rachio’s official app), just tell them that real-time information on their actions via competing interfaces are not supported and stick with reasonable updates of 1 update for every 5 minute normally and 1 per minute near / during the schedule run.

You should always develop with the least impact in mind. I’m sure even 1700 calls / day is not sustainable if more and more users will start taking advantage of your applications and god forbid that you share one user which means you would now need to split the 1700 limit between the two of you (+ anyone else who wishes to interface with that user).

Gene

With all due respect making no effort to communicate what is a breaking change to, as you put it, “all developers” is in any way acceptable. Even now there is still no mention of the API limit on rachio.readme.io and the Support Link at the bottom to this forum, where I came to find out what was going on, is broken.

What I don’t need a lesson in product design. What I do need communication and documentation. If those are correct I can make my product work just fine.

Well, Gene, a company that publishes an API (particularly a cloud-based one) has some implicit if not explicit responsibilities, not the least of which is to let users know if there are backend changes that may effect their operation. Each of our customers (and probably other integrations as well) use their own API key as published in the Rachio app(s), so Rachio knows from whom API calls are originating. They have the information needed to contact customers that are using offending integrations.

Saying that it’s the responsibility of an integration creator to continually monitor a forum looking for stuff that might break an integration is not practical. And since there was nothing documented about rate limits, it was left up to developers to interpret what that meant in the first place. It’s utterly ridiculous to make any claims about how “little time” we spend on the forums here.

In summary, I reject the assumptions you so unadvisedly made in your post. This problem lies solely with Rachio and their failure to think through the ramifications of implementing rate limits without any thought of communication to those who would be effected.

I would like to suggest a couple of things for Rachio to consider:

  • Implement some sort of integration developer notification mechanism. This could be as easy as allowing anyone who wants to know about API changes to sign up for an email. Correcting the issues Seth points out (broken links, incomplete docs) would also go a long way.
  • When making potentially breaking changes, contact users who are using API keys that are going to be effected. Even if I hadn’t gotten notification, I can guarantee that if any of our shared customers received such an email from Rachio they would have very quickly let us know and we could have reacted appropriately.
  • Consider opening up the API for direct communication to the controller. They could significantly mitigate scaling issues for the API if it wasn’t necessary to contact their cloud-based servers for all communication to the controller. Even if it were just for direct control commands (turn on zone, stop watering, etc) then at the very least those calls could never fail due to network or capacity issues.

Rachio makes great hardware, as many of our customers have found out. We even recommend it when customers ask about irrigation solutions. Implementing a few relatively simple things could help harden integrations and make Rachio an even better product.

1 Like

Not a BIG dev guy here, but have to agree with this one.
“Consider opening up the API for direct communication to the controller. They could significantly mitigate scaling issues for the API if it wasn’t necessary to contact their cloud-based servers for all communication to the controller. Even if it were just for direct control commands (turn on zone, stop watering, etc) then at the very least those calls could never fail due to network or capacity issues.”

The lack of direct control worries me that this could easily become a forced subscription device in the future…

Holding on to my old controller just in case.

It is easy to be dismissive when it comes to feedback. Could Rachio have handled it better? Yes. @mckynzee even apologized for missing the opportunity to inform everyone about upcoming changes, but implying that API comes with any sort of grantees… I’m afraid you are in for a shock. I could go on about how Rachio has a section within their TOS about reasonable load, but I really do hope that you will take this event as a lesson about what could happen with any of your other integrations which leave it completely up to end user about how often they should refresh the data.

You raise good points. Would it be nice for Rachio to provide a local interface? YES! I’d love to see this feature, especially for times when unit operation is essential, but internet infrastructure may be compromised (such as during the wild wire). Is it easy to implement? No, exposing services on a local network includes a higher risk that something will go wrong. Will Rachio ever develop this functionality? I hope so, but it will be much more likely that they do so in case we (as the developers) can prove that they will not be dealing with overwhelming amount of traffic, just because it is local.

I come from a different type of development. Thought out my carrier I was dealing with embedded programming, and now I dabble with web programming. In either application, resources are always limited and loads are not easily dismissed. I’m not a Rachio employee, nor do I have any reason to defend their position, but I do have a good idea of why they had to do what they’ve done.

Be nice to your service providers, would your code have been reasonable to being with, you may not have even noticed the reduction in API allowance. As is, treat this as a lesson and revisit some of your other code to reduce the load where it matters.

Gene

I see similar issues with network connectable thermostats.

Some thermostats are ‘closed’ to any local interaction. You ‘talk’ to the cloud to ‘talk’ to the tstat, or you don’t talk to the tstat at all. There’s a variety of reasons why some manufacturers do this, including marketing reasons.

Some thermostats are ‘interfaceable’ via local LAN connection, You’re going to ‘talk’ to the tstat without ever using the WAN.

Tstat manufacturers are very watchful on subjects like polled or non-polled. And if they’re polled, the manufacturers limit how often - typically the best you can get is once per minute ‘handshaking.’

It’s been interesting to read all you developers comment about need and philosophies for ‘talking’ to a sprinkler controller. They all remind me of the same subjects with tstats.

My observation is I don’t think you’ll ever get a local interface to the Iro. The architecture just isn’t set up to support this. It’s the cloud or nothing.

Interestingly, my network connected tstats are local to the LAN (controllable without need for a cloud), and polled. And I’m limited to polling no faster than 1 per minute.

Good luck you guys! And be nice to the manufacturer!

Best regards,

Bill

Gene, I honestly don’t need any lectures from you about development philosophy or load issues. I’ve been a VP of engineering at a high-volume payment provider, so I understand load issues. I also understood the responsibility we had to users of our API and the critical need to ensure uptime. While I understand the circumstances are different, my goal here is to hopefully get Rachio to think more holistically about their ecosystem. Saying “sorry we didn’t communicate well” is a start, but if they are serious about integrating with other systems they should use this as an opportunity to really listen what the community is saying. That was the goal of my post and suggestions.

As I’ve said multiple times in this thread, we are apparently barely hitting the threshold. Our users DO NOT have control over the polling interval. I do not believe that the plugin is violating any reasonable use statements in the TOS. Reasonable use is certainly in the eye of the beholder without documenting firm boundaries.

I’ve worked with thermostat providers and developers building plugins to them and, as Bill points out, have guided them to the right solution given their documented and/or declared limitations.

I hope you take this as a lesson that you may not, in fact, know everything about all circumstances around API usage.

While we’re on the topic can someone at Rachio document the webhook responses and what each webhook event does?

[
{
    "id": 5,
    "name": "DEVICE_STATUS_EVENT",
    "type": "WEBHOOK"
},
{
    "id": 10,
    "name": "ZONE_STATUS_EVENT",
    "type": "WEBHOOK"
},
{
    "id": 6,
    "name": "RAIN_DELAY_EVENT",
    "type": "WEBHOOK"
},
{
    "id": 7,
    "name": "WEATHER_INTELLIGENCE_EVENT",
    "type": "WEBHOOK"
},
{
    "id": 9,
    "name": "SCHEDULE_STATUS_EVENT",
    "type": "WEBHOOK"
},
{
    "id": 11,
    "name": "RAIN_SENSOR_DETECTION_EVENT",
    "type": "WEBHOOK"
},
{
    "id": 8,
    "name": "WATER_BUDGET",
    "type": "WEBHOOK"
},
{
    "id": 12,
    "name": "ZONE_DELTA",
    "type": "WEBHOOK"
},
{
    "id": 14,
    "name": "DELTA",
    "type": "WEBHOOK"
}
]

Would be nice to know and not just guess if this is the recommend course of action.

+1 for local device control. Going to the cloud when unnecessary is a fail. Let us use the API interfaced directly to the controller on our own networks…PLEASE!