I have a generation 2 that keeps going “offline”. After multiple resets it will sometimes come back. But I finally got around to taking a closer look after it happened this last time and I couldn’t get it to come back and my grass started to die in the heat.
Upon inspection, it connects to the Wi-Fi and network fine, gets an IP, etc. However, it shows up as offline because it can’t connect to the Rachio cloud. Specifically, it can’t resolve mqtt.rach.io. The problem isn’t the DNS server (which will resolve mqtt.rach.io just fine), the problem is that the controller is issuing a flawed DNS request so the DNS server rejects it. I belive this to be a bug in the firmware.
When the controller connects to the network, it pulls an IP via DHCP which also provides the DNS server addresses for the network. The controller sends a DNS request to the DNS server (correct layer 2 headers and everything it makes it there fine) BUT, the destination IP address of the DNS request from the controller to the DNS server is 0.0.0.0 (I’m talking about the layer 3 (IP) header here) so the DNS server rejects it (and before you ask, the DNS server advertized was not 0.0.0.0.) So the controller can’t resolve the domain for mqtt.rach.io (also pool.ntp.org but that is only for NTP and isn’t contributing to the offline problem although it is suffering from the same root cause) and thus is marked “offline”.
AND THEN, the controller doesn’t try the other DNS severs it should know about. When the controller pulled an IP, the DHCP reply provided multiple DNS servers for the controller to use but it seems to only use the first one and not use the others! Really?
And on top of that, if you look at the DNS requests from the controller, the transaction IDs are 0x0000 (and sometimes 0x0001 I guess when it retries but sequential transaction IDs shouldn’t be the behavior either because of security concerns). Really? Something clearly isn’t right with the traffic coming from the controller. Perhaps the 0x0000 transaction ID values are intentional (or a byproduct of the poor resolver code) but from a security perspective, this is an issue.
But at this point I’m willing to overlook the security issues, I just want the controller to issue proper DNS requests. If you can’t tell I’m pretty frustrated by this. Is there a patch or firmware upgrade I can apply to fix this? I realize there may be a chicken-and-egg problem if the controller can’t connect to the cloud though…
Let me know if you don’t believe me and need packet captures.