Generation 2 - CAN connect to Wifi - CANT connect to Rachio AWS cloud


#1

Hey guys, hopefully someone can help me out here. Support has gone silent over the past few days.

I’ve been trying to get my Rachio Gen 2 to connect to the Rachio AWS servers for weeks now with no luck. I’ve done firmware updates, reset to factory defaults etc on my equipment and on the Rachio). I’ve even tested two different Rachio Gen 2s, with the same result. I was able to show that the only device being used was the R7000 for the router and for wifi, could show that it was on a 2.4Ghz channel, and even placed it in the DMZ and could see the world trying to port scan it - so it was sitting on the internet, and still couldn’t connect. Figured maybe my ISP was blocking the required ports…but nope - tested that too, more on that below.

So, I can see that the Rachio is pulling an IP address (because I know it’s MAC address) and is on my LAN, but it makes no attempt to go outside of my network. I know this because I’ve done numerous different Wireshark traces (using a port mirror on the switch that goes to the router - in this scenario I was not soley using the R7000, since I couldn’t do a packet capture on it), all with the same result. I’ve even swapped out Routers. Initially I had a Netgear R7000, and thought maybe that was the cause, since I COULD get the Rachio to connect if using my phone as a mobile hotspot (purely a test and I couldn’t actually leave it that way), but it did show that in that scenario the Rachio could connect. The issue that I had though was that I could show that I could resolve DNS queries externally using my router or 8.8.8.8 or whatever, through my router, resolving to their servers (mqtt.rach.io). Since I could do that and since I could port query TCP 8883 on those same servers, ports / connectivity didn’t appear to be an issue - something else is still the cause.

PS C:\Users\eric> Test-NetConnection -ComputerName mqtt.rach.io -Port 8883

ComputerName : mqtt.rach.io
RemoteAddress : 52.32.13.229
RemotePort : 8883
InterfaceAlias : Wi-Fi
SourceAddress : 192.168.x.x
TcpTestSucceeded : True

The same port tests worked via my ISP (Verizon FIOS) and via my Mobile Hotspot (also Verizon).

Just in case though…figured I’ve already spent so much time on it, lets throw some money at the problem, so I bought a Ubiquiti USG Pro (since you can’t do a packet captures on the R7000 natively, and since I have no Netgear support, and since I wanted to buy one anyway :wink: ). Unfortunately the Rachio still couldn’t connect using the new router (also not blocking anything outbound and can also connect to the required ports and can resolve their AWS cloud based servers), but in doing the wireshark traces I don’t see the Rachio even attempt to leave my network. It pulls an IP address from my router, broadcasts some ARP traffic, does an IPV6 multicast tries to do some more ARP traffic, and then the process starts over again. No attempt to query DNS, which it’d need to do to find the servers, to connect via the required ports.

The unfortunate part is that I don’t have a working Wireshark trace to use as a baseline for comparison. I’ve asked Rachio for one since it’s an easy Repro, but I’ve go no response. If anyone out there wants to reset their Rachio Gen 2 to Factory Defaults and then do a Wireshark trace doing the initial add operation, that’d be awesome - any takers ? :wink:

I just need to figure out how to do a tcpdump or network trace on my Galaxy S7, so I can have that baseline.

The other unfortunate part is that there’s no way to SSH or console to the Rachio to look at any sort of logging.

The only thing that I noticed when doing the port testing via my ISP vs using my phone as a mobile hotspot is that when using the hotspot, it appeared to be using the IPV6 address - which should have no bearing from a port perspective, but maybe from a routing perspective? Not sure I don’t have a background in networking.

Anyhow, if anyone has any ideas, I’m happy to hear them, as I’d love to figure out root cause.

Thanks,
Eric


#2

@jansenet

Whoa this is above and beyond what we expect our normal users to do!

Can you do me a favor and run a network scan and send me the code that is given to you?

:cheers:


#3

Hi Franz, yeah, I’ll have to send you guys a new one since I’m not using the R7000 now. Gimme a few and i’ll send your way.

Also, just to give some additional details, currently I’m using my Ubiquiti Unifi AP - AC Pro version (that the Rachio is connecting to), which connects to a Ubiquiti 16 port POE switch, which goes to the Ubiquiti USG (firewall / router) - all at the latest firmware level.

Thanks,
Eric


#4

Here’s the code from RouteThis, post scan - SDW4XBB3


#5

When running the RouteThis scan, I did get an alert on my USG IPS saying Non-DNS or Non-Compliant DNS traffic on DNS port - Opcode 8, though 15, from my Phone (where I ran the RouteThis scan) to 54.236.218.231:53

Don’t get that when the Rachio Controller itself tries to connect though. I could turn my IPS off and re-scan using RouteThis if needed

Thanks,
Eric


#6

Just a quick glance might have a few issues.

  • We run into problems with 2.4g/5g networks of the same name.
  • UDP port 53 could not be reached (I don’t know what we use that for…)
  • Mesh networks (especially enterprise grade) are sometimes problematic

:cheers:

Screenshot from 2018-05-17 15-56-59


#7

Yeah, I can see why the UDP 53 traffic failed in my USG - I could fix that and re-scan, or just turn the IPS off for further testing - just got this USG put in place for testing since we couldn’t get the R7000 to work at all, no matter what we did.

As far as the Wifi name, when I had the R7000, I had separate SSID’s for 2.4 and 5ghz bands.

Now that I’m using my Unifi APs, the SSID is shared between the bands.

I could always put the R7000 back in place to further simplify, just not sure what further troubleshooting steps you guys would like to do. Travis S has the whole history of the case to include the old RouteThis scan.

I guess what I don’t understand is that if the Rachio controller is able to get on my Wifi network to pull an IP address, and i can see it on the network, ping it etc, why does the band matter? (asking because I really don’t know - not trying to be difficult).

Thanks,
Eric


#8

OK, so in an attempt to try to fix my setup to comply with what you pasted above, I’ve done the following:

  1. Disabled the IPS that was blocking UDP 53 out for DNS - even though I only see it being blocked when scanning using RouteThis - never from the Rachio Controller.
  2. I broke apart my Access Point Mesh and dedicated one AP to the Rachio. I created a separate WLAN called RachioNet-2.4, and only allowed that SSID on the dedicated AP. I also disabled the 5GHz radio on that dedicated AP, even though the Gen 2 can only use 2.4. I also removed all other SSID’s - only RachioNet-2.4 can be used on that AP.
  3. On my other AP, I ensured that RachioNet-2.4 was disabled, so that my phone and or controller couldn’t connect to that AP on that SSID.
  4. Tried to connect my controller, while running a Wireshark trace on a mirrored port. Unfortunately it failed to connect to the cloud. The wireshark trace showed the exact same frame sequence/pattern as before.
  5. I re-ran the route this with the new configuration - the code remained the same - SDW4XBB3

Thanks,
Eric


#9

The sequence you should be seeing in your wireshark dumps should look like this:
DHCP request from the controller to get an IP.
DNS request for pool.ntp.org to get an address of an NTP server.
NTP request to get the time.
DNS request for mqtt.rach.io
MQTT with TLS connection to mqtt.rach.io

I think if you’re not seeing the connection to mqtt.rach.io happen that means the DNS result the controller got wasn’t right. There is an issue that if one of the DNS server configured via DHCP returns NXDOMAIN or similar that the controller will take that as the final answer and won’t query any other servers. So it could be something like that.


#10

FYI, I’m running an USG-XG with IPS enabled and I haven’t had any problem. The rachio is connected through a UAP-HD. You should have something else blocking the connection. Maybe a custom firewall rule or something with your ISP.

I forgot to mention that both of my bands 2.4 / 5 GHz are using the same SSID.


#11

Thanks for sharing your setup and confirming that you’re good regardless both bands sharing the SSID. I was fairly certain that I was good to go in that regard since I could pull an IP. The main problem that I’m seeing is that after it pulls an IP address it never attempts to leave the network. :frowning:


#12

Are you seeing DNS activity? After DHCP that should be one of the first things to happen.


#13

Hi DPG, thank you for sending the frame sequence! I tried to do a trace on my phone since I knew the mobile hotspot add operation worked, by downloading tPacketCapture, but when running that and trying to add the Rachio, it would fail; the packet capture software sets up some sort of VPN to intercept the traffic, and the controller didn’t like it. As soon as I disabled the VPN / packet capture, then the Rachio could connect… Anyhow, definitely appreciate your response.

So when I look at the Wireshark trace (I left it running over night watching only traffic from the Rachios MAC address, using the following display filter in wireshark (where the X’s are obviously not the real values):

eth.addr == f0:03:8c:xx:xx:xx

I see in the DHCP ACK frame the Option 3 (for the Router) and Option 6 (for the Domain Name Server) offer up my routers IP address (10.1) for both.

All 90 or so other devices on my network are also using 10.1 for their DNS and they work fine. When manually checking from the machine that I’m typing on right now this is the result:

Resolve-DnsName -Name mqtt.rach.io -Server 192.168.10.1

Name Type TTL Section NameHost


mqtt.rach.io CNAME 275 Answer a3bmbcwe3hybwy.iot.us-west-2.amazonaws.com
a3bmbcwe3hybwy.iot.us-west-2.a CNAME 275 Answer iotmoonraker.us-west-2.prod.iot.us-west-2.amazonaws.com
mazonaws.com
iotmoonraker.us-west-2.prod.io CNAME 275 Answer dualstack.iotmoonraker-u-elb-1w8qnw1336zq-1186348092.us-west-2.elb.amazonaws.com
t.us-west-2.amazonaws.com

Name : dualstack.iotmoonraker-u-elb-1w8qnw1336zq-1186348092.us-west-2.elb.amazonaws.com
QueryType : AAAA
TTL : 60
Section : Answer
IP6Address : 2620:108:700f::3423:f086

Name : dualstack.iotmoonraker-u-elb-1w8qnw1336zq-1186348092.us-west-2.elb.amazonaws.com
QueryType : AAAA
TTL : 60
Section : Answer
IP6Address : 2620:108:700f::22d0:8d83

Name : dualstack.iotmoonraker-u-elb-1w8qnw1336zq-1186348092.us-west-2.elb.amazonaws.com
QueryType : AAAA
TTL : 60
Section : Answer
IP6Address : 2620:108:700f::3420:de5

Name : dualstack.iotmoonraker-u-elb-1w8qnw1336zq-1186348092.us-west-2.elb.amazonaws.com
QueryType : AAAA
TTL : 60
Section : Answer
IP6Address : 2620:108:700f::3420:503f

Name : dualstack.iotmoonraker-u-elb-1w8qnw1336zq-1186348092.us-west-2.elb.amazonaws.com
QueryType : AAAA
TTL : 60
Section : Answer
IP6Address : 2620:108:700f::3270:8294

Name : dualstack.iotmoonraker-u-elb-1w8qnw1336zq-1186348092.us-west-2.elb.amazonaws.com
QueryType : AAAA
TTL : 60
Section : Answer
IP6Address : 2620:108:700f::342a:58c4

Name : dualstack.iotmoonraker-u-elb-1w8qnw1336zq-1186348092.us-west-2.elb.amazonaws.com
QueryType : A
TTL : 34
Section : Answer
IP4Address : 52.42.88.196

Name : dualstack.iotmoonraker-u-elb-1w8qnw1336zq-1186348092.us-west-2.elb.amazonaws.com
QueryType : A
TTL : 34
Section : Answer
IP4Address : 52.32.13.229

Name : dualstack.iotmoonraker-u-elb-1w8qnw1336zq-1186348092.us-west-2.elb.amazonaws.com
QueryType : A
TTL : 34
Section : Answer
IP4Address : 52.32.80.63

Name : dualstack.iotmoonraker-u-elb-1w8qnw1336zq-1186348092.us-west-2.elb.amazonaws.com
QueryType : A
TTL : 34
Section : Answer
IP4Address : 52.35.240.134

Name : dualstack.iotmoonraker-u-elb-1w8qnw1336zq-1186348092.us-west-2.elb.amazonaws.com
QueryType : A
TTL : 34
Section : Answer
IP4Address : 50.112.130.148

Name : dualstack.iotmoonraker-u-elb-1w8qnw1336zq-1186348092.us-west-2.elb.amazonaws.com
QueryType : A
TTL : 34
Section : Answer
IP4Address : 34.208.141.131

DNS itself works fine. The issue that I’m seeing is that the Rachio never even attempts to resolve anything. I see no DNS traffic.

I can send you the traces if you wanna take a look? See if i’m missing something.

Thanks,
Eric


#14

When I change my filter to the still running capture, adding an or statement for the Rachio namespace, I only see my machine do the query, based on what I pasted above. The new filter looks like this:

eth.addr == f0:03:8c:xx:xx:xx || dns.qry.name == mqtt.rach.io

Thanks,
Eric


#15

I think for android you’ll need a rooted device to get anything useful. If you have a rooted device you can download a binary of tcpdump and use that to do a packet capture.

I think if DNS isn’t happening that probably means the issue is around DHCP.

When the gen2 is trying to connect to your network does the second led go solid on a few seconds after the first led? If it the third led takes a long time to come on it could be that your controller is going into autoip mode and in a lot of situations that’s pretty useless.


#16

Nah, it’s not getting an APIPA address. I can see that it’s pulling the IP address that I reserved for it (10.250). It always makes it to the 3rd light, but just blinks and stays there.

Thanks,
Eric


#17

Sorry I meant second LED instead of third :slight_smile:
Are you able to ping the address the DHCP server assigned to it?
Getting to third LED should mean that the controller thinks it’s on your network and
it’s trying to connect to mqtt.rach.io.


#18

Here’s a ping to the USG which is the firewall/router/DHCP/DNS server:

PS C:\Users\eric> ping 192.168.10.1

Pinging 192.168.10.1 with 32 bytes of data:
Reply from 192.168.10.1: bytes=32 time=1ms TTL=64
Reply from 192.168.10.1: bytes=32 time=1ms TTL=64
Reply from 192.168.10.1: bytes=32 time=1ms TTL=64
Reply from 192.168.10.1: bytes=32 time=1ms TTL=64

Thanks,
Eric


#19

Can you ping the controller from the same network? It should respond to the address it got via DHCP and it’s IPv6 link local address.


#20

PS C:\Users\eric> ping 192.168.10.250

Pinging 192.168.10.250 with 32 bytes of data:
Reply from 192.168.10.250: bytes=32 time=1460ms TTL=255
Reply from 192.168.10.250: bytes=32 time=401ms TTL=255
Reply from 192.168.10.250: bytes=32 time=2ms TTL=255
Reply from 192.168.10.250: bytes=32 time=4ms TTL=255

Ping statistics for 192.168.10.250:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 2ms, Maximum = 1460ms, Average = 466ms

PS C:\Users\eric> ping 192.168.10.1

Pinging 192.168.10.1 with 32 bytes of data:
Reply from 192.168.10.1: bytes=32 time<1ms TTL=64
Reply from 192.168.10.1: bytes=32 time=1ms TTL=64
Reply from 192.168.10.1: bytes=32 time=1ms TTL=64
Reply from 192.168.10.1: bytes=32 time=1ms TTL=64

Ping statistics for 192.168.10.1:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 1ms, Average = 0ms

PS C:\Users\eric>