Looking Glasses: Debugging Network Connectivity Issues
Yesterday's Facebook outage showed yet again the fragility of the Internet's routing infrastructure. A lot has been written about various deficiencies of BGP, the Border Gateway Protocol. But all too often, the problem isn't the protocol but the people (or scripts) administering the routers. Our ISC website did suffer a couple of outages last year due to Verizon misconfiguring BGP (sadly... several times within a few days). Facebook's outage appears to be a misconfiguration as well, according to some early statements from Facebook [1].
So how do you debug these routing issues, in particular, if they are beyond your control? Or what to do next if DNS isn't the problem for a change?
One useful tool is "Looking Glasses." These are websites that various ISPs, and in some cases, Universities and others have created. These websites will allow you to query the routing table of various routers. Before you read any further: These tools are meant for occasional manual debugging (and most try to enforce this via captchas and rate-limits). They are not meant to be used by automated scripts. If you want automated alerting about routing issues: Check commercial services like BGPMon, Thousandeyes, and Kentik.
The routing table isn't the same for every router on the Internet. It is always good to query routing issues from different locations, which is why these "Looking Glass" sites are so useful.
First of all: Where do you find them? There is a nice web page, http://www.bgplookingglass.com, that lists public-looking glasses. Personally, I like the CenturyLink one (https://lookingglass.centurylink.com). It does provide a wide range of locations. Also, it reminds me of Don Smith, who worked for CenturyLink. I will use the CenturyLink site for my examples here.
Let's use "DShield.org" as an example. The current IP address for DShield.org is 159.203.71.83. A quick "whois" shows that the IP address is owned by DigitalOcean and part of AS14061.
# whois.arin.net
NetRange: 159.203.0.0 - 159.203.255.255
CIDR: 159.203.0.0/16
NetName: DIGITALOCEAN-159-203-0-0
NetHandle: NET-159-203-0-0-1
Parent: NET159 (NET-159-0-0-0-0)
NetType: Direct Allocation
OriginAS: AS14061
Organization: DigitalOcean, LLC (DO-13)
Note that the AS information in whois is not always current. But it is a good start to tell you where you *should* find that IP address.
Let us start with that information and see what we get from BGP via CenturyLink:
The output you will get back is essentially what you would have gotten from the router's command line:
HOUSTON TX USA Bgp results for: 159.203.0.0/16
show router bgp routes 159.203.0.0/16 ipv4 hunt
===============================================================================
BGP Router ID:4.69.182.78 AS:3356 Local AS:3356
===============================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
l - leaked, x - stale, > - best, b - backup, p - purge
Origin codes : i - IGP, e - EGP, ? - incomplete
===============================================================================
BGP IPv4 Routes
===============================================================================
No Matching Entries Found.
===============================================================================
No Matching Entries Found? Is DShield.org down? .... no. And this is one of the issues: DigitalOcean owns 159.203.0.0/16, but they choose not to advertise the entire block. They may use different parts of that /16 in different datacenters. One quick way to figure out what prefix our IP is part of is to use Team Cymru's DNS service (they also operate a whois service with the same information, but I prefer the DNS version)
% dig +short 83.71.203.159.origin.asn.cymru.com TXT
"14061 | 159.203.64.0/20 | US | arin | 2015-08-10"
It so looks like that DigitalOcean uses a /20. Let's redo our query using this /20.
We now receive a lengthy response:
show router bgp routes 159.203.64.0/20 ipv4 hunt
===============================================================================
BGP Router ID:4.69.182.78 AS:3356 Local AS:3356
===============================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
l - leaked, x - stale, > - best, b - backup, p - purge
Origin codes : i - IGP, e - EGP, ? - incomplete
===============================================================================
BGP IPv4 Routes
===============================================================================
-------------------------------------------------------------------------------
RIB In Entries
-------------------------------------------------------------------------------
Network : 159.203.64.0/20
Nexthop : 4.69.182.68
Path Id : None
From : 4.69.182.68
Res. Protocol : LDP Res. Metric : 20000
Res. Nexthop : 4.69.200.153 (LDP)
Local Pref. : 100 Interface Name : NotAvailable
Aggregator AS : None Aggregator : None
Atomic Aggr. : Not Atomic MED : 0
AIGP Metric : None
Connector : None
Community : 3356:3 3356:22 3356:100 3356:123 3356:575
3356:901 3356:2039 3356:11352
Cluster : 4.69.182.68 0.0.7.2 0.0.7.14
Originator Id : 4.69.184.239 Peer Router Id : 4.69.182.68
Fwd Class : None Priority : None
Flags : Used Valid Best IGP Group-Best
Route Source : Internal
AS-Path : 14061
Route Tag : 0
Neighbor-AS : 14061
Orig Validation: NotFound
Source Class : 0 Dest Class : 0
Add Paths Send : Default
Last Modified : 15h03m56s
The router we selected has multiple "peers." Each peer will exchange routing information with this router resulting in multiple "RIB-in" entries. I am only displaying one of the entries above. Discrepancies in these entries could indicate a problem with information received from a particular router. But they do not have to be identical. Sometimes, there may be a good reason for one router to advertise slightly different information. (RIB = Routing Information Base. The internal database routers use to store routing information).
The important part for us is the "AS-Path" line. I highlighted it above for visibility. It lists the networks that the packet will pass through to reach the destination, starting with the particular router we used to issue this query. In our case, the result is pretty simple. DigitalOcean peers directly with CenturyLink. The AS "Path" in this case is just DigitalOcean's AS, which will receive the packet next.
What you should be looking for is loops (the same ASN showing up multiple times in an AS-Path). Or packets passing through ASNs you did not expect (for example in geographic locations that do not make sense).
Are you able to get the same information via "traceroute"? Yes and no. The route displayed by traceroute should follow the route communicated via BGP. But not all routers will send ICMP errors back. Many Looking Glass sites do include traceroute as an option so you may run a traceroute from the router to confirm what you are seeing in BGP. A packet may pass through an AS using multiple routers. You will see more "hops" with traceroute and traceroute may identify issues within an AS that are not necessarily visible in BGP.
[1] https://engineering.fb.com/2021/10/04/networking-traffic/outage/
[2] http://www.bgplookingglass.com
[3] https://lookingglass.centurylink.com
---
Johannes B. Ullrich, Ph.D. , Dean of Research, SANS.edu
Twitter|
Application Security: Securing Web Apps, APIs, and Microservices | Denver | Oct 2nd - Oct 7th 2024 |
Comments