These are unedited transcripts and may contain errors.
Routing Working Group session:
18 April, 2012, 2:00?3:30 p.m..
CHAIR: Hello. Good afternoon everyone. Let's get started here. It's 2 p.m.. this is the room for the Routing Working Group session. If you are looking for the other Working Group, that's taking place in parallel right now, it's the room upstairs.
We have a full agenda today. So let's start immediately. My co?chair, Rob Evans, wasn't able to be here this time, but nonetheless, he is actually the person who did most work in putting together the agenda, so I thank him for that continued work. This afternoon we will have Robert Kisteleki of the RIPE NCC as the Jabber scribe and the others as the scribe. So thank you very much for that and of course we count with the really good help of the stenographers for this session as well.
When you walk up to the microphone to make any comment on or question, please do state your name and affiliation for the benefit of the remote attendees, so they know who is speaking.
So the next item is the minutes of RIPE 63, our meeting back in Vienna. We circulated them. We have received no comments. Does anyone have comments to add at this time? No, okay. They can adopt those minutes.
We have three talks. The first one by Geoff Huston, the BGP DF Z in 2012. That will be followed by a talk by Randy Bush. A different title than we have here before. And we will end with a talk by Alexander as move on BGP routability and any other BIS, if there is any. So if Geoff is around ?? I saw him earlier ?? could he come up here.
GEOFF HUSTON: Good afternoon everyone. Who owns a router? Well, lots of you, right. When are you going to buy your next one? Tomorrow? Or would you prefer to hang on to your money and wait for a few more days? Most of you would actually like to get a fair deal of life out of your routing equipment and the real issue for you and the vendor who makes your router is trying to figure out how big to make them. And big has many dimensions, but one of the most critical ones for a router on the Internet is trying to figure out how many entries are going to be stuffed into that router. Now, there is two reasons why you get routing entries. One is, of course, your interior network topology and customers, the iBGPs and IGPs of this world. That's very hard to measure from outside, because everyone is different. But the other thing that actually fills up space, and an a lot of space is the eBGP world that a fault Freezone of the Internet, and how many entries are in that and how fast they are growing is actually what I'm looking at here. So, this is one half of the view about scaling the network and what it looks like.
We don't understand the system and we tell each other conventional lies. This one I actually had to review a paper a few months ago from folk who theoretically know a lot about routing, and there are two things here that I found pretty weird, because it's conventional wisdom but I'm not sure it's true. We tell each other that the default Freezone of the Internet is growing rapidly all the time, just grow, grow, grow, grow, grow. And the other thing we tell ourselves a lot of the time is that the reason why it grows is not because the Internet itself is growing so quickly at the edge, it's because everyone has bad routing practices, and they deaggregate and spew that noise all over the Internet. Is that true? How much of the growth is actually due to deaggregation and how bad is it? And then we tell ourselves you must aggregate, you must do the right thing, all this deaggregation is causing trouble. But the real question S. Is that true? So, I'm going to quickly look through that and look at some data we have collected.
So I want to explore how long the routing table is growing. What size router you should be buying if you want it to work for a few years and then I want to turn my attention to actually look at is deaggregation in the default Freezone out there in the bits of the domain that are eBGP, to what extent that's dominating the routing table.
This is the big picture of the routing table growth. This is the old classic, up and to the right, exponential growth. The starting date here is as far back as I can get reliable data, which is 1989, and over there is a few days ago. You can actually point to some pretty amazing things inside that data. Way back in 1991 or so, we kind of figured that there were two problems they were going to kill us. The first problem was we were going to run out of addresses, whoops. But the next problem was also just as bad, we actually thought, well this whole routing system with class As, Bs and Cs is not going to work at all. So as well as v6, the other thing we did in the very early nineties is get rid of class based routing. And the first time that hit product was actually at an IETF meeting in March of 1994, and the community was small enough that it could be deployed really quickly you you'll actually notice that the growth pattern, which we thought at the time was up and to the right, in retrospect is tiny. Actually managed to stop the problem for a while, and the resultant growth curve was pretty good.
But, you know, we love main a, we love excitement, we love a boom, everyone loves a boom and that was the big boom of the year 2000. And the inevitable crash happened and that's the inevitable crash in routing, it was 2001. There is broadband to the masses in 2005, 2006, and yes, grease and Spain an Ireland managed to do their bit. There is the aftermath of the global financial crisis, just around there. We are not immune from the normal cycles of business.
Address exhaustion so far hadn't really altered things much. Let's have a look at that a bit more. Jan 2010 up to 2012. There are some folk who decided that that actually do the right thing and that's very good, and just around the early part, 12 months ago, that's the point where APNIC ran out and rather than having more specifics immediately hit the routing table, the funny thing is it actually slowed the growth of the routing table down for a little while. Because we started doing it again. So, it's kind of back to the normal growth.
How much addresses are inside this routing system? This is a different view of the routing table, and I'm trying to see how much of the total address space, 256 /8s are actually covered in the last couple of years. The first thing is that's pretty weird that the address space sort of jumps up and down like that. That's my fault and a few oh I am sorry. These are /8s that get advertised, and for a while there over the last couple of years, before we started allocating /8s, we actually advertised them for a few weeks, just to find out how much crap traffic was being directed at them so when we handed you a /16 we could say with some honesty, we have tested it and 103.25.2.is is looking a bit sick, don't give it to a customer, they won't like you. Which was actually quite a useful thing to do at the time. So that's the reason why you get some of this noise. And now it's pretty clear what address runout is doing. That growth in address space slackened off as APNIC ran out of addresses. That's the /8s. The other way to measure the Internet is by the number of players. And we grew across two years from 33,000 to 40,000. This is a weird graph. This is just bizarre.
Every day we add 11 more networks. Not 20. Not 40. 11. And most of them come from this region. That's seven a day. You guys are a machine. There is just no other way to do it. I am sorry use seven today, we are going to do this, use seven the next day and it's been going on for years. That's one of the few curves on the Internet that is almost completely straight. You take a small amount of time off around Christmas and then you're back again. No one takes a holiday by the way over August, you are still just working a day, seven a day, well done, very consistent.
So how big does this grow? From January 11 committee to Jan '12 the number of prefixes grew by 13%, 14%. The amount of addresses grew by only 6%, the a.m. of ASs, around 9 to 12% in a year. That's not a lot. Moore's Law says a different kind of number in terms of scaleability of sill con, so, what we are seeing is a relatively modest growth of about one sixth or so per year, slowed down a bit since we ran out of addresses. Nothing terribly dramatic.
This is the last two years in v6. And I love this graph. It is just brilliant. One could argue that World IPv6 Day was a complete and stunning failure. Why? Because after it, the growth in v6 was a lot worse than the growth before it. Somehow it was the world let's stop growing v6 day. So, yeah, I am a little bit harsh on it. What I think actually happened, really, is that a lot of folk accelerated their plans, and that there was this sort of momentum in v6 but when we started talking about v6 day and it started to hit the industry around Jan of last year, a lot of folk brought their plans forward and started to do this earlier in the year. So that that number by World v6 Day we were around about 6,500 ASs, sorry 60 hundred prefixes of a bit accelerated. After v6 day, it slackened off a bit because the folk that were going to do their job, did it. And these days, you know, it is a lot slower than v4. There is the address span. This is the number of /32s, a lot. There is a lot of space in v6. But this is all weird. I did a bit of it. I advertised a /12 and Gert Döring saw T someone was advertising a /8. I saw it. Gert saw it. No one else did. No one else even noticed that this huge amount of space, unallocated dark space, was actually being advertised for a while in vision.
The AS count, much again, there is World v6 Day. Same kind of explanation. The stats in v6 are much more impressive. This is like the old Internet. The growth rates are up around 80, 90, 100 percent in the year, so the overall growth rate in 2009: 50%, the growth rate last year: Higher. So, I can do a bit of extrapolation, you can see the graphs if you want. But if I just take how long would it take v6, not to get to 400,000 entries, that's not really that interesting. How long would it take to get from the current AS count, 5500 to 40,000? If the industry conditions at the current pace, not that far away, 2016, which is actually not a bad number. It is achievable.
So, I get back to this issue, you are buying a router tomorrow, you are busy writing the specs, how long big? How many entries should it cope with? And you go, well this is a problem. Because we are running out of addresses and as we run out of addresses, we don't know whether the consequent transfers and trading and whatever happens will fragment the IPv4 space more or not. We just don't know. So, this is a tough one to speculate. But we can have a go anyway. What the hell, it's just maths. So you take the growth curve and you smooth it and then you do the first order differential, remember that from maths, it makes that global financial crisis stick out like a sore thumb. All of a sudden, when money hit the wall, BGP growth just stopped. But after that, optimism resumed, the money came back into the market and there we are again. There is IANA exhaustion.
So, the networks kind of growing. And we can actually do a least order squares best fit. We can get up to that kind of equation there and we can drive that forward in time and we get to this and that gives us this extrapolation.
It's not that scary. That in five year's time, if the industry kind of keeps ongoing with the current momentum, you currently need around 400,000 entries, by 2016 you'll need about half a million entries at this rate of growth. Fair enough. What about v6? Same technique. Do the growth rates. Extrapolate it forward. V6 is special. I actually don't think polynomials work. I like exponential here. I think you guys are really right into this. The growth rate is a lot higher.
The big numbers in terms of growth but small numbers in absolute terms. Even in four year ace time at this growth rate, it's still tens of thousands of entries.
Why would it be a problem? Silicone does get faster. Today's routers are bigger and quicker than yesterday's routers. When does the cost of routing, the unit cost, get more expensive? If you are paying the so much for routing and you keep on buying today's state of the art, as long as that cost is the same in real terms, you are okay. The cost increases when you are growing faster than Moore's Law. Because, as you heard on Monday from Greg Hankins in Brocade, once you get past Moore's Law, life gets really, really, really crap. So, you know, Moore's Law says, doubling every 18 months to two years you're okay. And that's the kind of graph of Moore's Law. So, here is the v4 routing table. Here is where Moore's Law would drive you. Here is where the stats drive you. In the eBGP world recollect in the default Freezone, it's kind of hard to see that this is a problem. There may be other problems in your life, there may be lots of other problems in your life but the growth rate of the IPv4 routing table at the eBGP level is probably not one of them at this point in time. That looks okay.
V6? Well, interestingly it's kind of faster. There is Moore's Law, and there is an optimistic exponential growth and you kind of go, well that's scary, isn't it? Well, no, because over there, you are still dealing in tens of thousands, not millions. They are double the size. So what? So, again, you might have many problems in your life, but as far as I can see, the size of the routing table it not going to make you lose sleep. That growth rate, as far as I can see, is no great cause for alarm at this point in time.
So, that's cool. But it assumes something. I can't stop you deaggregating and advertising 100,000 prefixes. Nobody can stop you. Because the routing system has no police, and has no natural form of limitation. It actually works well because you're all, on the whole, well?behaved. Congratulations. Let's look at this and look at how well we are behaved and part of this is, is aggregation still working? Are you still doing the same thing? That research paper said it's growing like crazy and the reason why is that you guys just deaggregate. You just sit there and spew out more specifics. And here is a really good example. Anyone want to own up to being Relcom? AS 2118? If you are in the room, this is you. Stop it. If you want find out if you are really in the top ten, go and like at the CIDR report, the Americans were winning when I did this at the end of the year on 23rd December, BellSouth, Covad and Korea. If you look at today, if you find yourself you can see how you rate. So there are some folk who are pretty bad at this and some folk don't figure.
Here is the routing table, same figure, up and to the right. Here is a different way of looking at it; this is now percentages. I am looking at the percentage of entries in the routing table that are more specific. And this is since 2002. This is ten years. This isn't up and to the right. It isn't even down. It's flat. There are very few things in the Internet that are flat. Well dead things are are flat. But apart from that there are very few things that, you know, are active constantly growing but at the same rate. And fascinatingly the number of more specifics is half of the routing table. And has been for ten years. So whatever you are doing, you are just doing consistently. And I thought well maybe that's just me. Maybe that's just what canney view from Australia so I go and look aunt every peering and route view since 1998, so this is going back not just to 2002 but even further. There is the great Internet boom and once we got over it, almost every peer of route views has the same view: Around half the entries are more specific of the other half. Bizarre. So I am getting a BIS curious about this, and you should be too, then I go how much of the address space is covered by more specifics? And that's a different curve. So, this is the address space, this is 0% up to 40%, and over the years more and more address space is covered by more specifics. The same number, 50%, but, you know, it's covering more and more space.
Do all of you do this? Well I do a cumulative distribution curve and I come up with the fact: No. Not all of you do this. A very small number of you do this. There are 40,000 ASes but 1186 do most of the deaggregation. And half of you, 20,000, do none. And the top ten ASes do 20,000 just on their own. So there are a small number of you that really, really, really like deaggregation. The rest of you do it a bit and there are few a don't do it at all. So it's not everybody.
So there is the list. Look for yourself. It is all over the world. In this particular case, back stand, Korea, America, Cox Communications, VTEL and  Tartar in India so it's all over the place.
So then I kind of looked: Is it the same people in what if I looked a year ago? And this is really, really weird. Over the last three years, I have looked at the worst deaggregators and plotted them all simultaneously. Some folk never learn, they just don't change. Other folk do better over time and the number of small specifics disappear. Other folk love deaggregating and do it a little bit everyday, they add two or three more to the mix. The end result is a constant, that the folk who are learning are compensated by the folk who are deliberately are ignoring, but the number of aggregates remains constant at the end so we are getting smarter and dumb err simultaneously. Sop of you are picking up the message, some of you just don't have any long term memory. Get with it.
This is weird. Why do you deaggregate? Why do you announce more specifics? Well, there are a number of reasons. Hole punching: You have a block, I have a more specific from your block but I announce it with my originating AS. So I am punching a whole and you can see it as a different originating AS. The other way is it's just you, you have multiple transits or upstreams, you deaggregate and in order to do load balancing on incoming traffic, you announce different prefixes down each upstream path so that you can balance the traffic. What do I see from that? What I see is the same origination but different paths. And some folk announce more specifics because they looked at the YouTube routing incident and said, ah?ha, that was a more specific attack, let me up the ante and let me advertise everything I have of /24s so nobody can hijacking me with a more specific. Yes, that happens. Would I classify that as poor routing practices, but this happens. You can actually classify this. The red is the same path. So this is fobbing who are basically advertising more specifics with the same path as the aggregate. I think that's the protection folk. The I am going to beat you to the draw, I am just going to advertise my more specifics.
The green is a different origin. So these are the folk who are hole?punching, and the blue is traffic engineering. Same origin, different path. Those are absolute numbers. What about in percentage terms? That's kind of interesting. Because, the folk who are hole punching is getting less and less. When I get a block and I have customers, I actually don't normally let them originate more specifics any more. This industry is less and less generous about that. In percentage terms, that's going down. What about the folk who just announce more specifics as a defensive mechanism? That's still very prevalent. And the folk who are doing same origin different path traffic engineering: About 30% of the routes.
Is it bad? Is this deaggregation bad? And the wisdom I get when I ask folk is of course it's bad, Geoff. Well why is it bad? Well there are lots of them. No there aren't. Well they are very noisy. Oh, you mean to say that if I advertise a more specific, I'm going to update it at a greater rate than the aggregate? Of course, that's what happens, isn't it?
Is it? Let's have a look. We have the data. We can search back. So let's look at the last three years of updates and look at the update rate if you are a more specific or if you are an aggregate. If more specifics really are noisy, they should have a higher average update rate, shouldn't they? Yes. So there is a green and a red data point for everyday since 2009. And if more specifics were noisier, the green would be above the red. You'd see a green band and a red band. I can't see any difference. And if you can, I'd like to know how. As far as I can see, even from the front row, there are indistinguishable. That's weird. The same update rate per prefix. Each prefix has a probability of being updated at around 0.3 a day. How about the number of prefixes themselves, irrespective of the number of updates? Can you see a difference between the green and the red? I can't. So at this point, this is weird. This is completely unexpected because I, like you, believed that more specifics were evil, and they were evil because they were unconstrained routing noise. All this load balancing traffic engineering just generated heaps of updates and that's really, really bad. I can't see it. No matter where I look, I can't see that. And that's really quite surprising. That we advertise and change aggregates the same rate as specifics.
So, is it a problem? I'm not sure. As I said, there are many reasons why you should not sleep tonight when you are worrying about the way in which your business is doing its routing and, you know, security someone of those oh my God nightmare situations, but in terms of more specifics, noise, update rates and so on, I'm not sure it's a problem. We could do a whole lot better. But am I saying to you: Drop everything and fix it? I'm not sure that I'm going to wave that flag at you. It would be nice if you did, and please do so, because cleaning up the world is always a good thing. But it's not doing too much damage at the moment. It's not the major problem you are finding in buying a new router.
So, that's the end of it at this point. As far as I can see, it's not the major problem out there in the eBGP world. We could do better. We should do better. But, what we're doing so far isn't exactly dangerous and it's certainly not killing your router. So thank you.
CHAIR: Thank you, Geoff.
AUDIENCE SPEAKER: Hi, Dave Freedman from Claranet. Excellent work. Did you manage, or did you measure the effect of drive by deaggregations? You know where somebody does a deagg for a short amount of time. This is quite a popular thing, people doing denial of service mitigation tend to use, there is a whole new industry of denial of service mitigation providers and their mechanisms are based on doing these rapid sort of deaggregation to say do with sinking traffic.
GEOFF HUSTON: If they are doing it in bulk, I would expect to see, in this particular graph I would expect to see little spikes because if they doing it in bulk, it would really appear. There is a lot of more specifics: 200,000, and to try and spot in those 200,000 a new one that appears for a day or two and then disappears is difficult and I wasn't particularly looking for that. So, the drive?by deaggregates aren't happening inside a massive batch file. What you do find is folk who jump up, there is one there, and it goes up again, but then they go stable. They kind of, I have deaggregated my world, I am a happy person, I'm not going to change it again. So those spikes, we don't see them as clearly. It's something worth looking for and thank you for the suggestion, I'll see if I can find drive by deaggregates. I like the expression.
AUDIENCE SPEAKER: I was looking at your graph and since you mentioned that, that the routing table is not growing in a particularly worrying pace ??
GEOFF HUSTON: The eBGP routing table.
AUDIENCE SPEAKER: Sorry. I couldn't help but wonder if this could be taken as an indication that the supposed black market of IPv address Yours sincerely actually not happening ?? not happening now at least because there is been a lot of talk in many places about this, how IPv addresss are being bought and sold and everything and it's hard to imagine how this could happen without deaggregating.
GEOFF HUSTON: So, in general ?? in general, we have seen that folk Professor, particularly honest folk, prefer to use registries, and in APNIC in particular, as part of our transfer policy, we actually have a published log of all transfers. And if there was this kind of open, not black, open market in addresses and deaggregation, I'd be able to take the log of transferred addresses, look in BGP and find if we are deliberately shattering up the prefixes. So far, that's not visible. But, you know, we have only run out of of v4 in addresss in APNIC for a year. You know, it's early days, and many providers are still running through, if you will, their one year that they got a year ago so they have still got stock. I suspect the next couple of years are critical. And as RIPE runs out, on the 12th August, guys, as RIPE runs out, time ?? whenever the hosteys get to work, whatever happens in Amsterdam, ten o'clock, whenever RIPE runs out I think you are going to find a bit more pressure on this. But we have no data. It's a new world. I can only guess. Nothing is far.
AUDIENCE SPEAKER: So you would say that while it's not worrying, we shouldn't run and deaggregate our networks just now?
GEOFF HUSTON: Well none of the data suggests that there will be this explosion. And the other thing about the routing system is it seems the folk who do routing, you folk, are incredibly conservative. And you push back at your product managers and I think you even push back at your vendors, because there are very few events that are extreme lunacy inside these numbers. Most of you do a really good job at doing the same thing Tomaz you did yesterday. And that makes these routing systems have very consistent numbers and it's a pleasure to watch because of that, it is a finally tuned system at work and it seems to be, on the whole, working so far I hope.
AUDIENCE SPEAKER: A remote question: I have two questions on Jabber from James Blessing Limelight. What about the update rates of prefixes that have more specifics versus prefixes that are whole?
GEOFF HUSTON: That are, sorry, the last word?
AUDIENCE SPEAKER: That are whole, that don't have holes in them.
GEOFF HUSTON: Oh... okay, so this is really hard. Because if I advertise a more specific into an aggregate, and it didn't have a hole but I am creating a hole, should I count that as a whole aggregate or a solid aggregate? I thought about this too James and actually the classification of aggregates that have more specifics and aggregates that don't in the context of the data you are searching through is actually difficult. So, I haven't looked because I'm not sure I know how to.
AUDIENCE SPEAKER: Second question: Your picks numbers stopped below 30K when we are expecting to hit N per ASN, where N is greater than 1. When do you see that happening?
GEOFF HUSTON: This is again a really interesting kind of question. We are seeing sort of funny trends in the v6 routing table. The first thing is that the currency, no matter what your allocation policies are, are that /48s have a fair deal of global could he herein. Some folk do filter but on the whole you can advertise a /48 and the other 6,000 ASs can generally reach you. So, right now what's going on is, quite a fine grain level of detail is inside the v6 routing table. How much deaggregation is in the v6 routing table? Not a lot. Why? I suspect that A) there is no legacy swamp, and that B) the folk who are doing v6 at this point aren't at those real big edges of the network where I have got very little bandwidth. A lot more traffic than bandwidth per circuit and I am doing the traffic engineering juggling. I don't see a lots of that in v6. So, right now the level of N addresses per AS is actually quite low. So that's why I think when we project v6, looking at the AS numbers versus the prefix number is actually a slightly more reliable projector.
CHAIR: Thank you, Geoff. By the way, it will not be August. This is Europe.
GEOFF HUSTON: I would like to remind you that over there in the RIPE NCC and down there in routing land, you guys don't take holidays. Over here is the AS per day. Where can I find it? The Europeans in the routing world don't take holidays. You don't stop for summer. You stop for a couple of hours at Christmas and the rest of the time 24 hours a day you are just working.
RANDY BUSH: Randy Bush, IIJ. 077068, 0 positive ?? there is a 2010 paper in JSAC in general on this subject that does have the numbers James was asking for, some numbers James was asking for, regarding announcement rates for different flavours of this stuff.
GEOFF HUSTON: Okay. Thank you very much for that reference.
CHAIR: Could you send the URL to the list.
RANDY BUSH: Maybe I'll send it to Geoff.
CHAIR: Thank you. Don't sit down, you're next, Randy.
RANDY BUSH: So, back in '93/'94, a friend of of ours for many of us here, Sean [Durin], of Sprint, was under?budgeted, which of course never happens to any of the rest of us, and his routers didn't have the memory that maybe he they should have. He was still running on ASRs, and he was dieing because of the churn, the CPU and memory were getting eaten by BGP churn. So, in this forum those many years ago, we developed something called route flap damping. I hope most folk here know what it is because I'm not going to try to go into it. And we deployed it and it did save Sean, but what happened was by, I don't know, '96 or something like that, '97, a number of us wearing my research hat instead of my ops hat, found that route flap damping was actually rather destructive, and we published some papers on the subject and stuff like that. And the real problem is that somebody ?? the network amplifies the bounce, the topology of the network amplifies the route flap and so people were getting penalised for the smallest of things.
So, we all depricated route flap damping due to the serious problems of over?damping. But we still have the behaviour that we didn't like. We are still getting a fairly bad churn rate. So, a couple of years back a couple of us said: Is there a minimal change, just a stupid hack, that we can do to try and make it a little better? Okay. Geoff, who just spoke, had a much more sophisticated hack and sometime maybe he'll report that, but it came out about the same, I believe, Geoff, is that fair?
GEOFF HUSTON: Yes.
RANDY BUSH: Okay. But this change required moving two constants. Geoff's required some more significant resources in the router the problem is that, you know, one in 100 prefixes causes 10% of the updates, right. 3% of the prefixes cause 36% of the updates. So, these people are the elephants, and they are these little mice down here who created updates but not a lot of them and they are really, you know, the vast majority of the network. So, the elephants are the problem, but the problem was that current techniques we have for this stuff, for MRAI of course but also RFD, and the RFD parameters we have today kill the mice too. Okay. They don't just get the elephants. We have bad weapons. So the approach we propose and in fact I think it's in currently shipping router code is higher suppress thresholds to save the mice, still reduce the churn, not as much, but still we are getting some benefit and it's trivial to
So I'll go into some detail. What we have is a measurement structure where we took BGP updates and we stuffed them, headed them words to a router and we also recorded them all. Okay. And we did this at Equinix, Dallas, and on router R0, we watched all the updates and the code in R0 did some funny things. It applied the damping parameters but it didn't actually damp. But it assigned the penalties so we could see them. Okay. And instead of having ?? it turns out the routers have hard code in them to limit the maximum damping penalty. So, we raised it very high, by we by the way, we had cooperating with the router vendors, BGP coders, okay. I mean two implementations and all that stuff, very cooperative from Cisco in this case, and we raised it so we could see the rest of the curve, it just want that way, so with could see the whole curve as it went up. So what we did is on the router, we retrieved the counters, snap shots eve]ry four or five minutes, occasionally we couldn't ?? the router was so busy it couldn't respond. And, i.e., we sampled the router and we binned them up.
So this is today's parameters without ?? what's ugly is the default threshold chopped off here and we'd have never seen that, okay. So raising that threshold helped. Today's parameters reduce 47% of the updates, the default parameters, 2 K. Unfortunately, dead mice.
So, what happens is we are now looking at the proportion of the prefixes from 0 to 1 as a cumulative distribution for the different, some different interesting possible penalty thresholds. At 2k, 14 percent of the prefixes, not 14% of the noise, but 14% of the prefixes of the Internet get damped. That's why we stopped doing route flap damping. Right.
But you just dump it to 4k and you are at 4.2%. You plump it to 6k. 12k, you are down 1%, 2%, 1% and under.
So what would happen if we scaled down there? Okay. This is the number of damp prefixes as a function of the percent threshold. So, you know, at 6k, 5,000 prefixes get damped. At 2k, look at that. So there is this very steep curve that if we just change that parameter and get down here, all those little mices live.
So, this is the percentage of churn, now we are not looking at prefixes we are looking at the number of announcements. Okay. So, if we ?? at 2%, as we said you are getting 40% of the you announcement, at 2k, but at 6k you are still getting, you know, 15, 17, 18%, even at 12k, you are getting 13%, or whatever. And you are harming almost none of the mice. But you are getting the elephants.
So, we said, well, you know, we were just measuring that in Dallas, what if this is topologically dependent? What if this is different depending upon where you are in the world before we start fooling with the defaults and the routers, maybe we should think about this a little.
So, what we did is we took the same experiment, except we took data from different places in route views, okay. The update streams, and played them into the router. So, I believe we played a ken yen exchange point into the router, and a couple of things like that. I forget exactly what they were, but they were pretty diverse and we got the same results.
So, the conclusion is the settings are far too aggressive. So it's turned off. So raise the threshold, raise the max. So you can see them and tune the default parameters to 6?15k. Now the reason I bring this up here and I'm talking about all this is this is where route flap damping started and then, I don't know, three, five, seven years ago, we, as a community, wrote some documents that said stop route flap damping, and so I'd like to suggest today that I think it was Christian and Phillip Smith, who no longer plays here very much, etc., but maybe we should write another document, and there is also documents in the IETF on the subject, by the way. But that's the IETF, another world, and so, that's kind of attested how complex you can make the document. Possibly here we can make a rather simple one. But do note, I know, I think it's even reached production code in at least one major router vendor with the change in the parameters, and shockingly to me, I think he said he is changing the defaults, which surprised me in the IETF document we say possibly not to change the default because if somebody is route flap damping today, very few are, but is route flap damping with the defaults and the default changes out from under them, they are going to get a lot more noise suddenly, they are going to go from that 47% to 80%. Okay. So ?? but, be that as it may and if people are interested I can actually find out what release trains and that kind of stuff this might be in.
And that's it.
CHAIR: Thank you, Randy.
AUDIENCE SPEAKER: It's Andy Davidson from Hurricane for this question: So my question is, my understanding of a lot of the issues of route flap damping when all the harmful documents came out is the effects of undamping the flap at some point in the future, so if I, as an operator, have got a problem and a prefix is flapping for an hour and you dampen me for four hours, after you undampen me then you start obviously appearing to flap words to your customers because all this stuff that I'm not causing is start to go appear in their routing tables etc., etc.. so, if we are doing some work on tuning the parameters, could we understand the amplification effects and maybe look if there is something in the defaults we can tweak that reduce those effects so we can say there is a dynamic dampening duration, for example, so that as soon as it looks like my operational problem is finished, you are going to undampen me and collapse all the announcements associated together.
RANDY BUSH: Now you are into Geoff's hat.
GEOFF HUSTON: Hello.
RANDY BUSH: Now you are getting to the time domain, and Geoff has a really interesting approach. Do you have any slides on it, even, Geoff?
GEOFF HUSTON: I don't think I have a slides on it but it's really quite easy to describe, there is an another way of looking at think. Randy's approach is ?? I don't know how to characterise it, you are saying I see the following characteristics of updates and if you change the route flap damping parameters you could capture more and get rid of some of the stuff that seems obvious noise. And the problem is, I suppose, from my perspective and where I'm thinking about is area they doing this? What's going on that's causing it? And part of it is that the MRI times that we use to trying to perform micro surgery with a mallet. 30 seconds is just this crude whack. Why 30? None of us know. It just seemed like a number that was bigger than 1 and less than a million. So the other way of doing this is to actually attach timers to every prefix. And what you do is that you see a new advertisement within an MRI time or interval that extends the AS path and makes it longer, keep it suppressed for another 30 seconds. As soon as you see a withdrawal, or as soon as you see the path get shorter, advertise immediately. So, when you get path exploration damping to withdraw, which appears to encompass 30% of today's updates, or around 30,000 dates a day, if you have this per?prefix timer you trap the exploration to withdraw and you react instantly to all the other events. The problem is that that approach requires recoding, delegate balance and a lot more timers in your router. Where Randy is, is, I think I am changing what, 3 parameters, Randy?
RANDY BUSH: Pardon?
GEOFF HUSTON: In route flap damping, you are actually changing a very small number of parameters.
RANDY BUSH: Two. In existing parameters not one line of code changed. Two constants. Horrible hat. Horrible hat.
GEOFF HUSTON: But, I would say if we ?? you know, if you want to spend some more time of router capacity actually trying to put some heuristics into the router that go, your flapping because you are doing path exploration to withdraw, you can trap a lot of updates as a result.
RANDY BUSH: So, and Geoff, the other thing is that you are bumping RAM.
GEOFF HUSTON: Yes.
RANDY BUSH: In order to keep that information, and so, you know, for the router thresholds where you're getting near, you are going to hit your RAM limit and go over.
CHAIR: Is anyone using the open source implementation of routing software like open BGP or Quagga to actually do some experimentation.
GEOFF HUSTON: We did this in Quagga and I have a Quagga implementation. There is been some joint work with some folk at Cisco and I am sure if you talk to the right routing folk you'll probably find it too inside some researchey experimental stuff. So, yeah, we are looking at it. The problem is that the update rate of BGP is not getting any worse. It's kind of being flat for years, and sort of it's not a problem, so no one really wants to spend a lot of energy fixing it at the moment.
RANDY BUSH: Geoff and I each have a series of talks for the last three to five years saying, it's getting pretty boring out there. You saw one of them.
GEOFF HUSTON: Your router isn't dieing because the update rate has doubled in the last year. Because it hasn't. That's part of this issue. So if it does double, we have some really good ideas to haul it back again, but at the moment, even without those ideas deployed, it's not sort of getting dramatically worse at the moment.
ANDY DAVIDSON: I think the prefix timer idea does address my question about this dynamic flap duration, but maybe we can borrow some of these concepts and say we are only going to keep ?? start a prefix timer and more essentially only going to track state for a small number of prefixes like you're saying we are going to dampen a small ??
RANDY BUSH: You need to see a different Geoff presentation which says that the, which prefixes are flapping. There seems to be a constant number of prefixes flapping. There are different ones everyday.
GEOFF HUSTON: This is the bad boy room. The bad boy room has 20,000 seats in it but you only get to live in the bad boy room and perform BGP updates for most around two days and then you get kicked out. But it's still only 20,000 big.
RANDY BUSH: And we don't know why. Cisco actually coded up Geoff's look at the time domain, okay. The problem is it was expensive in the router. You are not going to do it on a 7200 in real life, right. And lots of people are using you know 7200s to kind of routers for route reflectors, for da?da?da, and so, ops hat, you know, I'm not going to suggest a hack that causes you to buy more routers because I don't really have stock in Cisco.
GEOFF HUSTON: If you want to play with a Quagga that has it, I have certainly got one there if you want.
ANDY DAVIDSON: Thank you.
CHAIR: Thank you.
Can we have Alexander.
ALEXANDER ASIMOV: Hello, my name is Alexander Asimov, I am in a company called HLL and I'd like to introduce my report about BGP route instability.
First, a few words about our research project. I would have been starting BGP for last three years. We started as all with analysing BGP convergence process, with the equations about BGP or route withdrawal and we also built an imitation test bed with engines CB GP and prime for checking these results. And during this work, we, for the first time, met the main question: What is aut?num system graph? We already built a mathematical model which can recover every route policy level and this is a state of a working prototype. At the same time we were very interested in understanding the BGP route instability. So, we have divided it into three main classes.
First class is router misconfigurations, the second class is BGP route loops and the last class is link failure. During this report I am going to discuss the first two classes.
First I'd like to talk about the most commoner roar at the entry domain routing level. This error is made mostly by misunderstanding the use of default route. For example, one autonomous system exports form its upstream default route. At the same time, it was upstream some prefix and don't put it at the loop back interface. So if we try to ping some unused IP address from this prefix, we will ?? our request will fall in loop. Ask the customer to provide a connection.
Our researchers during the last week have found more than 10,000 prefixes that are affected with this error. This is vulnerability, because this makes effort to attack. So they can increase their attack magnitude with a TTL. And of course it is at the same time a billable bandwidth.
Another error at the router configuration is, could make networks work alike DDoS amplifier. For example, you sent one request and you receive 10 error replies, this could be used by attackers to attack at the same time several networks. For example, a network which is an up to vulnerability is upstreams and their network which address space was used whilst spotting. We detected more than a half thousand prefixes which are affected by this vulnerability.
Let's move to the BGP route loops. It's well known that BGP was built with a built?in defence against route loops. But the dynamic loops are still taking place.
They could occur after most types of BGP ?? most type of policy changes could put announced prefix in BGP route loop. This is based on a fact that BGP saves not only paths that are used at the current moment but also paths ?? alternative path. And this is issue could affect ?? of course the convergence time, but also able to make network paths unavailable.
This discusses the next example. Here, autonomous system ASX have announced some and the convergence process has finished. This is upstream, AS2 to default route. And at the first moment due to the some remember or or routing ?? or due to ?? purpose, ASX sends withdrawal to AS3.
At the next moment, we will have loop between AS2 and 3 and packet loss for AS1 to 3.
At the next step, the situation doesn't change and finally, after AS2 sends withdrawal message to AS1, and I guess announce message from AS1, the loop is finally broken down. I want to add that, as you can see, the route problem was during ?? after a one minute problem and it depends on the length of customer to provide a sequence.
From this we can see that one route drop could cause unstable dynamic loop. One could make your path unavailable.
We have discussed route withdrawal. Let us discuss the process after changing the prepend policy. Here again, ASX has already announced a prefix and the convergence process has finished. Let's do some for some balancing traffic purpose. ASX prepend its route with some value. For example, 5. The next step we will have unstable dynamic loop. And there is no way from, for packet from AS1 to 4.
At the next step the autonomous system 1 and 3 sends a new route, and after that we have a message raising between ?? messages from AS1 to 4 and AS3 and 2.
And finally, after one minute, the autonomous system 1 and 2 receives a new road and the problem is finally solved, but, again, we have more than one minute problem. But the core of the problem is that, nowadays, network demonstrators have no opportunity for prediction of traffic factor. And while using their BGP mechanisms, they don't understand ?? they don't understand their BGP convergence process and these could make networks partially unavailable.
We have, our system has already detected 1,000 prefixes which were in ?? which fall in unstable loop. Four more then in half hour. I want to mention in this prefixes were also prefixes which are in loops for more than several weeks. Of course, this influence is a packet loss and BGP and message noise.
Let's discuss the last example. It is the most known example for BGP route loop. It is a BGP route policy loop. Here autonomous systems from 1 to 4 have a strict priority to the upstreams. And let's say autonomous at the first moment announces some prefix to its upstreams. After that, we, again, see message racing. And, there is no opportunity with one message will locate another one. Then, route information is provided in system 1 and finally, the loop is complete. And it will ?? last until every time exceeds.
And after that, the situation will start from the beginning and as a result, we have a stable dynamic loop. We have detected more than 100 loops which fall in the stable dynamic loops. The vulnerability is the same as it was at their loops which were made by traffic bail and same mechanisms. There is first problem. You are unable to detect such loops from your side, but you are going to succeed in this problem, there are several techniques to break these loops down.
The first is route flap. As I said, they are all BGP route loops are made by message racing and there is an opportunity that if you are just flap your route, the BGP route flap won't repeat.
At the same time, there is another technique, it's just prepend your ?? route and change your policy. We are faced with route loops several times and this was quite enough.
And the last one is AS path poisoning. When you prepend one of ASs which is, you know, in the loop. And so here we exploit their defence of BGP against static loops and break their route loop with a guarantee.
But after that, we have only one question: How not to fall into another loop? There is one killer feature, and that could give you an opportunity to predict route loops and have a good prediction for traffic flow vector. You need to know how traffic goes to your autonomous system. There was a number projects attacked it on building out a system graph. But all of them suffer from the same problems. They don't use route policy. They build a graph only on physical layer. They show you links that are used by someone but not by your autonomous system. And you can say there is route registries which collect route policy data. But this data have two main ?? but this data is mostly incomplete and outdated.
We, during our research project, we have with already built a mathematical model for BGP route policy recovery. We have invented our verification process for checking the route policy which is, and which we can find in the registry. We are now at a stage of a working prototype. But we need as more data as possible and so I hope that registrations with RIPE Atlas will improve after this meeting.
Let's go back to the statistics.
We detected more than 4,000 autonomous systems which have, which had problems with route stability. In this table are are the biggest ones. I think some of you have already found its upstream. Most of autonomous systems have at the same time several errors. For example, AS174 have several IP addresses which could be used as sample fires. The maximum is 17 times, and there is a number of default route problems in this system.
We also grouped the errors among the route registries. As you can see there most amount of errors? RIPE legacy. But this is based on the fact that in RIPE registry, as I say, there are the most number of autonomous system objects. It's similar to in all registries, it is about 10%.
Now the duration of these issues. As you can see, route misconfiguration could last during more than several months. The route flaps lives not more than several hours usually. And ?? but at the same time, there are route BGP loops which are still alive during several weeks.
Here we show the trend of errors during work of ?? covers the eight weeks. The number of found errors raised up to 50%, but the number of errors at every moment is more stable value and it rose about 10%.
And we tried to make a legend. We tried to make a legend like Robin Hood. We tried to warn autonomous administrations for free and ask them to fix their problems, but we failed. Their network administrators of very big autonomous systems think that their problems in client networks is not their problem. Is it a Federal law or something else? I don't know. I just looked at one of such autonomous system and I just, I want to tell you what I have found.
There are autonomous systems X and Y, and autonomous system and they are one of the biggest autonomous systems in the world. At the connection of these autonomous systems there is a problem which I already said, the problem with default routes. So, in this example, there is the prefix 2.2.0. This is like, as far as I am concerned, this is a transnational link and that is why it has lit latency and packets lives there up to 4 seconds. This gives opportunity to attackers to increase their dose attack magnitude by four times. But the problem isn't over. In client networks of autonomous system X 1 there is a number of DDoS amplifiers. The maximum DDoS amplifier is more than 300 times. The minimum failure is about ten times. What we get if the attacker spoof address from prefix 22.214.171.124 and sends traffic to the DDoS, ample fires, here we'll get the amplifier which is more than 40 times. And there is no secret that there are botnets that could generate several gigabits and there is a number of such botnets. The result the link will be broken and only after that, after a number of the with withdrawals that client networks have, will become partly unavailable, the network administrators will look at what is going on in their client networks.
So, from this example we have several obvious effects. The problem in client network is your problem. The problem in is your problem to and the unstable prefix is invisible from origin AS system. And BGP balancing without understanding the convergence process and without prediction could be comfortable for the regional network. And we have a manager monitor, we are eager to give this information for free but we needed some trusted mechanism for verification. Maybe a route registers such as RIPE are the best people for this purpose.
Thank you for listening.
CHAIR: Are there any questions for Alexander? I don't know if the RIPE NCC want to say anything or not. I remember in the past they have had things like AS alerts based on things that occur in the routing bridges, maybe you should approach them and see if you can do something together.
ALEXANDER ASIMOV: Could you repeat your question, please.
CHAIR: I was just suggesting that there have been services provider by the RIPE NCC that have alerted about events in AS numbers in the paragraphs the I think some of them are still running so probably if you talk to them you can see how to leverage each other basically. Thank you.
Thank you. Is there any other business?
AUDIENCE SPEAKER: Hi, I am Vesna, community builder for measurement tools. This could be interesting for the application of Atlas /?BGS so we are thinking of building some alerts based on the Atlas measurements and if I can invite you to come to the measurements Working Group tomorrow, I am inviting you to come to the measurements Working Group tomorrow where we are going to talk about Atlas and after that there is a BoF about the user defined measurements based on Atlas where we can talk about your needs and requests and how much we can do in Atlas for you.
ALEXANDER ASIMOV: Thank you.
CHAIR: Thank you everyone. As I said if there is any other business? If not that will mean that without setting a precedent the Routing Working Group is done in its time slot and we are free to go to the coffee break. Thank you very much.
LIVE CAPTIONING BY MARY McKEON RPR
DOYLE COURT REPORTERS LTD, DUBLIN, IRELAND.