Mat Mathews

Subscribe to Mat Mathews: eMailAlertsEmail Alerts
Get Mat Mathews via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Related Topics: eCommerce Journal, Air Travel Journal

Blog Feed Post

Your Packets will be Re-accommodated

I normally hate tortured metaphors, especially in IT; but this one has been stuck in my head even before the dreadful UA passenger incident, so here we go…

Unfortunately, it took the horrid and avoidable treatment of a fellow human being for United to finally figure out something they (and other airlines) should have figured out long ago. The process of “deadheading” (as it’s known in the airline circles) is about leveraging existing routes to re-position crew for free; basically, piggybacking on an already incurred cost. At first blush, it sounds like a perfect optimization that should make everyone happy. However, what we IT profs should realize here is that this deadhead crew is actually a “mission critical” workload leveraging a “best effort” transport model. Sounds like a disconnect to me.

I am referring specifically to networks that are responsible for carrying predominantly server-to-server, server-to-storage, and storage-to-storage traffic so that applications can provide results to users. These networks (sometimes called fabrics) are generally dense collections of 10 and 40 GbE (or increasingly 25 / 50 / 100 GbE) Ethernet links used to connect hundreds to millions of these storage and server end points. When an application needs to use those server, storage, and network resources, we call it a “workload” as it places a load of work on those resources.

If we look at existing legacy networking technology in these environments, and try to understand what is considered “state of the art” or “best practice,” they all seem to indicate from a network perspective, that the absolute best we can do is assume all of those workloads are the same and randomly assign them to network resources, in this case, the links between the servers and compute endpoints.

So, while some workloads may represent applications that provide a direct user experience (like an e-commerce website), or maybe carry critical back-office transactions through a financial system, they will be treated exactly the same as a non-critical analytics application, a regularly scheduled backup process, or even a virus outbreak. To make matters worse, the state of the art selection algorithm used to choose a link for transport (think about it as a specific flight) is chosen at random by the network software that has no idea what that workload represents to the business. So what happens (more often than any network vendor cares to admit), is that critical workload traffic gets “bumped” (in networking terms we say “dropped”), because that specific link, chosen at random, is overbooked. This forces the application to try again, and hopefully get luckier the next time. In the meantime, a user may have given up on their shopping cart and gone elsewhere.

The self-aggrandizing network vendor happily says, “No problem, just upgrade your bandwidth!”, or in the metaphoric – buy bigger planes. Unfortunately, the root of the problem is the random selection process, so even bigger planes get unfairly loaded, while others fly empty.

With all the fancy computer algorithms, big data number crunching leveraging volumes of historical data that airlines have (that they use very effectively to make sure every flight I fly only has two back row middle seats left to choose from!), you’d think they’d have a better way to predict which routes might need seat reservations for deadhead crews and ensure those were not fully or over booked. It’s surely not a hard task, yet one that has been easier to take the risk of some unhappy customers for the benefit of a lot of free passage.

United has since announced a change to their policy. They will no longer allow the process of deadheading to displace an actual customer. No doubt this means they will apply some smarts in scheduling these critical crew transfers and ensure they have reserved space. Ironically, this process has gone on unheralded for many years, and many customers have likely been “re-accommodated” peacefully, with the impact on their travel localized to their immediate circle.

Similarly, many users and corporate IT application owners have had the network re-accommodate their workload traffic. Perhaps they complained about their user experience and wondered out loud if it was a network issue that was causing slow response times or session disconnects, but were told very simply that the network has “plenty of bandwidth” so it can’t be at fault, or maybe they just upgraded the bandwidth anyway to shut everyone up. This process can continue for many years without making headlines (just a lot of localized unhappiness) until a critical workload is unable to process key transactions or a big customer is lost.

It is not hard to have a better network. It starts with working with a vendor that thinks your workloads are NOT something to treat randomly so that they can sell you more bandwidth, while you pass-on collateral damage to your users and customers.

The post Your Packets will be Re-accommodated appeared first on Plexxi.

Read the original blog entry...

More Stories By Mat Mathews

Visionary solutions are built by visionary leaders. Plexxi co-founder and Vice President of Product Management Mat Mathews has spent 20 years in the networking industry observing, experimenting and ultimately honing his technology vision. The resulting product — a combination of traditional networking, software-defined networking and photonic switching — represents the best of Mat's career experiences. Prior to Plexxi, Mat held VP of Product Management roles at Arbor Networks and Crossbeam Systems. Mat began his career as a software engineer for Wellfleet Communications, building high speed Frame Relay Switches for the carrier market. Mat holds a Bachelors of Science in Computer Systems Engineering from the University of Massachusetts at Amherst.