Call on Google to fix looming data center speed issues

When you think of data center networking, you almost certainly think of Ethernet switches. These gadgets have been the backbone of data center networking for decades, and there are still more Ethernet switches sold in data center applications than any other technology. Network planners, however, are beginning to see changes in applications, and these changes suggest it’s time to think a little more about data center network options. As your data center evolves, so does its network.

With the advent of the cloud and cloud-centric development, two disruptors have been introduced to our happy and comfortable picture of Ethernet switching in the data center. The first was virtualization, the notion that there was not a 1:1 relationship between a computer and an application, but rather a pool of computers sharing the hosting of the application. The second was component composition, which said that if you wrote applications to break into logical pieces, you could run those pieces in parallel, scale them on demand, and transparently replace them when they fail. The impact of these on traffic, and therefore on data center switching, was enormous.

Traditional monolithic applications create vertical traffic, that is, flows between users and the data center. A few decades ago, things like service buses and inter-application coupling created horizontal traffic. Composition and virtualization create mesh traffic, where messages flow in a complex network among a whole series of components. Since traditional data center switches create a hierarchy, this mesh traffic emphasizes the traditional model and promises to break it.

Adding computers in a hierarchical switching network, or in more modern leaf-and-spine networks, involves adding layers as needed. Since this provides universal connectivity, you might be wondering what the problem is, and the answer is a combination of latency, blocking, and determinism:

  • Latency is the accumulated delay associated with moving from source port to destination port, which obviously increases as the number of switches you need to transit increases.
  • Blocking is the risk of not having the capacity to support a connection due to trunk/switch congestion.
  • Determinism is a measure of performance predictability and consistency.

OK, we have to switch to a new model, but which model? It turns out that data center networks must take into account two new missions: high performance computing (HPC) and hyperscale data centers.

In HPC, computers and application components performed incredibly complex computational functions, such as modeling the impact of a monarch butterfly migration on the global climate. This requires a set of systems operating in parallel and very tightly coupled, with very fast interconnections. This means fast and highly deterministic connections, something more like a computer bus or backplane than a network interface. Early solutions to this included InfiniBand and Fiber Channel, both of which are still in use today. Intel introduced Omni-Path as next-generation HPC technology, then turned it into Cornelis Networks.

In the mesh model, what we really need to support is a bunch of small, low-load components used by millions of concurrent users. This is what we call hyperscale computing today. Here, different users execute different components in different orders, and there is a constant exchange of messages between these components. Mesh traffic flows evolved from that horizontal traffic we talked about earlier, traffic that caused network providers to build their own fabric switches. Based on Ethernet connectivity, fabric switches were easily introduced into data centers that previously relied on these switch hierarchies, and they worked long before we started using microservices and large resource pools. A single fabric switch works great for horizontal traffic, but it supports a limited number of connections per switch, and unless you’re using fiber paths, there’s a limit to how far you can run Ethernet connections. Imagine a data center with servers stacked like a New York skyline to keep them close to your fabric.

Of course, public cloud providers, hosting companies and large enterprises started building data centers with more and more server racks. They really needed something between an HPC switch, an Ethernet fabric, and a traditional multi-switch hierarchy, something that was really good for mesh traffic. Enter Google Aquila.

Aquila is a hybrid in many dimensions. It is capable of supporting HPC applications and building a large-scale data center network. A data center is divided into dozens of cliques, which have up to a few thousand network ports. Within each clique, Aquila uses lightning-fast cellular protocol to interconnect server pods into a full mesh. Thus, performance within a clique is very high and latency is very low. Since packets transmitted within a clique are divided into cells, higher priority elements can transmit lower priority packets at any cell boundary, reducing latency and improving determinism. SDN switching is used between cliques, which means the entire data center can be traffic engineered.

Don’t run to the Google Store to buy an Aquila, though. This is a project, not a product, so it should be seen as an indication of the future direction of large-scale data center resource pools. I guess, but I think products based on the Aquila approach will probably be available in two to three years, which is how far data center network planners should be looking today. Despite the delay in Aquila’s gratification, however, there’s an important lesson you can take from this today and apply it to avoid the problems Aquila will eventually solve for a while longer.

Aquila defines a resource pool as a collection of subpools that are very efficient at connecting horizontal traffic by themselves. It’s pretty easy to use a tool like Kubernetes, which offers things like “affinities” that let you pull components to a specific set of servers and “spots” that let you push things back, to keep together highly interactive components in a click. Since Google was the developer of Kubernetes, it’s hard not to see Aquila’s architecture as a way to structure Kubernetes resource pools in hyperscale data centers.

Now the “Aquila hack”. You can do something similar in your data center using Ethernet switches and/or fabric switches. Create your own cliques by connecting groups of servers to a common switch/fabric, which means there will be lower latency and more determinism for connections within the clique. Then use Kubernetes features (or features of other container orchestration or DevOps tools) to guide your components to your own cliques. You can overrun an adjacent clique if you run out of capacity, of course, so you still maintain a large and efficient resource pool.

Kubernetes, which as I said was developed by Google, recognizes the need to keep certain components of an application close together to optimize performance. Aquila offers a data center network architecture that can support this same capacity, and while you can approach its efficiency using standard switching, it would be a good idea to consider upgrading to the new model if you rely on containerized applications based on microservices in your data center. Maybe Google is seeing something now that you won’t see until later, and then it might be too late.

Join the Network World communities on Facebook and LinkedIn to comment on the topics that matter to you.

Copyright © 2022 IDG Communications, Inc.


Source link