The Art of AI Architecture
There is a common misconception that one GPU cloud is exactly the same as another. This is simply not true. They are built on different technologies and architectures, and they each have their own challenges, advantages, and disadvantages.
The most advanced AI cloud operators are currently developing new blueprints for GPU data centers that deploy NVIDIA H100s in Kubernetes or other virtualized environments to achieve new levels of AI processing performance.
To the customer, the specs are essentially the same. AI cloud computing service providers brag about Nvidia HGX H100 arrays and the fact that they have 3.2 terabytes of InfiniBand. But that’s because they’re all using the same network cards. If all clouds look the same from a technical standpoint, customers will make decisions based on price.
But technical specifications alone don’t tell the whole story. You can buy a Toyota Corolla with 100 kilowatts of power, and you can buy a Mercedes with 100 kilowatts of power, but they’re not the same. The build quality is different, the cost is different, and the user experience is different.
The same goes for data centers. If the head of finance was overseeing the architecture, we would probably have the Toyota Corolla of data centers, and that’s fine for some, but given the choice, most organizations will choose the Mercedes. A data center built with cost savings in mind may work for some customers, but it will be slower and/or offer less cloud storage, and it may even be less secure.
GPU Clouds
GPU cloud construction varies greatly between data centers. A common misconception is that AI infrastructure can only be built on the NVIDIA DGX reference architecture. But that’s the easy part and the minimum viable baseline. How far organizations push beyond that is the differentiator. AI cloud providers are building highly differentiated solutions by adopting management and storage networks that can dramatically accelerate AI computing productivity.
Deploying GPU data centers as AI infrastructure is a complex and challenging task that requires a deep understanding of the balancing technologies to maximize throughput. High-performance management systems and security systems have a clear impact on the customer experience.
Another key factor determining the performance of AI clouds is the storage architecture. Using dynamically allocated WEKA architectures, non-volatile memory express (NVMe) drives, and GPU-direct storage can improve execution speed by up to 100% for certain workloads, such as Large Language Models (LLMs) used in machine learning.
WEKA’s Data Platform delivers unmatched performance and scalability, especially when feeding data to large-scale GPU environments. By transforming stagnant data silos into dynamic data pipelines, it effortlessly powers data-hungry GPUs, making them up to 20x more efficient and sustainable.
Access to storage
How fast you can access storage is critical in AI because you’re dealing with very large data sets of probably small bits of data. You could be looking at 100 billion bits of data spread across a network. Compared to digital media, where you’re dealing with maybe a few thousand assets, even though they could be hundreds of gigabytes each, it’s a very different profile. Traditional hard drives offer good speeds for digital media. Whereas an AI workload is very random by comparison, you’re taking a gig here and there and doing it millions of times per second.
Another important difference to note regarding AI architecture versus traditional storage models is the absence of a requirement to cache data. Everything is done on a direct request basis. The GPUs communicate directly with the disks on the network, they don’t go through the CPUs or the TCP IP stack. The GPUs are directly connected to the network fabric. They bypass most of the network layers and go directly to the storage. It removes network latency.
AI Infrastructure Architecture
AI infrastructure architecture must be designed to maximize compute power for the coming wave of AI workloads. Additionally, network architectures must be designed to be completely uncontested. Many organizations promise that, but you need a provider that is overprovisioned to deliver that level of guarantee.
Large AI users like Tesla and Meta design cloud infrastructure to meet the needs of different applications, with the ability to dynamically optimize AI cloud architectures for specific workloads. But most cloud providers don’t have the luxury of knowing exactly what they’re building for.
Returning to the car analogy, most modern transportation networks in major cities around the world were not built with current traffic volumes in mind. The problem with building a data center with a current or even projected purpose in mind is that data centers reach capacity faster than you think. Clouds need to be both overprovisioned and extremely scalable.
If you don’t know exactly what you’re building for, just build the biggest, fastest, most secure, and easiest-to-use platform possible. To optimize throughput, data centers require a highly distributed storage architecture, with hundreds of drives generating tens of millions of I/O operations per second on your servers.
Supporting infrastructure
GPU clouds also depend on supporting infrastructure. For example, if you’re using Kubernetes, you need master nodes. You need coordination nodes, you need nodes to pull in the data, you need nodes to just log in so you can have dashboards. The cloud provider has to provision very significant amounts of non-GPU compute in the same region.
Building true clouds is not easy and it is not cheap. Many data center providers call themselves “cloud,” but it is really more of a managed hardware environment. It is certainly less risky financially to have organizations sign x-year contracts and then build a facility that meets the requirements of the contract. And there are some advantages, particularly in security and performance. But it is not a cloud.
Cloud is self-service, it’s API-driven, you log in, you click a button and you have access to the processing power you need for as long as you need it. There are many organizations that don’t have the resources or requirements for ongoing data center support, they may only need the processing power for a short period of time and cloud gives them that option. NexGen Cloud democratizes AI by providing access to shared high performance architectures.
A final consideration, and one that is becoming increasingly important, is energy consumption. Organizations of all sizes are being asked to not only monitor their emissions, but to improve them. Not only from customers and society as a whole, but also from a regulatory perspective. Google and Microsoft recently announced an agreement with Nucor for a clean energy initiative to power data centers and ultimately achieve Net Zero for AI processing. ESG performance is also proving to be a critical metric in terms of shareholder value, and AI is incredibly energy-intensive.
Ultimately, organizations need to partner with a provider they can trust. A partner who can provide guidance, engineering, and support. Companies that use cloud infrastructure do so to focus on their own key differentiators. They’re not in the business of running cloud AI infrastructure, they want convenience, security, and reliability, and true cloud provides all of that on demand.
We offer the best AI tools.
This article was produced as part of TechRadarPro’s Expert Insights channel, where we showcase the best and brightest minds in the technology sector today. The views expressed here are those of the author and do not necessarily represent those of TechRadarPro or Future plc. If you’re interested in contributing, you can read more here: