MLPerf: How Intel Xeon Scalable CPUs Stack Up Against AI Performance Standards

April 01, 2022

Server hallway with text that overlays reading, "Determining Potential AI Performance with MLPerf

AI-based solutions are permeating a growing list of industries such as telecom, healthcare, retail, and fast food, to name a few. Leading the way are providers using the technology to enhance AI performance and set their solutions apart from the competition.

However, the key to better AI performance doesn't lie only in the applications but also deeper in the hardware, which drives inference engines to do more.

Hardware's Role in AI Performance

AI solutions have the unique challenge of processing large amounts of data at high speeds. Unfortunately, traditional business platforms aren't designed to efficiently handle this kind of load.

Therefore, even the best-coded AI software must rely on a finely tuned processing environment designed to ingest as much data as possible at speed to deliver the best possible insights. Due to this imperative, the question for solution providers becomes how to compare CPUs across manufacturers.

Measuring AI Performance with MLPerf

The MLCommons is an industry consortium that sets benchmarks by which the hardware that runs AI applications can be compared. Without it, software and hardware companies could make unsubstantiated claims about their products and attempt to market themselves based on reputation alone.

Instead, with the even playing field that MLCommons provides, tech companies are incentivized to develop better AI-related technologies, hardware, and software.

MLPerf is a specific set of benchmarks that enable hardware vendors to test their equipment in standardized environments to make relevant and accurate comparisons. It's based on some of the most common AI workloads, such as recommendation systems, natural language processing, and computer vision. And when using the MLPerf benchmarks, solution providers get to compare the performance of servers and CPUs based on computing tasks relevant to their solution at its needs.

The MLPerf Benchmark Suite

MLPerf provides not one but a set of benchmarks to determine AI performance. Each measurement is relevant to a different AI use case and its corresponding workload. For example:

DLRM
Today, recommendation systems can be found anywhere from an Amazon shopping cart to a Burger King drive-thru. They analyze user data and buying patterns to suggest new products and services in real-time. The Deep Learning Recommendation Model (DLRM) measures the performance of large-scale recommendation systems.

Resnet
As we know, AI systems are also being used to glean insights from still images and video. Some examples of this technology are hospitals detecting new diseases with retina scans and retail stores preventing theft in real-time with in-store cameras. Resnet is a benchmark model used to measure hardware performance related to image classification systems.

MiniGo
At the heart of many AI solutions is the need for continuously improving learning and decision-making. The AI systems that currently play games like chess and Go will serve as the building blocks of the autonomous driving cars and traffic control systems of the future, along with financial decision-making engines.

MiniGo measures how quickly an AI system learns to play the game Go, again, not only for entertainment applications but to measure how they adapt to rules and situational complexity.

3rd Gen Intel Xeon Scalable Processors' Performance Against MLPerf Benchmarks

Intel has a long heritage of using industry-accepted benchmarks to test its products, and as AI has grown in importance, they've continued to tune and upgrade its products to perform.

3rd Generation Intel Xeon Scalable processors outperform the prior generation not only in general IT computational tasks but AI as well. Additionally, their CPUs have been tested alongside other vendors in MLPerf.

Below is a summary of recent results:

DLRM - Using PyTorch, the DLRM can be in approximately 2 hours using 4 Intel Xeon Platinum 830H CPUs sockets. With 64 sockets of Intel Xeon Platinum 8376H CPUs, the same task took 15 minutes.

Resnet - Using TensorFlow, the Resnet 50 model can be trained in less than 10 hours or overnight, using 16 Intel Xeon Platinum 8380H CPU sockets. With 64 sockets of Intel Platinum 8376H CPUs, the same task took 3.5 hours.

MiniGo - Employing Float16 for the training phase and VNNI-based int8 inference for the self-play phase, MiniGo was trained within 7 hours. This performance benchmark was measured using 32 sockets of Intel Xeon Platinum 8380H. The same operation took 4.5 hours using 64 sockets of Intel Xeon Platinum 8376H.

The Benefits of Intel Platforms for AI Workloads

Thanks to its long history of innovation, Intel can offer solution providers significant improvements in AI compute performance in an ecosystem they already use.

Instead of converting to other platforms, providers that already leverage Intel can scale AI performance by adding processors - not necessarily servers. This advantage is due to Intel providing the only x86 data center CPU with AI acceleration. Therefore, solution providers can scale their AI performance without leaving a processing environment they already know and trust.

For Intel-Powered AI, Look No Further than UNICOM Engineering

When it comes to designing and building the hardware to support your next AI-enabled solution, there's no need to rely solely on your in-house staff.

UNICOM Engineering is proud to be a Dell Technologies Titanium OEM Partner and Intel Technology Provider. Our award-winning team is ready to enhance your AI solutions for optimal performance and reliability. Learn how UNICOM Engineering helps our customers bring their applications to market faster with hardware solutions powered by the industry-leading components from Dell Technologies and Intel. Schedule a consultation today to learn how UNICOM Engineering can assist your business.

Subscribe to Our Blog for Updates

Get expert blog content delivered straight to your inbox.