Assume one GPU may be very very similar to one other? Assume once more. It seems that there’s shocking variability within the efficiency delivered by chips of the identical mannequin. That may make getting your cash’s value by renting time on a GPU from a cloud supplier an actual roll of the cube, based on analysis from the School of William & Mary, Jefferson Lab, and Silicon Data.
“It’s known as the silicon lottery,” says Carmen Li, founder and CEO of Silicon Knowledge, which tracks GPU rental prices and benchmarks cloud-computing efficiency.
The silicon lottery’s existence has been recognized since a minimum of 2022, when researchers on the College of Wisconsin tied it to variations within the efficiency of GPU-dependent supercomputers. Li and her colleagues figured that the impact can be much more pronounced for AI cloud prospects.
Efficiency varies for GPU fashions within the cloud
In order that they ran 6,800 cases of the index agency’s benchmark take a look at on 3,500 randomly chosen GPUs operated by 11 cloud-computing suppliers. The three,500 GPUs comprised 11 models of Nvidia GPU, probably the most superior being the Nvidia H200 SXM. (The group wasn’t simply choosing on Nvidia; the GPU large makes up a lot of the rental cloud market.)
The benchmark, known as SiliconMark, is meant to supply a snapshot of a GPU’s skill to run large language models, or LLMs. It checks 16-bit floating-point computing performance, measured in trillions of operations per second, and a GPU’s internal-memory bandwidth, measured in gigabytes per second. The results confirmed that the computing efficiency different for all fashions, however for the 259 H100 PCIe GPUs it differed by as a lot as 34.5 p.c, and the reminiscence bandwidth of the 253 H200 SXM GPUs different by as a lot as 38 p.c.
SOURCE: SILICON DATA
Variations in how the GPU is cooled, how cloud operators configure their computer systems, and the way a lot use the chip has seen can all contribute to variations in efficiency of in any other case similar chips. However Silicon Knowledge’s evaluation confirmed that the actual wrongdoer was variations within the chips themselves, possible because of manufacturing points.
Such randomness has actual dollars-and-cents penalties, the researchers argue, as a result of there’s an opportunity {that a} pricier, extra superior GPU gained’t ship higher efficiency than an older mannequin chip.
So what ought to GPU renters do? “Essentially the most sensible strategy is to benchmark the precise rental they obtain,” says Jason Cornick, head of infrastructure at Silicon Knowledge. “Operating a benchmark instrument [such as SiliconMark] permits them to check their particular occasion’s efficiency in opposition to a broader corpus of information.”
From Your Web site Articles
Associated Articles Across the Net
