This publish initially appeared on Recode China AI.
For greater than a decade, Nvidia’s chips have been the beating coronary heart of China’s AI ecosystem. Its GPUs powered search engines, video apps, smartphones, electric vehicles, and the present wave of generative AI fashions. Whilst Washington tightened export guidelines for superior AI chips, Chinese language corporations stored settling for and shopping for “China-only” Nvidia chips stripped of their most superior options—H800, A800, and H20.
However by 2025, endurance in Beijing had seemingly snapped. State media started labeling Nvidia’s China-compliant H20 as unsafe and possibly compromised with hidden “backdoors.” Regulators summoned firm executives for questioning, whereas experiences from The Monetary Instances surfaced that tech corporations like Alibaba and ByteDance have been quietly told to cancel new Nvidia GPU orders. The Chinese language AI startup DeepSeek additionally signaled in August that its subsequent mannequin can be designed to run on China’s “next-generation” home AI chips.
The message was clear: China may not wager its AI future on an U.S. provider. If Nvidia wouldn’t—or couldn’t—promote its finest {hardware} in China, home options should fill the void by designing specialised chips for each AI coaching (constructing fashions) and AI inference (operating them).
That’s troublesome—in truth, some say it’s unimaginable. Nvidia’s chips set the worldwide benchmark for AI computing energy. Matching them requires not simply uncooked silicon efficiency however reminiscence, interconnection bandwidth, software program ecosystems, and above all, manufacturing capability at scale.
Nonetheless, a number of contenders have emerged as China’s finest hope: Huawei, Alibaba, Baidu, and Cambricon. Every tells a unique story about China’s bid to reinvent its AI hardware stack.
Huawei’s AI Chips Are within the Lead
Huawei is betting on rack-scale supercomputing clusters that pool hundreds of chips collectively for large beneficial properties in computing energy. VCG/Getty Photos
If Nvidia is out, Huawei, one in all China’s largest tech corporations, appears just like the pure alternative. Its Ascend line of AI chips has matured beneath the U.S. sanctions, and in September 2025 the corporate laid out a multi-year public roadmap:
- Ascend 950, anticipated in 2026 with a efficiency goal of 1 petaflop within the low-precision FP8 format that’s generally utilized in AI chips. It is going to have 128 to 144 gigabytes of on-chip reminiscence, and interconnect bandwidths (a measure of how briskly it strikes knowledge between elements) of as much as 2 terabytes per second.
- Ascend 960, anticipated in 2027, is projected to double the 950’s capabilities.
- Ascend 970 is additional down the road, and guarantees vital leaps in each compute energy and reminiscence bandwidth.
The present providing is the Ascend 910B, launched after U.S. sanctions minimize Huawei off from world suppliers. Roughly similar to the A100, Nvidia’s high chip in 2020, it turned the de facto possibility for corporations who couldn’t get Nvidia’s GPUs. One Huawei official even claimed the 910B outperformed the A100 by round 20 % in some coaching duties in 2024. However the chip nonetheless depends on an older kind of high-speed memory (HBM2E), and may’t match Nvidia’s H20: It holds a couple of third much less knowledge in reminiscence and transfers knowledge between chips about 40 % extra slowly.
The corporate’s newest reply is the 910C, a dual-chiplet design that fuses two 910Bs. In concept, it will possibly method the efficiency of Nvidia’s H100 chip (Nvidia’s flagship chip till 2024); Huawei showcased a 384-chip Atlas 900 A3 SuperPoD cluster that reached roughly 300 Pflops of compute, implying that every 910C can ship just below 800 teraflops when performing calculations within the FP16 format. That’s nonetheless shy of the H100’s roughly 2,000 Tflops, nevertheless it’s sufficient to coach large-scale fashions if deployed at scale. In actual fact, Huawei has detailed how they used Ascend AI chips to coach DeepSeek-like fashions.
To handle the efficiency hole on the single-chip degree, Huawei is betting on rack-scale supercomputing clusters that pool hundreds of chips collectively for large beneficial properties in computing energy. Constructing on its Atlas 900 A3 SuperPoD, the corporate plans to launch the Atlas 950 SuperPoD in 2026, linking 8,192 Ascend chips to ship 8 exaflops of FP8 efficiency, backed by 1,152 TB of reminiscence and 16.3 petabytes per second of interconnect bandwidth. The cluster will span a footprint bigger than two full basketball courts. Wanting additional forward, Huawei’s Atlas 960 SuperPoD is about to scale as much as 15,488 Ascend chips.
{Hardware} isn’t Huawei’s solely play. Its MindSpore deep learning framework and lower-level CANN software program are designed to lock prospects into its ecosystem, providing a home various to PyTorch (a well-liked framework from Meta) and CUDA (Nvidia’s platform for programming GPUs) respectively.
State-backed companies and U.S.-sanctioned corporations like iFlytek, 360, and SenseTime have already signed on as Huawei shoppers. The Chinese language tech giants ByteDance and Baidu additionally ordered small batches of chips for trial.
But Huawei isn’t an computerized winner. Chinese language telecom operators reminiscent of China Mobile and Unicom, that are additionally chargeable for constructing China’s data centers, stay cautious of Huawei’s affect. They typically desire to combine GPUs and AI chips from totally different suppliers reasonably than absolutely decide to Huawei. Large internet platforms, in the meantime, fear that partnering too carefully may hand Huawei leverage over their very own intellectual property.
Even so, Huawei is best positioned than ever to tackle Nvidia.
Alibaba Pushes AI Chips to Defend Its Cloud Enterprise
Alibaba Cloud’s enterprise depends upon dependable entry to training-grade AI chips. So it’s making its personal. Solar Pengxiong/VCG/Getty Photos
Alibaba’s chip unit, T-Head, was based in 2018 with modest ambitions round open-source RISC-V processors and knowledge heart servers. At the moment, it’s rising as one in all China’s most aggressive bids to compete with Nvidia.
T-Head’s first AI chip is the Hanguang 800 chip, an environment friendly chip designed for AI inference that was introduced in 2019; it’s capable of course of 78,000 photos per second and optimize suggestion algorithms and large language models (LLMs). Constructed on a 12-nanometer course of with round 17 billion transistors, the chip can carry out as much as 820 trillion operations per second (TOPS) and entry its reminiscence at speeds of round 512 GB per second.
However its newest design—the PPU chip—is one thing else totally. Constructed with 96 GB of high-bandwidth reminiscence and help for high-speed PCIe 5.0 connections, the PPU is pitched as a direct rival to Nvidia’s H20.
Throughout a state-backed television program that includes a China Unicom knowledge heart, the PPU was introduced as able to rivaling Nvidia’s H20. Stories counsel this knowledge heart runs over 16,000 PPUs out of twenty-two,000 chips in complete. The Info additionally reported that Alibaba has been using its AI chips to coach LLMs.
Apart from chips, Alibaba Cloud these days additionally upgraded its supernode server, named Panjiu, which now options 128 AI chips per rack, modular design for simple upgrades, and absolutely liquid cooling.
For Alibaba, the motivation is as a lot about cloud dominance as nationwide coverage. Its Alibaba Cloud enterprise depends upon dependable entry to training-grade chips. By making its personal silicon aggressive with Nvidia’s, Alibaba retains its infrastructure roadmap beneath its personal management.
Baidu’s Large Chip Reveal in 2025
At a latest developer convention, Baidu unveiled a 30,000-chip cluster powered by its third-generation P800 processors.Qilai Shen/Bloomberg/Getty Photos
Baidu’s chip story started lengthy earlier than at present’s AI frenzy. As early as 2011, the search big was experimenting with field-programmable gate arrays (FPGAs) to speed up its deep studying workloads for search and promoting. That inner undertaking later grew into Kunlun.
The primary technology arrived in 2018. Kunlun 1 was constructed on Samsung’s 14-nm course of, and delivered round 260 TOPS with a peak reminiscence bandwidth of 512 GB per second. Three years later got here Kunlun 2, a modest improve. Fabricated on a 7-nm node, it pushed efficiency to 256 TOPS for low-precision INT8 calculations and 128 Tflops for FP16, all whereas decreasing energy to about 120 watts. Baidu aimed this second technology much less at coaching and extra at inference-heavy duties reminiscent of its Apollo autonomous cars and Baidu AI Cloud companies. Additionally in 2021, Baidu spun off Kunlun into an unbiased firm known as Kunlunxin, which was then valued at US $2 billion.
For years, little surfaced about Kunlun’s progress. However that modified dramatically in 2025. At its developer convention, Baidu unveiled a 30,000-chip cluster powered by its third-generation P800 processors. Every P800 chip, in line with analysis by Guosen Securities, reaches roughly 345 Tflops at FP16, placing it in the identical degree as Huawei’s 910B and Nvidia’s A100. Its interconnect bandwidth is reportedly near Nvidia’s H20. Baidu pitched the system as able to coaching “DeepSeek-like” fashions with a whole bunch of billions of parameters. Baidu’s newest multimodal fashions, the Qianfan-VL household of fashions with 3 billion, 8 billion, and 70 billion parameters, have been all educated on its Kunlun P800 chips.
Kunlun’s ambitions lengthen past Baidu’s inner calls for. This yr, Kunlun chips secured orders value over 1 billion yuan (about $139 million) for China Cell’s AI initiatives. That information helped restore investor confidence: Baidu’s inventory is up 64 % this yr, with the Kunlun reveal taking part in a central function in that rise.
Simply at present, Baidu announced its roadmap for its AI chips, promising to roll out a brand new product yearly for the subsequent 5 years. In 2026, the corporate will launch the M100, optimized for large-scale inference, and in 2027 the M300 will arrive, optimized for coaching and inference of large multimodal fashions. Baidu hasn’t but launched particulars in regards to the chips’ parameters.
Nonetheless, challenges loom. Samsung has been Baidu’s foundry accomplice from day one, producing Kunlun chips on superior course of nodes. But experiences from Seoul counsel Samsung has paused manufacturing of Baidu’s 4-nm designs.
Cambricon’s Chip Strikes Make Waves within the Stock Market
Cambricon struggled within the early 2020s, with chips just like the MLU 290 that couldn’t compete with Nvidia chips. CFOTO/Future Publishing/Getty Photos
The chip firm Cambricon might be the very best performing publicly traded firm on China’s home inventory market. Over the previous 12 months, Cambricon’s share value has jumped practically 500 %.
The corporate was formally spun out of the Chinese language Academy of Sciences in 2016, however its roots stretch again to a 2008 analysis program targeted on brain-inspired processors for deep studying. By the mid-2010s, the founders believed AI-specific chips have been the long run.
In its early years, Cambricon targeted on accelerators known as neural processing models (NPUs) for each mobile devices and servers. Huawei was a vital first buyer, licensing Cambricon’s designs for its Kirin cell processors. However as Huawei pivoted to develop its personal chips, Cambricon misplaced a flagship accomplice, forcing it to broaden rapidly into edge and cloud accelerators. Backing from Alibaba, Lenovo, iFlytek, and main state-linked funds helped push Cambricon’s valuation to $2.5 billion by 2018 and finally touchdown it on Shanghai’s Nasdaq-like STAR Market in 2020.
The subsequent few years have been tough. Revenues fell, traders pulled again, and the corporate bled money whereas struggling to maintain up with Nvidia’s breakneck tempo. For some time, Cambricon regarded like one other cautionary story of Chinese language semiconductor ambition. However by late 2024, fortunes started to vary. The corporate returned to profitability, thanks largely to its latest MLU sequence of chips.
That product line has steadily matured. The MLU 290, constructed on a 7-nm course of with 46 billion transistors, was designed for hybrid coaching and inference duties, with interconnect expertise that might scale to clusters of greater than 1,000 chips. The follow-up MLU 370, the final model earlier than Cambricon was sanctioned by the United States authorities in 2022, can attain 96 Tflops at FP16.
Cambricon’s actual deal got here with the MLU 590 in 2023. The 590 was constructed on 7-nm and delivered peak efficiency of 345 Tflops at FP16, with some experiences suggesting it may even surpass Nvidia’s H20 in sure situations. Importantly, it launched help for less-precise knowledge codecs like FP8, which eased reminiscence bandwidth strain and boosted effectivity. This chip didn’t simply mark a leap—it turned Cambricon’s funds round, restoring confidence that the corporate may ship commercially viable merchandise.
Now all eyes are on the MLU 690, at the moment in growth. Trade chatter suggests it may method, and even rival, Nvidia’s H100 in some metrics. Anticipated upgrades embrace denser compute cores, stronger reminiscence bandwidth, and additional refinements in FP8 help. If profitable, it could catapult Cambricon from “home various” standing to a real competitor on the world frontier.
Cambricon nonetheless faces hurdles: its chips aren’t but produced on the identical scale as Huawei’s or Alibaba’s, and previous instability makes patrons cautious. However symbolically, its comeback issues. As soon as dismissed as a struggling startup, Cambricon is now seen as proof that China’s home chip path can yield worthwhile, high-performance merchandise.
A Geopolitical Tug-of-Struggle
At its core, the battle over Nvidia’s place in China isn’t actually about teraflops or bandwidth. It’s about management. Washington sees chip restrictions as a strategy to shield national security and sluggish Beijing’s advance in AI. Beijing sees rejecting Nvidia as a strategy to cut back strategic vulnerability, even when it means briefly residing with much less highly effective {hardware}.
China’s massive 4 contenders, Huawei, Alibaba, Baidu, and Cambricon, together with different smaller gamers reminiscent of Biren, Muxi, and Suiyuan, don’t but supply the actual substitutes. Most of their choices are barely comparable with A100, Nvidia’s finest chips 5 years in the past, and they’re working to meet up with H100, which was obtainable three years in the past.
Every participant can be bundling its chips with proprietary software program and stacks. This method may power Chinese language builders accustomed to Nvidia’s CUDA to spend extra time adapting their AI models which, in flip, may have an effect on each coaching and inference.
DeepSeek’s growth of its subsequent AI mannequin, for instance, has reportedly been delayed. The primary reason seems to be the corporate’s effort to run extra of its AI coaching or inference on Huawei’s chips.
The query will not be whether or not Chinese language corporations can construct chips—they clearly can. The query is whether or not and once they can match Nvidia’s mixture of efficiency, software program help, and belief from end-users. On that entrance, the jury’s nonetheless out.
However one factor is for certain: China not desires to play second fiddle on the planet’s most essential expertise race.
From Your Web site Articles
Associated Articles Across the Internet
