Moore Threads Unveils Huagang GPUs for Gaming and AI

Moore Threads introduced the Huagang GPU architecture and two new chips — Lushan for gaming and Huashan for AI — promising higher compute density, better energy efficiency, larger memory (up to 64GB), and MTLink scalability; launches slated for 2026.

Emma Collins Emma Collins . Comments
Moore Threads Unveils Huagang GPUs for Gaming and AI

3 Minutes

Chinese GPU maker Moore Threads revealed its next-generation Huagang ("Flowerpot") architecture at the MUSA 2025 Developer Conference, promising major lifts in both gaming and AI workloads. The company also introduced two chips built on the design — Lushan for graphics and Huashan for AI compute — with product launches targeted for 2026.

Two chips, two missions: Lushan for gamers, Huashan for AI

Huagang centers on denser, more efficient compute: redesigned compute units that Moore Threads says increase compute density by about 50% while improving energy efficiency roughly 10%. The architecture adds a new instruction set, asynchronous programming support, and smarter thread scheduling — features tuned for modern real-time rendering and large-scale AI inference.

Lushan replaces the older MTT S80 and S90 cards and is pitched at both gamers and professionals. Moore Threads claims sweeping gains: up to 15× better performance in AAA titles, 50× stronger ray tracing, and 64× higher AI compute performance in some workloads. The company also highlights big improvements in geometry processing and texture fill rates, plus a jump from 16GB to up to 64GB of memory — a tangible boost for large scenes and CAD/CAE applications. Lushan introduces the UniTE unified rendering architecture and a dedicated AI block to accelerate mixed graphics/AI tasks.

Huashan, by contrast, is built for heavy AI compute. It uses a dual-chiplet layout paired with nine HBM modules and supports FP4 and FP64 formats. Moore Threads directly compared Huashan to NVIDIA's Hopper and Blackwell families, claiming floating-point performance close to the Blackwell B200 and comparable total bandwidth, with particularly strong memory access characteristics. The chip can scale across many devices using MTLink 4.0, with a quoted interconnect speed of 1,314 GB/s and theoretical scalability to over 100,000 units.

These are company claims, and independent benchmarks will be needed to verify real-world performance against incumbents. Still, Moore Threads' focus on memory capacity, unified rendering, and dedicated AI hardware signals a push to narrow gaps in both gaming and data-center segments. Could bigger local memory and improved ray tracing performance make Lushan attractive to prosumers? Will Huashan's chiplet approach and MTLink scalability win traction in AI clusters? Expect to see the first Lushan-based consumer cards in 2026, with Huashan-based products arriving around the same time.

Source: gizmochina

“I cover emerging technologies, digital innovation, and the intersection of tech and everyday life. My goal is to make complex trends accessible and inspiring.”

Leave a Comment

Comments