Meta has announced four new generations of its in-house MTIA AI accelerator chips, developed with Broadcom and scheduled for deployment over the next two years. These chips are designed specifically for efficient AI inference, with successive models offering major increases in HBM bandwidth and compute performance compared to prior versions and competing products like Nvidia's GPUs.
The company emphasizes a strategy of rapid, iterative development and modular hardware, allowing new chips to drop into existing data center infrastructure. This supports a much faster release cadence of roughly six months between generations.
Meta's software stack ensures compatibility with industry standards like PyTorch, enabling models to run on both GPUs and MTIA chips without rewrite. The announcement is part of a broader effort to reduce dependence on Nvidia, following a recent major AI infrastructure deal with AMD.
Main topics: Meta's new MTIA AI chips, their technical specifications and performance, deployment strategy and modular design, software compatibility, and the strategic shift to reduce reliance on Nvidia.
Meta reveals four new MTIA chips built for AI inference — to be released on a six-month cadence
The chiplet-based accelerators are designed to run AI inference more efficiently than GPUs optimized for training workloads.
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
You are now subscribed
Your newsletter sign-up was successful
Meta today announced four successive generations of its in-house Meta Training and Inference Accelerator (MTIA) chips, all developed in partnership with Broadcom and scheduled for deployment within the next two years. “We’ve developed a competitive strategy for MTIA by prioritizing rapid, iterative development, reads Meta’s press release, along with an inference-first focus and frictionless adoption by building natively on industry standards.
The four new chips are MTIA 300, 400, 450, and 500. MTIA 300 is already in production for ranking and recommendations training, while 400 is currently in lab testing ahead of data center deployment. MTIA 450 and 500 are targeted at AI inference and are scheduled for mass deployment in early 2027 and later in 2027, respectively. According to Meta's technical blog, from MTIA 300 through to MTIA 500, HBM bandwidth increases 4.5 times, and compute FLOPs increase 25 times.
Meta says MTIA 450 doubles the HBM bandwidth of MTIA 400, describing it as “much higher than that of existing leading commercial products,” or, in other words, Nvidia’s H100 and H200. MTIA 500 then adds another 50% HBM bandwidth on top of 450, along with up to 80% more HBM capacity. Indeed, it’s HBM bandwidth and not raw FLOPs that are the main bottleneck during the decode phase of transformer inference, and mainstream GPUs are architected to maximize FLOPs for large-scale pre-training. This means they carry a cost and power overhead that Meta says is unnecessary for inference workloads.
Meta's approach also includes hardware acceleration for FlashAttention and mixture-of-experts feed-forward network computation, plus custom low-precision data types co-designed for inference. MTIA 450 supports MX4, delivering six times the MX4 FLOPs of FP16/BF16, with mixed low-precision computation that avoids the software overhead of data type conversion.
In terms of eventual deployment, MTIA 400, 450, and 500 will all use the same chassis, rack, and network infrastructure, meaning each new chip generation drops into the existing physical footprint for easy interchange. It’s this modularity, Meta says, that’s behind MTIA’s roughly six-month chip cadence, which itself is much faster than the industry’s typical one-to-two year cycle.
The software stack runs natively on PyTorch, vLLM, and Triton, with support for torch.compile and torch.export so that production models can be deployed simultaneously on both GPUs and MTIA without MTIA-specific rewrites. Meta said it has already deployed hundreds of thousands of MTIA chips across its apps for inference on organic content and ads.
All this comes just two weeks after Meta disclosed a long-term, $100 billion AI infrastructure agreement with AMD, suggesting that there’s a broader effort at play to reduce dependence on Nvidia across different parts of Meta’s AI stack while keeping MTIA at the core of inference workloads.
Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Luke James is a freelance writer and journalist. Although his background is in legal, he has a personal interest in all things tech, especially hardware and microelectronics, and anything regulatory.