AMD, Meta, and the End of NVIDIA’s Edge - Why I’m All In on $AMD (100X Potential).
If AMD captures even a 15% share of the projected $500 billion accelerator market by 2028, its data center revenue could surpass $50 billion—7x its 2024 baseline—with incremental margins above 60%.
Introduction: Why AMD’s Moment Is Now
For over a decade, NVIDIA has dominated the AI accelerator market — but the landscape is shifting fast. As compute demand explodes, the semiconductor industry is approaching a physical and economic breaking point. Monolithic GPU designs are hitting the wall of complexity, yield, and cost. And while NVIDIA’s new Blackwell architecture marks a cautious step toward modularity, it only underscores what AMD figured out over a decade ago: the future belongs to chiplets.
This piece is not just another comparison between two chipmakers. It’s a deep dive into why AMD’s structural design advantage — built on modular architecture, high memory density, and manufacturing agility — positions it to capture meaningful market share in a $500 billion AI accelerator market. And it’s already happening: hyperscalers like Meta and OpenAI are validating AMD’s MI300X platform in real-world production environments.
Even if AMD doesn’t surpass NVIDIA, it doesn’t need to. The TAM is massive, the margins are strong, and the architecture is sound. In this report, I lay out the technical, economic, and strategic case for why AMD isn’t just catching up — it’s poised to lead the next era of compute.
Let’s break it down.
NVIDIA–AMD AI Race:
The architecture behind NVIDIA’s new Blackwell platform actually reinforces AMD’s structural advantage on the hardware side. Unlike NVIDIA’s more cautious, transitional approach, AMD’s chiplet-based design is fully modular and highly scalable — giving it both manufacturing flexibility and cost efficiency.
My investment thesis is built on two key assumptions:
AMD has the potential to truly disrupt NVIDIA’s dominance in the AI accelerator market, thanks to its advanced chiplet architecture and ability to scale compute at lower cost.
Even if AMD doesn’t fully disrupt NVIDIA, its differentiated product roadmap and structural efficiency still position it to gain market share. And given the explosive growth in AI infrastructure spending, even modest gains in market share could translate into significant increases in AMD’s market value.
In short, AMD doesn’t need to beat NVIDIA outright to win. It just needs to keep executing — and even small wins can be worth billions.
I have discussed in the past that the chip industry is moving towards the lithographic rectile limit. This means that, as we move towards smaller process nodes, the complexity of producing monolithic chips - NVIDIAs focus – is increasing exponentially. On the other hand, chiplet architecture – AMDs focus – is considerably less complex, assuming one has the necessary expertise.
Less complexity means higher yields, and therefore lower costs, which make it likely that AMDs GPUs will eventually achieve a competitive price to performance ratio. So long as NVIDIA remains set on the monolithic path, AMD is bound to catch up on the hardware side.
Gordon Moore cited this possibility in his paper on the declining cost of semiconductor chips:
“It may prove to be more economical to build large systems out of smaller functions, which are separately packaged and interconnected. The availability of large functions, combined with functional design and construction, should allow the manufacturer of large systems to design and construct a considerable variety of equipment both rapidly and economically.”
NVIDIA’s Shift Toward Chiplets:
Antonio Linares has written an excellent piece on this topic, which has significantly shaped my understanding. You can read it here.
What I understanding is the following: the Blackwell architecture marks what I see as NVIDIA’s first tentative step toward chiplet-based design. This shift strongly validates AMD’s bold decision back in 2013 — at a time when the company was on the brink of bankruptcy, burdened with debt, and struggling under a broken business model. That chiplet bet, led by CEO Lisa Su, was a remarkable act of foresight. In hindsight, it wasn’t just a smart technical move — it was a transformational moment in semiconductor history. As an investor, I consider it one of the most extraordinary corporate turnarounds I’ve ever seen.
Following Nvidia’s GTC conference in March, there was speculation that Nvidia had finally embraced chiplet architecture, thereby reducing its risk of disruption by AMD. However, a closer inspection of the Blackwell architecture reveals a more nuanced reality. While Blackwell does feature two large dies connected together, this is not yet a full embrace of modular chiplet design in the same way AMD has implemented it.
Historically, Nvidia has used monolithic GPU designs—single, massive chips that pack all the compute and memory logic into one die. With Blackwell, the company has—for the first time—made two chips operate as a unified system at both the software and networking levels. This represents a meaningful shift: Blackwell is technically composed of two chiplets, and that marks Nvidia’s tentative first step into chiplet-style architecture.
However, there’s a crucial detail: both of Blackwell’s chiplets are as large as physically possible, right at the reticle limit. The reticle limit is the maximum area that can be printed in one pass by a photolithography machine during chip fabrication. As process nodes shrink, trying to pack more transistors into these large dies becomes exponentially harder and more expensive.
Nvidia now faces a choice. To scale compute further, it can either:
Increase complexity within each of the two existing dies—pushing against physical and manufacturing limits.
Add additional large monolithic dies and stitch them together as if they were chiplets.
The first option becomes increasingly costly and difficult. The second mimics chiplet design, but without the efficiencies. Connecting multiple massive dies leads to lower yields: if one part of a large chip fails, the entire component must be discarded. In contrast, AMD’s “pure” chiplet approach—which uses many small, modular dies—allows faulty units to be discarded with minimal cost, increasing yield and flexibility.
AMD’s CDNA3 architecture (which powers the MI300 series) reflects this advantage. Each compute unit is a small chiplet, well below the reticle limit, and highly scalable. Rather than being forced to push the limits of physics, AMD can continue scaling by simply adding more chiplets—a strategy it has refined over the last decade.
In the long run, two outcomes are possible. Nvidia may continue to dominate by connecting more monolithic chips and managing the resulting complexity. Or, AMD’s chiplet strategy may prove more scalable and cost-effective, enabling it to gain market share—even at the high end.
Either way, Nvidia's partial pivot to chiplets is both a validation of AMD’s decade-long vision and a signal that the limits of monolithic chip design are finally being reached.
Summary Of NVIDIA’s Chiplet Move Versus AMD:
In sum, NVIDIA’s Blackwell architecture, particularly in the B200 GPU, consists of two large dies: a GPU Compute Die (GCD) and a Memory Compute Die (MCD), which handles high-bandwidth memory (HBM3E). These two dies are tightly integrated using NVLink-C2C, NVIDIA’s high-speed interconnect. At the software level, this allows Blackwell to behave as if it were a single unified GPU. NVIDIA refers to this setup as a multi-die “superchip” that splits workloads intelligently across the two components.
However, it’s important to clarify what Blackwell is not. Unlike AMD’s MI300, which is built using many small chiplets, Blackwell’s dies are still very large—each one close to the reticle limit, the maximum size a lithography machine can print. This means NVIDIA hasn’t embraced the core philosophy of chiplet design, which is to use smaller, modular, and yield-efficient dies that can be mixed and matched. Nor has NVIDIA modularized its compute and cache blocks into reusable chiplets, as AMD has done in its CDNA3 architecture.
So what exactly is Blackwell? It’s best understood as a monolithic strategy split in two. While it benefits from some aspects of multi-die packaging—like improved yields and advanced interconnects—it lacks the true modularity and scalability of AMD’s chiplet-based approach. NVIDIA has taken a necessary first step toward breaking up monolithic designs, but it hasn’t yet committed to the full chiplet model.
In summary, Blackwell is a hybrid architecture: large-scale and multi-die, but not truly modular. It reflects the early stages of a shift in design philosophy. By contrast, AMD’s chiplet-based architecture remains more scalable, flexible, and efficient, particularly as compute demand grows and manufacturing constraints tighten.
Difference Between AMD & NVIDIA:
To understand the difference between Nvidia’s Blackwell architecture and AMD’s MI300 chiplet design, imagine two very different approaches to building a structure: one using massive concrete slabs, the other using small, modular Lego blocks.
Nvidia’s Blackwell is like trying to construct a building using two enormous stone slabs. Each slab is incredibly large—right at the edge of what can physically be manufactured (the rectile limit). You can bolt these slabs together with reinforced connectors—like Nvidia’s NVLink-C2C interconnect—and from the outside, it may seem like one unified structure. But in reality, the slabs are still monolithic. If there’s a flaw in one of them, the entire structure is compromised, and the faulty slab must be thrown away—wasting enormous time, effort, and money. This approach works for now, but adding more slabs becomes increasingly complex and inefficient. You're stacking at the edge of what physics and engineering can handle.
By contrast, AMD’s approach with MI300 is like building with Lego blocks. Each block—each chiplet—is small, reusable, and easy to manufacture. If a single block is defective, you simply discard it and replace it with another. You can scale up the structure by adding more blocks, in any configuration you like. Over time, the structure can grow larger, more flexible, and more powerful, without ever hitting the kind of physical or economic wall that monolithic slabs do. This modularity gives AMD a tremendous advantage in scalability, cost efficiency, and manufacturing yields.
In essence, Nvidia is still building with giant slabs, only just beginning to bolt them together. AMD, on the other hand, has spent the past decade mastering the Lego game—constructing scalable, modular architectures that are future-proof and economically resilient.
If AI workloads continue to grow in complexity and scale—which they will—AMD’s Lego-style chiplet strategy is far better suited to meet that demand. It’s more flexible, more fault-tolerant, and better aligned with the long-term direction of high-performance computing.
NVIDIAs Approach Before Blackwell:
Before Blackwell, Nvidia designed all of its GPUs as monolithic chips. That means everything—the compute cores, memory controllers, cache, and interconnect logic—was built into a single, massive chip. This approach worked well when there was still room to grow: as long as chipmakers could keep shrinking transistors and adding more functionality without running into physical limits, monolithic design was efficient and powerful.
But over time, Nvidia began to hit a hard barrier: the rectile limit. This is a physical constraint in semiconductor manufacturing—there’s a maximum size (around 850mm²) that a chip can be before it simply can’t be printed in one piece using current lithography equipment. Nvidia’s chips were already pushing this boundary with architectures like Ampere and Hopper. As demand for AI compute grew—especially for large language models and inference workloads—Nvidia needed more transistors, more memory bandwidth, and more performance than one chip could physically hold.
So, with Blackwell, Nvidia had to make a change. But instead of fully embracing a modular chiplet design like AMD, they took an intermediate step: they split the GPU into two giant dies, each nearly as large as physically possible. These two slabs are then connected using a high-bandwidth interface (NVLink-C2C), allowing them to behave as if they were one unified chip.
This “two-slab” design allowed Nvidia to scale up performance without having to completely redesign its software and architecture. It was a practical choice: by staying close to their existing design philosophy, Nvidia could deliver more compute while maintaining compatibility with their mature CUDA ecosystem. But this approach comes with trade-offs. If either of the two giant dies has a defect, the whole unit must be discarded—an expensive failure. And while it provides more compute in the short term, it’s not easily scalable. Nvidia can’t just keep adding more massive slabs without hitting major power, cost, and complexity issues.
Meanwhile, AMD took a very different approach. Years ago, AMD committed to chiplet-based design. Instead of building one giant chip, AMD builds multiple small, modular chiplets and connects them together. This has many advantages: smaller dies are cheaper to produce, yield better, and can be mixed and matched depending on performance needs. If one chiplet fails, it can be replaced without losing the whole unit. And because this design is inherently modular, AMD can scale performance simply by adding more chiplets.
In short, Nvidia’s move to a two-die design in Blackwell was necessary—they had hit a wall with monolithic chips. But it’s not yet a true chiplet architecture. It’s a stopgap: an attempt to buy more headroom without rethinking their entire GPU strategy. In contrast, AMD is already several years into building scalable, chiplet-based GPUs, and that gives them a structural advantage as the demand for compute continues to explode.
How Do You Make A Chip?
To make a modern computer chip, manufacturers use a process called photolithography. Imagine it like using a projector and a stencil to shine light onto a blank surface. In this case, the "surface" is a silicon wafer, and the "stencil" is a precise mask that defines the microscopic patterns of transistors and wires that form the chip. Light is shone through the mask and onto the wafer, etching these tiny patterns layer by layer to build the full chip.
But here’s the catch: the projector system (known as a lithography machine) has a hard size limit called the reticle limit. This is the maximum area the machine can print in a single pass — around 850mm². You might wonder why we can’t just make a bigger machine. The problem is, this limitation is tied to the physics of optics. Making the lenses and light systems larger introduces distortion, heat issues, and fundamental precision problems. We simply can’t print chips larger than the reticle limit with current technology.
You might think, “Why not print half the chip, then slide over and print the rest?” That’s called stitching, and while it’s possible, it’s extremely difficult. It requires perfect alignment down to the nanometre. Even the smallest misalignment can ruin the entire chip. So stitching large monolithic chips is rarely done because it's inefficient, error-prone, and costly.
Now, combine this physical limit with another trend: transistors are getting smaller with every generation. That allows for more performance, but it also makes the printing process even harder. The smaller the transistors, the more precise and complex the manufacturing needs to be. As a result, when you try to cram everything into one massive chip (monolithic design), the chances of something going wrong go way up. And if just one tiny part of the chip fails, you have to throw the whole thing away.
This is why the industry is moving toward chiplets. Instead of printing one giant chip, manufacturers print many smaller chips, each well below the reticle size. These smaller chiplets are easier to produce, cheaper to manufacture, and much more reliable. They’re then assembled together in a single package using ultra-fast connections, so they function as one big chip — but without the downsides of monolithic design.
In summary, chiplets are a smart workaround to two major problems: the physical limits of how big you can print a chip, and the increasing complexity of manufacturing smaller and smaller transistors. By splitting the chip into smaller, modular parts and assembling them later, companies like AMD have created a design that’s more scalable, efficient, and better suited for the future of computing.
Personalisation For AMD:
Because of AMD’s chiplet-based design, the chips aren’t manufactured as one large piece. Instead, they’re built from smaller, individual components that are printed separately and later assembled into a complete chip. This modular approach allows AMD to mix and match different compute engines with ease. Since they’ve spent years refining this process, the cost of combining various chiplets is relatively low. As a result, AMD can more easily adapt to changing demands in the compute market — offering greater flexibility, faster product development, and lower costs compared to traditional monolithic chip designs.
AMD’s MI300A and MI300X are built on the same modular, chiplet-based platform, but they are configured differently depending on the workload they’re targeting. The MI300X is designed purely as a GPU accelerator, featuring a full set of high-performance GPU tiles along with high-bandwidth memory. In contrast, the MI300A swaps out some of those GPU tiles and replaces them with CPU tiles based on AMD’s Zen 4 architecture. This essentially turns the MI300A into an APU — a single package that combines both CPU and GPU processing, ideal for workloads that benefit from tightly integrated compute.
Thanks to AMD’s chiplet architecture, this kind of flexibility is possible without having to redesign the entire chip. The company can simply “mix and match” different chiplet components—adding or removing GPU and CPU tiles as needed—based on customer demand or use-case requirements. Since AMD has spent years refining this chiplet assembly process, the cost of making these variations is relatively low. It’s a scalable and efficient way to build tailored chips for specific markets.
This flexibility gives AMD a significant advantage, especially as new markets like AI PCs and edge servers continue to grow. For example, the MI300A can deliver powerful compute performance for applications that need both CPU and GPU acceleration, without wasting space or cost on unnecessary GPU-only tiles. AMD’s CEO, Lisa Su, highlighted this strategic opportunity during recent earnings calls, emphasizing the role of AI-capable chips in next-generation PCs and data centers.
In summary, the MI300A and MI300X are two configurations of the same underlying platform. The difference lies in the balance of CPU and GPU chiplets — made possible by AMD’s chiplet design. This approach allows AMD to respond quickly to changing compute needs, lower production costs, and deliver specialized products across a wide range of markets.
The Demand For Computation Is Unlimited:
I’ve previously argued that the demand for computation is, in principle, unlimited. This stems from a simple but profound idea: reality is patterned, and those patterns can be mathematically modeled—and therefore computed. As long as the world continues to operate on rules and structures—from physics to biology to economics—there will always be a need to compute those rules more deeply, more accurately, and in real time.
I’ve also outlined two core reasons why this demand compounds over time:
The move from chaos to order—whether in a factory or a battlefield—is increasingly a function of applied computation. The more we compute, the more efficiently we organize systems.
Computation is recursive. The more we use compute to power AI, the more AI demands compute in return. AMD itself exemplifies this loop—using compute to train AI models that, in turn, help design better chips. This creates a self-reinforcing flywheel where the need for compute only accelerates.
To compute reality, you need a three-layer stack:
Layer 1: Patterned Reality — The physical world, from factories to cities to biological systems, operates according to patterns.
Layer 2: Silicon — Chips and sensors capture these patterns as raw data.
Layer 3: Ontology — Software organizes that data, adds structure, and turns it into actionable insight.
This stack will be deployed across every sector: defense, logistics, healthcare, biotech, manufacturing, energy, and beyond. We’re not just digitizing industries—we’re rendering reality itself into programmable infrastructure.
Take a factory as an example.
The physical layer includes machines, workers, and workflows.
The silicon layer places edge chips throughout the facility—capturing data on temperature, movement, productivity, etc.
The ontology layer turns that data into a live digital twin—a real-time simulation that can predict issues, optimize operations, and adapt dynamically.
Now apply that same structure to cities (traffic, utilities, emergency response), hospitals (patients, equipment, staffing), military systems (logistics, drones, battlefield strategy), biotech (molecular modeling), and supply chains (inventory, freight, demand forecasting). This three-layer compute model isn’t industry-specific—it’s a universal substrate for civilization.
Because the world isn’t random—it’s structured. From atoms to institutions, everything runs on rules. Rules can be modeled, models can be simulated, and simulations can be optimized. Optimized systems scale. They grow. They win.
Enter AMD. They are building the silicon backbone of this new world. Their chiplets will live inside factories, fighter jets, hospitals, and labs—turning physical signals into digital intelligence. AMD isn’t just making chips. They’re building the sensors of civilization.
In short, the demand for compute is not just large.
It is literally unlimited.
Evolving Nature Of Compute & Structural Edge Of Chiplets:
Given the open-ended nature of the compute market, it's nearly impossible to predict exactly how it will evolve in the decades ahead. Major developments like crypto mining and large language models (LLMs) weren’t specifically anticipated by NVIDIA or anyone else — what was anticipated, however, was the broader trend: demand for compute will continue to grow exponentially.
In such an unpredictable and fast-moving environment, it's critical to bet on the company that is most agile, both in culture and in technical design. While I’ve already outlined the cultural reasons why AMD is more adaptable, the technical reasons deserve attention as well.
AMD’s chiplet-based architecture allows it to respond quickly and efficiently to new demands. Because chips are built from smaller, modular components that can be mixed and matched, AMD can adjust its designs for new markets — such as AI inference, edge computing, or future use cases we haven’t even imagined — without redesigning an entire chip. These changes can be made at marginal cost, offering a level of flexibility that monolithic designs simply can’t match.
In the long run, this results in a clear cost and speed advantage for AMD — allowing it to adapt faster and more efficiently as the compute market continues to evolve.
AMD & Meta:
AMD has quietly secured a powerful strategic position as the memory-dense alternative for hyperscale AI accelerators—a role now validated by both Meta (META) and OpenAI. This endorsement sets the stage for a multi-year, high-margin revenue opportunity that the broader market has yet to fully appreciate.
Meta has launched a full-blown talent war in pursuit of artificial general intelligence (AGI), offering bonuses reportedly nearing $100 million to recruit top researchers from OpenAI, DeepMind, and Anthropic into its new Meta Superintelligence Labs. This aggressive hiring push signals an ambitious roadmap: larger Llama models and more complex multimodal agents that require significantly more on-board memory than mainstream GPUs can economically deliver.
To support this scale, Meta has standardized on AMD’s Instinct MI300X accelerator for its 405-billion-parameter Llama 3.1 model, reportedly ordering around 170,000 units according to Omdia’s supply chain analysis. Each MI300X features 192 GB of HBM3e memory and 5.3 TB/s of bandwidth—enough to house an entire Llama 3.1 shard within a single device. In contrast, Nvidia’s (NVDA) H100 maxes out at just 80 GB. By anchoring the largest open-weight model to AMD silicon, Meta not only validates AMD’s CDNA 3 architecture but also kickstarts a critical feedback loop for ROCm at hyperscale.
Further strengthening AMD’s position, OpenAI announced in June 2025 that it would begin deploying the upcoming MI350 accelerators and co-develop future MI450 silicon. This adds a second flagship customer and expands ROCm’s software telemetry and volume leverage—giving AMD a serious foothold in the next wave of AI infrastructure.
Meta’s commitment represents billions in potential annual revenue for AMD, driven by the nearly linear scaling of AI workloads with context length and model size. Mark Zuckerberg has publicly declared his ambition to build a "personal super-intelligence," which will require future Llama models to be exponentially larger than today’s versions—dramatically increasing demand for memory-dense GPUs, where AMD now has a clear advantage.
Meanwhile, other hyperscalers—including Microsoft, Oracle, Samsung, and DigitalOcean —have begun deploying MI300X instances as well. This growing adoption base broadens AMD’s footprint across the AI infrastructure landscape and diversifies its revenue streams. These flagship wins help create a virtuous cycle: major customers attract more software optimization, reducing friction for new adopters and chipping away at Nvidia’s long-standing ecosystem lock-in.
Chiplet Advantage & ROMc:
This shift comes at a critical time. AI accelerator supply remains structurally tight—HBM memory is constrained, and Nvidia’s backlog stretches well into 2026. Hyperscalers are actively seeking a credible second source. AMD’s chiplet architecture provides a major advantage here: it enables the integration of more memory channels at a lower marginal silicon cost than traditional monolithic designs. This cost difference—amounting to tens of thousands of dollars per GPU at 192 GB configurations—adds up quickly across large-scale deployments and becomes even more valuable when power and data center space are constrained.
Software, once seen as AMD’s Achilles heel, is rapidly improving. ROCm 6.2 now supports vLLM and BitsAndBytes natively, adds FP8 kernels, and introduces advanced profiling tools—narrowing the performance gap with Nvidia’s CUDA for both training and inference. Crucially, Meta revealed that it is serving Llama 3.1 production traffic entirely on MI300X clusters, showing that AMD’s stack is already capable of handling the most demanding real-time inference workloads. As more open-source repositories accept HIP pull requests, the barriers to switching will continue to erode quarter by quarter.
Price transparency is beginning to emerge in the AI accelerator market. While Nvidia’s H100 still sells for around $30,000 per unit and Intel’s eight-way Gaudi 3 board is listed at approximately $125,000, AMD’s MI300X is being quoted in the low- to mid-$20,000 range. This pricing gives hyperscalers the ability to cut their dollar-per-gigabyte costs by more than 50% at the rack level—a decisive advantage, especially for inference workloads where operating expenses matter more than raw training time. These architectural and cost advantages reinforce the core of my thesis: AMD’s edge lies in memory density and efficiency, not just in peak FLOPS.
At its July "Advancing AI" event, AMD reported that seven of the world’s top ten AI companies are now deploying MI300-based systems. The company reiterated its roadmap to grow data center AI revenue from about $5 billion in 2024 to “tens of billions” by the end of the decade. Key partners are already rolling out infrastructure: Oracle Cloud offers GPU.MI300X.8 instances, DigitalOcean provides bare-metal access to MI300X for startups, and Dell is shipping PowerEdge XE9680 nodes optimized for Llama 4 workloads. Each deployment expands the ROCm telemetry base, feeding performance data back into AMD’s kernel autotuning loop.
Data Centre Edge & Modularity:
The AI data center accelerator market is projected to exceed $500 billion by 2028—a tenfold increase from 2023. Even if Nvidia retains 80% of that market, the remaining 20% still represents an enormous opportunity for alternative suppliers. According to Omdia, AMD’s MI300X shipments across Meta, Microsoft, Oracle, and TensorWave already surpassed 327,000 units in 2024, with Meta accounting for nearly half. Given Meta’s public roadmap, those numbers are expected to rise annually in line with the expanding size of Llama models.
AMD’s modular chiplet architecture also opens up new verticals. The MI300A, which combines Zen 5 CPU tiles with CDNA 3 GPUs, has been selected for the U.S. Department of Energy’s exascale-class El Capitan supercomputer. This strengthens AMD’s standing in high-performance computing and allows software optimizations to flow back into its commercial offerings. Meanwhile, AMD’s acquisition of ZT Systems enables it to provide full-rack AI infrastructure—allowing the company to bundle CPUs, GPUs, NICs, and UALink switches into unified, turnkey solutions.
If AMD captures even a 15% share of the projected $500 billion accelerator market by 2028, its data center revenue could surpass $50 billion—7x its 2024 baseline—with incremental margins above 60%. This would drive consolidated earnings far beyond current expectations and justify significant multiple expansion. On the May earnings call, AMD management reaffirmed this long-term vision, stating that data center AI revenue is on track to reach “tens of billions” annually by the second half of the decade. With Q1 revenue starting at $3.7 billion and double-digit year-over-year growth, that trajectory remains within reach—even after accounting for the $1.5 billion export license headwind affecting MI308 shipments.
FPGAs:
The key advantage of AMD’s Xilinx FPGAs, such as the Versal platform, lies in their reconfigurable architecture. Unlike traditional processors or GPUs, which are fixed-function silicon, FPGAs can be dynamically reprogrammed even while in operation. This means the same physical chip can be adapted in real time to perform entirely different tasks — such as switching from image processing to AI inference or encryption — without changing the hardware. This gives engineers enormous flexibility, especially in edge or embedded environments where size, latency, and power constraints are critical.
In contrast, NVIDIA GPUs are built for high-throughput, general-purpose compute workloads. They are optimized for tasks like large-scale AI training and inference, where parallelism and floating-point operations are key. However, their architecture is fixed at the time of fabrication — you can’t change the underlying hardware logic. You can change the software running on them, but you can’t reconfigure the chip itself to specialize it for a new task on the fly. This makes them extremely powerful for general-purpose AI workloads but less flexible for highly specialized or dynamic environments.
The practical benefits of Versal FPGAs are particularly clear in edge computing scenarios. For instance, a Versal chip on a drone or military device can be updated in real time to run different workloads — say, shifting from sensor fusion to threat detection — without needing new hardware. This is something NVIDIA simply cannot offer with its fixed silicon. Additionally, FPGAs tend to deliver lower latency and better power efficiency for narrow workloads compared to GPUs.
A real-world example of this flexibility in action is Microsoft’s Project Catapult, where thousands of Xilinx FPGAs were deployed across Azure data centers to accelerate Bing search. These chips were dynamically reprogrammed to handle different parts of the search pipeline — reducing latency and improving efficiency in ways that general-purpose CPUs or GPUs couldn’t match.
To be clear, FPGAs are not replacements for GPUs. They are difficult to program, lack widespread software tooling, and aren’t suitable for training massive AI models like GPT-4 or Llama 3. That’s where NVIDIA still dominates — thanks to its CUDA ecosystem and highly optimized AI stack.
In summary, AMD’s Versal FPGAs offer a level of runtime flexibility and hardware reconfigurability that NVIDIA cannot match. If your workload requires real-time adaptability, low latency, or deployment at the edge, FPGAs are a superior choice. But for large-scale AI training and inference in the cloud, NVIDIA’s GPUs remain the leader.
Both companies are making different bets: AMD is focused on flexibility, modularity, and edge intelligence, while NVIDIA is doubling down on performance, scale, and software dominance. Understanding this trade-off is critical for investors and technologists alike.
We can see more of this ability for AMD to develop products and match evolutions in the market via combining compute at a marginal cost – through their chiplet structure and FPGAs.
Amazon Web Services (AWS) has introduced a new class of cloud servers that combine AMD’s EPYC CPUs with Xilinx Virtex FPGAs. These servers are designed for specialized, high-performance workloads such as:
Genomics (e.g., real-time DNA sequencing)
Multimedia processing (like live video encoding)
Network security (deep packet inspection, encryption)
Cloud-based broadcasting (low-latency media delivery)
The key here is that these workloads benefit from reconfigurable hardware — and that’s exactly what Xilinx FPGAs provide. Unlike GPUs or CPUs, FPGAs can be reprogrammed at the hardware level to match specific task requirements, giving greater flexibility, efficiency, and performance for niche but compute-intensive jobs.
This launch is important because it shows that AWS is now actively using the combination of AMD + Xilinx in production — taking advantage of AMD’s general-purpose compute (EPYC) alongside the programmable logic of Xilinx FPGAs. It’s a strong signal that AMD’s Xilinx acquisition is paying off, and it allows AMD to offer something NVIDIA cannot: customizable silicon for highly specific workloads in the cloud.
In sum, FPGAs are unique chips that can reconfigure themselves in real time, enabling on-the-fly updates to support new AI tasks without changing the hardware. Only FPGAs offer this level of flexibility.
Recent quarters show strong growth in AMD's adaptive computing segment — particularly as FPGAs are increasingly used alongside GPUs in AI workloads. This allows companies to unlock multiple AI capabilities at low incremental cost, making FPGAs a powerful, scalable solution in data centers and edge environments.
Recap: Architecture, Meta Validation, Chiplet Advantage, Market Opportunity
The evolution of NVIDIA’s Blackwell architecture — while powerful — ultimately underscores the structural superiority of AMD’s chiplet-based design. While NVIDIA has taken its first steps toward modularity by splitting its GPU into two large dies, these are still monolithic in philosophy, constrained by the reticle limit and burdened by increasing manufacturing complexity. In contrast, AMD’s approach is to scale compute horizontally — with small, reusable chiplets — enabling higher yields, lower costs, and unprecedented flexibility.
This is not just theoretical. It’s already playing out in the real world. In Q2 2025, Meta selected AMD’s MI300X accelerator to power its flagship 405-billion-parameter Llama 3.1 model, reportedly ordering around 170,000 units, according to Omdia’s supply chain analysis. Each MI300X offers 192GB of HBM3e memory and 5.3 TB/s of bandwidth — enough to fit entire model shards on a single device. This gives Meta an enormous inference cost advantage and confirms what the market has long underestimated: AMD’s chiplet architecture enables more memory channels, denser configurations, and better economics for AI inference at hyperscale.
Why is this Meta deal so vital? Because it’s the strongest public validation yet of AMD’s ability to compete — and win — at the highest levels of AI infrastructure. Meta isn’t just testing AMD chips; they’re running production Llama 3.1 inference workloads on MI300X clusters. That’s a direct, high-stakes replacement of what was once an NVIDIA monopoly. And it’s not just Meta. OpenAI has also committed to co-developing AMD’s MI350 and MI450 accelerators, adding a second flagship partner and further expanding AMD’s ROCm ecosystem.
This positions AMD not only as a technically competitive second source, but increasingly as a strategic partner of choice — especially as NVIDIA faces persistent GPU shortages, supply constraints, and rising costs.
When you combine this flagship adoption with AMD’s ability to scale flexibly — using both chiplets and Xilinx FPGAs — a broader picture emerges: AMD can address multiple AI opportunities at marginal cost, from hyperscale training clusters to real-time edge deployments. This platform-level adaptability is a structural advantage that monolithic vendors like NVIDIA cannot replicate without overhauling their entire architectural playbook.
Even if AMD doesn’t surpass NVIDIA’s market share, it doesn’t have to. The AI accelerator market is forecast to exceed $500 billion by 2028, and even capturing 15–20% of that share would drive tens of billions in annual revenue — a 7x increase from AMD’s 2024 data center baseline.
Conclusion:
In sum, AMD is uniquely positioned to scale across the rapidly evolving compute landscape, thanks to its chiplet-based architecture. Unlike monolithic designs, AMD’s modular approach allows it to “mix and match” compute components with minimal cost. For instance, the MI300X is a GPU-only accelerator, while the MI300A combines CPU and GPU tiles in a single package — all built on the same chiplet foundation. This flexibility enables AMD to tailor solutions across AI, HPC, and edge workloads without redesigning the entire chip.
This modularity has already unlocked new verticals. The MI300A powers the U.S. Department of Energy’s El Capitan supercomputer, while hyperscalers are increasingly deploying MI300X clusters for AI inference. In July, AMD confirmed that 7 of the top 10 global AI companies are now using MI300 systems. The company expects to grow its data center AI revenue from $5 billion in 2024 to “tens of billions” by decade’s end — targeting a share of a market projected to exceed $500 billion by 2028.
Meanwhile, NVIDIA’s new Blackwell architecture signals its tentative move toward chiplets — splitting its GPU into two large dies. But these dies remain near the reticle limit, which introduces yield and scaling challenges. In contrast, AMD’s chiplets are smaller, yield-efficient, and easily scalable — a design advantage refined over the past decade. Blackwell is a step forward for NVIDIA, but it lacks the modular efficiency of AMD’s CDNA3-based MI300 platform.
The most compelling validation of AMD’s architecture comes from Meta, which selected the MI300X to run its 405B-parameter Llama 3.1 model. With 192GB of memory and 5.3TB/s bandwidth, AMD’s accelerator enables Meta to run full model shards on a single device — something NVIDIA’s H100 cannot match. According to Omdia, Meta has ordered roughly 170,000 units. OpenAI has also committed to deploying AMD’s MI350 and co-developing the MI450, marking another major strategic win.
Crucially, AMD no longer lags on software too much. ROCm 6.2 now supports industry-standard tools like vLLM and BitsAndBytes, with performance closing in on NVIDIA’s CUDA stack. Meta is already running Llama 3.1 production traffic entirely on MI300X, proving AMD’s inference capabilities are not only viable — they’re leading. At a price point well below NVIDIA and Intel competitors, MI300X gives hyperscalers the ability to cut dollar-per-gigabyte costs by over 50% at the rack level — a decisive advantage in cost-sensitive inference workloads.
Even modest market share gains for AMD would be transformational. If AMD captures just 15–20% of the $500B AI accelerator market, it could drive data center revenue 7x higher than today, with incremental margins above 60%. With strong execution, strategic hyperscaler partnerships, and growing software maturity, AMD doesn’t need to beat NVIDIA outright — it just needs to keep gaining ground. The upside is staggering. The thesis is intact.
This is one of the most asymmetric investment opportunities in the market today.



$AMD isn’t just chasing $NVDA, it’s playing a different game entirely.
Everyone’s focused on training benchmarks. But inference is where the real money is—and AMD’s MI355X is already beating Blackwell on both performance and cost.
The secret? Memory. AMD’s 288GB of HBM3E gives it a massive edge for large language models that need high throughput and low latency. As models get bigger, bandwidth per dollar matters more than raw compute.
And here’s what most people miss: if AMD grabs just 10–15% of the AI inference market by 2026, that could unlock billions in free cash flow that aren’t priced in yet.
Even analysts are behind. They’re still using outdated training-first assumptions. But inference, ROCm traction, and sovereign AI deals—that’s where the upside is building.
This isn’t hype. It’s a strategic shift Wall Street hasn’t caught onto yet.
Thanks for the time and work 🇵🇹