Sunday, June 21, 2026Live

Sun, Jun 21, 2026Live

Vendor

NVIDIA AI News

Every AI news story AI Briefing has published about NVIDIA — 101 articles spanning Apr 4, 2026 – Jun 21, 2026. Track NVIDIA's model releases, research papers, product launches, funding rounds, and partnerships across the AI industry, updated daily.

101 articles · Apr 4, 2026 – Jun 21, 2026

NVIDIA and SK Telecom to build gigawatt-scale Korean AI Cloud on DSX

NVIDIA and SK Telecom announced plans for a gigawatt-scale AI Cloud in Korea using the NVIDIA DSX platform, with the first AI factory launching in 2027. At GTC Taipei, Jensen Huang also unveiled the Vera Rubin computing platform and Vera CPU.

2026-06-21

IREN signs $3.4B NVIDIA cloud deal as AI revenue jumps 839%

Nvidia accounts for nearly 90% of AI accelerator sales, and former crypto miner IREN Limited illustrated the demand: its AI revenue rose 839% year-over-year to $33.6M in Q3 2026 (23% of total) on the back of a five-year, $3.4B cloud-services contract with NVIDIA and 5GW of secured power.

2026-06-21

NVIDIA and SK Telecom to build gigawatt-scale Korean AI Cloud on DSX platform

2026-06-20

NVIDIA locks in $25B of high-grade bonds to fund AI sprint

Nvidia raised $25 billion in high-grade bonds for general corporate purposes amid its AI infrastructure buildout, after S&P upgraded it to AA. The company holds nearly 90% of AI accelerator sales as Amazon prepares custom-chip competition.

2026-06-20

NVIDIA XR AI framework enters public beta for AR glasses agents

NVIDIA announced NVIDIA XR AI is now in public beta, giving developers a framework to build multimodal AI agents for AR glasses and XR devices. The company also scheduled its 2026 Annual Meeting of Stockholders for June 24, held virtually online.

2026-06-18

NVIDIA XR AI enters public beta, bringing AI agents to AR glasses

NVIDIA released XR AI in public beta, a framework for building multimodal AI agents for AR glasses and XR devices. It addresses the infrastructure gap developers face when creating AI experiences for wearable hardware that is ready but lacks integration tooling.

2026-06-17

NVIDIA Blackwell GB300 posts 20x agentic-AI leap over Hopper as Rubin nears

NVIDIA's Blackwell Ultra GB300 recorded record performance in AA-AgentPerf, a new agentic-AI workflow benchmark, running 20x faster than Hopper as the Rubin generation approaches. NVIDIA and SK hynix unveiled a multiyear next-gen AI memory partnership, while the 96GB RTX PRO 6000 Blackwell's price climbed over 50% to $13,250 amid memory shortages.

2026-06-16

NVIDIA's Blackwell Ultra GB300 Posts 20x Agentic Leap on New AA-AgentPerf Benchmark

NVIDIA reported that its Blackwell Ultra GB300 set records on the new AA-AgentPerf benchmark, running agentic AI workloads roughly 20× faster than Hopper as the next-gen Rubin architecture nears launch. NVIDIA also reported leading performance on the industry's first agentic AI coding benchmark.

2026-06-15

NVIDIA Releases DiffusionGemma 26B on Hugging Face with 1,100 Tokens/Sec on H100

NVIDIA released DiffusionGemma 26B A4B IT NVFP4 on Hugging Face — a quantized multimodal generative model developed by Google DeepMind on the Gemma 4 26B A4B Mixture-of-Experts architecture — offering text generation exceeding 1,100 tokens per second on Hopper H100 GPUs with a 256K token context window.

2026-06-15

NVIDIA and Abridge Build Healthcare-Specific Foundation Model on Blackwell

NVIDIA and Abridge are collaborating on a healthcare-specific foundation model built on Blackwell infrastructure and the Nemotron open model family, powering Abridge's clinician intelligence platform with an enterprise-wide Northwestern Medicine rollout.

2026-06-15

Nvidia builds healthcare foundation model with Abridge, Northwestern Medicine rollout

Nvidia is partnering with Abridge, maker of an ambient clinical note-taking app, to train a healthcare-specific foundation model on Nvidia's open Nemotron family and Blackwell infrastructure. The model embeds native clinical reasoning into foundation weights and launches alongside an enterprise-wide Northwestern Medicine deployment.

2026-06-14

NVIDIA's Blackwell Ultra NVL72 leads first agentic AI benchmark, AgentPerf

Artificial Analysis launched AgentPerf, the first agentic AI infrastructure benchmark, and NVIDIA's Blackwell Ultra NVL72 platform delivered leading results, running 20x more agents per megawatt and topping agentic coding performance — a key efficiency milestone for agentic workloads.

2026-06-14

Nvidia acquires Kumo AI and partners with Abridge on a healthcare model

Nvidia acquired Kumo AI to bring predictive AI to business data, extending an acquisition pattern that includes Run:ai (~$700M), Illumex, and a Groq agreement. It is also developing a healthcare AI model with ambient-listening startup Abridge and launched an AI factory manager blueprint for autonomous manufacturing.

2026-06-12

NVIDIA releases Cosmos 3 open omni-model for physical AI at GTC 2026

NVIDIA released Cosmos 3, billed as the first open omni-model for physical AI reasoning and action across video, robotics and industrial applications. Jensen Huang showcased it at GTC 2026 alongside Adobe, Cohere, Google DeepMind, Meta, Microsoft, OpenAI and Tesla, and it is hosted on Hugging Face.

2026-06-11

NVIDIA signs SK hynix memory and NAVER sovereign-AI deals as D-Matrix challenges its lead

NVIDIA announced a multiyear partnership with SK hynix to co-develop next-generation memory for AI factories, plus an expanded deal with NAVER to build sovereign AI infrastructure on the NVIDIA DSX platform. Meanwhile Microsoft-backed startup D-Matrix is ramping chip production to challenge NVIDIA in inference, though Jensen Huang maintains his company leads on low-cost inference.

2026-06-10

Robotics funding surges: Standard Bots hits $1B valuation, Nebius and NVIDIA launch Physical AI lab

Standard Bots reached a $1 billion valuation after a $200 million Series C, signaling strong investor interest in physical AI and robotics. Nebius and NVIDIA launched a Physical AI Living Lab for European robotics startups, with the first cohort starting September 2026 and applications running through the NVIDIA Inception pipeline.

2026-06-10

NVIDIA Confidential Computing powers Apple's Private Cloud Compute, expands to Google Cloud

NVIDIA GPUs with Confidential Computing are now used for confidential inference in Apple's Private Cloud Compute as it expands beyond Apple's own data centers to Google Cloud. Unveiled at WWDC, the GPUs support server-side inference for Apple Foundation Models built with Apple and Google.

2026-06-10

Nvidia banks on AI PCs and expands sovereign AI infrastructure with NAVER

Nvidia is betting on still-unproven demand for AI PCs, with Nvidia-powered Windows laptops potentially rivaling Macs on memory bandwidth — a key AI bottleneck. Separately, Korea's NAVER is expanding its sovereign AI infrastructure built on the Nvidia DSX platform to serve Korean industries and global customers with production-scale AI factories.

2026-06-09

NVIDIA brings RTX Spark Superchip with 128GB unified memory to PCs

NVIDIA unveiled RTX Spark Superchip, combining Blackwell GPUs (up to 6,144 CUDA cores) with up to 128GB unified memory to run agentic AI locally on Windows laptops and desktops without cloud dependency. Detailed by Jensen Huang at GTC Taipei/COMPUTEX, NVIDIA rolled out the platform in South Korea with KRAFTON, NC and esports champions T1 across PC Bangs.

2026-06-08

NVIDIA unveils RTX Spark, putting 128GB unified memory and 120B local models on Windows PCs

At Computex/GTC Taipei 2026, NVIDIA unveiled RTX Spark, a Blackwell-based superchip with up to 6,144 CUDA cores and 128GB unified memory that can run a 120-billion-parameter model locally with roughly one petaflop of compute. It brings agentic AI to mainstream Windows laptops and desktops without cloud dependency, directly challenging Apple and AMD.

2026-06-07

NVIDIA releases Nemotron-3-Ultra 550B LatentMoE model with 1M context

NVIDIA released Nemotron-3-Ultra-550B-A55B-Base-BF16, a hybrid Latent Mixture-of-Experts model with 55B active and 550B total parameters, Multi-Token Prediction layers, pre-training on 20T tokens and support for up to 1M context length. It debuted on Perplexity for Pro and Max users and added new members to the Nemotron coalition.

2026-06-07

NVIDIA releases Nemotron 3 Ultra, an open 550B MoE hybrid Mamba-Transformer for long-running agents

NVIDIA launched Nemotron 3 Ultra, a 550-billion-parameter open Mixture-of-Experts model with a hybrid Mamba-Attention design built for long-running AI agents. NVIDIA reports up to 6x higher inference throughput than comparable open LLMs at similar accuracy, a 1M-token context window, and the highest non-hallucination score in its comparison set (78.7 on AA-Omniscience).

2026-06-06

NVIDIA launches Cosmos 3, first fully open omni-model for physical AI

NVIDIA launched Cosmos 3 at GTC Taipei, billed as the world's first fully open omni-model for physical AI, built on a mixture-of-transformers architecture combining vision reasoning, world generation and action prediction. Released in Nano (16B) and Super (64B) sizes on Hugging Face, it cuts physical-AI training cycles from months to days. NVIDIA also formed the Cosmos Coalition with robotics labs including Skild AI, Runway and Black Forest Labs.

2026-06-04

NVIDIA unveils consumer PC AI chip and DLSS 4.5 Ray Reconstruction at GTC Taipei

Alongside Cosmos 3, NVIDIA announced a new AI chip aimed at personal computers — marking a deeper push into the consumer device market — and released DLSS 4.5 Ray Reconstruction with a second-generation transformer, now supported across 1,000+ RTX games and apps. The RTX Spark was tied into Microsoft's 'unmetered intelligence' Windows vision.

2026-06-04

NVIDIA unveils RTX Spark superchip and Vera CPU at Computex

Jensen Huang announced the RTX Spark superchip — pairing a Blackwell RTX GPU (6,144 CUDA cores, one petaflop) with a custom 20-core Grace CPU built via MediaTek over NVLink and 128GB unified memory — plus the Vera CPU for data-center agentic AI, expected Q3 2026. Satya Nadella tied RTX Spark to delivering 'unmetered intelligence' on Windows.

2026-06-03

NVIDIA unveils Nemotron 3 Ultra, a 550B-parameter open model for AI agents

NVIDIA launched Nemotron 3 Ultra, a 550-billion-parameter mixture-of-experts open model built for long-running autonomous agents across coding, research and enterprise workflows, claiming up to 5x faster inference and 30% lower running costs, with weights due on Hugging Face, ModelScope and OpenRouter around June 4.

2026-06-03

NVIDIA brings agentic AI to the edge with JetPack 7.2 on Jetson

At Computex, NVIDIA announced JetPack 7.2 with agentic AI skills, NemoClaw support, CUDA 13 on Jetson Orin, Yocto project support, a performance boost on Jetson AGX Orin 32GB, and Multi-Instance GPU (MIG) support on Jetson Thor — pushing autonomous agents into physical-world edge deployments with memory-efficient inference.

2026-06-02

NVIDIA teases Nemotron 3 Ultra, a 550B-parameter MoE model for AI agents

NVIDIA announced that Nemotron 3 Ultra, a 550-billion-parameter mixture-of-experts model built for AI agents, is 'coming this week' and expected to be available June 4. The model extends NVIDIA's open-model strategy beyond physical AI into large-scale agentic reasoning.

2026-06-02

NVIDIA reports record $81.6B Q1 FY2027 revenue ahead of COMPUTEX

NVIDIA reported record Q1 FY2027 revenue of $81.6B, up 85% year-over-year, with data-center revenue reaching $75.2B. Shares closed at $211.14 on May 29 as investors positioned ahead of Jensen Huang's COMPUTEX 2026 Taipei keynote amid strong AI inference demand.

2026-06-01

AI chip startup Groq reportedly raising $650M after Nvidia's $20B deal

Following Nvidia's reported $20B not-acqui-hire, AI inference-chip startup Groq is reportedly raising $650M from existing backers to build out its inference cloud business, led by interim CEO/CFO Adam Winter.

2026-06-01

NVIDIA signals AI inference demand inflection, commits $100-150B in Taiwan

NVIDIA reported a significant inflection in AI inference demand driven by agentic AI at scale, with CEO Jensen Huang committing $100-150 billion in annual Taiwan spending for manufacturing and supply chain. The company confirmed Vera Rubin CPU delivery to leading AI firms with partner availability in H2 2026.

2026-05-31

NVIDIA releases quantized DeepSeek-V4-Pro-NVFP4 on Hugging Face

NVIDIA published DeepSeek-V4-Pro-NVFP4 on Hugging Face, an NVFP4-quantized version of the DeepSeek-V4-Pro Mixture-of-Experts model with 1.6 trillion total parameters (49B activated). Optimized via Model Optimizer for commercial and non-commercial use, it targets advanced reasoning, agentic AI, tool use and complex math/software tasks.

2026-05-31

Nvidia posts $81.6B revenue with 92% data-center growth, signals $200B push into server CPUs

Nvidia beat estimates with $1.87 EPS on $81.6B revenue (vs $78.9B expected), data center revenue up 92% YoY as CEO Jensen Huang called demand 'parabolic'. Management telegraphed an aggressive expansion into server CPUs that could squeeze AMD and Intel by roughly $200B, and the company raised its quarterly dividend from 1¢ to 25¢. New GeForce 610.47 WHQL drivers also drop legacy Control Panel support.

2026-05-27

NVIDIA posts $81.6B revenue up 85% YoY as Huang publicly warns Super Micro on chip smuggling

NVIDIA reported $81.6B in quarterly revenue, up 85% year-over-year, with Jensen Huang calling it 'the largest infrastructure expansion' in history. Huang publicly pressured Super Micro to tighten risk checks after Taiwan detained three people in an AI server smuggling case. Lenovo also confirmed it is building N1x laptops, and NVIDIA joined Google's SynthID watermarking consortium.

2026-05-26

NVIDIA posts record $81.6B Q1, teases Vera CPUs and $7.8M Vera Rubin racks

NVIDIA reported $81.6B in Q1 2026 revenue, dominated by AI data-center sales, and quietly removed the Gaming revenue category from its financial reports. Analysts expect Computex Taipei demos of Vera CPUs outrunning Intel/AMD x86 by 1.5x for agentic inference, with Morgan Stanley pegging a single Vera Rubin rack at $7.8M ($2M+ of it in memory).

2026-05-25

OpenBMB releases VoxCPM2 as open-source ElevenLabs alternative with 30-language support

OpenBMB released VoxCPM2, a 2B-parameter speech model supporting voice cloning, voice design, and high-quality synthesis in 30 languages without explicit language tags. The release is positioned as a free, open-source alternative to ElevenLabs and runs efficiently on NVIDIA consumer GPUs.

2026-05-25

NVIDIA posts record $58.3B profit, names Anthropic as new hyperscaler customer

NVIDIA reported adjusted EPS of $1.62 on revenue of $68.12B (up 73% YoY), record profit of $58.3B, an $80B buyback, and a dividend hike to $0.25/share. Jensen Huang said NVIDIA newly won Anthropic alongside OpenAI, xAI, Meta MSL and Microsoft, projecting $1T in Grace Blackwell and Vera Rubin sales.

2026-05-24

NVIDIA Q1: ~79% revenue growth, $42.97B adjusted profit, Anthropic added as hyperscale customer

NVIDIA reported roughly 79% revenue growth and adjusted profit up 81.8% to $42.97B for the April quarter, with Jensen Huang telling CNBC the company is gaining share among frontier-model hyperscalers and citing newly-added Anthropic alongside OpenAI, xAI, Meta MSL and Microsoft. Despite the beat, the stock failed to impress as investors look toward inference workloads and the next data-center chip cycle.

2026-05-23

Hark raises $700M+ at $6B valuation for 'personalized intelligence' AI devices

Hark Inc. announced more than $700M in Series A funding led by Parkway Venture Capital, with NVIDIA, Intel Capital, AMD Ventures, Qualcomm Ventures, and Salesforce Ventures all participating. The startup is building consumer 'personalized intelligence' hardware at a $6B valuation — a rare deal where NVIDIA, Intel, AMD and Qualcomm all sit on the same cap table.

2026-05-23

Nvidia Q1 FY27 Beats with ~79% Revenue Growth; AMD's $4K Ryzen AI Halo Targets DGX Spark

Nvidia reported a ~79% year-over-year revenue jump and 81.8% adjusted profit growth to $42.97B, beating Wall Street estimates as analysts shifted focus from training capex to inference demand and Jensen Huang's commentary on China. AMD countered with a $4,000 Ryzen AI Halo mini-PC packing 128GB on-board memory, explicitly priced to match Nvidia's $4,000 DGX Spark. Nvidia separately announced a BioNeMo deal with Qiagen for drug discovery.

2026-05-21

NVIDIA-Verified Agent Skills bring capability governance to AI agents

NVIDIA introduced Verified Agent Skills, a governance framework for portable agent capabilities used with MCP-connected tools and open models. The program aims to make autonomous agents safer and more interoperable across deployments, alongside Dell's AI Factory rollout and Nvidia's earnings test of its inference-era dominance.

2026-05-20

NVIDIA guides 95% revenue growth, fourth straight quarter of acceleration

NVIDIA reported 85% YoY revenue growth in the April quarter — 12 points above January — and guided to 95% growth for the current quarter. That's the fourth consecutive quarter of accelerating, not decelerating, growth, undercutting the 'AI capex is peaking' thesis that's haunted the stock since Anthropic's $200B TPU commitment.

2026-05-20

NVIDIA invests in chip-portability startup Decart at ~$4B valuation

Decart raised $300M at a ~$4B valuation with Radical Ventures leading and Nvidia, Adobe Ventures, Toyota Ventures, and Andrej Karpathy participating. Decart is building AI optimization software alongside world-model research, and notably makes it easier to switch between AI chips — an unusual bet for Nvidia given its lock-in advantage.

2026-05-19

NVIDIA stock falls 9% as Anthropic commits $200B to Google TPUs and Amazon's Trainium hits $225B in revenue commitments

NVIDIA shares dropped 9% over six sessions amid mounting custom-silicon competition. Anthropic committed roughly $200B to Google TPUs over five years, Amazon's Trainium has $225B+ in revenue commitments (1.4M chips deployed across three generations, including 500K in Project Rainier alone), and Meta is deploying homegrown silicon. NVIDIA is up only 5% YTD vs. a 55% gain for the Philly semi index. Q1 earnings land May 20. Separately, China is still blocking H200 imports despite Trump and Jensen Huang's Beijing trip.

2026-05-18

NVIDIA releases SANA-WM, open-source 2.6B world model: 60-second 720p video with 6-DoF camera control on a single RTX 5090

NVIDIA introduced SANA-WM, an open-source 2.6B-parameter camera-controlled world model producing 60-second 720p clips with precise 6-DoF camera control. Trained on 64 H100s, it runs inference on a single RTX 5090 — dramatically lowering the bar for high-quality generative video and open-source world models.

2026-05-18

NVIDIA plans mini data centers next to power substations ahead of May 20 earnings

Nvidia unveiled plans to deploy mini data centers adjacent to local power substations to tackle AI energy bottlenecks. Q4 FY26 hit $68.13B revenue with $62.31B from Data Center and 75% non-GAAP gross margins; Q1 FY27 guidance is ~$78B. Trump confirmed China still has not approved Nvidia AI chip imports.

2026-05-17

BofA raises NVIDIA target to $320, lifts AI data center TAM to $1.7T by 2030

Bank of America's Vivek Arya raised NVIDIA's price target to $320 and lifted the AI data center systems TAM to $1.7T by 2030, including $1.2T in AI accelerators. The note lands as NVIDIA stock rallied 4.5% to $236 (market cap $5.71T) into its May 20 Q1 FY27 earnings, with Polymarket pricing a 97% beat probability. Trump separately said China hasn't approved H200 imports.

2026-05-17

NVIDIA details Vera Rubin scale-up for agentic AI; mini data centers planned at substations

NVIDIA published a deep dive on how Vera Rubin handles the non-deterministic, long-trajectory runtime of agentic inference, while unveiling plans for mini data centers next to local power substations to address AI energy bottlenecks. Hermes framework now enables self-improving agents on RTX PCs and DGX Spark.

2026-05-15

PyTorch 2.12 ships with up to 100x faster batched linalg.eigh on CUDA

PyTorch 2.12 brings major CUDA linear-algebra speedups including a reported 100x faster batched eigendecomposition. Release notes also cover additional kernel improvements and compiler updates.

2026-05-14

DeepSeek V4 Pro lands on NVIDIA build.nvidia.com — 1M context, >150 tok/s on GB200

DeepSeek V4 Pro — 1.6T parameters with 49B activated and a 1M-token context — is now hosted on NVIDIA's build.nvidia.com. Early benchmarks on the GB200 NVL72 (Blackwell Ultra) report throughput above 150 tokens per second per user, alongside a 75% V4 Pro API price cut and 1/10 cache pricing that's already forcing competitor recalibration.

2026-05-14

NVIDIA powers Hermes self-improving agents on RTX, faces BIPA lawsuit

NVIDIA showcased Hermes, a self-improving local agent running on RTX PCs and DGX Spark with Qwen 3.6 27B matching 400B-class accuracy at one-sixteenth the size. Jensen's foundation bought $108M of CoreWeave compute for academics. NVIDIA also faces an Illinois BIPA suit over voice training.

2026-05-14

Recursive Superintelligence raises $650M at $4.65B for self-improving AI

GV and Greycroft led a $650M round in Recursive Superintelligence with participation from Nvidia and AMD Ventures, valuing the startup at $4.65 billion. The company is targeting self-improving model architectures — recursive self-improvement as an explicit research goal.

2026-05-14

U.S. clears Nvidia H200 sales to ~10 Chinese firms including Alibaba, Tencent, ByteDance

The U.S. Commerce Department approved Nvidia H200 chip sales to roughly 10 Chinese companies — including Alibaba, Tencent, ByteDance and JD.com — easing AI export restrictions ahead of Trump's state visit to Beijing.

2026-05-14

Cerebras prices IPO at $185, valuing Nvidia challenger at $56B

Cerebras Systems priced its IPO at $185/share — far above the $115-125 marketed range — selling 30 million shares for an implied $56B fully diluted valuation. Backers including Foundation Capital, Benchmark, and OpenAI score major paper gains on a decade-long bet against Nvidia GPUs.

2026-05-14

NVIDIA hits record high ahead of Vera Rubin; sued under Illinois BIPA

NVDA closed at a record with Q1 revenue expected at $78.6B (+78% YoY). The upcoming Vera Rubin platform reportedly trains models with 75% fewer GPUs than Blackwell and cuts inference token costs 90%. Separately, journalists and podcasters sued NVIDIA under Illinois BIPA over voice data used for AI training.

2026-05-13

Vera Rubin nears shipment with claims of 75% fewer GPUs for training, 90% cheaper inference

NVIDIA's next-gen Vera Rubin platform — combining Rubin GPUs, Vera CPUs, and NVLink 6 switches — is reportedly beginning shipments ahead of a May 20 catalyst, with management claiming 75% GPU reduction for training and 90% lower inference token costs vs Blackwell. NVIDIA also unveiled Fleet Intelligence for real-time visibility across large GPU estates.

2026-05-12

Nvidia commits $40B+ to AI equity deals in 2026; Vera Rubin nears shipping

Nvidia has now committed over $40 billion to AI equity deals in 2026 alone, cementing its position as the industry's biggest financial backer. Its next-gen Vera Rubin platform — combining Rubin GPU, Vera CPU, and NVLink 6 — reportedly trains models with 75% fewer GPUs than Blackwell and cuts inference token costs by 90%. Wired argues CUDA proves Nvidia is fundamentally a software company.

2026-05-11

NVIDIA-IREN strike $3.4B five-year cloud deal with $2.1B equity option for 5 GW of AI infrastructure

NVIDIA and IREN announced a strategic partnership to deploy up to 5 GW of AI infrastructure, anchored by a five-year cloud services contract worth $3.4B and an option for NVIDIA to buy up to $2.1B of IREN stock at $70/share. NVIDIA shares hit a record on the news. Blackwell Ultra is ramping, Rubin launches in 2026, and Feynman is planned for 2028.

2026-05-10

NVIDIA releases cuda-oxide, a Rust-to-PTX compiler backend

NVlabs published cuda-oxide v0.1.0, a custom rustc codegen backend that compiles `#[kernel]`-annotated Rust functions to PTX via Stable MIR → Pliron IR → LLVM IR. Single-source host+device builds run from one `cargo oxide build` command. It's NVIDIA's most serious move yet to bring Rust into the CUDA toolchain.

2026-05-10

NVIDIA Star Elastic packs 30B/23B/12B reasoning models in one checkpoint

Built on the Nemotron Elastic framework and applied to Nemotron Nano v3, Star Elastic trains three nested reasoning variants in a single 160B-token run, eliminating per-size training and storage. Zero-shot slicing recovers 23B and 12B models from the 30B parent without retraining.

2026-05-10

NVIDIA + Corning partner on US AI-infrastructure manufacturing

NVIDIA and Corning announced a long-term partnership to scale domestic US production of AI infrastructure components, including optical connectivity for data centers. Separately, an F5 report shows enterprises increasingly bringing AI inference in-house on NVIDIA stacks.

2026-05-09

Dynamo adds streaming tokens + multi-turn agentic harness

NVIDIA Dynamo gained support for multi-turn agentic exchanges that interleave reasoning with tool calls and stream tokens through structured turns, targeting agentic serving with preserved interaction state. Researchers separately showed grammar-constrained decoding sharply improves Bash command generation in small LMs.

2026-05-09

Huawei 950PR fills the NVIDIA gap as DeepSeek V4 optimizes inference for domestic Chinese silicon

Chosun reports DeepSeek's new V4 model, while trained on NVIDIA chips, is explicitly optimized to run inference on Huawei's 950PR accelerator — accelerating Chinese AI self-reliance amid US export controls. Separately, Anthropic's newly announced access to xAI's Colossus underscores how compute scarcity is reshaping NVIDIA's customer landscape.

2026-05-08

Jensen flags inference-demand inflection ahead of May 20 earnings; Rubin in 2026

Jensen Huang highlighted an inflection in inference demand as NVIDIA's next growth driver, with earnings May 20 and a Taipei GTC keynote June 1. Roadmap: Blackwell Ultra ramping, Rubin in 2026, Rubin Ultra 2027, Feynman 2028. Mistral Medium 3.5 also went live on NVIDIA NIM this week.

2026-05-07

Vera Rubin platform enters production targeting 10x lower inference cost

NVIDIA's Rubin platform — six new chips — entered production with claims of 10x lower inference token cost and 4x fewer GPUs for MoE training vs. Blackwell. Spectrum-X Photonics promises 5x better power efficiency. AWS, Google Cloud, Microsoft, and CoreWeave will ship Rubin products in H2 2026; Microsoft is deploying NVL72 in its Fairwater superfactories.

2026-05-07

NVIDIA expands Spectrum-X Ethernet with Multi-Rail Connectivity for gigascale AI

NVIDIA announced Multi-Rail Connectivity (MRC) for Spectrum-X, its open AI-native Ethernet scale-out fabric, positioning the platform as the standard for hyperscale AI factories. The announcement was co-signed by AMD, Broadcom, Intel, Microsoft, and OpenAI as a new open networking protocol to reduce wasted GPU time during large training runs.

2026-05-06

NVIDIA debuts Nemotron Omni multimodal foundation model for agents

NVIDIA released Nemotron Omni, a new multimodal foundation model in the Nemotron family designed as the 'brain' for AI agents. Coverage describes it as a substantial upgrade aimed at agentic, multimodal workloads, and it pairs with NVIDIA's expanded ServiceNow partnership around Project Arc.

2026-05-06

NVIDIA and ServiceNow partner on autonomous AI agents for enterprises

NVIDIA and ServiceNow announced a partnership to deliver autonomous AI agents purpose-built for enterprise environments, moving beyond reasoning to action. ServiceNow simultaneously expanded its AI Control Tower as the governance hub for agentic workflows across the enterprise.

2026-05-06

Blitzy raises $200M at $1.4B valuation for parallel coding-agent platform

Autonomous software development startup Blitzy raised $200M at a $1.4B valuation to scale an enterprise platform that runs thousands of coding agents in parallel. Co-founder Sid Pardeshi is a former NVIDIA master inventor; the round underscores hot investor appetite for agentic dev tooling.

2026-05-06

NVIDIA sets May 20 earnings and June 1 GTC keynote to unveil Rubin and Feynman roadmap

Jensen Huang's GTC keynote on June 1 from Taipei will detail Blackwell Ultra ramp, Rubin launching in 2026, Groq 3 LPX in H2 2026, and Feynman in 2028. NVIDIA is also broadening its open-model lineup for agentic, physical and healthcare AI, citing a major inflection in inference demand and growth in sovereign AI, enterprise AI and physical AI.

2026-05-05

Nvidia data-center revenue hits $193.7B as hyperscalers spend $710B on AI

Nvidia reported data-center revenue surging 75% YoY to $193.7B on Hopper/Blackwell demand, with CUDA lock-in keeping hyperscaler switching costs high. Roche separately deployed 3,500 Blackwell GPUs in a hybrid-cloud AI factory for drug discovery.

2026-05-04

NeMo RL adds speculative decoding for 1.8x rollout speedup at 8B

NVIDIA Research integrated speculative decoding directly into NeMo RL with a vLLM backend, delivering lossless 1.8x acceleration at 8B parameters and projecting 2.5x end-to-end speedup at 235B model scales.

2026-05-03

Nemotron 3 Nano Omni unifies vision/audio/video/text in 30B MoE

NVIDIA launched Nemotron 3 Nano Omni, a 30B-parameter (A3B active) hybrid MoE multimodal model claiming up to 9x higher throughput than other open omnimodal models. Open weights ship on Hugging Face and as an NIM microservice; day-zero availability on Amazon SageMaker JumpStart.

2026-05-01

Microsoft Shader Model 6.10 brings neural rendering to all GPUs

Microsoft announced a Shader Model 6.10 preview bringing neural rendering into mainstream graphics APIs, letting developers exploit matrix hardware on any GPU rather than NVIDIA-exclusive paths.

2026-05-01

Nemotron 3 Nano Omni: 30B/3B-active open multimodal model for edge agents

NVIDIA released Nemotron 3 Nano Omni, an open-weight multimodal model unifying vision, audio, and language in a single architecture with 30B parameters but only 3B active per inference. NVIDIA claims 9× throughput over comparable open models, leadership on six benchmarks, and licenses it for commercial use under the NVIDIA Open Model Agreement.

2026-04-29

Foxconn fast-forwards Groq 3 LPX racks for trillion-parameter inference

Foxconn became lead supplier for NVIDIA's Groq 3 LPX inference cabinet, accelerating delivery of NVIDIA's claimed 35× AI inferencing leap. The LPX targets trillion-parameter models on the Vera Rubin platform, with rack-scale racks shipping ahead of schedule.

2026-04-29

NVIDIA crosses $5T as semiconductor ETF surges 40% in April

NVIDIA crossed a $5 trillion market cap on April 24 after gaining 19% in April, with the semiconductor ETF up 40.4% on the month. Q4 revenue grew 73% YoY, Data Center revenue hit a record $62.3B (up 75%), and the Blackwell Ultra ramp is in full swing with Rubin still slated for late 2026.

2026-04-28

Stanford/Berkeley/NVIDIA's LLM-as-a-Verifier beats GPT-5.5 on Terminal-Bench

Stanford, Berkeley, and NVIDIA jointly released LLM-as-a-Verifier, an agent verification framework that combines with any agent harness or model. Scaling verification compute lets the framework outperform GPT-5.5 and Claude Mythos on Terminal-Bench and SWE-Bench Verified. Lead authors include Ion Stoica (Databricks), Azalia Mirhoseini (ex-Anthropic), and Marco Pavone (NVIDIA).

2026-04-27

NVIDIA unveils Vera Rubin rack-scale platform with six new chips and 10x inference cost reduction vs. Blackwell

NVIDIA announced the Vera Rubin platform, a rack-scale AI system comprising six new chips designed to reduce AI inference token costs by up to 10x compared to the Blackwell platform. AWS, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure will be among the first cloud providers to deploy Vera Rubin instances. NVIDIA also announced a multiyear partnership with Meta for millions of Blackwell and Rubin GPUs spanning on-premises and cloud deployment.

2026-04-25

NVIDIA Unveils Blackwell Ultra GPU and Expands NIM Microservices Catalog

NVIDIA detailed the Blackwell Ultra B300 GPU, offering up to 1.5x memory bandwidth improvement over the B200, targeting large model inference and training at scale. The company expanded its NIM microservices catalog to over 150 optimized model endpoints, including support for Llama 4 and Mistral models, with expanded partnerships with Microsoft Azure and AWS.

2026-04-23

NVIDIA Releases Nemotron 3 Super: 120B Open Model for Agentic Workloads

NVIDIA released Nemotron 3 Super, a 120 billion parameter open hybrid MoE model designed to reduce compute costs for running AI agents at scale. The model activates only 12.7 billion parameters per inference while maintaining capabilities comparable to much larger models, specifically optimized for agentic AI applications requiring sustained reasoning and tool use.

2026-04-23

NVIDIA Reports Record $215.9B Revenue; Rubin Platform Promises 10x Inference Cost Reduction

NVIDIA reported record fiscal year 2026 revenue of $215.9 billion, up 65% year-over-year, with data center revenue reaching $193.7 billion driven by massive GPU deployments. The company announced the Rubin platform comprising six new chips delivering up to 10x reduction in inference token costs compared to Blackwell generation, with H2 2026 deployments planned.

2026-04-22

NVIDIA unveils open-source Nemotron 3 series with fivefold inference boost

NVIDIA announced the open-source release of its Nemotron 3 series models, featuring significant architectural improvements that deliver up to five times faster inference performance compared to previous generations. The models are designed to run efficiently on NVIDIA's latest hardware while maintaining competitive accuracy across key benchmarks.

2026-04-21

AI startup Cursor in talks to raise $2 billion funding round at over $50 billion valuation

Artificial intelligence startup Cursor is in talks to raise a $2 billion fundraising round at an over $50 billion valuation. Andreessen Horowitz is slated to co-lead the new investment round, with NVIDIA and Thrive Capital also expected to participate. The three firms have all previously backed the AI coding startup.

2026-04-21

AI Chip Competitors Attract Record $8.3B Funding as Cerebras Files for IPO

AI chip startups globally raised a record $8.3 billion in funding during 2026 as competition against NVIDIA intensifies. Cerebras disclosed its US IPO filing, aiming to challenge NVIDIA with inference-focused chips that avoid high-bandwidth memory bottlenecks, backed by a $20 billion partnership with OpenAI.

2026-04-20

NVIDIA Stock Surpasses $200 as Vera Rubin Platform Projects $1 Trillion Revenue Through 2027

NVIDIA shares surpassed $200 for the first time since November 2025, driven by CEO Jensen Huang's projections of exponential growth in agentic AI computing demands. The Vera Rubin platform is now projected to generate $1 trillion in cumulative revenue through 2027, doubling prior estimates.

2026-04-19

NVIDIA Releases Full-Stack Optimizations for Agentic Inference with Dynamo Platform

NVIDIA published optimization techniques for agentic AI inference, highlighting real-world adoption where Stripe generates 1,300+ PRs weekly via agents and Ramp attributes 30% of merged PRs to agents. The Dynamo platform addresses KV cache pressure in agent workflows with 85-97% cache hit rates.

2026-04-19

NVIDIA launches Isaac GR00T N1.7 open reasoning VLA model for humanoid robotics

NVIDIA announced Isaac GR00T N1.7, an open reasoning Vision-Language-Action (VLA) model designed for humanoid robot control and reasoning. The model advances embodied AI capabilities, enabling robots to understand visual input, reason about tasks, and execute complex actions, marking progress toward general-purpose robotic systems.

2026-04-18

NVIDIA Launches Ising: First Open-Source AI Models for Quantum Computing with Industry-Leading Accuracy

NVIDIA released Ising, described as the world's first open-source AI models for quantum computing, targeting error correction and processor calibration. The Ising Decoding model is 2.5x faster and 3x more accurate than current industry standards. NVIDIA provides accompanying workflows, training data, and NIM microservices allowing developers to fine-tune models locally while protecting proprietary data.

2026-04-16

Blackwell Ultra GB300 Inference Benchmarks Show 50% Throughput Improvement Over H100

NVIDIA published detailed inference throughput specifications for the Blackwell Ultra GB300, demonstrating 50% higher tokens-per-second than H100 SXM on standard transformer workloads. The GB300 features 288GB HBM3e memory per GPU, enabling full-parameter serving of 70B-class models without tensor parallelism. Cloud availability through AWS, Azure, and Google Cloud is expected in Q3 2026.

2026-04-16

NVIDIA NIM Microservices Updated for Latest Frontier Models with One-Click Deployment

NVIDIA announced expanded NIM microservice catalogs now covering Llama 4, Gemini, and updated Claude endpoints, enabling enterprises to deploy frontier models on-premises with pre-optimized TensorRT-LLM inference backends. The update includes new guardrails integration via NeMo Guardrails 0.11 and support for multi-node inference on H100 and B200 clusters. Developers using NVIDIA AI Enterprise can access these NIMs through NGC catalog with SLA-backed support.

2026-04-14

NVIDIA Validates RTX PRO Blackwell GPU Support for Edge AI Computing

Premio validated NVIDIA RTX PRO Blackwell GPU support across its edge computing solutions, with the new GPUs delivering up to 3,511 TOPS and up to 24,064 CUDA cores for inference and generative AI workloads. The validation enables accelerated AI capabilities at the edge, supporting real-time processing requirements for industrial and enterprise deployments. The RTX PRO Blackwell architecture provides significant performance improvements for edge AI applications while maintaining power efficiency requirements for distributed computing environments.

2026-04-13

RISC-V Chip Designer SiFive Reaches $3.65B Valuation with NVIDIA Backing

NVIDIA-backed chip designer SiFive reached a $3.65 billion valuation in its latest funding round, representing significant growth from its March 2022 pre-money valuation of $2.33 billion when it raised $175 million. SiFive maintains open, non-proprietary chip architectures using the RISC-V instruction set, positioning itself as a neutral vendor for AI infrastructure development. The substantial valuation increase reflects growing demand for alternative processor architectures as companies seek flexibility and independence from traditional x86 and ARM-based solutions for AI workloads.

2026-04-13

NVIDIA Unveils Next-Gen 'Rubin' AI Chip Platform with Accelerated Development Timeline

NVIDIA announced its next generation of AI chips, the 'Rubin' platform, featuring updated GPUs and a new central processor called 'Vera', just months after its Blackwell model. The company has committed to a 'one-year rhythm' for releasing new AI chip models, signaling NVIDIA's accelerated pace of chip development to maintain market dominance. Meanwhile, game developer S-Game Studio publicly distanced Phantom Blade Zero from DLSS 5, citing concerns that generative AI visual tech could alter artists' original creative intent, reflecting growing pushback against AI-generated visuals in gaming.

2026-04-12

NVIDIA Launches NIM Agentic Framework and AITune Toolkit for Optimized AI Inference

NVIDIA released the NIM Agentic Framework, delivering 5x throughput gains for reasoning-heavy AI agents through speculative decoding that pairs small draft models with larger verifier models. The framework integrates with TensorRT, TensorRT-LLM, vLLM, and SGLang across cloud, data center, and RTX AI PC deployments, with LangChain partnership signaling enterprise adoption pathways. Separately, NVIDIA launched AITune, an open-source inference toolkit that automatically identifies the fastest backend (TensorRT, Torch-TensorRT, TorchAO) for any PyTorch model while validating correctness, addressing the persistent gap between research models and production deployments. Additionally, NVIDIA unveiled its 'Rubin' AI chip platform with updated GPUs and new 'Vera' central processor, maintaining a one-year release rhythm to solidify market dominance.

2026-04-11

NVIDIA Integrates Jetson Platform into Firefly Aerospace Lunar Mission and Delivers NIM 1.4 Performance Gains

Firefly Aerospace announced a collaboration with NVIDIA to integrate the Jetson platform into its Elytra spacecraft for processing high-resolution lunar imagery in orbit during the upcoming Blue Ghost Mission 2, scheduled for late 2026. The NVIDIA software stack, built on CUDA, will power AI models for lunar imaging service, enabling repeated mapping and change detection. Additionally, NVIDIA's NIM 1.4 microservices update achieved 2.6x throughput gains on H100 hardware, reaching 1,201 tokens/sec on Llama 3.1 8B versus 613 tokens/sec in standard deployments, supporting continuously updated optimized inference engines for DeepSeek, LLaMA, Mistral, and SDXL model families.

2026-04-10

CoreWeave and Meta Expand AI Cloud Deal to $21 Billion with NVIDIA Rubin GPU Deployment; NVIDIA Extends into Space and Cybersecurity

CoreWeave and Meta announced an expanded $21 billion AI cloud infrastructure agreement, with CoreWeave becoming one of the first providers to deploy NVIDIA's next-generation Rubin GPUs for large-scale inference, reasoning, and agentic AI workloads. Separately, NVIDIA announced general availability of GB200 NVL72 rack-scale systems integrating 72 Blackwell GPUs delivering 1.4 exaflops of FP4 inference performance, with AWS, Azure, and Google Cloud deployments planned for Q2 2026. NVIDIA is also expanding into space through partnerships with Firefly Aerospace and Planet Labs for real-time AI processing on lunar imaging and Earth observation satellites using Jetson modules, and is a launch partner in Project Glasswing for AI-powered cybersecurity defense. Additionally, NVIDIA and Siemens unveiled a chip verification solution capable of simulating trillions of cycles in days, and NVIDIA's upcoming N1 SoC for AI PC laptops featuring 128 GB memory was detailed.

2026-04-09

NVIDIA Details DLSS 5 with 6.7x VRAM Compression and Releases Nemotron OCR v2 Multilingual Text Recognition Model

NVIDIA detailed DLSS 5, a neural rendering technology demonstrating compression that reduces VRAM usage from 6.5GB to 970MB for texture and material data — a roughly 6.7x reduction with significant implications for game and application developers. The company is also partnering with Siemens on an AI chip verification solution capable of simulating trillions of cycles in days, while expanding GPU-based EDA infrastructure with Synopsys and Cadence. Separately, NVIDIA released Nemotron OCR v2, a state-of-the-art production-ready multilingual OCR model integrating a detector, recognizer, and relational model for layout analysis, available commercially via the NVIDIA NeMo Retriever collection.

2026-04-08

NVIDIA Advances Agentic AI with Gemma 4 Collaboration, NIM 2x Throughput Gains, and Neural Texture Compression

NVIDIA made multiple announcements this week: in collaboration with Google, it released the Gemma 4 model family optimized for NVIDIA hardware across devices from smartphones to IoT systems, including Gemma's first MoE model for agentic, on-device AI with local real-time data processing to reduce latency and cloud dependency. NVIDIA NIM (Inference Microservices) achieved 2x throughput improvements on H100 GPUs — benchmarked at 1,201 tokens/second versus 613 without NIM on Llama 3.1 8B — supporting DeepSeek, Llama, Mistral, and SDXL across cloud, data center, and PC environments. Additionally, NVIDIA demonstrated Neural Texture Compression (NTC) reducing VRAM usage from 6.5GB to 970MB (an ~85% reduction), positioning AI-driven compression as a complement to DLSS 5 focused on efficiency rather than image reconstruction.

2026-04-07

NVIDIA to Deploy 1M+ Blackwell and Vera Rubin GPUs Across AWS Regions; Neural Compression Cuts VRAM from 6.5GB to 970MB

NVIDIA committed to deploying more than one million GPUs spanning Blackwell and next-generation Vera Rubin architectures across AWS global cloud regions throughout 2026, representing a generational step up in throughput, latency, and cost-per-token for Bedrock and SageMaker workloads, with cross-region inference failover addressing prior reliability pain points. Separately, NVIDIA demonstrated Neural Texture Compression (NTC) and Neural Materials (NM) capable of reducing VRAM usage from 6.5GB to 970MB, and DLSS 5's generative AI frame generation has divided the developer community over whether AI reconstruction overrides artistic intent. NVIDIA also released the Gemma 4 model family in collaboration with Google, with NVIDIA-optimized variants available via NIM inference microservices on Hugging Face.

2026-04-06

NVIDIA Sets New MLPerf Inference Records: 2.5M Tokens/Sec on Blackwell Ultra, 3x Speedups and 60% VRAM Reduction via Software Optimizations

NVIDIA announced PyTorch-CUDA software optimizations achieving up to 3x performance improvements and 60% VRAM reduction for video and image generative AI workloads, with native NVFP4/FP8 precision support. Blackwell Ultra submissions reached a record 2.5M tokens/sec throughput in MLPerf inference benchmarks, while RTX AI infrastructure demonstrated 35% faster inference for small language models via Ollama and llama.cpp. NVIDIA also announced optimizations for Google's Gemma 4 on RTX PCs, DGX Spark, and edge devices, and introduced new local agent models including Nemotron 3 Nano 4B and Nemotron 3 Super 120B. Separately, NVIDIA's DLSS 5 and Neural Texture Compression technology — reducing VRAM from 6.5GB to 970MB — are facing backlash from game developers who label AI-generated frames 'AI slop,' with Jensen Huang publicly defending the technology.

2026-04-04

More vendors

Anthropic OpenAI Google AWS Azure Meta xAI Mistral Apple Hugging Face Alibaba DeepSeek Samsung

← Browse all AI stories