Technology companies are moving rapidly to turn artificial intelligence from a screen-based tool into a physical force, pouring money and engineering talent into robots that can see, reason and act in the real world.
The push, often described as embodied AI or physical AI, marks a new phase in the race that began with large language models. Instead of building systems that only generate text, code, images or video, companies are trying to connect foundation models to cameras, sensors, arms, wheels and humanoid bodies. The ambition is to create machines that can carry out tasks in factories, warehouses, hospitals, homes and public spaces with far greater flexibility than traditional industrial robots.
The shift is being driven by both opportunity and pressure. Generative AI has already become a multibillion-dollar software market, but many of its most valuable future uses may depend on machines that can operate safely in complex environments. Nvidia, Google DeepMind, Tesla, Amazon, Boston Dynamics, Figure AI, Agility Robotics and a fast-growing group of Chinese companies are among those competing to define the next platform for robotics.
Consulting firm McKinsey has estimated that venture capital funding for robotics more than tripled between 2023 and 2025, reaching $40.7 billion annually, while governments have increasingly treated humanoid and service robots as strategic technologies. China has made embodied AI a major industrial priority, supported by state-backed funds, local manufacturing clusters and a dense supply chain for motors, sensors, reducers and batteries.
From chatbots to machines that act
The appeal of embodied AI is straightforward: software intelligence becomes more economically powerful when it can manipulate objects, move through space and adapt to changing conditions. A language model can write a warehouse instruction. A robot guided by a multimodal model could, in theory, identify the correct item, avoid obstacles, grasp the package, place it on a conveyor and recover when something goes wrong.
Nvidia has positioned itself as a central supplier for this transition. The company has released Cosmos and Isaac GR00T models and tools intended to help developers train robots in simulated worlds before transferring them to real machines. Nvidia says its Cosmos 3 system combines vision reasoning, world generation and action prediction, while its robotics platform links cloud training, simulation and edge computing hardware for deployment.
Google DeepMind is pursuing a similar goal through Gemini Robotics, which brings Gemini’s multimodal reasoning into robots of different forms. In its latest Gemini Robotics-ER 1.6 release, Google said the system added stronger embodied reasoning, including the ability to read instruments such as gauges and sight glasses, a capability developed with Boston Dynamics for industrial settings.
The corporate logic is clear. The first wave of AI assistants made it easier to search, summarize and generate digital work. The next wave aims to automate physical work that has resisted conventional robotics because it is varied, messy and hard to script. That includes warehouse picking, factory inspection, hospital support, farming, cleaning, maintenance and eventually some domestic chores.
But the challenge is far greater than adding wheels or arms to a chatbot. Robots must understand space, force, friction, lighting, sound, touch and human movement in real time. They must also be safe. A bad answer from a chatbot can be corrected; a bad action from a machine holding a tool, moving a patient or operating near workers can cause injury.
Investment is rising, but so are the hurdles
The new robotics race is not only about intelligence. It is about manufacturing, components and reliability. Humanoid robots require precise actuators, durable joints, batteries, cooling systems, sensors and control software. A technically impressive prototype may still fail as a commercial product if it is too expensive, too fragile or too difficult to maintain.
That is why suppliers are becoming as important as model developers. A joint venture announced by SKF and Chinese component maker Leaderdrive, aimed at high-precision transmission components for humanoid robot joints, illustrates how the industry is moving from demonstrations toward supply-chain preparation. The venture is expected to begin operations by the end of 2026. Researchers also continue to warn that current systems remain brittle. A review of foundation models for mobile service robots found persistent challenges in multimodal sensor fusion, real-time decision-making, task generalization and human-robot interaction. In practice, a robot that performs well in one room, factory or warehouse can struggle when lighting, object placement, floor texture or human behavior changes.
That brittleness explains the industry’s interest in simulation and synthetic data. Training robots only in the real world is slow, costly and risky. Training them in digital environments allows developers to expose models to millions of variations before testing on hardware. But the “sim-to-real” gap remains difficult: simulated physics rarely captures every detail of real surfaces, cables, reflections, clutter or human unpredictability.
Compute is another barrier. Large multimodal models require significant processing power, yet robots need quick responses with limited onboard energy. Latency that is acceptable for a chatbot can be dangerous for a machine balancing on two legs or working near people. Companies are therefore investing heavily in edge chips and compact inference systems that can process sensor data locally rather than relying entirely on cloud connections.
China’s role is increasingly important. Domestic reporting and market analysis suggest a surge of financing into embodied intelligence companies, with robotics startups benefiting from a wider national push to commercialize AI and advanced manufacturing. The country’s advantage lies not only in software research, but in its ability to scale hardware through established electronics, battery, motor and component ecosystems.
Early deployments will be narrow, supervised and industrial
Despite the excitement around humanoid robots, the first large-scale commercial gains are likely to come from constrained settings rather than open-ended home use. Warehouses, factories, power plants and hospitals offer clearer tasks, controlled operating zones and measurable returns. Robots can be trained to inspect gauges, move bins, sort parcels, deliver supplies or support repetitive handling under human supervision.
Consumer robots remain a harder market. Startups are beginning to offer home robots for chores such as tidying, folding laundry or making beds, but these systems remain expensive and limited. A household is one of the most unpredictable environments for a robot: objects vary constantly, pets and children move unexpectedly, and users expect high reliability at consumer prices.
There are also social and ethical questions. Humanoid robots built for care, companionship or domestic work raise concerns about privacy, emotional dependency, surveillance and data collection inside intimate spaces. In China, Ubtech’s lifelike UWorld U1 robot has drawn attention not only for its human-like design, but also for plans involving 3D facial reconstruction and voice technology, raising questions about consent and the commercialization of simulated human presence.
For now, the most realistic path is gradual adoption. Companies will deploy robots in narrowly defined jobs, measure safety and productivity, then expand capabilities as models improve. Human oversight will remain central, especially in healthcare, logistics and manufacturing. The near-term future is less likely to be fully autonomous humanoids replacing workers wholesale than mixed environments where people supervise fleets of specialized machines.
Still, the strategic direction is unmistakable. The AI industry is expanding from language and images into movement and manipulation. If companies can overcome the engineering, safety and cost barriers, embodied AI could become one of the defining technologies of the next decade, reshaping not only how businesses automate work but how people interact with machines in everyday life.