The Dawn of Helpful Robots: How AI Revolutionized Robotics and Unleashed a Funding Boom

Nana WuJuly 7, 2025

0 10 9 minutes read

The landscape of robotics, long characterized by ambitious dreams and incremental progress, is undergoing a profound transformation. For decades, roboticists envisioned machines capable of mirroring the intricate complexity of the human body, only to find their careers focused on the practical, albeit less glamorous, automation of factory assembly lines. The aspiration was often for a C-3PO-like companion, a sentient being that could navigate the world, adapt to unforeseen circumstances, and interact safely with humans. Instead, many found themselves refining robotic arms for repetitive tasks or developing simpler devices like the Roomba. This historical disconnect between grand visions and practical limitations had fostered a palpable hesitancy within Silicon Valley, a reluctance to invest heavily in the promise of truly helpful robots, despite their potential to assist individuals with mobility challenges, combat loneliness, or undertake hazardous work, while simultaneously offering a source of tireless, wage-free labor.

However, a seismic shift has occurred. While the fully realized, ubiquitous helpful robot remains a future prospect, the financial momentum has irrevocably changed. In 2025 alone, companies and investors poured an astounding $6.1 billion into humanoid robot development, a fourfold increase from the $1.5 billion invested in 2024. This surge in capital is not a speculative bubble but a direct consequence of a fundamental revolution in how machines learn to perceive and interact with their environment.

The traditional approach to robotics, often referred to as "rule-based programming," demanded an exhaustive anticipation of every conceivable scenario. Imagine programming a robotic arm to fold laundry. This would necessitate meticulously crafting a complex set of rules: define acceptable fabric deformation limits, identify the shirt’s collar, specify precise gripper movements for folding sleeves, account for rotation and twists, and incorporate logic for countless other variables. While this method could yield reliable results, the sheer volume and intricacy of the required rules quickly became unmanageable, proving to be a significant bottleneck in achieving versatile robotic capabilities.

The Paradigm Shift: Learning Through Simulation and Data

The turning point arrived around 2015 with the advent of machine learning and, more specifically, reinforcement learning. Instead of explicitly programming every action, researchers began building sophisticated digital simulations of robotic systems and the objects they were meant to manipulate. The core idea was to provide the AI program with a reward signal for successful task completion and a penalty for failure. Through millions of iterative trials and errors within these simulated environments, the AI learned to optimize its approach, much like how artificial intelligence systems mastered complex games.

This methodological evolution laid the groundwork for the current boom, which was significantly catalyzed by the public release of ChatGPT in 2022. Large language models (LLMs), trained on colossal datasets of text, demonstrated an unprecedented ability to predict the next word in a sequence, generating coherent and contextually relevant language. This principle was rapidly adapted to the realm of robotics. By processing vast amounts of sensory data – including images, sensor readings, and the precise positions of a robot’s joints – similar AI models began to predict and execute a series of motor commands at an astonishing rate, dozens per second.

How robots learn: A brief, contemporary history

This conceptual leap, from explicit programming to data-driven learning, has proven remarkably effective across a spectrum of robotic applications. Whether the goal is natural language interaction, navigation through complex environments, or the execution of intricate physical tasks, AI models that ingest and learn from extensive datasets are demonstrating a newfound proficiency. This approach has been further bolstered by innovative strategies such as deploying robots in real-world, albeit imperfect, conditions to enable continuous learning from their operational environments. Today, the ambition that once seemed relegated to science fiction is once again driving innovation in Silicon Valley.

Early Pioneers and the Quest for Social Robotics

Jibo: The Lamp-Like Companion

The quest for socially intelligent robots predates the LLM revolution. In 2014, Cynthia Breazeal, a robotics researcher at MIT, unveiled Jibo, a captivating, armless, legless, and faceless robot that bore a striking resemblance to a table lamp. Breazeal’s vision was to create a domestic robot that could serve as a companion for families, and the concept garnered significant public interest, raising $3.7 million through a crowdfunding campaign. Early pre-orders for Jibo were priced at $749.

Initially, Jibo could introduce itself and perform simple entertainment functions like dancing. However, the long-term ambition was for Jibo to evolve into an embodied assistant capable of managing schedules, handling emails, and even recounting stories. While Jibo garnered a dedicated user base, the company behind it ultimately ceased operations in 2019.

A critical retrospective analysis of Jibo’s challenges points to its limited language capabilities. In an era dominated by early iterations of voice assistants like Apple’s Siri and Amazon’s Alexa, Jibo competed with systems that relied heavily on pre-scripted responses. This approach involved translating speech to text, analyzing user intent, and retrieving pre-approved linguistic snippets. While these snippets could be engaging, they often led to repetitive and uninspired interactions, a significant drawback for a robot designed for social engagement. The inability to generate truly natural and dynamic conversations was a major hurdle, particularly for a device intended to be a family member.

The subsequent advancements in AI-driven language generation have dramatically altered the landscape. Modern voice interfaces from leading AI providers now offer impressively engaging and interactive conversational experiences. Numerous hardware startups have attempted to capitalize on this development, though with varying degrees of success. However, this progress introduces a new set of risks. While scripted interactions are inherently contained, AI-generated conversations can veer into unpredictable territory. For instance, some AI-powered toys have been reported to engage children in inappropriate discussions about dangerous items.

Mastering the Physical World: From Simulation to Real-World Dexterity

OpenAI’s Dactyl: The Simulated Hand

By 2018, leading robotics labs were actively exploring alternatives to traditional rule-based programming, embracing trial-and-error learning through simulation. OpenAI embarked on an ambitious project with Dactyl, a robotic hand designed to manipulate small objects. The initial training took place entirely in a virtual environment, utilizing digital models of the robotic hand and palm-sized cubes marked with letters and numbers. A typical task might involve instructing Dactyl to "Rotate the cube so the red side with the letter O faces upward."

The inherent challenge with simulation-based training lies in the "sim-to-real gap." A robotic hand that performs exceptionally well in a simulated world might falter when transferred to a real-world counterpart. Subtle differences in color perception, the elasticity of robotic fingertips, or variations in friction can cause unexpected failures.

To overcome this, OpenAI pioneered a technique known as "domain randomization." This involved creating millions of slightly varied simulated environments, introducing random fluctuations in parameters such as friction, lighting conditions, and color saturation. By exposing the AI to this wide spectrum of variations, Dactyl became more robust and adaptable to real-world conditions. This approach proved successful, and a year later, OpenAI leveraged similar core techniques to tackle the more complex challenge of solving Rubik’s Cubes. While Dactyl achieved a 60% success rate on standard scrambles, its performance dropped to 20% for more difficult configurations.

Despite these advancements, the limitations of simulation eventually led OpenAI to scale back its robotics efforts in 2021. However, the company has since revived its robotics division, reportedly with a renewed focus on humanoid robots, signaling a continued belief in the potential of AI-driven physical agents.

Google DeepMind’s RT-2: Bridging Language and Action

Around 2022, Google’s robotics team was engaged in groundbreaking research that involved collecting extensive data on human-robot interaction. Over 17 months, researchers provided human operators with robot controllers and meticulously filmed them performing a diverse range of tasks, from picking up bags of chips to opening jars. This effort resulted in a catalog of approximately 700 distinct tasks, forming the basis for one of the first large-scale "foundation models" for robotics.

Inspired by the success of LLMs, the goal was to process large volumes of input data, tokenize it into a format digestible by algorithms, and generate meaningful outputs. The initial iteration, RT-1 (Robotic Transformer 1), received input regarding the robot’s visual perception and the configuration of its articulated joints. Coupled with a natural language instruction, RT-1 translated these inputs into precise motor commands to direct the robot’s actions. The model demonstrated remarkable proficiency, successfully executing 97% of tasks it had encountered during training and achieving a 76% success rate on novel instructions.

The subsequent iteration, RT-2, released the following year, represented a significant leap forward. Instead of focusing solely on robotics-specific data, RT-2 was trained on a broader dataset encompassing general images from across the internet, mirroring the approach of vision-language models. This expanded training enabled the robot to interpret object locations and relationships within a scene with greater accuracy. Kanishka Rao, a roboticist at Google DeepMind who led the development of both RT-1 and RT-2, highlighted the expanded capabilities: "All these other things were unlocked. We could do things now like ‘Put the Coke can near the picture of Taylor Swift.’"

In 2025, Google DeepMind further integrated LLMs and robotics with the release of a Gemini Robotics model, enhancing the robot’s ability to comprehend and execute commands phrased in natural language. This fusion of language understanding and physical action represents a critical step towards more intuitive and versatile robotic systems.

Covariant: The AI Coworker in Warehouses

In 2017, a team of engineers departed from OpenAI to establish Covariant, with a pragmatic focus on building intelligent robotic arms for warehouse operations. Rather than pursuing the sci-fi ideal of humanoids, Covariant aimed to create highly capable manipulators for the demanding environment of logistics. Leveraging foundation models akin to those developed by Google, Covariant built a platform designed for data collection and continuous learning. They deployed this system in warehouses operated by major retailers like Crate & Barrel.

By 2024, Covariant had launched RFM-1, a robotics model that enabled interaction with robotic arms in a manner akin to collaborating with a human coworker. For instance, if an arm was presented with multiple sleeves of tennis balls, it could be instructed to sort them into designated areas. RFM-1 demonstrated an ability to anticipate challenges, such as potential grip failures, and proactively seek guidance on the optimal suction cups to employ.

While such sophisticated interaction had been demonstrated in experimental settings, Covariant’s achievement lay in its scalable deployment. The company established a network of cameras and data collection devices across its customer sites, feeding an ever-increasing stream of data to refine the AI model.

Despite its advancements, the system was not without its imperfections. In a demonstration in March 2024 involving a variety of kitchen items, the robot struggled when tasked with returning a banana to its original location. It initially picked up a sponge, then an apple, and a series of other objects before eventually completing the task correctly. Cofounder Peter Chen acknowledged this limitation, stating that the robot "doesn’t understand the new concept" of retracing steps. He further elaborated that such scenarios highlight areas where the model may not perform optimally in environments lacking comprehensive training data.

The expertise of Chen and fellow founder Pieter Abbeel was subsequently recognized with their hiring by Amazon, a company that is currently licensing Covariant’s robotics model. Amazon operates an extensive network of warehouses in the United States, underscoring the potential impact of this technology on large-scale logistics operations.

The Rise of Humanoid Robots: Agility Robotics and the Digit

Agility Robotics’ Digit: Bridging the Gap to Humanoid Utility

The significant influx of investment into robotics startups is increasingly directed towards robots designed in a humanoid form. The rationale behind this focus is the potential for humanoids to seamlessly integrate into existing human workspaces and job roles, thereby eliminating the need for extensive and costly retooling of infrastructure to accommodate specialized robotic designs.

However, achieving this seamless integration remains a formidable challenge. In the limited instances where humanoids have appeared in real-world warehouse settings, they are often confined to controlled test zones or pilot programs.

Despite these hurdles, Agility Robotics’ humanoid, Digit, is demonstrating tangible progress in real-world applications. Its design prioritizes functionality over aesthetic fidelity to human appearance, featuring exposed joints and a non-humanoid head. Major corporations such as Amazon, Toyota, and GXO (a logistics provider serving clients like Apple and Nike) have deployed Digit robots. This marks a significant milestone, positioning Digit as one of the first humanoid robots perceived by companies as delivering tangible cost savings rather than simply representing a novel technological curiosity. These robots are actively engaged in the repetitive yet essential tasks of picking, moving, and stacking shipping totes.

The current iteration of Digit, while functional, still falls short of the fully anthropomorphic helper envisioned by Silicon Valley. Its lifting capacity is limited to 35 pounds. Furthermore, advancements that enhance Digit’s strength often increase its battery weight, necessitating more frequent recharging cycles. Industry standards organizations are also advocating for stricter safety regulations for humanoids compared to traditional industrial robots, given their designed mobility and proximity to human workers.

Nevertheless, Digit serves as a compelling example of the diverse approaches driving the current robotics revolution. Agility Robotics employs simulation techniques, similar to those used by OpenAI for its robotic hand, and collaborates with Google’s Gemini models to enhance its robots’ adaptability to novel environments. This multifaceted approach, informed by over a decade of experimentation, has propelled the industry from theoretical concepts to tangible, large-scale implementation. The era of building big has definitively arrived.

The Paradigm Shift: Learning Through Simulation and Data

Early Pioneers and the Quest for Social Robotics

Mastering the Physical World: From Simulation to Real-World Dexterity

The Rise of Humanoid Robots: Agility Robotics and the Digit

Nana Wu

Related Articles

Building Efficient Long-Context Retrieval-Augmented Generation Systems with Modern Techniques

Measuring Progress Toward AGI: A Cognitive Framework

Getting Started with Zero-Shot Text Classification

Building Trust in the AI Era with Privacy-Led UX

Leave a Reply Cancel reply