Measuring Progress Toward AGI: A Cognitive Framework

Reynand WuNovember 17, 2025

0 10 5 minutes read

Google DeepMind has unveiled a groundbreaking initiative aimed at demystifying the path toward Artificial General Intelligence (AGI). The research organization today announced the release of a comprehensive paper, "Measuring Progress Toward AGI: A Cognitive Taxonomy," alongside the launch of a significant Kaggle hackathon. This dual approach seeks to establish a standardized method for evaluating the cognitive capabilities of AI systems, a crucial step in understanding our proximity to achieving AGI.

The advent of AGI holds the potential to revolutionize human endeavors, promising accelerated scientific discovery and novel solutions to humanity’s most complex challenges. However, the intangible nature of "general intelligence" has made it exceedingly difficult to quantify progress. The lack of empirical tools for objective evaluation has left researchers and the public alike with a nebulous understanding of how far we have come and how far we have yet to go. Google DeepMind posits that cognitive science, with its rich history of studying the human mind, offers a vital framework for building these much-needed evaluation metrics.

A Cognitive Taxonomy for AI

The core of Google DeepMind’s contribution lies in its newly published paper, which meticulously outlines a cognitive taxonomy. This taxonomy is not an arbitrary construct but is deeply rooted in decades of interdisciplinary research spanning psychology, neuroscience, and cognitive science. It identifies ten fundamental cognitive abilities that are hypothesized to be critical for the development of truly general intelligence in artificial systems. These abilities are envisioned as the building blocks of a sophisticated, adaptable, and human-like intelligence.

Measuring progress toward AGI: A cognitive framework

While the paper details the theoretical underpinnings of this taxonomy, the immediate practical application is being driven by the Kaggle hackathon. This collaborative competition, open to the global research community, invites participants to design and develop concrete evaluations for these cognitive abilities. The focus is particularly sharp on five areas where the current evaluation gap is most pronounced: learning, metacognition, attention, executive functions, and social cognition. By crowdsourcing the development of these evaluations, Google DeepMind aims to accelerate the creation of robust and reliable benchmarks.

From Theory to Practice: The Kaggle Hackathon

The "Measuring progress toward AGI: Cognitive abilities" hackathon, hosted on Kaggle, represents a significant step in operationalizing the cognitive framework. It encourages participants to move beyond theoretical discussions and engage in the hands-on creation of evaluation tools. The hackathon leverages Kaggle’s newly introduced Community Benchmarks platform, a feature designed to facilitate the testing and refinement of AI evaluations against a spectrum of leading AI models. This ensures that the developed evaluations are not only theoretically sound but also practically applicable to the current landscape of advanced AI.

The initiative is supported by a substantial prize pool of $200,000, underscoring the importance Google DeepMind places on this endeavor. The structure of the awards is designed to incentivize both specialized expertise and overall excellence. Ten thousand dollar awards will be presented to the top two submissions in each of the five cognitive ability tracks. Furthermore, four grand prizes of $25,000 each will be awarded to the four absolute best overall submissions, recognizing those that demonstrate exceptional ingenuity and impact. The submission window for the hackathon opened on March 17 and will close on April 16, with the results slated for announcement on June 1. Interested participants are encouraged to visit the Kaggle website to begin developing their evaluation designs.

The Three-Stage Evaluation Protocol

To facilitate a comprehensive assessment of AI systems against the proposed cognitive abilities, Google DeepMind advocates for a three-stage evaluation protocol. This protocol is designed to benchmark AI performance not in isolation, but in direct comparison to human capabilities. By establishing a clear point of reference, the framework aims to provide a more nuanced and accurate understanding of an AI system’s general intelligence.

The first stage involves defining and understanding the core cognitive abilities. The second stage focuses on developing specific tasks and metrics that can effectively measure these abilities in AI systems. The third stage, and perhaps the most critical for tracking progress, involves comparing the AI’s performance on these tasks against established human baselines. This comparative approach allows for a more precise charting of advancements and identifies areas where AI is approaching, or even surpassing, human cognitive performance.

Background and Context: The Quest for AGI

The pursuit of Artificial General Intelligence has been a long-standing ambition within the field of artificial intelligence. Unlike narrow AI, which is designed to perform specific tasks (e.g., image recognition, language translation), AGI refers to AI with the ability to understand, learn, and apply knowledge across a wide range of tasks at a level comparable to, or exceeding, human cognitive abilities.

The concept of AGI gained significant traction in the mid-20th century with pioneers like Alan Turing, who proposed the Turing Test as a measure of machine intelligence. Over the decades, significant advancements in computing power, algorithms, and data availability have propelled AI capabilities forward. However, the leap from sophisticated narrow AI to true AGI remains a formidable challenge. Researchers have grappled with fundamental questions about consciousness, reasoning, creativity, and common sense, all of which are considered hallmarks of general intelligence.

Various research institutions and companies worldwide are actively engaged in AGI research. However, a lack of standardized metrics has often led to fragmented progress and differing interpretations of what constitutes "general intelligence." This has fueled the need for a more unified and scientifically grounded approach to measurement. Google DeepMind’s cognitive taxonomy and hackathon are a direct response to this critical need.

Broader Impact and Implications

The implications of this initiative extend far beyond the immediate research community. A robust framework for measuring AGI progress could have profound effects on several fronts:

Research Direction: By providing clear benchmarks, the framework can help guide AI research efforts, allowing scientists to focus on developing capabilities that are demonstrably lacking or lagging. This could lead to more efficient and targeted research, accelerating the overall pace of AGI development.
Safety and Ethics: As AI systems become more capable, ensuring their safety and ethical alignment becomes paramount. A standardized evaluation framework can help identify potential risks and biases early in the development process, enabling the proactive implementation of safety measures. It also allows for a more informed public discourse about the capabilities and limitations of advanced AI.
Economic and Societal Impact: The development of AGI is widely expected to have transformative economic and societal consequences, from automation and job displacement to entirely new industries and scientific breakthroughs. A clearer understanding of the timeline and nature of AGI development can help policymakers, businesses, and individuals prepare for these changes.
Public Understanding and Trust: Demystifying AGI through clear, measurable progress can foster greater public understanding and trust in AI technologies. When progress is transparent and grounded in scientific evaluation, it can help alleviate public anxieties and promote informed dialogue about the future of AI.

The partnership with Kaggle is particularly noteworthy. Kaggle, a platform renowned for its data science competitions and community, provides an ideal environment for fostering collaboration and innovation. By opening this challenge to a global audience, Google DeepMind is not only seeking diverse perspectives and solutions but also democratizing the effort to understand and build AGI.

Looking Ahead

The launch of the "Measuring Progress Toward AGI: A Cognitive Taxonomy" paper and the accompanying Kaggle hackathon marks a significant milestone in the quest for Artificial General Intelligence. By grounding the evaluation of AI in the principles of cognitive science and fostering a collaborative, community-driven approach to developing practical benchmarks, Google DeepMind is paving the way for a more transparent, measurable, and ultimately, more predictable journey toward this ambitious goal. The outcomes of this initiative will undoubtedly shape the future of AI research and its impact on society for years to come.