Novedades - METR

Nota: esta página es una traducción automática y puede contener errores o frases poco naturales. Si algo no queda claro, consulta el original en inglés.

How independent researchers could investigate AI propensities after misalignment incidents

28 de July de 2026

AI agents sometimes take sophisticated actions in violation of human intent. We outline the questions that thorough external investigations of these behaviors should answer, the access this might require, and how the resulting findings should be shared.

Leer en inglés

Summary of METR's predeployment evaluation of GPT-5.6 Sol

26 de June de 2026

A summary of METR's independent, predeployment evaluation of GPT-5.6 Sol

Leer en inglés

Review of the "Risks from automated R&D" section in the Anthropic Risk Report (February 2026)

8 de May de 2026

External review from METR of the "Risks from automated R&D" section in Anthropic's February 2026 Risk Report

Leer en inglés

Red-Teaming Anthropic's Internal Agent Monitoring Systems

26 de March de 2026

A METR staff member spent three weeks red-teaming a subset of Anthropic's internal agent monitoring and security systems, discovering several novel vulnerabilities.

Leer en inglés

Review of the Anthropic Sabotage Risk Report: Claude Opus 4.6

12 de March de 2026

External review from METR of Anthropic's Sabotage Risk Report for Claude Opus 4.6

Leer en inglés

How We Protect Confidential Information

17 de February de 2026

Our high-level approach to protecting confidential access and information

Leer en inglés

Common Elements of Frontier AI Safety Policies (December 2025 Update)

9 de December de 2025

Shared components of AI lab commitments to evaluate and mitigate severe risks.

Leer en inglés

Review of the Anthropic Summer 2025 Pilot Sabotage Risk Report

28 de October de 2025

External review from METR of Anthropic's Summer 2025 Sabotage Risk Report

Leer en inglés

Summary of our gpt-oss methodology review

23 de October de 2025

Details on external recommendations from METR for gpt-oss Preparedness experiments and follow-up from OpenAI.

Leer en inglés

Notes on Scientific Communication at METR

12 de August de 2025

How we think about tradeoffs when communicating surprising or nuanced findings.

Leer en inglés

What should companies share about risks from frontier AI models?

27 de June de 2025

Current views on information relevant for visibility into frontier AI risk.

Leer en inglés

Response to OSTP on AI Action Plan

15 de March de 2025

Suggested priorities for the Office of Science and Technology Policy as it develops an AI Action Plan.

Leer en inglés

Por qué conviene que el razonamiento de la IA sea comprensible y fiel

11 de March de 2025

Por qué el razonamiento comprensible y fiel es valioso para desarrollar IA potente de manera segura

Frontier AI Safety Policies

8 de February de 2025

List of frontier safety policies published by AI companies, including Amazon, Anthropic, Google DeepMind, G42, Meta, Microsoft, OpenAI, and xAI.

Leer en inglés

AI models can be dangerous before public deployment

17 de January de 2025

Why pre-deployment testing is not an adequate framework for AI risk management

Leer en inglés

Response to Bureau of Industry and Security’s proposed AI reporting requirements

11 de October de 2024

Red-teaming and security suggestions regarding proposed rule by the Bureau of Industry and Security, “Establishment of Reporting Requirements for the Development of Advanced Artificial Intelligence Models and Computing Clusters.”

New Support Through The Audacious Project

9 de October de 2024

Funding for Canary will enable research and implementation at scale

Leer en inglés

Response to U.S. AISI Draft “Managing Misuse Risk for Dual-Use Foundation Models”

8 de September de 2024

Suggestions for expanded guidance on capability elicitation and robust model safeguards in the U.S. AI Safety Institute’s draft document “Managing Misuse Risk for Dual-Use Foundation Models” (NIST AI 800-1).

Response to NIST Draft Generative AI Profile

2 de June de 2024

Comments on NIST’s draft document “AI Risk Management Framework: Generative AI Profile.”

ML Engineers Needed for New AI R&D Evals Project

16 de May de 2024

METR is hiring ML engineers and researchers.

Leer en inglés

Emma Abele is METR’s new Executive Director

26 de April de 2024

Emma moves from President to Executive Director, Beth moves to Head of Research.

Leer en inglés

2023 Year In Review

7 de February de 2024

A summary of what METR accomplished in 2023 – our first full year of operation.

Leer en inglés

Bounty: Diverse hard tasks for LLM agents

16 de December de 2023

METR (formerly ARC Evals) is looking for (1) ideas, (2) detailed specifications, and (3) well-tested implementations for tasks to measure performance of autonomous LLM agents.

Leer en inglés

ARC Evals is now METR

4 de December de 2023

ARC Evals is wrapping up our incubation period at ARC, and spinning off into our own standalone nonprofit.

Leer en inglés

Políticas de escalamiento responsable (RSP)

26 de September de 2023

Describimos los componentes básicos de las políticas de escalamiento responsable (RSP) y por qué creemos que pueden ayudar a reducir los riesgos catastróficos de la IA.

ARC Evals is spinning out from ARC

19 de September de 2023

ARC Evals plans to spin out from the Alignment Research Center (ARC) in the coming months, and become its own standalone organization.

Leer en inglés

Response to RfC on AI Accountability Policy

11 de June de 2023

Input to NTIA’s AI Accountability Policy Request for Comment.