Research

We host several resarch projects investigating open problems in AI safety.

Focus Areas

Our broad purpose is to address emergent risks from advanced AI systems. We welcome a variety of interests in this area but are mainly aligned with Anthropic's recommendations. Here are a few prominent areas of interest:

Current Projects

Eliciting Language Model Behaviors using Reverse Language Models

We evaluate the applicability of a reverse language model, pre-trained on inverted token-order, as a tool for automated identification of an LM's natural language failure modes.

Scaling laws for activation addition

Activation engineering is a promising direction for controlling LLM behavior at inference time with zero compute cost.  Recent research suggest manipulating model internals may even enable more precise control over model outputs.  We seek to understand how techniques operating on model activation scale with model size and improve their performance for larger models.

Supervised Program for Alignment Research

Started by groups at UC Berkeley, Georgia Tech, and Stanford and now organized by Kairos, the Supervised Program on Alignment Research (SPAR) is a national project-based research program for students interested in AI safety running this fall. SPAR matches students around the world with advisors to do guided projects in AI safety.

Research opportunities

Supervised Program for Alignment Research

Started by groups at UC Berkeley, Georgia Tech, and Stanford and now organized by Kairos, the Supervised Program on Alignment Research (SPAR) is a national project-based research program for students interested in AI safety running this fall. SPAR matches students around the world with advisors to do guided projects in AI safety.

Learn more »

AI safety research for junior design

We are currently in talks to work with Georgia Tech's CS and ML faculty to mentor projects in AI safety research for the junior design research option.  Keep an eye out for updates and join our server!

External opportunities in AI safety

This site contains a list of entry-level opportunities in technical and governance AI safety research