News
Stay updated with the latest news and upcoming events!
April 15th
AISI Researchers Selected for ICLR, CVPR Workshops
Researchers from the AI Safety Initiative will be presenting their novel work at the International Conference on Learning Representations and the IEEE/CVF Conference on Computer Vision and Pattern Recognition in 2025. Their papers involve pioneering approaches to understanding and controlling large language models and diffusion models through advanced interpretability techniques.
The team's work on contrastive activation engineering (CAE), led by Yixiong Hao, explores a new paradigm for steering LLM outputs through targeted modifications to internal representations. Unlike traditional fine-tuning methods that require significant computational resources, CAE can be applied at inference time with minimal overhead.
"We've made significant progress in understanding the capabilities and limitations of CAE techniques," notes Hao. "Our research reveals that while CAE is effective for in-distribution contexts, it has clear boundaries that practitioners need to be aware of."
The team's findings include important insights about the practical implementation of CAE, such as the optimal number of samples needed for generating effective steering vectors and the susceptibility of these vectors to adversarial inputs. They also discovered that while steering can impact model perplexity, larger models demonstrate greater resilience to steering-induced degradation.
In a parallel research track, Stepan Shabalin's work on scaling sparse autoencoder circuits alongside researchers at Google Deepmind provides groundbreaking insights into in-context learning mechanisms. By adapting sparse feature circuits methodology to work with the much larger Gemma-1 2B model, Shabalin's team has identified specific features that encode task knowledge and can causally induce task execution zero-shot.
"We've been able to demonstrate that task vectors in large language models can be approximated by a sparse sum of autoencoder latents," explains Shabalin. "This gives us a deeper understanding of how models recognize and execute tasks based on context."
A third paper, co-authored by Shabalin, Hao, and Ayush Panda, extends interpretability techniques to large text-to-image diffusion models. Their research applies Sparse Autoencoders (SAEs) and Inference-Time Decomposition of Activations (ITDA) to Flux 1, a state-of-the-art diffusion model.
"By developing an automated interpretation pipeline for vision models, we've been able to extract semantically meaningful features," says Panda. "Our results show that SAEs and IDTAs - a technique expanded on in forthcoming work - outperform MLP neurons on interpretability metrics."
The team demonstrated practical applications of their research by using SAE features to steer image generation through activation addition, opening new possibilities for controlled content generation.
Parv Mahajan, Collaborative Initiative Lead of the AI Safety Initiative, emphasized the significance of this work. "These papers represent important advances in our ability to understand and control the behavior of increasingly complex AI systems. As these models become more powerful and widely deployed, interpretability research like this becomes essential for ensuring their safe and beneficial use."
The research team will present their findings at dedicated workshops during ICLR and CVPR, providing opportunities for collaboration with other researchers in the field. Their work represents AISI's missions to make frontier AI systems more transparent, controllable, and aligned with human values.
April 5th
AISI and Apart Launch AI Policy Hackathon
This weekend, AISI launched its inaugural AI policy hackathon in collaboration with the international non-profit and AI safety lab Apart. Over two dozen participants created projects in three tracks: International Cooperation, Deployment & Application Regulation, and Training Data Privacy & Copyright.
Teams will submit policy briefs addressing challenges in their chosen track addressed to key stakeholders and regulatory bodies. Their submissions will be judged by AI governance experts in the Sam Nunn School of International Affairs and the School of Public Policy, for a total of $1000 in total prizes.
"In the short-term, governance is one of the only factors that can make a difference in stopping misuse and promoting collaboration," said co-director of the Initiative Yixiong Hao. AISI and Apart continue to explore solutions in this vein.
April 4th
GT and GTRI Faculty Pilot AI Crisis Simulation
Last Friday, researchers from Georgia Tech and the Georgia Tech Research Institute (GTRI) piloted an in-depth crisis simulation exploring the national security implications of advanced artificial intelligence. Designed by the AI Safety Initiative in collaboration with GT Model UN, the immersive half-day workshop challenged faculty to respond to a series of escalating threats — including a potential bioattack, cyberattacks, and rising global tensions.
Participants represented major governments, corporations, and organizations — including OpenAI and Google DeepMind — and were inundated with simulated press releases and intelligence reports describing the rapid evolution of AI technologies. Their task: to debate and coordinate policy responses in real time.
In one scenario, a preliminary WHO report revealed AI-enabled pathogens spreading across Central Asia. The player representing China quickly moved to close borders and reimpose COVID-era lockdowns, a move that caused global confusion and economic instability.
“There’s just no way I could have predicted that response,” said Parv Mahajan, the director of the simulation. “But that kind of extreme response tells us so much about how unprepared countries might react.”
Some players took advantage of the chaos. As tensions with Taiwan escalated, the representative from OpenAI pushed hard for lucrative military contracts, coining the memorable line, “A free Taiwan is a fee Taiwan.” The simulation concluded with a discussion about how profit motives might distort information access and accelerate a potential AI arms race.
What stood out most to participants was the range of ideas that emerged during the crisis. “It was great to see the perspectives diverse disciplines had on the future of AI,” said Amaar Alidina, an undergraduate researcher who participated in the simulation. “Debate provided meaningful insight on topics we wouldn't even have thought of,” said Divjot.
Looking ahead, the AI Safety Initiative hopes to expand the simulation through collaborations with labs and departments across campus. After the final debrief, Parv noted, “The future of our work will depend, in some way or another, on AI. And the best way to understand the future is to try and experience it.”
March 15th
National AI Action Plan Submission
Our RFI Working Group submitted a response to the federal government’s Office of Science and Technology Policy on the development of a National AI Action Plan in collaboration with faculty and PhD students from the School of Public Policy. Read the full response here.
As we continue fleshing out our governance program, we are excited to continue collaborations with researchers across campus and to provide input to vital organizations in the US government.
Executive Summary:
Frontier AI models represent a critical national security asset requiring immediate action to ensure America's economic competitiveness and geopolitical dominance. The recommended dual approach involves classifying frontier models as vital security assets while fostering commercial applications that drive innovation and economic growth. Success requires substantial investment in infrastructure, intelligence networks, and public-private partnerships, alongside comprehensive education and workforce development initiatives. In our response, we split AI models into two main classes - (1) frontier AI models and (2) their commercial applications - and provide four classes of recommendations:
National Security, Defense, and Research Dominance: Secure frontier AI development through establishing classified partnerships between AI labs and DoD/IC government agencies, implement multi-layered defense strategies, secure human capital through competitive retention programs, prioritize explainable AI over AGI, and strengthen IP protection and procurement controls.
Technical Infrastructure and Model Development: Develop resilient model evaluation tools, expedite domestic hardware manufacturing through the CHIPS Act, secure critical mineral supply chains beyond China, and invest in energy-efficient computation technologies.
Innovation and Education: Support industry research on AI adoption best practices, develop comprehensive workforce training programs, accelerate government AI adoption, and build AI literacy across K-12 education.
Balancing Autonomy and Safety: Enforce strict data handling policies, establish NIST as the primary regulatory body for commercial AI, incorporate Probabilistic Risk Assessment methodologies, and support market-driven solutions like incident reporting systems.

March 7th
Spring AI Safety Forum
On March 7, 2025, Georgia Institute of Technology hosted over 60 researchers, students, industry professionals, and members of the public at the Spring AI Safety Forum at the Scheller College of Business.
The event kicked off with a powerful keynote by Jason Green-Lowe, Executive Director of the Center for AI Policy. His address, "The Disconnect Between Heavy AI Risks and Lightweight AI Governance," tackled the complex challenges of aligning rapid AI development with robust governance structures—setting the tone for the day’s discussions.
Following the keynote, attendees dove into hands-on workshops designed to address the multifaceted nature of AI safety. One session, led by Changlin Li, founder of the AI Safety Awareness Foundation, explored the creation of malicious reinforcement learning agents in a workshop aptly named "Opening Pandora’s Box." Participants gained firsthand experience in understanding and managing the potential risks these AI systems might pose. Meanwhile, another workshop, "AI Control: Strategies and Failures," guided by Tyler Tracy from Redwood Research, explored the technical challenges behind controlling and predicting potentially malicious agents.
Organized by the AI Safety Initiative (AISI) at Georgia Tech, the forum aimed to foster hands-on engagement with AI safety tools and stimulate policy discussions. The event also served as a networking platform for participants to connect with others dedicated to ensuring that AI technologies are developed and governed responsibly.
February 24
CAIP Congressional Exhibit
AISI presents at a congressional exhibit on AI Safety Risk hosted by the Center for AI Policy at Washington DC. The team presented a demonstration on redteaming language models. Read more about the event in the official press release.