We study the mechanisms of mind — how minds reflect, remember, form beliefs, make sense of experience, and collaborate with others — and implement them as architectures that give AI the capacity to reason, align, and cooperate effectively.
We work from first principles, drawing on cognitive science and neuroscience, to study curiosity, metacognition, affective reasoning, theory of mind, generative memory, and self-directed learning — then translate those principles into AI architectures. This work is essential for building AI that genuinely helps us and the world we live in.

Investigating how language models represent, store, and retrieve information over multiple timescales and contexts.

Studying self-awareness, confidence estimation, and reflective reasoning to enable models that understand their own thinking.

Exploring how models form, revise, and maintain beliefs in the face of uncertainty, evidence, and new information.

Building internal models of the world to support prediction, planning, and counterfactual reasoning.

Advancing methods for aligning models with human values, intent, and societal well-being.