Signal #120975POSITIVE

My research agenda and work

100

This is a summary of the work I've done and work I plan to do, and the theories of change and AI progress that motivate my work. I've been working full-time on alignment for three years and change, and thinking about brainlike AGI and its alignment increasingly often since 2004. Here's the research agenda in one breath: I'm trying to predict what the first transformative AI will be, in enough mechanistic detail that we can predict likely failure modes of its alignment. That's in service of finding interventions that address those failure modes efficiently, so that they can realistically be implemented even if timelines are short and work is rushed. I'm using my background in computational cognitive neuroscience to predict what might be called loosely brainlike AGI: LLMs with added human-like cognitive capacities. I'll give a summary in the rest of this section, then give a little more depth on each major thread of my work in the remaining sections. All of it is pretty brief.Approach an...

AI Alignment Forumabout 3 hours ago
Read Full Article

Explore with AI-Powered Tools

View All Signals

Explore more AI intelligence

Want to discover more AI signals like this?

Explore Steek
My research agenda and work | Steek AI Signal | Steek