AGI Safety Fundamentals
Interest form for the next round of this programme, taking place over Winter 2021
The course will last for 8 weeks. Each week (apart from week 0), cohorts meet for 1.5 hours to discuss the readings and exercises. Broadly speaking, the first half of the course explores the motivations and arguments underpinning the field of AGI safety, while the second half focuses on technical solution proposals.
The main focus each week will be on the four core readings and the first exercise, which should take around 2 hours total. If you find yourself taking longer than this, then it’s okay to skip the last core reading. If you’ve already read some of the core readings, or want to learn more about the topic, then the optional readings are recommended. Depending on the level of technical expertise in any given group, the facilitator may also advise the group to prioritise some readings over others.
Includes details on all upcoming socials, speaker events and Q&As, and integrates with your personal calendar
This page includes all the talks and Q&As that took place during the Summer 2021 programme.
This week mainly involves learning about foundational concepts in machine learning, for those who aren’t already familiar with them. For those who are already comfortable with these concepts, just do the final exercise. For future reference, see this ML glossary for explanations of unfamiliar terms.
What do we mean by artificial general intelligence, and how might we achieve it?
This week is the first week of regular discussions in your cohort, at your alloted weekly time.
This week we’ll focus on how and why AGIs might develop goals that are misaligned with those of humans, in particular when they’ve been trained using machine learning.
Given previous arguments, what might it look like when major problems arise, and how could we prevent them?
This week, we look at three techniques for training AIs based on human data (all listed under “learn from teacher” in Christiano’s AI alignment landscape from last week). These are the core building blocks from which techniques for solving outer alignment problems are constructed.
The most prominent research directions in technical AGI safety involve scaling up human-in-the-loop methods by breaking down the process of supervision into subtasks whose correctness we can be confident about. We’ll cover three closely-related variants this week (all classed under “build a better teacher” in Christiano’s AI alignment landscape).
A lot of safety work focuses on “shifting the paradigm” of AI research. This week we’ll cover three ways in which safety researchers have attempted to do so.
In the last week of curriculum content, we’ll look at the field of AI governance, as well as more general work on preparing for a future in which artificial minds play a major role.
Pick a topic related to AGI safety to investigate, and make a presentation or blog post on it. You can do a literature review, or try to explain existing concepts in your own words, or brainstorm new ideas. In the last meeting, everyone will present their work.
This is aimed at whatever would interest or benefit *you* to spend more time on; so feel free to be flexible with content or format.
Work on the projects will take throughout September, and your cohort will have an informal round of presentations/discussions on these projects at your regular cohort time (subject to your cohort changing the times) during the week of 20-26 Sept
Click the projects heading for more information
Provided here is a selection of machine learning courses, AI safety career advice, a list of AI safety organisations, ways to engage further with the AI safety community, and more.