Ben Mann Monthly Mar 2023

Safety blog post, Claude open access, Hackathon, sitting & eating

Ben Mann

Apr 09, 2023

Purpose

An index for my memory
A menu of topics for my next conversation with you
A faster way to share what I’m excited about without the barrier of writing a complete blog entry on it
A skimmable way to spread content I found valuable

Photos

Experiments and experiences

Hackathon
- On behalf of Anthropic, attended an LLM hackathon at a giant mansion in Hillsborough that calls itself AGI House. Provided participants with access to Claude. Alongside Elad Gil, Huggingface's CEO, Boris Power from OpenAI, and a few others, I gave a talk emphasizing the importance of AI safety work. Missed the demos because I had to go home to be with Euda. This technology is changing so fast that everything built at the hackathon will probably be obsolete in 6-12 months. Talked to many participants about Anthropic's theory of change and where I think the industry is going.
Dad life month 5
- Euda continues to gain new capabilities! She's sitting up unsupported, eating solid food of all kinds, and has slept up to 8 hours at a stretch on her own! I'm starting to find interacting with her more rewarding since she's more responsive and I can see her figuring things out. And perhaps more importantly, her default now is not crying, but instead looking around, trying to roll, and grasping objects to shove in her mouth.

Life updates

🚀 Launched Claude in Slack
👨‍👩‍👧‍👦 Sister moved to SF
🥵 Stressful time at work, but getting better

Content

5 point Likert ratings for “I would recommend this content to a friend”, sorted

Anthropic's core views on safety 5/5
- A cogent and humble representation of Anthropic's empirical AI safety agenda, with forecasts around timelines and how our different safety bets address different possible future scenarios. Super excited to share this since until now Anthropic's theory of change has been more opaque than I'd like. Summary from Claude follows:
1. Anthropic believes AI could have an impact comparable to major revolutions like the industrial revolution, possibly within the next decade. While this view is speculative, the evidence from trends in compute and algorithmic progress suggests it's plausible enough to warrant serious safety planning.
2. There are two major sources of risk from advanced AI:
  - The technical alignment problem: We don't know how to build AI systems that are robustly helpful, harmless, and honest. Powerful but misaligned AI could be an existential threat.
  - Societal disruption: The rapid development of advanced AI will disrupt economies, jobs, and geopolitics in ways that could cause harm even if we solve the technical alignment problem.
3. Anthropic is taking an empirically-driven approach to AI safety focused on developing techniques to align AI systems and better understand their behaviors and capabilities. Their research areas include:
  - Mechanistic interpretability: Reverse engineering neural networks to gain insight into how they work.
  - Scalable oversight: Developing ways for AI systems to assist in their own oversight and training at scale. Examples include Constitutional AI and debate.
  - Process-oriented learning: Training AI systems to follow safe processes rather than just achieve outcomes. This could make them more transparent, controllable, and less prone to undesirable behaviors.
  - Understanding generalization: Gaining a better understanding of how large language models learn and apply knowledge in new contexts. This could help anticipate and address harmful emergent behaviors.
  - Testing for failure modes: Probing smaller models for dangerous behaviors like deception to better understand how they might arise in more advanced systems.
  - Evaluating societal impacts: Studying how AI systems are used and how they might impact society to help guide policy and research.
4. Anthropic's goal is to develop a "portfolio" of safety techniques applicable across a range of scenarios, from optimistic cases where safety is easy to achieve to pessimistic ones where it may be very difficult or impossible. Their approach will adapt based on what they discover about which scenario we're in.
Having kids 5/5
- One of the best summaries of the biases on how non-parents perceive becoming parents, and what it feels like once you overcome those biases and how it changes your life. Claude summary follows:
- The author was apprehensive about having children before becoming a parent. He saw parents as uncool and dull, and children as troublesome.
  - After having children, his views changed. He felt an instant protective instinct towards his kids and all children. While some of this change was due to biological changes, not all of it was.
  - Some of his preconceptions about parenting were wrong. He had only noticed badly behaved children and stressed parents before, missing the joyful moments. His own childhood misbehavior also gave him the wrong impression. In reality, parenting also includes peaceful, fun moments.
  - Parenting is difficult at times, but also rewarding. His children became his friends. While having kids made him less productive and ambitious, he found ways to work around this and continue to achieve his goals.
  - Though having kids meant losing some freedom and spontaneity, he rarely made use of this freedom before becoming a parent. He now experiences more happiness and joyful moments as a parent than before.
  - His experience of parenting has been positive overall, though he acknowledges that experiences vary widely for different people. Many of the worries he had before kids are likely common, as is the happiness children can bring.
Sing 2 4/5
- I found it hard to believe the 98% audience score on Rotten Tomatoes, but it was really good! Despite the cliche plot and characters, the execution was charming, and the world building was fun. Haven't seen the first one, but didn't feel like I missed much.
Ted Lasso S3E1-3 4/5
- I don't like football or sitcoms, but this show keeps getting better and better. The characters have real, hard struggles; the puns are terrible; and overall it's a masterclass in how to be a good person even if the people around you aren't.
- I particularly relate to Ted trying to take too much emotional labor on himself and struggling to be direct with the people who hurt him. When he breaks through, it inspires me to want to break through too!
The Last of Us S1E9 2/5
- Oof, what a dark ending. Joel sacrifices finding a cure for his relationship with a surrogate daughter! In his defense, maybe there's a nondestructive way to get the cure out, and/or maybe the destructive method wouldn't have worked.
- People sure love their dark apocalyptic dystopia content these days...

Ben Mann Monthly

Ben Mann Monthly Mar 2023

Safety blog post, Claude open access, Hackathon, sitting & eating

Experiments and experiences

Life updates

Content

Discussion about this post