AI Explained Official Podcast

Auteur(s): Philip - Host of AI Explained YT
  • Résumé

  • Covering the biggest news of the century - the arrival of smarter-than-human AI. From the author of Simple Bench, which reveals the remaining gap between LLM and human reasoning. Hype-free, and the British accent is a freebie bonus.

    © 2025 AI Explained Official Podcast
    Voir plus Voir moins
Épisodes
  • Deep Research by OpenAI - The Ups and Downs vs DeepSeek R1 Search + Gemini Deep Research
    Feb 3 2025

    12 hours ago Deep Research was unveiled, and I’ve tested it thoroughly, including vs Deepseek R1 with search, Gemini Deep Research and even R1 in Perplexity. It’s a notable step forward, with one big caveat. I’ll go through all the benchmark figures, my initial impression of the o3 model within, and much more.

    Deep Research:
    https://openai.com/index/introducing-deep-research/

    https://www.youtube.com/watch?v=YkCDVn3_wiw


    GAIA Bench: https://openreview.net/forum?id=fibxvahvs3

    https://openreview.net/pdf?id=fibxvahvs3

    CodeELO:https://arxiv.org/pdf/2501.01257

    CamelCamel:https://uk.camelcamelcamel.com/

    Deepseek R1 with search: https://chat.deepseek.com/

    https://arxiv.org/pdf/2501.12948

    HaluBench: https://arxiv.org/pdf/2407.08488


    Chapters:

    00:00 - Introduction

    01:06 - Powered by o3, Humanity’s Last Exam, GAIA

    03:55 - Simple Tests

    06:00 - Good News vs Deepseek R1 and Gemini Deep Research

    09:32 - Bad News on Hallucinations

    14:14 - What Can’t it Browse?

    14:42 - For Shopping?

    16:40 - Final thoughts



    Voir plus Voir moins
    19 min
  • o3-mini and the “AI War”
    Jan 31 2025

    o3-mini is here, and yes, I’ve read the paper in full - 2 hours after release, and even the post-launch Reddit AMA. Some epic details like a FrontierMath score that made me double-take, a likely new Cursor favorite, bio risk expertise and a cost-comparison with Deepseek R1., But does it perform on basic reasoning - let’s find out. Plus, arguably the bigger story - the increasingly frenetic rhetoric coming out of the West - and Dario Amodei and Alexandr Wang (CEOs of Anthropic and Scale AI respectively) in particular. The last thing we need is an “AI War”.


    https://wandb.me/simple-bench


    (Colab): https://colab.research.google.com/drive/1AVijcPnEkl8Gy_754XbRdG5m7Q5-9slg?usp=sharing


    Chapters:

    00:00 - Introduction

    00:45 - o3 mini

    05:11 - First impressions vs Deepseek R1

    07:21 - 10x Scale, o3-mini System Card, Amodei Essay, bitcoin wallets…

    12:40 - Simple Competition Finale

    13:03 - Clips and Final Thoughts on the “AI War”



    O3-mini: https://openai.com/index/openai-o3-mini/

    Paper: https://cdn.openai.com/o3-mini-system-card.pdf

    Amodei Essay: https://darioamodei.com/on-deepseek-and-export-controls?s=09

    FrontierMath wild stat:https://arxiv.org/pdf/2411.04872

    Sam Altman Channels Napoleon: https://x.com/sama/status/1883185690508488934

    Altman ‘pulls up releases’: https://x.com/sama/status/1884066337103962416

    “AI War” by Wang: https://scale.com/blog/win-the-ai-war

    Anthropic Original Views on Capabilities: https://www.anthropic.com/news/core-views-on-ai-safety

    AI Insider Cost Comparison:https://x.com/arankomatsuzaki/status/1884676245922934788

    Deepseek R1 Paper: https://arxiv.org/pdf/2501.12948

    R1, o3-mini Price Comparison: https://techcrunch.com/2025/01/31/openai-launches-o3-mini-its-latest-reasoning-model/

    Semianalysis on $1,3M deepseek salaries, and them falling behind as ‘the time gap to match US capabilities increases’: https://semianalysis.com/2025/01/31/deepseek-debates/

    OpenAI Valuation: https://www.bloomberg.com/news/articles/2025-01-30/openai-in-talks-to-raise-funding-at-340-billion-value-wsj-says?srnd=phx-ai

    Wang Clip: https://x.com/tsarnick/status/1867700453494206883

    Amodei Clip: https://x.com/ai_ctrl/status/1884951111771001188

    https://simple-bench.com/



    Voir plus Voir moins
    15 min
  • Nothing Much Happens in AI, Then Everything Does All At Once
    Jan 24 2025

    When it rains, it pours. OpenAI Operator tested and reviewed, with full paper analysis. Perplexity Assistant is useful. Then Stargate, is it all smoke and mirrors? Strong rumours of an o3+ model from Anthropic. Then a full breakdown of Deepseek R1, and what it’s training method says about the state of AI. It’s not open source BTW. Plus Humanity’s Last Exam, and Hassabis Accelerates his AGI timeline.

    00:00 - Introduction

    00:54 - OpenAI Operator

    04:53 - Perplexity Assistant

    05:15 - StarGate

    07:51 - Better than o3?

    08:25 - DeepSeek R1 Analysis

    12:12 - Training Secrets

    15:19 - No More Process Rewarding ?

    19:01 - Hassabis Timeline Accelerates

    21:22 - Humanity’s Last Exam


    https://app.grayswan.ai/arena/chat/harmful-ai-assistant

    https://app.grayswan.ai/arena

    https://openai.com/index/computer-using-agent/

    System Prompt: https://github.com/wunderwuzzi23/scratch/blob/master/system_prompts/operator_system_prompt-2025-01-23.txt


    OpenAI Operator: https://operator.chatgpt.com/

    System Card: https://cdn.openai.com/operator_system_card.pdf


    There is No Plan: https://x.com/jeffclune/status/1882120726339318007


    Perplexity Assistant: https://x.com/perplexity_ai/status/1882466239123255686


    Stargate: https://openai.com/index/announcing-the-stargate-project/

    Labour goes to 0: https://moores.samaltman.com/

    Larry Ellison AI Surveillance: https://x.com/TheChiefNerd/status/1882042989184430332

    Amodei 1984: https://www.bloomberg.com/news/articles/2025-01-22/anthropic-ceo-says-openai-s-stargate-venture-seems-chaotic

    Microsoft Hesitate: https://www.theinformation.com/articles/why-sam-altman-joined-forces-with-larry-ellison-and-took-a-step-back-from-microsoft?rc=sy0ihq


    Dylan Patel o3+ for Anthropic: https://www.youtube.com/watch?v=7EH0VjM3dTk


    Deepseek R1: https://arxiv.org/pdf/2501.12948

    https://arxiv.org/pdf/2412.19437

    Diagram: https://pbs.twimg.com/media/GhyQsM6WQAE7W52?format=jpg&name=large

    https://simple-bench.com/

    Process: https://x.com/sama/status/1664018190840614912

    https://x.com/karpathy/status/1835561952258723930

    https://openai.com/index/trading-inference-time-compute-for-adversarial-robustness/?s=09

    Demis Interview: https://www.youtube.com/watch?v=yr0GiSgUvPU

    Humanity’s Last Exam:

    https://agi.safe.ai/

    https://x.com/DanHendrycks/status/1882481730671857815

    https://www.nytimes.com/2025/01/23/technology/ai-test-humanitys-last-exam.html?s=09



    Voir plus Voir moins
    23 min

Ce que les auditeurs disent de AI Explained Official Podcast

Moyenne des évaluations de clients

Évaluations – Cliquez sur les onglets pour changer la source des évaluations.