AI Safety Breakthrough

Auteur(s): AI SafeGuard
  • Résumé

  • The future of AI is in our hands. Join AI SafeGuard on "AI Safety Breakthrough" as we explore the frontiers of AI safety research and discuss how we can ensure a future where AI remains beneficial for everyone. We delve into the latest breakthroughs, uncover potential risks, and empower listeners to become informed participants in the conversation about AI's role in society. Subscribe now and become part of the solution!

    Intro about the author

    J, graduated from Carnegie Mellon University, School of Computer Science, 10+ years in Cybersecurity, Cyber Threat Intelligence, Risk, Compliance, privacy and AI Safety.

    Voir plus Voir moins
Épisodes
  • DeepSeek: A Disruptive Force in AI
    Feb 3 2025

    This episode explores DeepSeek, a Chinese AI startup challenging the AI landscape with its free alternative to ChatGPT. We'll examine DeepSeek's innovative architecture, including Mixture-of-Experts (MoE) and Multi-head Latent Attention (MLA), which optimize efficiency. The discussion will highlight DeepSeek's use of reinforcement learning (RL) and its impact on reasoning capabilities, as well as how its open-source approach is democratizing AI access and innovation.

    We will also discuss ethical concerns, the competitive advantages and disadvantages of US-based models, and how DeepSeek is impacting cost structures and proprietary models. Join us as we analyze DeepSeek’s influence on the AI industry and the future of AI development and international collaboration

    Voir plus Voir moins
    10 min
  • VLSBench: A Visual Leakless Multimodal Safety Benchmark
    Jan 26 2025

    Are current AI safety benchmarks for multimodal models flawed? This podcast explores the groundbreaking research behind VLSBench, a new benchmark designed to address a critical flaw in existing safety evaluations: visual safety information leakage (VSIL)

    We delve into how sensitive information in images is often unintentionally revealed in the accompanying text prompts, allowing models to identify unsafe content based on text alone, without truly understanding the visual risks This "leakage" leads to a false sense of security and a bias towards simple textual alignment methods.

    Tune in to understand the critical need for leakless multimodal safety benchmarks and the importance of true multimodal alignment for responsible AI development. Learn how VLSBench is changing the way we evaluate AI safety and what it means for the future of AI.

    Voir plus Voir moins
    20 min
  • Adaptive Stress Testing for Language Model Toxicity
    Jan 20 2025

    This episode explores ASTPrompter, a novel approach to automated red-teaming for large language models (LLMs). Unlike traditional methods that focus on simply triggering toxic outputs, ASTPrompter is designed to discover likely toxic prompts – those that could naturally emerge during regular language model use. The approach uses Adaptive Stress Testing (AST), a technique that identifies likely failure points, and reinforcement learning to train an "adversary" model. This adversary generates prompts that aim to elicit toxic responses from a "defender" model, but importantly, these prompts have a low perplexity, meaning they are realistic and likely to occur, unlike many prompts generated by other methods.

    Voir plus Voir moins
    15 min

Ce que les auditeurs disent de AI Safety Breakthrough

Moyenne des évaluations de clients

Évaluations – Cliquez sur les onglets pour changer la source des évaluations.