• DeepSeek: A Disruptive Force in AI
    Feb 3 2025

    This episode explores DeepSeek, a Chinese AI startup challenging the AI landscape with its free alternative to ChatGPT. We'll examine DeepSeek's innovative architecture, including Mixture-of-Experts (MoE) and Multi-head Latent Attention (MLA), which optimize efficiency. The discussion will highlight DeepSeek's use of reinforcement learning (RL) and its impact on reasoning capabilities, as well as how its open-source approach is democratizing AI access and innovation.

    We will also discuss ethical concerns, the competitive advantages and disadvantages of US-based models, and how DeepSeek is impacting cost structures and proprietary models. Join us as we analyze DeepSeek’s influence on the AI industry and the future of AI development and international collaboration

    Show more Show less
    10 mins
  • VLSBench: A Visual Leakless Multimodal Safety Benchmark
    Jan 26 2025

    Are current AI safety benchmarks for multimodal models flawed? This podcast explores the groundbreaking research behind VLSBench, a new benchmark designed to address a critical flaw in existing safety evaluations: visual safety information leakage (VSIL)

    We delve into how sensitive information in images is often unintentionally revealed in the accompanying text prompts, allowing models to identify unsafe content based on text alone, without truly understanding the visual risks This "leakage" leads to a false sense of security and a bias towards simple textual alignment methods.

    Tune in to understand the critical need for leakless multimodal safety benchmarks and the importance of true multimodal alignment for responsible AI development. Learn how VLSBench is changing the way we evaluate AI safety and what it means for the future of AI.

    Show more Show less
    20 mins
  • Adaptive Stress Testing for Language Model Toxicity
    Jan 20 2025

    This episode explores ASTPrompter, a novel approach to automated red-teaming for large language models (LLMs). Unlike traditional methods that focus on simply triggering toxic outputs, ASTPrompter is designed to discover likely toxic prompts – those that could naturally emerge during regular language model use. The approach uses Adaptive Stress Testing (AST), a technique that identifies likely failure points, and reinforcement learning to train an "adversary" model. This adversary generates prompts that aim to elicit toxic responses from a "defender" model, but importantly, these prompts have a low perplexity, meaning they are realistic and likely to occur, unlike many prompts generated by other methods.

    Show more Show less
    15 mins
  • Global Responsible AI Maturity: A Survey of 1000 Organizations
    Jan 16 2025

    This episode dives into the critical topic of Responsible AI (RAI), exploring how organizations worldwide are grappling with the ethical and practical challenges of AI adoption. We'll be drawing insights from a comprehensive survey of 1000 organizations across 20 industries and 19 geographical regions

    Show more Show less
    19 mins
  • Ivy-VL: A Lightweight Multimodal Model for Everyday Devices
    Dec 9 2024

    In this episode, we dive into Ivy-VL, a groundbreaking lightweight multimodal AI model released by AI Safeguard in collaboration with Carnegie Mellon University (CMU) and Stanford University. With only 3 billion parameters, Ivy-VL processes both image and text inputs to generate text outputs, offering an optimal balance of performance, speed, and efficiency. Its compact design supports deployment on edge devices like AI glasses and smartphones, making advanced AI accessible on everyday hardware.

    Join us as we explore Ivy-VL's development, real-world applications, and how this collaborative effort is redefining the future of multimodal AI for smart devices. Whether you're an AI enthusiast, developer, or tech-savvy professional, tune in to learn how Ivy-VL is setting new standards for accessible AI technology.

    Show more Show less
    19 mins
  • Agent Bench: Evaluating LLMs as Agents
    Nov 27 2024

    Large Language Models (LLMs) are rapidly evolving, but how do we assess their ability to act as agents in complex, real-world scenarios? Join Jenny as we explore Agent Bench, a new benchmark designed to evaluate LLMs in diverse environments, from operating systems to digital card games.

    We'll delve into the key findings, including the strengths and weaknesses of different LLMs and the challenges of developing truly intelligent agents.

    Show more Show less
    13 mins
  • Hacking AI for Good: Open AI’s Red Teaming Approach
    Nov 24 2024

    In this podcast, we delve into OpenAI's innovative approach to enhancing AI safety through red teaming—a structured process that uses both human expertise and automated systems to identify potential risks in AI models. We explore how OpenAI collaborates with external experts to test frontier models and employs automated methods to scale the discovery of model vulnerabilities. Join Jenny as we discuss the value of red teaming in developing safer, more reliable AI systems.

    Show more Show less
    18 mins
  • Surgical Precision: PKE’s Role in AI Safety
    Nov 24 2024

    Explore how Precision Knowledge Editing (PKE) refines AI for safety and ethical behavior in Surgical Precision: PKE’s Role in AI Safety.

    Join experts as we uncover the science, challenges, and breakthroughs shaping trustworthy AI. Perfect for tech enthusiasts and professionals alike, this podcast reveals how PKE ensures AI serves humanity responsibly.

    Show more Show less
    14 mins