Page de couverture de 52 Weeks of Cloud

52 Weeks of Cloud

52 Weeks of Cloud

Auteur(s): Noah Gift
Écouter gratuitement

À propos de cet audio

A weekly podcast on technical topics related to cloud computing including: MLOPs, LLMs, AWS, Azure, GCP, Multi-Cloud and Kubernetes.2021-2024 Pragmatic AI Labs Mathématique Science
Épisodes
  • ELO Ratings Questions
    Sep 18 2025
    Key Argument
    • Thesis: Using ELO for AI agent evaluation = measuring noise
    • Problem: Wrong evaluators, wrong metrics, wrong assumptions
    • Solution: Quantitative assessment frameworks
    The Comparison (00:00-02:00)

    Chess ELO

    • FIDE arbiters: 120hr training
    • Binary outcome: win/loss
    • Test-retest: r=0.95
    • Cohen's κ=0.92

    AI Agent ELO

    • Random users: Google engineer? CS student? 10-year-old?
    • Undefined dimensions: accuracy? style? speed?
    • Test-retest: r=0.31 (coin flip)
    • Cohen's κ=0.42
    Cognitive Bias Cascade (02:00-03:30)
    • Anchoring: 34% rating variance in first 3 seconds
    • Confirmation: 78% selective attention to preferred features
    • Dunning-Kruger: d=1.24 effect size
    • Result: Circular preferences (A>B>C>A)
    The Quantitative Alternative (03:30-05:00)

    Objective Metrics

    • McCabe complexity ≤20
    • Test coverage ≥80%
    • Big O notation comparison
    • Self-admitted technical debt
    • Reliability: r=0.91 vs r=0.42
    • Effect size: d=2.18
    Dream Scenario vs Reality (05:00-06:00)

    Dream

    • World's best engineers
    • Annotated metrics
    • Standardized criteria

    Reality

    • Random internet users
    • No expertise verification
    • Subjective preferences
    Key StatisticsMetricChessAI AgentsInter-rater reliabilityκ=0.92κ=0.42Test-retestr=0.95r=0.31Temporal drift±10 pts±150 ptsHurst exponent0.890.31Takeaways
    1. Stop: Using preference votes as quality metrics
    2. Start: Automated complexity analysis
    3. ROI: 4.7 months to break even
    Citations Mentioned
    • Kapoor et al. (2025): "AI agents that matter" - κ=0.42 finding
    • Santos et al. (2022): Technical Debt Grading validation
    • Regan & Haworth (2011): Chess arbiter reliability κ=0.92
    • Chapman & Johnson (2002): 34% anchoring effect
    Quotable Moments

    "You can't rate chess with basketball fans"

    "0.31 reliability? That's a coin flip with extra steps"

    "Every preference vote is a data crime"

    "The psychometrics are screaming"

    Resources
    • Technical Debt Grading (TDG) Framework
    • PMAT (Pragmatic AI Labs MCP Agent Toolkit)
    • McCabe Complexity Calculator
    • Cohen's Kappa Calculator

    🔥 Hot Course Offers:
    • 🤖 Master GenAI Engineering - Build Production AI Systems
    • 🦀 Learn Professional Rust - Industry-Grade Development
    • 📊 AWS AI & Analytics - Scale Your ML in Cloud
    • ⚡ Production GenAI on AWS - Deploy at Enterprise Scale
    • 🛠️ Rust DevOps Mastery - Automate Everything
    🚀 Level Up Your Career:
    • 💼 Production ML Program - Complete MLOps & Cloud Mastery
    • 🎯 Start Learning Now - Fast-Track Your ML Career
    • 🏢 Trusted by Fortune 500 Teams

    Learn end-to-end ML engineering from industry veterans at PAIML.COM

    Voir plus Voir moins
    4 min
  • The 2X Ceiling: Why 100 AI Agents Can't Outcode Amdahl's Law"
    Sep 17 2025
    AI coding agents face the same fundamental limitation as parallel computing: Amdahl's Law. Just as 10 cooks can't make soup 10x faster, 10 AI agents can't code 10x faster due to inherent sequential bottlenecks.📚 Key ConceptsThe Soup AnalogyMultiple cooks can divide tasks (prep, boiling water, etc.)But certain steps MUST be sequential (can't stir before ingredients are in)Adding more cooks hits diminishing returns quicklyPerfect metaphor for parallel processing limitsAmdahl's Law ExplainedMathematical principle: Speedup = 1 / (Sequential% + Parallel%/N)Logarithmic relationship = rapid plateauSequential work becomes the hard ceilingEven infinite workers can't overcome sequential bottlenecks💻 Traditional Computing BottlenecksI/O Operations - disk reads/writesNetwork calls - API requests, database queries Database locks - transaction serializationCPU waiting - can't parallelize waitingResult: 16 cores ≠ 16x speedup in real world🤖 Agentic Coding Reality: The New Bottlenecks1. Human Review (The New I/O)Code must be understood by humansSecurity validation requiredBusiness logic verificationCan't parallelize human cognition2. Production DeploymentSequential by natureOne deployment at a timeRollback requirementsCompliance checks3. Trust BuildingCan't parallelize reputationBad code = deleted customer dataRevenue impact risksTrust accumulates sequentially4. Context LimitsHuman cognitive bandwidthUnderstanding 100k+ lines of codeMental model limitationsCommunication overhead📊 The Numbers (Theoretical Speedups)1 agent: 1.0x (baseline)2 agents: ~1.3x speedup10 agents: ~1.8x speedup 100 agents: ~1.96x speedup∞ agents: ~2.0x speedup (theoretical maximum)🔑 Key TakeawaysAI Won't Fully Automate Coding JobsMore like enhanced assistants than replacementsHuman oversight remains criticalTrust and context are irreplaceableEfficiency Gains Are LimitedReal-world ceiling around 2x improvementNot the exponential gains often promisedSimilar to other parallelization effortsSuccess Factors for Agentic CodingWell-organized human-in-the-loop processesClear review and approval workflowsIncremental trust buildingRealistic expectations🔬 Research ReferencesPrinceton AI research on agent limitations"AI Agents That Matter" paper findingsEmpirical evidence of diminishing returnsReal-world case studies💡 Practical ImplicationsFor Developers:Focus on optimizing the human review processBuild better UI/UX for code reviewImplement incremental deployment strategiesFor Organizations:Set realistic productivity expectationsInvest in human-agent collaboration toolsDon't expect 10x improvements from more agentsFor the Industry:Paradigm shift from "replacement" to "augmentation"Need for new metrics beyond raw speedFocus on quality over quantity of agents🎬 Episode StructureHook: The soup cooking analogyTheory: Amdahl's Law explanationTraditional: Computing bottlenecksModern: Agentic coding bottlenecksReality Check: The 2x ceilingFuture: Optimizing within constraints🗣️ Quotable Moments"10 agents don't code 10 times faster, just like 10 cooks don't make soup 10 times faster""Humans are the new I/O bottleneck""You can't parallelize trust""The theoretical max is 2x faster - that's the reality check"🤔 Discussion QuestionsIs the 2x ceiling permanent or can we innovate around it?What's more valuable: speed or code quality?How do we optimize the human bottleneck?Will future AI models change these limitations?📝 Episode Tagline"When infinite AI agents hit the wall of human review, Amdahl's Law reminds us that some things just can't be parallelized - including trust, context, and the courage to deploy to production." 🔥 Hot Course Offers:🤖 Master GenAI Engineering - Build Production AI Systems🦀 Learn Professional Rust - Industry-Grade Development📊 AWS AI & Analytics - Scale Your ML in Cloud⚡ Production GenAI on AWS - Deploy at Enterprise Scale🛠️ Rust DevOps Mastery - Automate Everything🚀 Level Up Your Career:💼 Production ML Program - Complete MLOps & Cloud Mastery🎯 Start Learning Now - Fast-Track Your ML Career🏢 Trusted by Fortune 500 TeamsLearn end-to-end ML engineering from industry veterans at PAIML.COM
    Voir plus Voir moins
    4 min
  • Plastic Shamans of AGI
    May 21 2025
    The plastic shamans of OpenAI🔥 Hot Course Offers: - 🤖 Master GenAI Engineering - Build Production AI Systems - 🦀 Learn Professional Rust - Industry-Grade Development - 📊 AWS AI & Analytics - Scale Your ML in Cloud - ⚡ Production GenAI on AWS - Deploy at Enterprise Scale - 🛠️ Rust DevOps Mastery - Automate Everything 🚀 Level Up Your Career: - 💼 Production ML Program - Complete MLOps & Cloud Mastery - 🎯 Start Learning Now - Fast-Track Your ML Career - 🏢 Trusted by Fortune 500 Teams Learn end-to-end ML engineering from industry veterans at PAIML.COM
    Voir plus Voir moins
    11 min
Pas encore de commentaire