Tag

llm.

18 writings found

Latest Archives

Permission Hungry Agents and the Return to First Principles

ThoughtWorks Radar 34 reveals AI's paradox: we're racing forward while rediscovering software fundamentals, and our security models aren't ready.

The Permission Hungry Dilemma: When AI Agents Want Access to Everything

ThoughtWorks Radar 34 highlights a fundamental tension: the most useful AI agents need broad access, but our security guardrails haven't caught up yet.

Epic's AI NPCs in Fortnite: When Your Quest Giver Can Go Off-Script

Epic Games lets developers create AI-powered Fortnite characters with conversation capabilities. But there are some very specific rules about what they can't be.

When Benchmarks Break: A Laptop Model Drew Better Pelicans Than Claude Opus

A quantized 21GB model running locally outperformed Anthropic's flagship on SVG generation. What this tells us about AI benchmarks and model comparison.

When Benchmark Performance Stops Meaning What We Think It Means

A quantized local model outdraws Claude Opus 4.7 at pelicans on bicycles. What does that tell us about AI benchmarks? Probably nothing good.

Meta's Muse Spark: A Tooled-Up Return to Frontier Models

Meta launches Muse Spark with 16 built-in tools, visual grounding, and Code Interpreter. But where's the open source promise?

Meta's Muse Spark: A Developer's First Look at the Tool Arsenal

Meta returns to frontier models with Muse Spark. I got my hands dirty with its 16 tools, from visual grounding to Python sandboxes, and here's what matters.

Meta's Muse Spark: A Tool-Heavy Return to the Frontier Model Race

Meta drops Muse Spark with 16 tools, Code Interpreter, visual grounding, and Meta content search. But is a hosted-only model what we really wanted?

When Machines Write Code, Humans Must Learn to Judge

As LLMs generate more code, teams face cognitive surrender and debt proliferation. The future isn't about writing code, it's about verification.

Building macOS Apps Without Knowing Swift: What Vibe Coding Actually Teaches Us

I built two monitoring tools for my M5 MacBook using Claude and GPT without writing Swift myself. The results work, but should they?

LLMs Don't Actually Care About Your Tech Stack

Modern coding agents work surprisingly well with new tools and private codebases, challenging the assumption that they're biased toward mainstream tech.

Why Coding Agents Might Not Lock Us Into Boring Technology After All

Modern LLMs can learn new tools on the fly through documentation and examples. The feared training data bias might be less of an issue than we thought.

LLMs Don't Care About Your Tech Stack Anymore

Modern coding agents work surprisingly well with new and obscure tools. The fear that AI would lock us into boring, popular tech seems outdated.

The Map That Became the Territory: AI, Specifications, and What We Mean When We Say 'I Built This'

On AI agents, observability, bespoke software, and the uncomfortable question of who actually built what when LLMs generate our code.

Just-in-Time Tests: When AI Writes Your Tests Right Before Deployment

Meta's Catching JiTTests use LLMs to auto-generate tests on-demand, targeting regressions without maintenance overhead. A radical shift in testing philosophy.

View all writings →