Tag

artificial-intelligence.

72 writings found

Page 2

Claude's System Prompt Evolution: What Opus 4.7 Tells Us About AI Behavior Design

Anthropic's latest system prompt reveals a shift toward proactive AI behavior. I dig into what these changes mean for developers building with Claude.

PyCon US 2026: Why This Matters for Python and AI Engineering

PyCon returns to California with new AI and security tracks. What this shift means for the Python community and the future of technical conferences.

When Benchmarks Break: A Laptop Model Drew Better Pelicans Than Claude Opus

A quantized 21GB model running locally outperformed Anthropic's flagship on SVG generation. What this tells us about AI benchmarks and model comparison.

When Benchmark Performance Stops Meaning What We Think It Means

A quantized local model outdraws Claude Opus 4.7 at pelicans on bicycles. What does that tell us about AI benchmarks? Probably nothing good.

The Virtue of Laziness: Why AI Threatens What Makes Us Good Engineers

LLMs lack the programmer's essential virtue of laziness. Without constraints, they generate complexity instead of elegant abstractions.

Meta's Muse Spark: A Tooled-Up Return to Frontier Models

Meta launches Muse Spark with 16 built-in tools, visual grounding, and Code Interpreter. But where's the open source promise?

Meta's Muse Spark: They're Back in the Frontier Game (And the Tools Are Wild)

Meta drops Muse Spark with 16 powerful tools including visual grounding, Python sandbox, and Meta content search. Are they back in the race?

AI-Assisted Development: The Taste Problem

Why coding with AI agents works brilliantly for implementation but falls apart for API design. Lessons from building real systems with Claude.

When Machines Write Code, Humans Must Learn to Judge

As LLMs generate more code, teams face cognitive surrender and debt proliferation. The future isn't about writing code, it's about verification.

Meta's KernelEvolve: When AI Writes Its Own Performance Code

Meta's KernelEvolve system uses AI agents to automatically optimize low-level hardware kernels, achieving 60% performance gains in hours instead of weeks.

When Agents Write Code, We Judge It: The Verification Economy

As LLMs generate code at scale, our job shifts from writing to verifying. What does this mean for how we organize teams and think about programming?

Making Team Standards Executable: Infrastructure for AI-Assisted Development

AI coding tools produce wildly different results based on who's prompting. Treating team standards as versioned, executable instructions solves the consistency problem.

Why Your AI Benchmark Is Probably Wrong: The N,K Trade-off

Google Research reveals why using 3-5 human raters per item isn't enough for reproducible AI evaluation. The depth vs breadth problem explained.

Meta's AI is Reshoring American Concrete, One Mix at a Time

How Bayesian optimization is helping U.S. concrete producers ditch imported cement and redesign mixes in days instead of months.

Code Review, Observability, and the Cognitive Cost of AI Amplification

Rethinking code review as product judgment, observability as our new IDE, and whether AI tools extend our capabilities or replace them entirely.

View all writings →