Tag

artificial-intelligence.

49 writings found

Latest Archives

When Machines Write Code, Humans Must Learn to Judge

As LLMs generate more code, teams face cognitive surrender and debt proliferation. The future isn't about writing code, it's about verification.

Meta's KernelEvolve: When AI Writes Its Own Performance Code

Meta's KernelEvolve system uses AI agents to automatically optimize low-level hardware kernels, achieving 60% performance gains in hours instead of weeks.

When Agents Write Code, We Judge It: The Verification Economy

As LLMs generate code at scale, our job shifts from writing to verifying. What does this mean for how we organize teams and think about programming?

Making Team Standards Executable: Infrastructure for AI-Assisted Development

AI coding tools produce wildly different results based on who's prompting. Treating team standards as versioned, executable instructions solves the consistency problem.

Why Your AI Benchmark Is Probably Wrong: The N,K Trade-off

Google Research reveals why using 3-5 human raters per item isn't enough for reproducible AI evaluation. The depth vs breadth problem explained.

Meta's AI is Reshoring American Concrete, One Mix at a Time

How Bayesian optimization is helping U.S. concrete producers ditch imported cement and redesign mixes in days instead of months.

Code Review, Observability, and the Cognitive Cost of AI Amplification

Rethinking code review as product judgment, observability as our new IDE, and whether AI tools extend our capabilities or replace them entirely.

The Uncomfortable Ease of Profiling Users Through Their Public Comments

Building a tool to profile Hacker News users with LLMs reveals how much we leak through casual comments, and raises questions about digital footprints.

Google's Healthcare AI Push: From Screening Rooms to Source Code

Google Research unveils healthcare AI spanning breast cancer detection, agentic systems, and open-weight models. What it means for developers building in this space.

Can AI Actually Understand Physics? Google's Superconductivity Test Reveals Surprising Answers

Google tested six LLMs on expert-level physics questions. The results show which AI systems can handle real scientific research and which ones hallucinate.

Google's Flash Flood AI: Training on News Reports to Predict Urban Disasters

Google Research uses Gemini to extract flood data from news articles, creating an AI model that predicts flash floods 24 hours early across the Global South

Google's Flash Flood AI: Training Neural Networks on News Articles

Google Research uses Gemini to scrape news reports for flood data, training ML models that predict urban flash floods 24 hours ahead. Here's why that's wild.

LLMs Don't Actually Push You Toward Boring Technology

Coding agents work surprisingly well with new, undocumented tools. The 'training data bias' concern might be overstated in 2026.

LLMs Don't Actually Care About Your Tech Stack

Modern coding agents work surprisingly well with new tools and private codebases, challenging the assumption that they're biased toward mainstream tech.

Why Coding Agents Might Not Lock Us Into Boring Technology After All

Modern LLMs can learn new tools on the fly through documentation and examples. The feared training data bias might be less of an issue than we thought.

View all writings →