Algorithm First — Why Your AI-First Code Is Burning Tokens On Solved Problems

1 mai 2026·7 min de lectura·4 vistas

Algorithm First — Why Your AI-First Code Is Burning Tokens On Solved Problems

I've been watching something disturbing happen in our industry over the past 18 months.

Teams that were shipping solid, deterministic software two years ago are now wrapping everything in LLM calls. Every feature. Every decision. Every edge case. The pattern is always the same: "We'll just ask the AI."

And the bills are coming due.

Not just in dollars. In reliability. In latency. In the quiet erosion of trust when your "smart" system randomly decides to format a date differently on Tuesday than it did on Monday.

I've been calling this pattern アルゴリズム優先 (Arugorizumu Yuusen) — Algorithm First. It's become Principle #9 in my philosophy, and it's the single highest-leverage architectural decision most AI-assisted teams are not making.

Here's the hard truth: everything that CAN be algorithmic MUST be algorithmic. The LLM is for judgment. The algorithm is for determinism. If you're burning tokens on problems we solved thirty years ago, you're not building — you're burning.

The Token Burn Nobody's Tracking

Simon Willison recently reframed AI-assisted security research as proof of work: throw more tokens at the problem, get better results. That framing is honest, and it's terrifying when you apply it to production code.

A developer analyzed 3,177 API calls across four AI tools solving the same bug. The variance in token consumption was staggering — not because the tools were different, but because the problems they were solving overlapped with deterministic solutions that cost zero tokens.

Here's what that looks like in practice:

# The AI-First approach (I see this everywhere)
def process_order(order_data):
    response = claude.messages.create(
        model="claude-sonnet-4-2026",
        messages=[{
            "role": "user",
            "content": f"""Process this order:
            - Validate all required fields are present
            - Calculate total with 20% tax
            - Format the shipping address
            - Determine if order qualifies for free shipping (over $50)
            
            Order: {json.dumps(order_data)}"""
        }]
    )
    return parse_llm_response(response)

This is not an exaggeration. I've seen this exact pattern in production. Every time this runs, you're spending ~2,000 tokens on addition and string formatting. The LLM doesn't care. Your wallet does.

# The Algorithm First approach
def process_order(order_data):
    # Deterministic validation — zero tokens
    required_fields = ['customer_name', 'items', 'shipping_address']
    for field in required_fields:
        if field not in order_data:
            raise ValueError(f"Missing required field: {field}")
    
    # Deterministic calculation — zero tokens
    subtotal = sum(item['price'] * item['quantity'] for item in order_data['items'])
    tax = subtotal * 0.20
    total = subtotal + tax
    free_shipping = subtotal > 50.00
    
    # LLM for judgment — minimal tokens, high value
    address = order_data['shipping_address']
    formatted = claude.messages.create(
        model="claude-sonnet-4-2026",
        messages=[{
            "role": "user",
            "content": f"""Format this shipping address for a USPS label.
            Return ONLY the formatted address, no explanation.
            
            Raw address: {json.dumps(address)}"""
        }]
    ).content[0].text
    
    return {
        "total": total,
        "tax": tax,
        "formatted_address": formatted,
        "free_shipping": free_shipping
    }

The second version costs 95% fewer tokens and produces identical results for everything deterministic. The only thing the LLM touches is the address formatting — a genuinely fuzzy problem where judgment matters.

Three Benefits You Can't Ignore

1. Reliability — Same Input, Same Output

This is the one nobody talks about until it bites them.

Deterministic algorithms produce the same output for the same input. Every time. Forever. An LLM might format your address correctly 95% of the time on the first try, but that 5% is a ticking bomb. It's not random — it's path-dependent on context window, temperature, and the phase of the moon in the latent space.

I've debugged production incidents where an LLM suddenly decided to translate "123 Main St" into Spanish. Not because the prompt changed. Not because the model updated. Because the stars aligned in a particular way that made "123 Calle Principal" seem reasonable at that moment.

Algorithmic code doesn't have moods.

2. Efficiency — Your Token Budget Is Not Infinite

The tokenmaxxing trend that Forbes just covered — encouraging developers to spend as many tokens as possible — is the most dangerous idea in AI-assisted development right now.

It treats tokens as a free resource. They're not.

Every token you burn on a solved problem is a token you can't spend on genuine ambiguity. Every API call to format a date string is compute that could have gone toward understanding a novel edge case in your business logic.

The teams winning with AI are the ones who treat tokens like a precious metal. They hoard them for the moments when algorithmic certainty fails.

3. Offline Capability — Your System Should Work Without a Network

This is the one that stings.

I've watched teams build agents that can't function without a live connection to Claude or GPT. Their entire logic stack collapses when the API is down, the rate limit hits, or the network drops.

An Algorithm First architecture runs offline. The deterministic core never needs a network call. It's testable in a CI pipeline without mocking an LLM. It's deployable to edge environments with no GPU. It's debuggable with a print statement.

Your Ring 0 hooks — the foundational logic — should never depend on a Ring 2 agent.

The Architecture That Actually Works

Here's the pattern I've been using across production systems:

Ring 0 — The Algorithmic Core

All business logic that can be expressed as rules
Validation, calculation, transformation
Zero LLM calls
100% deterministic
Testable with unit tests

Ring 1 — The Judgment Layer

LLM calls for genuinely ambiguous problems
Address formatting, sentiment analysis, intent classification
Guarded by Ring 0 validation
Token-budgeted per call

Ring 2 — The Exploration Layer

Agents that search, iterate, and discover
Security research, code generation, complex reasoning
Token-budgeted per session
Human-in-the-loop for critical decisions

The mistake most teams make is putting everything in Ring 2. They build agents that burn 10,000 tokens on math that could be done in 1ms, because they never stopped to ask: "Is this actually a judgment problem?"

A Real Example: Security Research

The security research workflow Willison described is a perfect case study.

Running Claude Mythos on a binary isn't a Ring 0 problem. It's genuine exploration — finding vulnerabilities requires creativity and iteration. That's Ring 2 territory. The token burn is justified because the output is novel.

But here's what I see teams doing wrong: they run the same Mythos workflow on every build, burning millions of tokens on regression testing that could be handled by deterministic fuzzing and static analysis.

The Algorithm First approach:

Ring 0: Static analysis, deterministic fuzzing, unit tests — zero tokens
Ring 1: Classification of static analysis results — minimal tokens
Ring 2: Deep exploration of high-risk areas — token budget allocated by risk score

The team that implements this spends 80% fewer tokens on security and finds more novel vulnerabilities, because they're not burning their budget on solved problems.

The Insight Nobody Wants to Hear

Here's what I've learned the hard way, after refactoring too many AI-first codebases that were burning money on nothing:

The LLM is not a replacement for thinking. It's a replacement for guessing.

When you understand your problem well enough to write an algorithm, you should write the algorithm. The LLM is for the edges — the cases where your understanding breaks down, where the rules are fuzzy, where judgment matters more than precision.

The teams that will win the next five years are not the ones who use AI the most. They're the ones who use AI only where it matters.

Algorithm First isn't anti-AI. It's pro-intelligence. It's about reserving your most expensive compute — human and machine — for the problems that genuinely need it.

The Ermite Shinkofa

Jay "The Ermite"

Coach Holístico & Consultor — Creador Shinkofa

Coach y consultor especializado en acompañamiento neurodivergente (Altas Capacidades, hipersensibles, multipotenciales). 21 años de emprendimiento, 12 años de coaching. Con base en España.

Saber más →

Algorithm First — Why Your AI-First Code Is Burning Tokens On Solved Problems

The Token Burn Nobody's Tracking

Three Benefits You Can't Ignore

1. Reliability — Same Input, Same Output

2. Efficiency — Your Token Budget Is Not Infinite

3. Offline Capability — Your System Should Work Without a Network

The Architecture That Actually Works

A Real Example: Security Research

The Insight Nobody Wants to Hear

No te pierdas ningún artículo