<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Tom Seidel – Articles</title>
  <subtitle>Freelance Java consultant with 20+ years of experience in cloud-native architectures, microservices, DevOps, and AI-powered agentic systems.</subtitle>
  <link href="https://remus-software.org/feed.xml" rel="self" type="application/atom+xml"/>
  <link href="https://remus-software.org/" rel="alternate" type="text/html"/>
  <id>https://remus-software.org/</id>
  <author>
    <name>Tom Seidel</name>
    <email>tom.seidel@remus-software.org</email>
  </author>
  <updated>2026-06-13T00:00:00.000Z</updated>
  <entry>
    <title>From Qwen3-32B to Qwen3.6-35B-A3B: Upgrading a Local Inference Stack on 2× RTX 5060 Ti</title>
    <link href="https://remus-software.org/articles/qwen3-to-qwen36-upgrade/" rel="alternate" type="text/html"/>
    <id>https://remus-software.org/articles/qwen3-to-qwen36-upgrade/</id>
    <published>2026-06-13T00:00:00.000Z</published>
    <updated>2026-06-13T00:00:00.000Z</updated>
    <summary>A dense-to-sparse model swap that nearly doubled token throughput, improved output quality, and taught us a few things about vLLM with MoE and Mamba architectures.</summary>
    <content type="html">&amp;lt;h1 id=&amp;quot;from-qwen3-32b-to-qwen3.6-35b-a3b%3A-upgrading-a-local-inference-stack-on-2%C3%97-rtx-5060-ti&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#from-qwen3-32b-to-qwen3.6-35b-a3b%3A-upgrading-a-local-inference-stack-on-2%C3%97-rtx-5060-ti&amp;quot;&amp;gt;From Qwen3-32B to Qwen3.6-35B-A3B: Upgrading a Local Inference Stack on 2× RTX 5060 Ti&amp;lt;/a&amp;gt;&amp;lt;/h1&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;em&amp;gt;A dense-to-sparse model swap that nearly doubled our token throughput, made the outputs noticeably better, and taught us a few things about vLLM along the way.&amp;lt;/em&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;the-setup&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-setup&amp;quot;&amp;gt;The Setup&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;We run a local inference server with two NVIDIA RTX 5060 Ti GPUs — 16 GB VRAM each, Blackwell architecture, connected over PCIe (no NVLink, unfortunately). The whole stack is managed through Ansible playbooks so nothing is done manually and everything is reproducible. vLLM handles inference, Ollama sits on standby for lighter stuff.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Before diving into the models, here’s the VRAM math that governs everything we do on this hardware:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code&amp;gt;=== Model Weight VRAM ===

Formula:  params (billions) × bytes_per_param = weight VRAM

fp16 (no quantization):  2 bytes/param
  → 32B model:  32 × 2 = 64 GB  ❌ way too big
  → 35B model:  35 × 2 = 70 GB  ❌ way too big

INT4 quantization (AWQ/GPTQ):  0.5 bytes/param (4 bits = 0.5 bytes)
  → 32B model:  32 × 0.5 = 16 GB  ✓ fits
  → 35B model:  35 × 0.5 = 17.5 GB  ✓ fits

INT8 quantization:  1 byte/param
  → 32B model:  32 × 1 = 32 GB  ⚠️ tight (no room for KV cache)
  → 35B model:  35 × 1 = 35 GB  ❌ doesn&amp;#039;t fit

FP8 quantization:  1 byte/param
  → 32B model:  32 × 1 = 32 GB  ⚠️ same as INT8
  → 35B model:  35 × 1 = 35 GB  ❌ doesn&amp;#039;t fit


=== KV Cache VRAM (per active sequence) ===

Formula:  context_tokens × bytes_per_token

The KV cache stores attention keys + values for every token in the context.
Size depends on model architecture (num_layers, num_heads, head_dim) and
the KV dtype:

  fp16 KV cache:  ~0.25 MiB per token (model-dependent)
  fp8 KV cache:   ~0.125 MiB per token (half of fp16)

  For our 35B model at 65K context:
    fp16:  65,536 × 0.25 MiB ≈ 16 GB  ❌ eats half our VRAM
    fp8:   65,536 × 0.125 MiB ≈ 8 GB  ✓ manageable
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;With AWQ 4-bit weights (~18-20 GB) + fp8 KV cache (~8 GB for a full 65K sequence) + overhead (~2 GB), we land at roughly 28-30 GB — just under our 32 GB ceiling. That’s the budget we’re working with.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Our workhorse for months was &amp;lt;strong&amp;gt;Qwen3-32B-AWQ&amp;lt;/strong&amp;gt; — a dense 32-billion-parameter model quantized to 4-bit. It did code reviews, general chat, and agent tool-calling just fine inside a 65K-token context window. Reliable, predictable, no complaints.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Then we switched to &amp;lt;strong&amp;gt;Qwen3.6-35B-A3B-AWQ-4bit&amp;lt;/strong&amp;gt;. On paper the numbers look similar (35B vs 32B), but under the hood it’s a completely different beast. The upgrade turned out to be worth it — but it wasn’t exactly a drop-in replacement.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;what-changed-under-the-hood&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#what-changed-under-the-hood&amp;quot;&amp;gt;What Changed Under the Hood&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;h3 id=&amp;quot;dense-vs.-mixture-of-experts&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#dense-vs.-mixture-of-experts&amp;quot;&amp;gt;Dense vs. Mixture of Experts&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Here’s the big one. Qwen3-32B is a &amp;lt;strong&amp;gt;dense&amp;lt;/strong&amp;gt; model — all 32 billion parameters fire for every single token. Every forward pass uses the entire brain, so to speak.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Qwen3.6-35B-A3B is a &amp;lt;strong&amp;gt;Mixture of Experts&amp;lt;/strong&amp;gt; (MoE) model. It has 35 billion parameters in total, but only about &amp;lt;strong&amp;gt;3 billion are active&amp;lt;/strong&amp;gt; per token. The model has a bunch of “expert” sub-networks and a router that picks which experts handle each input. Picture a company with 35 specialists where only 3 get pulled into any given meeting.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The upshot: way less compute per token, even though the model weights take up roughly the same VRAM. That’s where the speed bump comes from.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;hybrid-mamba-transformer-architecture&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#hybrid-mamba-transformer-architecture&amp;quot;&amp;gt;Hybrid Mamba-Transformer Architecture&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Qwen3-32B is a pure Transformer — standard attention, quadratic cost, you know the drill.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Qwen3.6 mixes things up with a &amp;lt;strong&amp;gt;hybrid architecture&amp;lt;/strong&amp;gt; that alternates between Transformer layers and &amp;lt;strong&amp;gt;Mamba&amp;lt;/strong&amp;gt; (State Space Model) layers. Mamba processes sequences with linear complexity instead of quadratic attention, which is a big deal for long contexts. You get precise token-to-token attention from the Transformer layers where it matters, and efficient sequential processing from Mamba everywhere else.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;This hybrid design is also the reason we had to change a bunch of vLLM settings — more on that below.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;the-vllm-config-changes-(and-why-they-mattered)&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-vllm-config-changes-(and-why-they-mattered)&amp;quot;&amp;gt;The vLLM Config Changes (and Why They Mattered)&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;The architecture shift meant several vLLM parameters needed adjusting. Some were obvious, some were not. Here’s the rundown.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;1.-tool-call-parser%3A-hermes-%E2%86%92-qwen3_coder&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#1.-tool-call-parser%3A-hermes-%E2%86%92-qwen3_coder&amp;quot;&amp;gt;1. Tool Call Parser: &amp;lt;code&amp;gt;hermes&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt;qwen3_coder&amp;lt;/code&amp;gt;&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;What this is:&amp;lt;/strong&amp;gt; vLLM needs a parser to translate the model’s raw text output into structured tool calls (function names + JSON arguments) that match the OpenAI API format. Different models format their tool calls differently, so the parser has to match.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;What tripped us up:&amp;lt;/strong&amp;gt; Qwen3.5 and 3.6 completely changed their tool-calling format. The &amp;lt;code&amp;gt;hermes&amp;lt;/code&amp;gt; parser that worked perfectly with Qwen3-32B just doesn’t understand the new format. The correct one is &amp;lt;code&amp;gt;qwen3_coder&amp;lt;/code&amp;gt; — yes, confusingly named, but it’s the right parser for &amp;lt;em&amp;gt;all&amp;lt;/em&amp;gt; Qwen3.5/3.6 models, not just the ones with “Coder” in the name.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;What happens if you get this wrong:&amp;lt;/strong&amp;gt; This is the nasty part. Tool calls don’t break outright — they just get parsed wrong. The model outputs structured calls in the new format, the &amp;lt;code&amp;gt;hermes&amp;lt;/code&amp;gt; parser mangles them, and tool results get injected back in a format the model doesn’t recognize. The result looks like the model suddenly got stupid. It didn’t — it’s a format mismatch. We spent a while scratching our heads before figuring this one out.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;2.-max_num_batched_tokens%3A-new-parameter%2C-set-to-4096&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#2.-max_num_batched_tokens%3A-new-parameter%2C-set-to-4096&amp;quot;&amp;gt;2. &amp;lt;code&amp;gt;max_num_batched_tokens&amp;lt;/code&amp;gt;: New Parameter, Set to 4096&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;What this is:&amp;lt;/strong&amp;gt; Controls the maximum number of tokens vLLM processes in a single prefill batch — basically how much work it tries to chew on at once when processing input tokens.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Why we needed it:&amp;lt;/strong&amp;gt; The Mamba layers in Qwen3.6 require a memory alignment block size of &amp;lt;strong&amp;gt;2096 tokens&amp;lt;/strong&amp;gt;. vLLM’s default is 2048. That’s less than 2096. You can probably guess what happens:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code&amp;gt;AssertionError: In Mamba cache align mode, block_size (2096)
must be &amp;amp;lt;= max_num_batched_tokens (2048)
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;Server crashes on startup. Not a tuning issue — a hard compatibility floor. Setting it to 4096 gives comfortable headroom and solved it immediately. This is a must-have for any Mamba-based model.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;3.-rope-scaling%3A-just-removed-it&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#3.-rope-scaling%3A-just-removed-it&amp;quot;&amp;gt;3. RoPE Scaling: Just Removed It&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;What this is:&amp;lt;/strong&amp;gt; RoPE (Rotary Position Embedding) scaling tricks like YaRN let a model handle context windows longer than what it was trained for. Qwen3-32B only natively supports 32K tokens, so we needed YaRN to stretch it to 65K.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;What changed:&amp;lt;/strong&amp;gt; Qwen3.6 natively supports up to &amp;lt;strong&amp;gt;262,144 tokens&amp;lt;/strong&amp;gt;. Our 65K fits easily. So we just… removed the scaling. Done.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Bonus:&amp;lt;/strong&amp;gt; Without RoPE extrapolation, the model operates inside its trained context range. No more positional confusion in long documents, no artifacts from stretching embeddings beyond their design. This alone improved quality on long-context tasks.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;4.-max_num_seqs%3A-32-%E2%86%92-8&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#4.-max_num_seqs%3A-32-%E2%86%92-8&amp;quot;&amp;gt;4. &amp;lt;code&amp;gt;max_num_seqs&amp;lt;/code&amp;gt;: 32 → 8&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;What this is:&amp;lt;/strong&amp;gt; How many concurrent requests the engine keeps alive in GPU memory at once. Each active request needs its own KV cache — the memory structure that stores all the previous tokens’ attention data.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Why we had to lower it:&amp;lt;/strong&amp;gt; Here’s the VRAM budget math:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code&amp;gt;Total VRAM (2 GPUs):                          32 GB
- Model weights (AWQ 4-bit, TP=2):           ~20 GB
- Overhead (activations, CUDA, etc.):         ~2 GB
----------------------------------------------
Available for KV cache:                       ~10 GB

KV cache per full-length sequence (65K tokens, fp8):
  65,536 tokens × ~0.125 MiB/token ≈ 8 GB

Max concurrent full-length sequences:
  10 GB available ÷ 8 GB per sequence ≈ 1.25
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;So with full 65K contexts, you can barely fit one sequence. But most requests are way shorter than 65K — a typical agent conversation might use 10-20K tokens, which needs only 1-2 GB of KV cache. That’s why &amp;lt;code&amp;gt;max_num_seqs: 8&amp;lt;/code&amp;gt; works: it assumes a realistic mix of context lengths, not everyone maxing out the window at once.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The old value of 32 would have risked OOM errors. For our agent workload (typically 1-3 concurrent requests), 8 is more than enough. If you’re running a high-traffic service though, this could be a real constraint.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;what-didn%E2%80%99t-change&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#what-didn%E2%80%99t-change&amp;quot;&amp;gt;What Didn’t Change&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;A few settings carried over and are still doing heavy lifting:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;fp8 KV cache&amp;lt;/strong&amp;gt;: Cuts KV memory in half compared to fp16. The single most important setting for squeezing long contexts into 32 GB.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;AWQ 4-bit with Marlin kernel&amp;lt;/strong&amp;gt;: Model weights at 4-bit precision, ~20 GB instead of ~70 GB. The Marlin kernel is optimized for our Blackwell GPUs.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Tensor Parallelism (TP=2)&amp;lt;/strong&amp;gt;: Each layer is split across both GPUs. Better than pipeline parallelism for our low-concurrency agent workload.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Triton attention backend&amp;lt;/strong&amp;gt;: Flash Attention is broken on sm_120 consumer GPUs. Triton works fine with the hybrid architecture.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Prefix caching&amp;lt;/strong&amp;gt;: Reuses KV cache for repeated prompt prefixes (system prompts, tool definitions). Free 50-80% improvement in time-to-first-token for agent workloads.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;so%2C-did-it-actually-get-better%3F&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#so%2C-did-it-actually-get-better%3F&amp;quot;&amp;gt;So, Did It Actually Get Better?&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Short answer: yes, noticeably.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;token-throughput-nearly-doubled&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#token-throughput-nearly-doubled&amp;quot;&amp;gt;Token Throughput Nearly Doubled&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;The MoE architecture is doing the heavy lifting here. Only 3B parameters active per token means way less compute per generated token compared to the dense 32B. The Mamba layers add more efficiency on the prefill side since they dodge the quadratic cost of full attention.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;In practice: responses come back roughly twice as fast. For interactive use and multi-step agent loops, that’s a very welcome improvement.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;output-quality-stepped-up&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#output-quality-stepped-up&amp;quot;&amp;gt;Output Quality Stepped Up&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Code reviews&amp;lt;/strong&amp;gt; worked fine with both models — the old 32B was already good at spotting bugs, suggesting improvements, explaining logic. The new model keeps that bar.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Minor code refactorings&amp;lt;/strong&amp;gt; are where Qwen3.6 pulls ahead. Renaming variables for clarity, extracting helper functions, simplifying conditionals, restructuring imports — these need a nuanced feel for code intent and scope. Qwen3-32B would often over-engineer or miss the point. Qwen3.6 handles these cleanly and stays focused.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Tool-calling&amp;lt;/strong&amp;gt; is more accurate and consistent now that the right parser is in place. Fewer malformed JSON arguments, cleaner structured output.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h3 id=&amp;quot;no-more-context-weirdness&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#no-more-context-weirdness&amp;quot;&amp;gt;No More Context Weirdness&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;With Qwen3-32B at 65K, the YaRN scaling would sometimes cause positional confusion — the model would lose track of where information appeared in long documents, especially past the 32K boundary. Qwen3.6, running within its native context range, doesn’t have this problem. References to earlier parts of long conversations are just more reliable.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;what-we-learned&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#what-we-learned&amp;quot;&amp;gt;What We Learned&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;ol&amp;gt;
&amp;lt;li&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Architecture changes aren’t drop-in.&amp;lt;/strong&amp;gt; MoE and Mamba need different vLLM settings than dense Transformers. The &amp;lt;code&amp;gt;max_num_batched_tokens&amp;lt;/code&amp;gt; crash wasn’t well-documented — we had to dig into vLLM source to figure it out.&amp;lt;/p&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Tool call parsers are model-specific.&amp;lt;/strong&amp;gt; Always check this when upgrading between Qwen generations. A mismatch doesn’t throw an error — it just makes the model look dumb, which sends you debugging in the wrong direction for a while.&amp;lt;/p&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;MoE punches way above its weight.&amp;lt;/strong&amp;gt; 35B total params, 3B active, quality comparable to dense models many times its active size. For constrained hardware, MoE is the move.&amp;lt;/p&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Native context beats stretched context.&amp;lt;/strong&amp;gt; Running within the model’s native range beats extrapolation every time. When picking models, check the native context window first.&amp;lt;/p&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Ansible-managed infra saves your bacon during upgrades.&amp;lt;/strong&amp;gt; Every config change — parser swap, new parameter, removed scaling — went through playbooks. The whole migration is reproducible, reversible, and self-documenting. Doing this manually on the server would have made the Mamba crash way harder to debug and rollbacks way riskier.&amp;lt;/p&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;quick-comparison&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#quick-comparison&amp;quot;&amp;gt;Quick Comparison&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Qwen3-32B-AWQ (Before)&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Qwen3.6-35B-A3B-AWQ-4bit (After)&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Architecture&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Dense Transformer&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Hybrid Mamba-Transformer (MoE)&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Active Params&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;32B (all of them)&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;~3B (of 35B total)&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Native Context&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;32K&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;262K&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;RoPE Scaling&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;YaRN (to reach 65K)&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;None needed&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Tool Call Parser&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;hermes&amp;lt;/code&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;qwen3_coder&amp;lt;/code&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;max_num_batched_tokens&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;2048 (default)&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;4096 (must be ≥ 2096)&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;max_num_seqs&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;32&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;8&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;KV Cache&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;fp8&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;fp8&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Quantization&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;AWQ (awq_marlin)&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;AWQ (awq_marlin)&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Tensor Parallelism&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;2&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Attention Backend&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;TRITON_ATTN&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;TRITON_ATTN&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;em&amp;gt;Hardware: 2× NVIDIA RTX 5060 Ti 16 GB (Blackwell, PCIe, no NVLink). Inference: vLLM. Infrastructure: Ansible.&amp;lt;/em&amp;gt;&amp;lt;/p&amp;gt;
</content>
    <author>
      <name>Tom Seidel</name>
    </author>
    <category term="AI"/>
    <category term="LLM"/>
    <category term="vLLM"/>
    <category term="Self-Hosting"/>
    <category term="Infrastructure"/>
  </entry>
  <entry>
    <title>I Built My Own AI Server — Here&amp;#039;s What I Learned About Running Frontier Models at Home</title>
    <link href="https://remus-software.org/articles/i-built-my-own-ai-server/" rel="alternate" type="text/html"/>
    <id>https://remus-software.org/articles/i-built-my-own-ai-server/</id>
    <published>2026-06-05T00:00:00.000Z</published>
    <updated>2026-06-05T00:00:00.000Z</updated>
    <summary>A hands-on journey of building a local AI server with dual RTX 5060 Ti GPUs to run 32-billion-parameter models at home — covering hardware choices, software stack (Ollama, vLLM, Open WebUI), Blackwell compatibility headaches, and how local open-source models compare to frontier APIs.</summary>
    <content type="html">&amp;lt;h1 id=&amp;quot;i-built-my-own-ai-server-%E2%80%94-here%E2%80%99s-what-i-learned-about-running-frontier-models-at-home&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#i-built-my-own-ai-server-%E2%80%94-here%E2%80%99s-what-i-learned-about-running-frontier-models-at-home&amp;quot;&amp;gt;I Built My Own AI Server — Here’s What I Learned About Running Frontier Models at Home&amp;lt;/a&amp;gt;&amp;lt;/h1&amp;gt;
&amp;lt;p&amp;gt;We’ve all been there: you send a prompt to Claude or GPT-5, wait a few seconds, and wonder — &amp;lt;em&amp;gt;could I do this myself?&amp;lt;/em&amp;gt; Not out of arrogance, but curiosity. What does it actually take to run a 32-billion-parameter language model on your own hardware, in your own home, without sending a single byte to the cloud?&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;I set out to find out. I bought the hardware, installed the software, fought the bugs, and benchmarked the results. What I found was a landscape that is simultaneously more accessible and more chaotic than I expected. The tools exist. The models are free. The hardware is affordable. But putting it all together? That’s still an adventure — one I want to share with you.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;why-bother-with-a-local-ai-server%3F&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#why-bother-with-a-local-ai-server%3F&amp;quot;&amp;gt;Why Bother With a Local AI Server?&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;The case for running AI locally is compelling on paper:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Privacy&amp;lt;/strong&amp;gt;: Your data never leaves your network.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Cost&amp;lt;/strong&amp;gt;: No per-token API fees that scale unpredictably.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Control&amp;lt;/strong&amp;gt;: You choose the model, the configuration, and the rules.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Learning&amp;lt;/strong&amp;gt;: There’s no better way to understand these systems than running them yourself.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;But there’s also an honest reason: I wanted to see if a 32-billion-parameter open-source model could actually hold its own against the frontier models from Anthropic and OpenAI. The answer, as you’ll see, is nuanced.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;the-hardware%3A-keeping-it-realistic&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-hardware%3A-keeping-it-realistic&amp;quot;&amp;gt;The Hardware: Keeping It Realistic&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;I didn’t want to go all-in financially. The goal was to build something that could &amp;lt;em&amp;gt;seriously&amp;lt;/em&amp;gt; run AI agents — not a toy, but not a $20,000 GPU workstation either. Here’s what I landed on:&amp;lt;/p&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;Component&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Choice&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Why&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;GPU&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;2× NVIDIA RTX 5060 Ti (16 GB each)&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;32 GB total VRAM — enough for a 32B model&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;CPU&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;AMD Ryzen 9 9950X (16 cores)&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Fast enough for data preprocessing and orchestration&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;RAM&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;64 GB DDR5&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Buffer for model weights and context overflow&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Storage&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;2 TB NVMe SSD&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Fast model loading and KV cache spill&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;img src=&amp;quot;hardware_setup.jpg&amp;quot; alt=&amp;quot;Hardware setup — the completed build&amp;quot;&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The RTX 5060 Ti is interesting. It’s NVIDIA’s newest &amp;lt;strong&amp;gt;Blackwell&amp;lt;/strong&amp;gt; architecture (compute capability SM 12.0) — powerful, but so new that much of the AI software ecosystem hadn’t caught up yet. That decision alone would cause me weeks of headaches, as we’ll see.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The total build came in at a fraction of what a cloud GPU instance would cost over a year. And as a bonus? I could install Windows on the side and play games when I wasn’t experimenting with AI. :)&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;the-software-stack%3A-three-engines%2C-one-server&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-software-stack%3A-three-engines%2C-one-server&amp;quot;&amp;gt;The Software Stack: Three Engines, One Server&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;The open-source LLM ecosystem has consolidated around a few key tools. I chose a stack of three:&amp;lt;/p&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;flowchart TD
    User[&amp;quot;You (browser, CLI, or agent)&amp;quot;]
    WebUI[&amp;quot;Open WebUI (:8080)&amp;lt;br/&amp;gt;Chat interface&amp;quot;]
    Ollama[&amp;quot;Ollama (:11434)&amp;lt;br/&amp;gt;Simple, friendly model serving&amp;quot;]
    vLLM[&amp;quot;vLLM (:8001)&amp;lt;br/&amp;gt;High-performance inference&amp;quot;]
    GPU[&amp;quot;2× RTX 5060 Ti&amp;lt;br/&amp;gt;32 GB VRAM&amp;quot;]

    User --&amp;gt; WebUI
    WebUI --&amp;gt; Ollama
    WebUI --&amp;gt; vLLM
    Ollama --&amp;gt; GPU
    vLLM --&amp;gt; GPU
&amp;lt;/div&amp;gt;&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;&amp;lt;a href=&amp;quot;https://ollama.com/&amp;quot;&amp;gt;Ollama&amp;lt;/a&amp;gt;&amp;lt;/strong&amp;gt;: The easy option. Download a model, run it, chat with it. Great for experimentation.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;&amp;lt;a href=&amp;quot;https://docs.vllm.ai/&amp;quot;&amp;gt;vLLM&amp;lt;/a&amp;gt;&amp;lt;/strong&amp;gt;: The serious option. Built for throughput — parallel requests, efficient memory management, and production-grade serving.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;&amp;lt;a href=&amp;quot;https://openwebui.com/&amp;quot;&amp;gt;Open WebUI&amp;lt;/a&amp;gt;&amp;lt;/strong&amp;gt;: A beautiful chat interface that connects to either engine.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;A critical design decision: &amp;lt;strong&amp;gt;only one inference engine runs at a time.&amp;lt;/strong&amp;gt; Both Ollama and vLLM need the full 32 GB of VRAM to serve a 32-billion-parameter model. Running them simultaneously would cause one to crash with an out-of-memory error. I automated the switching with a single configuration toggle:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-yaml&amp;quot;&amp;gt;# One variable controls everything
inference_engine: &amp;amp;quot;vllm&amp;amp;quot;   # or &amp;amp;quot;ollama&amp;amp;quot;
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;When you flip this value, Ansible (my automation tool of choice) stops one engine, starts the other, and reconfigures the web interface to point at the right one. Clean and simple.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;the-blackwall%3A-when-new-hardware-meets-immature-software&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-blackwall%3A-when-new-hardware-meets-immature-software&amp;quot;&amp;gt;The Blackwall: When New Hardware Meets Immature Software&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Here’s where the story gets interesting. The RTX 5060 Ti uses NVIDIA’s Blackwell architecture, internally known as &amp;lt;strong&amp;gt;SM 12.0&amp;lt;/strong&amp;gt;. It’s powerful hardware, but vLLM — the inference engine I needed for serious work — had barely been tested on it.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The first time I tried to start vLLM, it crashed immediately. The error messages were cryptic:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code&amp;gt;undefined symbol: check_cuda_arch
RuntimeError: FlashInfer requires GPUs with sm75 or higher
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;This was baffling. SM 12.0 &amp;lt;em&amp;gt;is&amp;lt;/em&amp;gt; higher than SM 7.5. The problem? The FlashInfer library — vLLM’s default attention backend — had a JIT compiler that simply didn’t know how to handle Blackwell GPUs yet. It saw an architecture number it didn’t recognize and panicked.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;This wasn’t just one bug. It was a cascade of three independent problems, each hiding behind the other:&amp;lt;/p&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;flowchart TD
    A[&amp;quot;vLLM won&amp;#039;t start&amp;quot;] --&amp;gt; B[&amp;quot;Bug 1: Broken systemd unit&amp;lt;br/&amp;gt;(typo in config file)&amp;quot;]
    A --&amp;gt; C[&amp;quot;Bug 2: FLASH_ATTN backend&amp;lt;br/&amp;gt;incompatible with SM 12.0&amp;quot;]
    A --&amp;gt; D[&amp;quot;Bug 3: FlashInfer sampler&amp;lt;br/&amp;gt;also incompatible with SM 12.0&amp;quot;]
    B --&amp;gt; E[&amp;quot;Fix: Remove stray characters&amp;lt;br/&amp;gt;from RestartSec value&amp;quot;]
    C --&amp;gt; F[&amp;quot;Fix: Switch to TRITON_ATTN&amp;lt;br/&amp;gt;(a different attention backend)&amp;quot;]
    D --&amp;gt; G[&amp;quot;Fix: Uninstall flashinfer-python&amp;lt;br/&amp;gt;entirely from the environment&amp;quot;]
    E --&amp;gt; H[&amp;quot;vLLM boots! ...but slowly&amp;quot;]
    F --&amp;gt; H
    G --&amp;gt; H
&amp;lt;/div&amp;gt;&amp;lt;p&amp;gt;The fix for the attention backend was to switch from &amp;lt;code&amp;gt;FLASH_ATTN&amp;lt;/code&amp;gt; to &amp;lt;strong&amp;gt;&amp;lt;code&amp;gt;TRITON_ATTN&amp;lt;/code&amp;gt;&amp;lt;/strong&amp;gt; — a different implementation that happened to work on Blackwell. But FlashInfer kept resurfacing in unexpected places: not just for attention, but also for the &amp;lt;em&amp;gt;sampler&amp;lt;/em&amp;gt; (the component that picks which word comes next). Eventually, the only reliable solution was to completely remove the FlashInfer package from the Python environment, forcing vLLM to use its fallback paths everywhere.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;This took days of debugging, reading GitHub issues, and testing configurations. It’s a reminder that in fast-moving open-source ecosystems, being on the cutting edge of hardware means you &amp;lt;em&amp;gt;are&amp;lt;/em&amp;gt; the QA team.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;the-vram-puzzle%3A-fitting-a-32-billion-parameter-model-into-32-gb&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-vram-puzzle%3A-fitting-a-32-billion-parameter-model-into-32-gb&amp;quot;&amp;gt;The VRAM Puzzle: Fitting a 32-Billion-Parameter Model Into 32 GB&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;A 32-billion-parameter model like &amp;lt;strong&amp;gt;Qwen3-32B&amp;lt;/strong&amp;gt; is a large piece of software. In its compressed AWQ (4-bit quantized) form, the model weights take up about &amp;lt;strong&amp;gt;19 GB&amp;lt;/strong&amp;gt; of VRAM. That leaves roughly 13 GB for the &amp;lt;strong&amp;gt;KV cache&amp;lt;/strong&amp;gt; — the working memory the model uses to track the conversation context.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Here’s the math that kept me up at night:&amp;lt;/p&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;Context Length&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;KV Cache (fp16)&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;KV Cache (fp8)&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Fits?&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;32K tokens&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;~8 GB&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;~4 GB&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;✅ (with fp8)&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;64K tokens&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;~16 GB&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;~8 GB&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;✅ (with fp8, tight)&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;128K tokens&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;~32 GB&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;~16 GB&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;❌&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;p&amp;gt;At standard precision (fp16), even 32K tokens of context wouldn’t fit alongside the model weights. The breakthrough was switching the KV cache to &amp;lt;strong&amp;gt;fp8&amp;lt;/strong&amp;gt; — a half-precision floating point format that Blackwell supports natively. This halved the memory requirement with virtually no quality loss, letting me serve conversations up to 64K tokens long.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;splitting-across-two-gpus&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#splitting-across-two-gpus&amp;quot;&amp;gt;Splitting Across Two GPUs&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Since the 19 GB model doesn’t fit on a single 16 GB card, it must span both GPUs. There are two ways to do this:&amp;lt;/p&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;flowchart LR
    subgraph &amp;quot;Pipeline Parallelism (PP=2)&amp;quot;
        direction LR
        P1[&amp;quot;GPU 0: Layers 1-30&amp;lt;br/&amp;gt;compute → send → wait&amp;quot;]
        P2[&amp;quot;GPU 1: Layers 31-60&amp;lt;br/&amp;gt;wait → compute → send&amp;quot;]
    end

    subgraph &amp;quot;Tensor Parallelism (TP=2)&amp;quot;
        direction LR
        T1[&amp;quot;GPU 0: Half of every layer&amp;lt;br/&amp;gt;compute → sync → compute&amp;quot;]
        T2[&amp;quot;GPU 1: Other half of every layer&amp;lt;br/&amp;gt;compute → sync → compute&amp;quot;]
    end
&amp;lt;/div&amp;gt;&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Pipeline Parallelism&amp;lt;/strong&amp;gt; splits the model in half — GPU 0 handles the first 30 layers, GPU 1 handles the last 30. The problem? While GPU 0 is working, GPU 1 sits idle, and vice versa. For a single conversation, this wastes about 50% of available compute.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Tensor Parallelism&amp;lt;/strong&amp;gt; splits each layer across both GPUs, so they’re always working together. The cost? They need to exchange data after every single layer — and on consumer GPUs without NVLink (NVIDIA’s high-speed GPU-to-GPU connection), that data has to travel through the motherboard’s PCIe bus, which is slow.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;I tested both. For my use case (a personal AI agent handling 1-4 requests at a time), the communication overhead of Tensor Parallelism was roughly equal to the idle-time waste of Pipeline Parallelism. I ended up choosing Pipeline Parallelism because it was more stable on my specific hardware — GeForce cards have known issues with P2P (peer-to-peer) GPU communication that can cause deadlocks.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;automation%3A-because-doing-it-twice-is-doing-it-wrong&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#automation%3A-because-doing-it-twice-is-doing-it-wrong&amp;quot;&amp;gt;Automation: Because Doing It Twice Is Doing It Wrong&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;The setup process was complex enough that I automated it with &amp;lt;strong&amp;gt;Ansible&amp;lt;/strong&amp;gt; — a tool that lets you describe your entire server configuration as code. The result: a single command that installs NVIDIA drivers, Docker, Ollama, vLLM, Open WebUI, downloads models, and configures everything:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;# One command, 30-90 minutes later, you have a working AI server
ansible-playbook -i inventory.ini site.yml --ask-become-pass
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;The Ansible project grew to include:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Roles&amp;lt;/strong&amp;gt; for each component (NVIDIA drivers, Docker, Ollama, vLLM, Open WebUI)&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Mutual exclusion logic&amp;lt;/strong&amp;gt; that ensures only one engine runs at a time&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Model download tasks&amp;lt;/strong&amp;gt; that pull the right quantized models from HuggingFace&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;A test playbook&amp;lt;/strong&amp;gt; that verifies everything is working after installation&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;An update playbook&amp;lt;/strong&amp;gt; for keeping components current&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;This was essential. When you’re debugging driver incompatibilities and attention backend failures, you need to be able to tear everything down and rebuild from scratch in minutes, not hours.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;performance%3A-the-honest-numbers&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#performance%3A-the-honest-numbers&amp;quot;&amp;gt;Performance: The Honest Numbers&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;After all the configuration, optimization, and debugging, what did I actually get?&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;~20 tokens per second&amp;lt;/strong&amp;gt; for single-stream generation with Qwen3-32B-AWQ.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;To put that in context: that’s roughly 15 words per second — readable in real time, but noticeably slower than ChatGPT or Claude, which typically deliver 40-80+ tokens per second through their cloud infrastructure.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Where does the time go? The biggest bottleneck is the &amp;lt;strong&amp;gt;pipeline parallelism bubble&amp;lt;/strong&amp;gt; — that 50% idle time I mentioned earlier. Here are the optimizations I identified and tested:&amp;lt;/p&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;Optimization&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Expected Improvement&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Status&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Switch to Tensor Parallelism&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;+15-30% at low concurrency&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Tested — marginal gain&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Enable prefix caching&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;+30-50% time-to-first-token&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Implemented&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Speculative decoding (small draft model)&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;+50-100% decode speed&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Planned&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;NVFP4 quantization (Blackwell-native)&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;+17% if available&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Waiting for model checkpoint&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Reduce context window to 32K&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;More VRAM for batching&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Available if 64K not needed&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Push GPU utilization to 95%&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Small but free&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Implemented&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;p&amp;gt;The most promising path forward is &amp;lt;strong&amp;gt;speculative decoding&amp;lt;/strong&amp;gt;: using a tiny model (like Qwen3-0.6B) to quickly generate candidate tokens, which the large model then verifies in parallel. This could potentially double the throughput to 40-55 tokens per second — competitive with cloud APIs.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;real-world-testing%3A-can-it-replace-claude%3F&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#real-world-testing%3A-can-it-replace-claude%3F&amp;quot;&amp;gt;Real-World Testing: Can It Replace Claude?&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;I put the local setup to the test with the &amp;lt;strong&amp;gt;Hermes agent&amp;lt;/strong&amp;gt; — an open-source AI agent framework — running real development tasks:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;Writing project documentation for a codebase&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Generating test cases&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Drafting pull request descriptions&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Small and large code refactoring suggestions&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;The results were honest and humbling:&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Documentation&amp;lt;/strong&amp;gt; was useful but lacked depth. The model captured the high-level structure but missed important details that a human reviewer would need to fill in.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Test cases&amp;lt;/strong&amp;gt; were mostly correct but overlooked edge cases and contained minor errors — good as a starting point, not as a final product.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Pull request descriptions&amp;lt;/strong&amp;gt; were generally good but sometimes lacked clarity.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Refactoring suggestions&amp;lt;/strong&amp;gt; were the weakest area. The model sometimes made the code &amp;lt;em&amp;gt;worse&amp;lt;/em&amp;gt;, and the response times were long enough to break the flow of interactive development.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The verdict: a 32-billion-parameter local model is not yet a replacement for frontier cloud models when it comes to complex code understanding and generation. It’s a capable first-draft machine that still requires significant human review.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;a-more-promising-use-case&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#a-more-promising-use-case&amp;quot;&amp;gt;A More Promising Use Case&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;But there’s a second use case I’m more optimistic about: &amp;lt;strong&amp;gt;compliance documentation&amp;lt;/strong&amp;gt;. Generating ISO 27001 and ISO 62304 compliant documents — security policies, risk assessments, compliance reports — from unstructured customer input. This is fundamentally a text-generation-from-template task rather than deep code analysis, and early signs suggest the model handles it better. The structured, formulaic nature of compliance documents plays to the strengths of a quantized model.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;what-i%E2%80%99d-tell-someone-starting-today&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#what-i%E2%80%99d-tell-someone-starting-today&amp;quot;&amp;gt;What I’d Tell Someone Starting Today&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;If you’re thinking about building your own AI server, here’s what I learned:&amp;lt;/p&amp;gt;
&amp;lt;ol&amp;gt;
&amp;lt;li&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;The hardware is ready.&amp;lt;/strong&amp;gt; Consumer GPUs with 16+ GB of VRAM can genuinely run useful 30B+ parameter models. The price-to-performance ratio is compelling.&amp;lt;/p&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;The software is still catching up.&amp;lt;/strong&amp;gt; Especially on newer GPU architectures. Budget significant time for debugging if you’re on cutting-edge hardware. If stability matters more than bleeding-edge performance, consider a previous-generation GPU (RTX 4090, RTX 3090) where the software ecosystem is more mature.&amp;lt;/p&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Automation is not optional.&amp;lt;/strong&amp;gt; The number of configuration variables, environment settings, and interdependencies is too large to manage manually. Tools like Ansible, Docker, and systemd are essential.&amp;lt;/p&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Quantization is the key that unlocks everything.&amp;lt;/strong&amp;gt; Without AWQ 4-bit quantization and fp8 KV caches, a 32B model simply wouldn’t fit in 32 GB of VRAM. These techniques have matured enough that quality loss is minimal for many tasks.&amp;lt;/p&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Set realistic expectations.&amp;lt;/strong&amp;gt; Local models at this scale are powerful assistants, not autonomous replacements. They excel at drafting, summarizing, and generating structured content — tasks where a human reviews the output. For complex reasoning and real-time interactive coding, cloud frontier models still have a clear edge.&amp;lt;/p&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;looking-forward&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#looking-forward&amp;quot;&amp;gt;Looking Forward&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;The pace of progress in this space is extraordinary. When I started this project, Blackwell support in vLLM was essentially nonexistent. Within weeks, community fixes were landing in pull requests. Quantization techniques that were research papers six months ago are now one-line configuration options.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The open-source model ecosystem is also advancing rapidly. Qwen3-32B is already a capable model, and newer releases from Qwen, Llama, and DeepSeek continue to close the gap with proprietary offerings. NVFP4 quantization — designed specifically for Blackwell’s hardware — promises another 17% speedup once model checkpoints become available.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Building your own AI server in 2026 is a bit like building your own PC in 1998: it requires patience, some technical knowledge, and a willingness to troubleshoot. But the reward is a deeper understanding of the technology and a system that is entirely yours — private, customizable, and free from API rate limits.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The future of AI isn’t just in the cloud. It’s increasingly in our homes, our offices, and our homelabs. And it’s getting better every month.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;further-reading&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#further-reading&amp;quot;&amp;gt;Further Reading&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;a href=&amp;quot;https://github.com/tmseidel/local-llm-setup&amp;quot;&amp;gt;Project Source on GitHub&amp;lt;/a&amp;gt; — The complete Ansible playbooks, Docker configurations, and systemd units to replicate this entire setup from scratch&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;a href=&amp;quot;https://docs.vllm.ai/&amp;quot;&amp;gt;vLLM Documentation&amp;lt;/a&amp;gt; — The inference engine powering this setup&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;a href=&amp;quot;https://ollama.com/&amp;quot;&amp;gt;Ollama&amp;lt;/a&amp;gt; — The simpler alternative for getting started&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;a href=&amp;quot;https://huggingface.co/Qwen&amp;quot;&amp;gt;Qwen3 Model Family&amp;lt;/a&amp;gt; — The open-weight models used in this project&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;a href=&amp;quot;https://www.reddit.com/r/LocalLLaMA/comments/1rwubcy/guide_awq_models_working_on_rtx_5060_ti_sm_120/?tl=de&amp;quot;&amp;gt;vLLM Field Report: AWQ on RTX 5060 Ti&amp;lt;/a&amp;gt; — Community testing on Blackwell hardware&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;a href=&amp;quot;https://openwebui.com/&amp;quot;&amp;gt;Open WebUI&amp;lt;/a&amp;gt; — The chat interface that ties it all together&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
</content>
    <author>
      <name>Tom Seidel</name>
    </author>
    <category term="AI"/>
    <category term="LLM"/>
    <category term="Self-Hosting"/>
    <category term="Hardware"/>
    <category term="DevOps"/>
  </entry>
  <entry>
    <title>Sovereign AI Is Loud — But The Real Issue Is Sovereign IT</title>
    <link href="https://remus-software.org/articles/regarding-sovereign-it/" rel="alternate" type="text/html"/>
    <id>https://remus-software.org/articles/regarding-sovereign-it/</id>
    <published>2026-05-22T00:00:00.000Z</published>
    <updated>2026-05-22T00:00:00.000Z</updated>
    <summary>Why the debate about sovereign AI is really a broader discussion about sovereign IT — control over infrastructure, platforms, portability, and the long-term ability to make and reverse technical decisions.</summary>
    <content type="html">&amp;lt;h1 id=&amp;quot;sovereign-ai-is-loud-%E2%80%94-but-the-real-issue-is-sovereign-it&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#sovereign-ai-is-loud-%E2%80%94-but-the-real-issue-is-sovereign-it&amp;quot;&amp;gt;Sovereign AI Is Loud — But The Real Issue Is Sovereign IT&amp;lt;/a&amp;gt;&amp;lt;/h1&amp;gt;
&amp;lt;p&amp;gt;Most articles here are focused on very concrete, technical problems: setups, architectures, and reproducible solutions. This one takes a step back. Still, at its core, it remains technical — because questions of “sovereignty” ultimately materialize in infrastructure, operations, and the ability to make and reverse decisions.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;the-current-narrative%3A-sovereign-ai&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-current-narrative%3A-sovereign-ai&amp;quot;&amp;gt;The Current Narrative: Sovereign AI&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;The debate around &amp;lt;strong&amp;gt;sovereign AI&amp;lt;/strong&amp;gt; has gained remarkable momentum. Across industry reports, policy discussions, and conference stages, there is a growing emphasis on regaining control over data, models, and infrastructure. Especially in Europe, the dependency on non-local providers has become a recurring concern, driven by both geopolitical tension and regulatory realities.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Conceptually, the idea is straightforward: build, run, and control AI systems within your own sphere of influence. In practice, however, this turns out to be significantly more demanding.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Operating a self-hosted LLM setup today requires assembling a stack from relatively young and still evolving components:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;inference engines such as &amp;lt;strong&amp;gt;vLLM&amp;lt;/strong&amp;gt; or &amp;lt;strong&amp;gt;Ollama&amp;lt;/strong&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;containerized deployments and orchestration, often via &amp;lt;strong&amp;gt;Kubernetes&amp;lt;/strong&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;surrounding services like vector databases, authentication, observability, and networking&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;and, perhaps most critically, access to and operation of suitable &amp;lt;strong&amp;gt;GPU infrastructure&amp;lt;/strong&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;There is no widely accepted, production-ready “standard stack” comparable to what exists for more traditional enterprise workloads. Instead, organizations are faced with a growing but fragmented ecosystem. While this provides flexibility, it also means that building such a platform still requires a considerable amount of in-house expertise and operational maturity.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;infrastructure-is-scaling-%E2%80%94-but-not-necessarily-becoming-sovereign&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#infrastructure-is-scaling-%E2%80%94-but-not-necessarily-becoming-sovereign&amp;quot;&amp;gt;Infrastructure Is Scaling — But Not Necessarily Becoming Sovereign&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;At the same time, the physical backbone of digital infrastructure is expanding rapidly. Driven largely by AI workloads, Europe is experiencing a strong increase in data center construction and capacity.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;key-data-points&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#key-data-points&amp;quot;&amp;gt;Key Data Points&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;Topic&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Data&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Demand driver&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;AI is now the leading driver of data center demand in Europe &amp;lt;a href=&amp;quot;https://www.rlbinsights.com/reports/data-centre-trends-report-2025/ai-hits-europe&amp;quot;&amp;gt;1&amp;lt;/a&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Capacity growth&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Avg. deployments: 16 MW → 33 MW → 47 MW (2023–2025) &amp;lt;a href=&amp;quot;https://www.rlbinsights.com/reports/data-centre-trends-report-2025/ai-hits-europe&amp;quot;&amp;gt;1&amp;lt;/a&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Investment scale&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;amp;gt; €100B expected in European data centers by 2030 &amp;lt;a href=&amp;quot;https://www.eudca.org/state-of-european-data-centres-2025&amp;quot;&amp;gt;2&amp;lt;/a&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Pipeline size&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;~14 GW planned capacity in EMEA by mid-2025 &amp;lt;a href=&amp;quot;https://www.allianz.com/content/dam/onemarketing/azcom/Allianz_com/economic-research/publications/specials/en/2025/october/2025-10-07-construction-AZ.pdf&amp;quot;&amp;gt;3&amp;lt;/a&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;p&amp;gt;This expansion is often interpreted as a positive signal for digital sovereignty: more infrastructure, closer to users, under European jurisdiction.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Yet the picture is more nuanced. Much of this newly built capacity is designed to support hyperscale cloud and AI platforms. Even when physically located in Europe, these systems often operate within technological, contractual, and operational frameworks defined elsewhere. In that sense, infrastructure alone does not automatically translate into control.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;the-overlooked-context%3A-sovereignty-decisions-were-made-earlier&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-overlooked-context%3A-sovereignty-decisions-were-made-earlier&amp;quot;&amp;gt;The Overlooked Context: Sovereignty Decisions Were Made Earlier&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Seen in this light, the current intensity of the AI sovereignty debate is somewhat surprising. Over the past decade, many organizations have already made fundamental decisions that reduced their control over IT systems — often quite deliberately.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The shift toward cloud-based and platform-centric architectures brought undeniable advantages: faster deployment, reduced operational overhead, and access to highly sophisticated tooling. At the same time, it gradually relocated critical capabilities:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;infrastructure to cloud providers&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;development workflows to hosted platforms&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;collaboration and communication to SaaS ecosystems&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;customer-facing systems to externally operated services&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;None of this was irrational. In many cases, it was the most pragmatic choice available. However, it also meant that certain aspects of control — over data flows, system behavior, or long-term portability — became harder to maintain.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;different-industries%2C-different-trade-offs&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#different-industries%2C-different-trade-offs&amp;quot;&amp;gt;Different Industries, Different Trade-offs&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;In my own experience, the way organizations approach these questions varies significantly by industry.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;In &amp;lt;strong&amp;gt;medical technology&amp;lt;/strong&amp;gt;, there remains a noticeable degree of caution when it comes to outsourcing critical IT systems. This is not simply a cultural preference but largely shaped by regulation. Requirements around data protection, auditability, and traceability leave relatively little room for ambiguity. As a result, questions about where data resides, who can access it, and how systems can be audited or replaced tend to be addressed early and in detail.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;This does not necessarily mean that cloud usage is avoided altogether. Rather, it is approached more deliberately, and often with additional safeguards or hybrid models. In that sense, sovereignty is less an abstract concept and more an operational constraint that needs to be managed.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;By contrast, in parts of the &amp;lt;strong&amp;gt;automotive industry&amp;lt;/strong&amp;gt;, the shift toward external platforms was often more assertive. Internal infrastructure teams were frequently seen as cost-intensive and less flexible, while cloud providers offered scalability, speed, and a rich ecosystem of services. Over time, this led to a situation where a significant portion of the IT landscape — from development pipelines to collaboration tools — relies on a relatively small number of external vendors.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;This approach brought clear benefits in terms of velocity and standardization. At the same time, it introduced new forms of dependency, some of which only become visible when trying to change direction. Migration paths, data portability, and operational independence tend to become more complex as systems are more deeply integrated into proprietary platforms.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;why-ai-changes-the-tone-of-the-debate&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#why-ai-changes-the-tone-of-the-debate&amp;quot;&amp;gt;Why AI Changes the Tone of the Debate&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Against this backdrop, it becomes clearer why AI has triggered a more pronounced reaction.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;AI systems tend to sit very close to core value creation. They process sensitive data, encode domain knowledge, and increasingly influence decision-making processes. As a result, questions of control feel more immediate. Where earlier layers of abstraction — infrastructure, collaboration tools, or workflow systems — could be externalized with relatively limited visibility, AI makes dependencies more tangible.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;At the same time, many of the underlying challenges are not new. Vendor lock-in, limited transparency, reliance on external roadmaps, or the gradual loss of internal expertise are all dynamics that have been present for years. AI does not introduce them so much as amplify their impact.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;a-more-nuanced-view-on-sovereignty&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#a-more-nuanced-view-on-sovereignty&amp;quot;&amp;gt;A More Nuanced View on Sovereignty&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;It is also worth noting that sovereignty is rarely absolute. In practice, it tends to manifest as a spectrum rather than a binary choice.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;levels-of-sovereignty&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#levels-of-sovereignty&amp;quot;&amp;gt;Levels of Sovereignty&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;Level&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Example&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Low&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Full reliance on SaaS / hyperscalers&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Medium&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;EU hosting with contractual safeguards&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;High&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Open-source, portable architectures&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Maximum&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Fully self-hosted or air‑gapped systems&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;p&amp;gt;For many organizations, a balanced approach is likely the most practical: retaining control where it matters most while leveraging external services where appropriate. What becomes increasingly important, however, is the ability to make these choices consciously — and to revisit them if needed.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;early-signs-of-a-shift&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#early-signs-of-a-shift&amp;quot;&amp;gt;Early Signs of a Shift&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;There are indications that the broader conversation is evolving. In the public sector, for example, initiatives are emerging that explicitly frame digital sovereignty as a strategic objective.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Programs that move away from proprietary ecosystems toward open-source-based infrastructures — such as the adoption of LibreOffice, Linux, and open collaboration platforms — reflect a growing awareness of long-term dependencies and their implications. &amp;lt;a href=&amp;quot;https://www.schleswig-holstein.de/DE/landesregierung/ministerien-behoerden/I/Presse/PI/2024/CdS/241125_cds_open-source-strategie&amp;quot;&amp;gt;4&amp;lt;/a&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;These efforts are not without friction. Migration, training, and organizational change can be challenging. Yet they also demonstrate that alternative approaches are possible, even at scale.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;conclusion&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#conclusion&amp;quot;&amp;gt;Conclusion&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;The renewed focus on sovereign AI is both understandable and valuable. At the same time, it risks narrowing the perspective if it is treated in isolation.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Questions of control, dependency, and adaptability do not begin with AI — they extend across the entire IT landscape. In that sense, sovereign AI is less a starting point and more a visible manifestation of a broader theme.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;A more comprehensive view would therefore consider not only AI systems, but the surrounding architecture, data flows, and operational capabilities. Ultimately, sovereignty is less about complete independence and more about maintaining the flexibility to respond, adapt, and make informed choices.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Or, put differently:&amp;lt;br&amp;gt;
it is not necessarily about building everything yourself —&amp;lt;br&amp;gt;
but about retaining the ability not to be entirely dependent on others.&amp;lt;/p&amp;gt;
</content>
    <author>
      <name>Tom Seidel</name>
    </author>
    <category term="AI"/>
    <category term="Sovereignty"/>
    <category term="Cloud"/>
    <category term="Architecture"/>
  </entry>
  <entry>
    <title>Meet AI-Git-Bot 1.7 — the teammate that reviews, tests and ships your PRs</title>
    <link href="https://remus-software.org/articles/meet-ai-git-bot-the-teammate-that-review-ships-test-prs/" rel="alternate" type="text/html"/>
    <id>https://remus-software.org/articles/meet-ai-git-bot-the-teammate-that-review-ships-test-prs/</id>
    <published>2026-05-21T00:00:00.000Z</published>
    <updated>2026-05-21T00:00:00.000Z</updated>
    <summary>AI-Git-Bot 1.7 is here, transforming from &amp;#039;the PR review bot&amp;#039; to the AI teammate your repo has been waiting for — reviewing your code, writing your tests, deploying your previews, cleaning up after itself. The chores that always get cut under deadline pressure? Wire one bot. They get done. Every PR. Forever.</summary>
    <content type="html">&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;Imagine opening a pull request and, two minutes later, a bot has reviewed your diff, deployed a fresh preview, written a Playwright test for the feature you just added, run it against that preview, and pasted the report back into the PR — all inside the Git tool you already use.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;That’s not a roadmap. That’s &amp;lt;strong&amp;gt;AI-Git-Bot 1.7&amp;lt;/strong&amp;gt;, shipping today. 🚀&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;img src=&amp;quot;dashboard_ai_git_bot.PNG&amp;quot; alt=&amp;quot;AI-Git-Bot dashboard&amp;quot;&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;first-time-here%3F-30-second-intro-%F0%9F%91%8B&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#first-time-here%3F-30-second-intro-%F0%9F%91%8B&amp;quot;&amp;gt;First time here? 30-second intro 👋&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;AI-Git-Bot is the open-source AI teammate that lives inside your Git tool&amp;lt;/strong&amp;gt; — Gitea, GitHub, GitHub Enterprise, GitLab, Bitbucket Cloud. No new dashboard to log into, no Chrome extension, no Slack bot to babysit. You assign it work the same way you’d assign it to a colleague: request it as a PR reviewer, assign it an issue, or &amp;lt;code&amp;gt;@mention&amp;lt;/code&amp;gt; it in a comment.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;It takes over the &amp;lt;strong&amp;gt;necessary-but-uncomfortable&amp;lt;/strong&amp;gt; chores that quietly rot every codebase:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;📝 Writing a &amp;lt;em&amp;gt;proper&amp;lt;/em&amp;gt; issue with acceptance criteria — before any code is written&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;🔍 Reviewing PRs &amp;lt;strong&amp;gt;consistently&amp;lt;/strong&amp;gt;, even when the human reviewer is drowning&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;🧪 Adding the regression test for the bug you just fixed (the one we always say “we’ll add later”)&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;🛠️ Implementing the boring follow-up tickets (renames, bumps, small refactors)&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;🧹 Tearing down the preview environment nobody remembers spinning up&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;You pick which chores hurt most this quarter, wire one bot, done. The rest stays exactly as it was.&amp;lt;/p&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;👉 Want the long-form pitch? Read &amp;lt;strong&amp;gt;&amp;lt;a href=&amp;quot;https://github.com/tmseidel/ai-git-bot/blob/main/doc/pitch/PITCH.md&amp;quot;&amp;gt;&amp;lt;code&amp;gt;doc/pitch/PITCH.md&amp;lt;/code&amp;gt;&amp;lt;/a&amp;gt;&amp;lt;/strong&amp;gt; — it’s the fastest way to decide whether AI-Git-Bot is for your team.&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;what%E2%80%99s-new-in-1.7-%E2%80%94-the-highlights&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#what%E2%80%99s-new-in-1.7-%E2%80%94-the-highlights&amp;quot;&amp;gt;What’s new in 1.7 — the highlights&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;h3 id=&amp;quot;%F0%9F%8E%AC-1.-prs-that-test-themselves&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#%F0%9F%8E%AC-1.-prs-that-test-themselves&amp;quot;&amp;gt;🎬 1. PRs that test themselves&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;This is the headline. Tag your bot with the new &amp;lt;strong&amp;gt;Full-stack QA&amp;lt;/strong&amp;gt; workflow + tell it where to deploy your PRs, and every pull request now gets:&amp;lt;/p&amp;gt;
&amp;lt;ol&amp;gt;
&amp;lt;li&amp;gt;A short &amp;lt;strong&amp;gt;plan&amp;lt;/strong&amp;gt; of which user journeys to cover&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;A fresh &amp;lt;strong&amp;gt;Playwright test suite&amp;lt;/strong&amp;gt;, generated for &amp;lt;em&amp;gt;this&amp;lt;/em&amp;gt; PR&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;A &amp;lt;strong&amp;gt;deploy&amp;lt;/strong&amp;gt; of your PR to a real preview&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;The suite &amp;lt;strong&amp;gt;run live against that preview&amp;lt;/strong&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;A &amp;lt;strong&amp;gt;report comment&amp;lt;/strong&amp;gt; on the PR — pass/fail, screenshots, suite source included&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Automatic &amp;lt;strong&amp;gt;teardown&amp;lt;/strong&amp;gt; when the PR closes&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;p&amp;gt;Don’t like the suite? Drop a comment: &amp;lt;code&amp;gt;@bot regenerate-tests focus on the checkout flow&amp;lt;/code&amp;gt;. The bot replans with your feedback. Just need a quick rerun? &amp;lt;code&amp;gt;@bot rerun-tests&amp;lt;/code&amp;gt;. That’s it.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;img src=&amp;quot;gitea-pr-with-e2e-test-run.png&amp;quot; alt=&amp;quot;PR with a Playwright report posted by the bot&amp;quot;&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Why you’ll love it:&amp;lt;/strong&amp;gt; the test debt that always slips to “next sprint” finally gets paid down — automatically, per PR.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;%F0%9F%94%8C-2.-it-plays-nice-with-your-pipeline-(whatever-it-is)&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#%F0%9F%94%8C-2.-it-plays-nice-with-your-pipeline-(whatever-it-is)&amp;quot;&amp;gt;🔌 2. It plays nice with your pipeline (whatever it is)&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;We added four ways for the bot to deploy your PR — pick the one you already have:&amp;lt;/p&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;You’re using…&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;The bot uses strategy…&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Jenkins / TeamCity / a bash script behind a webhook&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;WEBHOOK&amp;lt;/code&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Vercel, Netlify, Render, Cloudflare Pages&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;STATIC&amp;lt;/code&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;GitHub Actions, Gitea Actions, GitLab CI, Bitbucket Pipelines&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;CI_ACTION&amp;lt;/code&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;An internal platform team exposing deploys via MCP&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;MCP&amp;lt;/code&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;No need to migrate anything.&amp;lt;/strong&amp;gt; The bot adapts to your stack, not the other way around.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;%F0%9F%A7%A9-3.-workflows-you-can-mix%2C-match-and-extend&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#%F0%9F%A7%A9-3.-workflows-you-can-mix%2C-match-and-extend&amp;quot;&amp;gt;🧩 3. Workflows you can mix, match and extend&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Reviewing PRs and running E2E tests are now just two examples of a bigger idea: &amp;lt;strong&amp;gt;PR Workflows&amp;lt;/strong&amp;gt;. Each one is a named, configurable bundle (“Default review”, “Full-stack QA”, or your own) that you can assign to any bot from the admin UI.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;img src=&amp;quot;ai-bot-workflow-settings.png&amp;quot; alt=&amp;quot;Workflow configurations admin UI&amp;quot;&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Want a custom one — say a license-header check, an SBOM-diff comment, or a “ping the on-call channel on every hotfix PR” workflow? It’s a clean extension point now. Your platform team can ship it without forking the project.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;%F0%9F%97%A3%EF%B8%8F-4.-new-slash-commands%2C-less-back-and-forth&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#%F0%9F%97%A3%EF%B8%8F-4.-new-slash-commands%2C-less-back-and-forth&amp;quot;&amp;gt;🗣️ 4. New slash commands, less back-and-forth&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Two more commands in the bot’s vocabulary:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;@bot rerun-tests&amp;lt;/code&amp;gt; — re-run the existing suite, no replanning&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;@bot regenerate-tests &amp;amp;lt;your feedback&amp;amp;gt;&amp;lt;/code&amp;gt; — replan the suite, your feedback goes straight into the planner&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;Combined with the existing &amp;lt;code&amp;gt;@bot fix …&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;@bot write …&amp;lt;/code&amp;gt;, your PR comments become the remote control.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;%F0%9F%9B%9F-5.-test-suites-that-can-outlive-the-pr-%E2%80%94-if-you-want-them-to&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#%F0%9F%9B%9F-5.-test-suites-that-can-outlive-the-pr-%E2%80%94-if-you-want-them-to&amp;quot;&amp;gt;🛟 5. Test suites that can outlive the PR — if you want them to&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;By default, generated suites are &amp;lt;strong&amp;gt;ephemeral&amp;lt;/strong&amp;gt;: they live with the PR, vanish on close, nothing leaks. Safe and boring — exactly what most teams want.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;But if you’d love to &amp;lt;strong&amp;gt;keep&amp;lt;/strong&amp;gt; the suites the bot writes, flip one setting and pick:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;em&amp;gt;commit-to-pr&amp;lt;/em&amp;gt; — committed straight to the PR branch&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;em&amp;gt;offer-as-pr&amp;lt;/em&amp;gt; — the bot opens a follow-up PR with the suite&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;em&amp;gt;promote-on-merge&amp;lt;/em&amp;gt; — auto-promoted when the original PR merges&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;A nightly cleanup job keeps the test folder from turning into a swamp.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;%F0%9F%94%92-6.-tighter%2C-safer%2C-friendlier&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#%F0%9F%94%92-6.-tighter%2C-safer%2C-friendlier&amp;quot;&amp;gt;🔒 6. Tighter, safer, friendlier&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;A handful of quality-of-life upgrades that you’ll feel without noticing:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Per-bot tool whitelisting&amp;lt;/strong&amp;gt; — give your writer-bot read-only access, your coding-bot the full toolbox&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Async callbacks&amp;lt;/strong&amp;gt; with single-use, HMAC-signed secrets — no shared tokens in your CI runners&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Force-push safe&amp;lt;/strong&amp;gt; — re-pushing three times in a minute no longer confuses the bot&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Better LLM compatibility&amp;lt;/strong&amp;gt; — Gemini 3.x, sanitised tool names across providers, more robust agent loops&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;what-to-try-first-%E2%80%94-pick-one-%F0%9F%91%87&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#what-to-try-first-%E2%80%94-pick-one-%F0%9F%91%87&amp;quot;&amp;gt;What to try first — pick one 👇&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;You don’t need to adopt everything. Start with the one that hooks you:&amp;lt;/p&amp;gt;
&amp;lt;ol&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;🚀 The 15-minute upgrade test.&amp;lt;/strong&amp;gt; Pull &amp;lt;code&amp;gt;tmseidel/ai-git-bot:1.7.0&amp;lt;/code&amp;gt;, point at your existing database, watch it migrate cleanly. Your current bots behave exactly like in 1.6. Zero risk, great way to validate.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;🎬 The “wow” demo.&amp;lt;/strong&amp;gt; Spin up the bundled sample stack — &amp;lt;code&amp;gt;docker compose -f systemtest/docker-compose-e2e-sample.yml up&amp;lt;/code&amp;gt; — open a PR, sit back, watch the bot plan, deploy, test, report. &amp;lt;strong&amp;gt;This is the one to show your team on Monday morning.&amp;lt;/strong&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;🔌 Wire it to your real CI.&amp;lt;/strong&amp;gt; If you live in GitHub/GitLab/Gitea/Bitbucket pipelines, &amp;lt;a href=&amp;quot;https://github.com/tmseidel/ai-git-bot/blob/main/doc/PR_WORKFLOWS_CI_ACTIONS.md&amp;quot;&amp;gt;&amp;lt;code&amp;gt;PR_WORKFLOWS_CI_ACTIONS.md&amp;lt;/code&amp;gt;&amp;lt;/a&amp;gt; is the shortest path to “the bot just tested my PR against a real preview environment.”&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;🧩 Build your own workflow.&amp;lt;/strong&amp;gt; Got a chore you keep nagging humans about on every PR? Ship it as a workflow — your team will thank you forever.&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;brand-new%3F-start-here-%F0%9F%A7%AD&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#brand-new%3F-start-here-%F0%9F%A7%AD&amp;quot;&amp;gt;Brand new? Start here 🧭&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;If you are…&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Start with…&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;👀 &amp;lt;strong&amp;gt;Curious&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;a href=&amp;quot;https://github.com/tmseidel/ai-git-bot/blob/main/doc/pitch/PITCH.md&amp;quot;&amp;gt;The pitch&amp;lt;/a&amp;gt; — why this exists, what it actually does, how it compares to Copilot Workspace / GitLab Duo / Qodo / Aider&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;🧑‍💻 &amp;lt;strong&amp;gt;A developer&amp;lt;/strong&amp;gt; who wants to play&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;a href=&amp;quot;https://github.com/tmseidel/ai-git-bot/blob/main/doc/LOCAL_DEVELOPMENT.md&amp;quot;&amp;gt;Local development guide&amp;lt;/a&amp;gt; — up and running in ~10 min&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;🏗️ &amp;lt;strong&amp;gt;An architect&amp;lt;/strong&amp;gt; evaluating it&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;a href=&amp;quot;https://github.com/tmseidel/ai-git-bot/blob/main/doc/ARCHITECTURE.md&amp;quot;&amp;gt;Architecture overview&amp;lt;/a&amp;gt; and &amp;lt;a href=&amp;quot;https://github.com/tmseidel/ai-git-bot/tree/main/doc/agentic-workflows&amp;quot;&amp;gt;agentic workflows internals&amp;lt;/a&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;🛠️ &amp;lt;strong&amp;gt;DevOps / Platform&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;a href=&amp;quot;https://github.com/tmseidel/ai-git-bot/blob/main/doc/DEPLOYMENT.md&amp;quot;&amp;gt;Deployment guide&amp;lt;/a&amp;gt; + &amp;lt;a href=&amp;quot;https://github.com/tmseidel/ai-git-bot/blob/main/doc/PR_WORKFLOWS_CI_ACTIONS.md&amp;quot;&amp;gt;CI action recipes&amp;lt;/a&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;🆙 &amp;lt;strong&amp;gt;Already running 1.6&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;a href=&amp;quot;https://github.com/tmseidel/ai-git-bot/blob/main/doc/MIGRATION_1.6_TO_1.7.md&amp;quot;&amp;gt;Migration guide 1.6 → 1.7&amp;lt;/a&amp;gt; — short version: drop in, done&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;img src=&amp;quot;ai-bot-bot-configuration.png&amp;quot; alt=&amp;quot;Bot configuration form&amp;quot;&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;get-it-now&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#get-it-now&amp;quot;&amp;gt;Get it now&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;docker pull tmseidel/ai-git-bot:1.7.0
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;⭐ &amp;lt;strong&amp;gt;Star us on GitHub:&amp;lt;/strong&amp;gt; &amp;lt;a href=&amp;quot;https://github.com/tmseidel/ai-git-bot&amp;quot;&amp;gt;https://github.com/tmseidel/ai-git-bot&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;🐛 &amp;lt;strong&amp;gt;Found something?&amp;lt;/strong&amp;gt; &amp;lt;a href=&amp;quot;https://github.com/tmseidel/ai-git-bot/issues&amp;quot;&amp;gt;Open an issue&amp;lt;/a&amp;gt; — we read every one.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;💬 &amp;lt;strong&amp;gt;Just want to say hi?&amp;lt;/strong&amp;gt; Drop a discussion on the repo, we love hearing how teams are wiring this up.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;The bottom line:&amp;lt;/strong&amp;gt; in 1.7, AI-Git-Bot stops being “the PR review bot” and starts being &amp;lt;strong&amp;gt;the AI teammate your repo has been waiting for&amp;lt;/strong&amp;gt; — reviewing your code, writing your tests, deploying your previews, cleaning up after itself.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The chores that always get cut under deadline pressure? Wire one bot. They get done. Every PR. Forever.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Happy shipping. 🚀&amp;lt;/p&amp;gt;
</content>
    <author>
      <name>Tom Seidel</name>
    </author>
    <category term="software-development"/>
    <category term="AI"/>
    <category term="DevOps"/>
    <category term="Git"/>
    <category term="CI/CD"/>
  </entry>
  <entry>
    <title>Standardizing Agentic AI at Scale: Why Remote MCP Servers Are the Missing Layer</title>
    <link href="https://remus-software.org/articles/remote-mcp-standardization/" rel="alternate" type="text/html"/>
    <id>https://remus-software.org/articles/remote-mcp-standardization/</id>
    <published>2026-05-15T00:00:00.000Z</published>
    <updated>2026-05-15T00:00:00.000Z</updated>
    <summary>Why organizations should standardize agentic AI with remote MCP server pools to improve consistency, governance, security, and adoption across teams.</summary>
    <content type="html">&amp;lt;p&amp;gt;&amp;lt;em&amp;gt;How organizations can eliminate inconsistent AI output, unify governance, and accelerate adoption — by moving from local, per-developer tool chaos to a shared pool of remote MCP resources.&amp;lt;/em&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;the-problem-no-one-talks-about%3A-ai-output-variance&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-problem-no-one-talks-about%3A-ai-output-variance&amp;quot;&amp;gt;The Problem No One Talks About: AI Output Variance&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Enterprise AI adoption is accelerating, but a quiet problem is eroding its ROI: &amp;lt;strong&amp;gt;the same prompt, run by two different employees, produces dramatically different results&amp;lt;/strong&amp;gt;. This is not a model problem — it is an infrastructure problem.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The root causes are surprisingly structural:&amp;lt;/p&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;Factor&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Description&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Client-side setup&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Each employee configures their local AI environment differently — different system prompts, context window sizes, temperature settings, or client versions&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Prompt engineering skill gap&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Employees vary widely in their ability to write effective prompts, leading to wildly different output quality for identical tasks&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Installed local tools &amp;amp;amp; MCP servers&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;One developer has a CRM MCP plugin; another does not. One has an up-to-date internal knowledge tool; another is running a stale fork&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Model version drift&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Without pinned configurations, different clients may call different model versions&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Context availability&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Access to company-specific context (style guides, internal APIs, compliance rules) depends on what each user has manually configured&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Security posture&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;API keys stored in local &amp;lt;code&amp;gt;.env&amp;lt;/code&amp;gt; files, personal tokens in config files, credentials passed as environment variables — each a potential exposure point&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Tool versioning&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Locally installed MCP servers may lag behind in functionality or contain unfixed bugs&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Permissions divergence&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Without centralized access control, some employees inadvertently have access to sensitive tools they should not, and others are blocked from tools they need&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;p&amp;gt;This variance has three compounding consequences that directly impact the business case for AI:&amp;lt;/p&amp;gt;
&amp;lt;ol&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Quality rework&amp;lt;/strong&amp;gt; — Outputs need manual post-processing because they don’t meet the expected standard, eliminating the productivity gain&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Adoption stagnation&amp;lt;/strong&amp;gt; — When AI “works well for some and not others,” skepticism spreads and uptake plateaus&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Compliance failure&amp;lt;/strong&amp;gt; — Uncontrolled tool access and inconsistent outputs make it nearly impossible to enforce data governance, GDPR obligations, or industry-specific regulations&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;where-the-mcp-ecosystem-stands-today%3A-a-protocol-snapshot&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#where-the-mcp-ecosystem-stands-today%3A-a-protocol-snapshot&amp;quot;&amp;gt;Where the MCP Ecosystem Stands Today: A Protocol Snapshot&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Before making the case for remote MCP, it is worth grounding the discussion in where the ecosystem actually is — because the data tells a revealing story about the gap between where things are and where they need to go.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;As of early 2026, the distribution of publicly registered MCP servers by transport protocol breaks down as follows — based on aggregated data from Anthropic, OpenAI, and analyst surveys published Q4 2025 through Q1 2026, as compiled by &amp;lt;a href=&amp;quot;https://www.digitalapplied.com/blog/mcp-adoption-statistics-2026-model-context-protocol&amp;quot;&amp;gt;Digital Applied (April 2026)&amp;lt;/a&amp;gt;:&amp;lt;/p&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;Transport&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Share&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Status&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;stdio&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;~67 %&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Local subprocess — runs on the client machine&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Streamable HTTP&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;~28 %&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Remote, OAuth 2.1-secured — the current standard&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;SSE&amp;lt;/strong&amp;gt; (legacy)&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;~5 %&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Remote, deprecated — being phased out&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;pie title MCP Server Distribution by Transport Protocol (2026)
    &amp;quot;stdio (local)&amp;quot; : 67
    &amp;quot;Streamable HTTP (remote)&amp;quot; : 28
    &amp;quot;SSE legacy (remote)&amp;quot; : 5
&amp;lt;/div&amp;gt;&amp;lt;p&amp;gt;Two-thirds of all MCP servers currently run locally via stdio. This is a direct artifact of how MCP adoption began: developers building quick local integrations for their own AI clients, spinning up tools as subprocesses, storing credentials in environment variables. It was the fastest path to a working prototype — and it shows.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The more important number, however, is the &amp;lt;strong&amp;gt;trend&amp;lt;/strong&amp;gt;. Remote MCP deployments have grown nearly fourfold since May 2025. Among the 20 most widely used MCP servers, 80 % now offer a remote deployment option — a figure independently confirmed by MCP Manager via Ahrefs search-volume data (&amp;lt;a href=&amp;quot;https://mcpmanager.ai/blog/mcp-adoption-statistics/&amp;quot;&amp;gt;MCP Manager, October 2025&amp;lt;/a&amp;gt;). And of those remote servers, 81 % authenticate via OAuth 2.1 — meaning the governance infrastructure is already being built at the ecosystem level.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The market is moving toward remote. The question for organizations is whether they lead that shift deliberately — with centralized governance, versioning, and access control — or whether they arrive there reactively, having accumulated the same local-config debt that once plagued their microservices era.&amp;lt;/p&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;The 67 % stdio figure is not a benchmark to maintain — it is a starting point to move away from.&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;the-architectural-shift%3A-from-local-chaos-to-a-shared-mcp-resource-pool&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-architectural-shift%3A-from-local-chaos-to-a-shared-mcp-resource-pool&amp;quot;&amp;gt;The Architectural Shift: From Local Chaos to a Shared MCP Resource Pool&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;The solution mirrors a pattern mature engineering organizations have known for decades: &amp;lt;strong&amp;gt;centralize what should be consistent, and expose it through a well-governed interface&amp;lt;/strong&amp;gt;.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;In the context of AI agents, this means replacing a sprawl of locally configured MCP servers with a unified, remote MCP server pool that every client in the organization connects to.&amp;lt;/p&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;graph TD
    subgraph Before [&amp;quot;❌ Before: Per-Client Local Setup&amp;quot;]
        E1[&amp;quot;Employee A&#92;n(has CRM MCP v1.2)&amp;quot;]
        E2[&amp;quot;Employee B&#92;n(has CRM MCP v1.0, misconfigured)&amp;quot;]
        E3[&amp;quot;Employee C&#92;n(no CRM MCP)&amp;quot;]
    end

    subgraph After [&amp;quot;✅ After: Shared Remote MCP Pool&amp;quot;]
        C1[&amp;quot;Client A&amp;quot;]
        C2[&amp;quot;Client B&amp;quot;]
        C3[&amp;quot;Client C&amp;quot;]

        subgraph Pool[&amp;quot;Remote MCP Server Pool (Org-managed)&amp;quot;]
            direction LR
            MCP1[&amp;quot;CRM MCP&#92;n(latest, versioned)&amp;quot;]
            MCP2[&amp;quot;Knowledge Base MCP&#92;n(RAG-enabled)&amp;quot;]
            MCP3[&amp;quot;Compliance MCP&#92;n(audit-logged)&amp;quot;]
            MCP4[&amp;quot;Code Review MCP&#92;n(shared tooling)&amp;quot;]
        end

        C1 --&amp;gt; Pool
        C2 --&amp;gt; Pool
        C3 --&amp;gt; Pool
    end
&amp;lt;/div&amp;gt;&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;benefits-of-a-remote%2C-centralized-mcp-resource-pool&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#benefits-of-a-remote%2C-centralized-mcp-resource-pool&amp;quot;&amp;gt;Benefits of a Remote, Centralized MCP Resource Pool&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Moving to shared remote MCP servers delivers benefits across multiple dimensions:&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;%F0%9F%94%92-security-%26-governance&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#%F0%9F%94%92-security-%26-governance&amp;quot;&amp;gt;🔒 Security &amp;amp;amp; Governance&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Centralized secret management&amp;lt;/strong&amp;gt; — API keys, tokens, and credentials live on the server, never on employee machines&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Unified access control&amp;lt;/strong&amp;gt; — Role-based permissions enforced at the MCP layer: the marketing team never accidentally calls a financial data tool&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Audit trails&amp;lt;/strong&amp;gt; — Every tool call is logged centrally, making compliance reporting tractable&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Reduced attack surface&amp;lt;/strong&amp;gt; — No credentials scattered across dozens of laptops&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h3 id=&amp;quot;%F0%9F%93%90-consistency-%26-quality&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#%F0%9F%93%90-consistency-%26-quality&amp;quot;&amp;gt;📐 Consistency &amp;amp;amp; Quality&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Single source of truth&amp;lt;/strong&amp;gt; — All agents access the same tool versions, the same data, the same business logic&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Versioned, tested releases&amp;lt;/strong&amp;gt; — Tool updates are deployed once and take effect for all users simultaneously&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Embedded prompt hygiene&amp;lt;/strong&amp;gt; — System prompts, context, and guardrails can be baked into server-side tool definitions, not left to individual configuration&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h3 id=&amp;quot;%E2%9A%99%EF%B8%8F-operational-efficiency&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#%E2%9A%99%EF%B8%8F-operational-efficiency&amp;quot;&amp;gt;⚙️ Operational Efficiency&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Zero per-client installation&amp;lt;/strong&amp;gt; — Onboarding a new employee means one auth flow, not an afternoon of CLI setup&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Centralized monitoring&amp;lt;/strong&amp;gt; — Error rates, latency, and usage patterns visible in one place&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Faster iteration&amp;lt;/strong&amp;gt; — Improve a tool once, benefit everyone immediately&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h3 id=&amp;quot;%F0%9F%93%88-adoption-%26-culture&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#%F0%9F%93%88-adoption-%26-culture&amp;quot;&amp;gt;📈 Adoption &amp;amp;amp; Culture&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Consistent experience&amp;lt;/strong&amp;gt; — Every user gets the same capable toolset, reducing “it works for them but not me” friction&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Lower barrier to entry&amp;lt;/strong&amp;gt; — Non-technical employees can use powerful agentic tools without understanding local config&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Trust through predictability&amp;lt;/strong&amp;gt; — Repeatable, governed outputs build organizational trust in AI&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;the-microservices-analogy%3A-secured-services-calling-secured-services&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-microservices-analogy%3A-secured-services-calling-secured-services&amp;quot;&amp;gt;The Microservices Analogy: Secured Services Calling Secured Services&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;This architecture is not new — it is precisely the pattern that modern software engineering adopted when monoliths became unmanageable.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Consider the parallel to microservices:&amp;lt;/p&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;graph LR
    subgraph Microservices[&amp;quot;Microservices Architecture (known pattern)&amp;quot;]
        direction LR
        MS_Client[&amp;quot;Frontend / API Gateway&amp;quot;]
        MS_Auth[&amp;quot;Auth Service&#92;n(OAuth 2.1 / OIDC)&amp;quot;]
        MS_CRM[&amp;quot;CRM Service&#92;n(secured endpoint)&amp;quot;]
        MS_BI[&amp;quot;BI Service&#92;n(secured endpoint)&amp;quot;]

        MS_Client -- &amp;quot;Bearer Token&amp;quot; --&amp;gt; MS_Auth
        MS_Auth -- &amp;quot;validates, issues scoped token&amp;quot; --&amp;gt; MS_Client
        MS_Client -- &amp;quot;scoped token&amp;quot; --&amp;gt; MS_CRM
        MS_Client -- &amp;quot;scoped token&amp;quot; --&amp;gt; MS_BI
    end

    subgraph MCP_Arch[&amp;quot;MCP Architecture (same principle)&amp;quot;]
        direction LR
        AI_Client[&amp;quot;AI Agent / Claude Code&amp;quot;]
        MCP_Auth[&amp;quot;OAuth 2.1 Provider&#92;n(org identity)&amp;quot;]
        MCP_CRM[&amp;quot;CRM MCP Server&#92;n(remote, secured)&amp;quot;]
        MCP_KB[&amp;quot;Knowledge MCP Server&#92;n(remote, secured)&amp;quot;]

        AI_Client -- &amp;quot;auth request&amp;quot; --&amp;gt; MCP_Auth
        MCP_Auth -- &amp;quot;scoped token (RFC 8707)&amp;quot; --&amp;gt; AI_Client
        AI_Client -- &amp;quot;tool call + token&amp;quot; --&amp;gt; MCP_CRM
        AI_Client -- &amp;quot;tool call + token&amp;quot; --&amp;gt; MCP_KB
    end
&amp;lt;/div&amp;gt;&amp;lt;p&amp;gt;Just as a well-designed microservices landscape enforces that &amp;lt;strong&amp;gt;only authenticated, authorized clients can call sensitive services&amp;lt;/strong&amp;gt; — regardless of who wrote the calling code or where it runs — a remote MCP pool enforces that &amp;lt;strong&amp;gt;only authenticated agents, with the right scopes, can call the right tools&amp;lt;/strong&amp;gt;.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The MCP spec adopted &amp;lt;strong&amp;gt;OAuth 2.1 with PKCE and Resource Indicators (RFC 8707)&amp;lt;/strong&amp;gt; precisely for this reason: so that a rogue or misconfigured client cannot escalate privileges or leak tokens across service boundaries — a gap that was explicitly closed in the June 2025 spec update.&amp;lt;/p&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;The bottom line:&amp;lt;/strong&amp;gt; Just as you would never let application developers hardcode database passwords into their laptops and query production directly, you should not let them hardcode MCP tool configurations either. Centralize the services; secure the boundary; govern the access.&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;quick-start%3A-connecting-claude-code-cli-to-remote-mcp-servers&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#quick-start%3A-connecting-claude-code-cli-to-remote-mcp-servers&amp;quot;&amp;gt;Quick Start: Connecting Claude Code CLI to Remote MCP Servers&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;The good news is that connecting to a remote MCP server requires only a single command. Here is a practical walkthrough using Claude Code (the CLI), which supports Streamable HTTP transport natively.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;step-1%3A-add-a-remote-mcp-server&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#step-1%3A-add-a-remote-mcp-server&amp;quot;&amp;gt;Step 1: Add a remote MCP server&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;# Basic remote server (Streamable HTTP)
claude mcp add --transport http &amp;amp;lt;server-name&amp;amp;gt; https://mcp.your-org.com/crm

# With a scope — &amp;amp;quot;project&amp;amp;quot; commits to .mcp.json (shareable via git)
claude mcp add --transport http --scope project crm-mcp https://mcp.your-org.com/crm

# With a static auth header (for servers not using OAuth)
claude mcp add --transport http --scope user analytics-mcp https://mcp.your-org.com/analytics &#92;
  --header &amp;amp;quot;Authorization: Bearer ${ORG_MCP_TOKEN}&amp;amp;quot;
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;h3 id=&amp;quot;step-2%3A-authenticate-via-oauth-(if-the-server-requires-it)&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#step-2%3A-authenticate-via-oauth-(if-the-server-requires-it)&amp;quot;&amp;gt;Step 2: Authenticate via OAuth (if the server requires it)&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;# Inside a Claude Code session, trigger the OAuth flow
/mcp
# Select the server name, press Enter
# Claude Code opens a browser tab for org SSO / OAuth consent
# After consent, the token is stored automatically
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;h3 id=&amp;quot;step-3%3A-add-via-json-(ideal-for-scripting-or-ci%2Fcd-provisioning)&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#step-3%3A-add-via-json-(ideal-for-scripting-or-ci%2Fcd-provisioning)&amp;quot;&amp;gt;Step 3: Add via JSON (ideal for scripting or CI/CD provisioning)&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;# For automated onboarding scripts
claude mcp add-json crm-mcp &amp;#039;{
  &amp;amp;quot;type&amp;amp;quot;: &amp;amp;quot;http&amp;amp;quot;,
  &amp;amp;quot;url&amp;amp;quot;: &amp;amp;quot;https://mcp.your-org.com/crm&amp;amp;quot;
}&amp;#039;
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;h3 id=&amp;quot;step-4%3A-verify-the-connection&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#step-4%3A-verify-the-connection&amp;quot;&amp;gt;Step 4: Verify the connection&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;claude mcp list
# → crm-mcp    https://mcp.your-org.com/crm    ✓ connected
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;h3 id=&amp;quot;committing-the-org-wide-configuration&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#committing-the-org-wide-configuration&amp;quot;&amp;gt;Committing the org-wide configuration&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;When &amp;lt;code&amp;gt;--scope project&amp;lt;/code&amp;gt; is used, Claude Code writes the server configuration to &amp;lt;code&amp;gt;.mcp.json&amp;lt;/code&amp;gt; at the project root. &amp;lt;strong&amp;gt;Committing this file to your repository means every developer who clones the repo gets the same MCP pool automatically&amp;lt;/strong&amp;gt; — no per-person setup required.&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-jsonc&amp;quot;&amp;gt;// .mcp.json (committed to version control)
{
  &amp;amp;quot;mcpServers&amp;amp;quot;: {
    &amp;amp;quot;crm-mcp&amp;amp;quot;: {
      &amp;amp;quot;type&amp;amp;quot;: &amp;amp;quot;http&amp;amp;quot;,
      &amp;amp;quot;url&amp;amp;quot;: &amp;amp;quot;https://mcp.your-org.com/crm&amp;amp;quot;
    },
    &amp;amp;quot;knowledge-base-mcp&amp;amp;quot;: {
      &amp;amp;quot;type&amp;amp;quot;: &amp;amp;quot;http&amp;amp;quot;,
      &amp;amp;quot;url&amp;amp;quot;: &amp;amp;quot;https://mcp.your-org.com/kb&amp;amp;quot;
    },
    &amp;amp;quot;compliance-mcp&amp;amp;quot;: {
      &amp;amp;quot;type&amp;amp;quot;: &amp;amp;quot;http&amp;amp;quot;,
      &amp;amp;quot;url&amp;amp;quot;: &amp;amp;quot;https://mcp.your-org.com/compliance&amp;amp;quot;
    }
  }
}
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;💡 &amp;lt;strong&amp;gt;This approach is AI-implementation agnostic.&amp;lt;/strong&amp;gt; The MCP specification is an open standard, now governed by the Agentic AI Foundation (AAIF) under the Linux Foundation, with support from Anthropic, OpenAI, Google, Microsoft, and Block. The same remote MCP servers you configure for Claude Code can be consumed by Cursor, Windsurf, the OpenAI Agents SDK, Gemini’s Vertex AI Agent Builder, and any other MCP-compliant client. &amp;lt;strong&amp;gt;Invest once in your MCP server infrastructure; benefit across your entire AI toolchain.&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;a-note-on-the-argument%3A-where-it-holds-%E2%80%94-and-where-to-be-careful&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#a-note-on-the-argument%3A-where-it-holds-%E2%80%94-and-where-to-be-careful&amp;quot;&amp;gt;A Note on the Argument: Where It Holds — and Where to Be Careful&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;This proposal is structurally sound, but a few nuances are worth acknowledging:&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Where the argument is strong:&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;For large organizations with many AI users, the consistency and governance case is compelling and mirrors established patterns from API gateway and microservices adoption&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;OAuth 2.1 at the MCP layer is not theoretical — it is spec-mandated since mid-2025 and implemented in production by major SaaS vendors&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;The adoption-stagnation problem is real: inconsistent AI experiences are a documented cause of tool abandonment&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Where to be careful:&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Latency&amp;lt;/strong&amp;gt; — Remote MCP servers introduce network round-trips vs. local stdio. For latency-sensitive agentic loops (many sequential tool calls), this can matter. Mitigate with geographic co-location or edge deployment&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Dependency risk&amp;lt;/strong&amp;gt; — A central MCP pool is a shared dependency. Plan for high availability from day one; a pool outage affects all agents simultaneously&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Migration cost&amp;lt;/strong&amp;gt; — Teams with mature local setups face a real migration effort. A phased approach (critical/shared tools remote first, niche local tools later) reduces friction&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Not a silver bullet for prompt quality&amp;lt;/strong&amp;gt; — Centralizing tools improves tool consistency, but employees with weak prompt engineering skills will still produce weaker outputs. Pair this with prompt libraries, agent templates, and training&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;conclusion&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#conclusion&amp;quot;&amp;gt;Conclusion&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;The variance problem in enterprise AI is not going to solve itself. As long as every employee runs their own local configuration, prompt-engineers with their own skills, and connects to their own patchwork of local tools, the delta between the best and worst AI-assisted work in your organization will remain large.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Remote MCP servers are the architectural lever that addresses the &amp;lt;em&amp;gt;tooling&amp;lt;/em&amp;gt; layer of this problem — and it is the layer most amenable to centralized governance. Centralizing your MCP resource pool brings your AI agents closer to the reliability model you expect from your SaaS APIs: versioned, authenticated, monitored, and consistent for every caller.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The pattern is proven. The spec is stable. The tooling is ready.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;outlook%3A-further-levers-for-ai-standardization-in-your-organization&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#outlook%3A-further-levers-for-ai-standardization-in-your-organization&amp;quot;&amp;gt;Outlook: Further Levers for AI Standardization in Your Organization&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;The remote MCP pool is one layer of a broader AI standardization stack. Organizations that go further will benefit from:&amp;lt;/p&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;graph TD
    L1[&amp;quot;🏗️ Layer 1: Remote MCP Server Pool&#92;n(consistent tools, governed access)&amp;quot;]
    L2[&amp;quot;📝 Layer 2: Shared Prompt Libraries &amp;amp; Agent Templates&#92;n(consistent task framing, embedded best practices)&amp;quot;]
    L3[&amp;quot;🔍 Layer 3: Observability &amp;amp; Evaluation&#92;n(log tool calls, score outputs, detect drift)&amp;quot;]
    L4[&amp;quot;🏢 Layer 4: AI Gateway / Proxy&#92;n(rate limits, cost allocation, model routing, PII redaction)&amp;quot;]
    L5[&amp;quot;🎓 Layer 5: Continuous AI Literacy Programs&#92;n(prompt engineering, agent design patterns)&amp;quot;]

    L1 --&amp;gt; L2 --&amp;gt; L3 --&amp;gt; L4 --&amp;gt; L5
&amp;lt;/div&amp;gt;&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Shared prompt libraries&amp;lt;/strong&amp;gt;: Versioned, peer-reviewed prompt templates for common tasks, stored in a central repository and surfaced to all users — reducing the skill-gap effect directly&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Observability pipelines&amp;lt;/strong&amp;gt;: Logging agent tool calls and scoring outputs over time exposes which tools are underperforming, which prompts produce the best results, and where policy violations occur&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;AI gateways&amp;lt;/strong&amp;gt;: A proxy layer (such as LiteLLM, Portkey, or a custom gateway) sitting in front of multiple model providers enables model routing, cost visibility, PII stripping, and fallback — independent of which AI client employees use&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Agent-as-a-service&amp;lt;/strong&amp;gt;: For high-value, repeatable workflows (contract review, data extraction, report generation), encapsulate the entire agent — model, tools, system prompt, output format — as an internal service with an API, removing individual variation entirely&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;The organizations that will lead in AI-augmented productivity are not the ones who gave every employee access to ChatGPT or Claude. They are the ones who built the infrastructure layer that makes the output of any employee, on any client, consistently excellent.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;em&amp;gt;References and further reading:&amp;lt;/em&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;a href=&amp;quot;https://modelcontextprotocol.io/specification/2025-11-25/basic/transports&amp;quot;&amp;gt;MCP Official Specification — Transports&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;a href=&amp;quot;https://code.claude.com/docs/en/mcp&amp;quot;&amp;gt;Claude Code: Connect to tools via MCP&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;a href=&amp;quot;https://docs.anthropic.com/en/build-with-claude/mcp/directory&amp;quot;&amp;gt;Anthropic MCP Connector Directory&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
</content>
    <author>
      <name>Tom Seidel</name>
    </author>
    <category term="AI"/>
    <category term="MCP"/>
    <category term="Architecture"/>
    <category term="Security"/>
  </entry>
  <entry>
    <title>Improving Software Quality with Reviewer Personas in AI-Git-Bot</title>
    <link href="https://remus-software.org/articles/improve-software-quality-with-reviewer-personas/" rel="alternate" type="text/html"/>
    <id>https://remus-software.org/articles/improve-software-quality-with-reviewer-personas/</id>
    <published>2026-05-08T00:00:00.000Z</published>
    <updated>2026-05-08T00:00:00.000Z</updated>
    <summary>How to use AI-Git-Bot as a review gateway with multiple specialized reviewer personas to catch security, reliability, performance, testability, and maintainability issues in Git pull requests.</summary>
    <content type="html">&amp;lt;h1 id=&amp;quot;improving-software-quality-with-reviewer-personas-in-ai-git-bot&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#improving-software-quality-with-reviewer-personas-in-ai-git-bot&amp;quot;&amp;gt;Improving Software Quality with Reviewer Personas in AI-Git-Bot&amp;lt;/a&amp;gt;&amp;lt;/h1&amp;gt;
&amp;lt;p&amp;gt;AI-Git-Bot can be used as a self-hosted review gateway that connects Git platforms such as Gitea, GitHub, GitLab, and Bitbucket Cloud with configurable AI providers. One of its strongest quality-improvement patterns is not to configure a single generic AI reviewer, but to create a small system of focused reviewer personas. Each persona uses its own system prompt, its own bot identity, and its own review criteria. The result is a repeatable multi-pass review process that brings security, reliability, performance, testability, and maintainability perspectives into the Git workflow before code is merged.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;This article uses an exemplary project, &amp;lt;strong&amp;gt;ShopFlow&amp;lt;/strong&amp;gt;, to explain the approach. ShopFlow is a fictional e-commerce application with a Java/Spring Boot backend, a PostgreSQL database, and a web frontend. The team frequently changes checkout logic, payment integrations, discount rules, user-account flows, and order-processing jobs. Those are exactly the kinds of changes where a single human reviewer or a single generic AI review can miss important risks.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;1.-problem-description&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#1.-problem-description&amp;quot;&amp;gt;1. Problem Description&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Modern software teams already know that code review is essential, but the review process is under pressure:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;Pull requests often mix business logic, infrastructure changes, tests, and documentation.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Human reviewers have limited time and different areas of expertise.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Security problems, concurrency bugs, missing edge cases, and weak test coverage are easy to overlook.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Teams want fast feedback, but not at the cost of quality.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Generic AI reviews can help, but a single broad prompt often produces broad feedback instead of deep specialized analysis.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;In ShopFlow, consider a pull request titled &amp;lt;strong&amp;gt;“Add express checkout with saved cards and promotional discount stacking”&amp;lt;/strong&amp;gt;. The diff touches:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;authentication checks for saved payment methods,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;discount calculation rules,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;database migrations,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;order-confirmation emails,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;retry behavior for payment-provider timeouts,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;unit and integration tests.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;No single reviewer perspective is enough. A security-minded reviewer will ask whether users can access another user’s saved card. A reliability reviewer will look for retry storms and idempotency problems. A maintainability reviewer will ask whether the discount rules are testable and understandable.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;AI-Git-Bot addresses this by letting a team configure several bots, each connected to a Git integration, an AI integration, and a dedicated system prompt entry. These bots can be requested as reviewers in the Git platform. When assigned or re-requested as reviewers, they fetch the pull-request diff, send it to the configured AI provider, and post their review back to the pull request.&amp;lt;/p&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;flowchart LR
    PR[&amp;quot;ShopFlow pull request&#92;nExpress checkout&amp;quot;] --&amp;gt; SecBot[&amp;quot;Security Reviewer Bot&amp;quot;]
    PR --&amp;gt; RelBot[&amp;quot;Reliability &amp;amp; Performance Bot&amp;quot;]
    PR --&amp;gt; MaintBot[&amp;quot;Testability &amp;amp; Maintainability Bot&amp;quot;]

    SecBot --&amp;gt; SecReview[&amp;quot;Finds auth, permission, secret,&#92;nand input-validation risks&amp;quot;]
    RelBot --&amp;gt; RelReview[&amp;quot;Finds latency, retry, concurrency,&#92;nand resource-usage risks&amp;quot;]
    MaintBot --&amp;gt; MaintReview[&amp;quot;Finds weak tests, unclear design,&#92;nand maintainability issues&amp;quot;]

    SecReview --&amp;gt; Git[&amp;quot;Git pull-request discussion&amp;quot;]
    RelReview --&amp;gt; Git
    MaintReview --&amp;gt; Git
    Git --&amp;gt; Human[&amp;quot;Human reviewer makes&#92;nfinal merge decision&amp;quot;]
&amp;lt;/div&amp;gt;&amp;lt;p&amp;gt;The goal is not to replace human review. The goal is to make human review more effective by adding consistent, specialized, early feedback directly where development already happens: in the Git pull request.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;2.-current-status-of-good-practices&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#2.-current-status-of-good-practices&amp;quot;&amp;gt;2. Current Status of Good Practices&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Good engineering teams already combine several quality practices:&amp;lt;/p&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;Practice&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Strength&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Limitation&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Human code review&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Context, judgment, ownership, mentoring&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Slow under load; expertise varies by reviewer&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;CI builds and tests&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Deterministic validation&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Only catches what is encoded in tests&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Static analysis&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Repeatable style, bug, and complexity checks&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Often shallow on business context&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Dependency scanning&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Known vulnerability detection&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Does not reason about application-specific misuse&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Security review&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Deep risk analysis&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Expensive and often reserved for larger changes&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Architecture review&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Long-term design quality&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Usually asynchronous and not applied to every PR&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;p&amp;gt;These practices are still necessary. AI-Git-Bot fits as an additional review layer between automated checks and final human approval. It is especially useful because its reviews are:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;triggered intentionally&amp;lt;/strong&amp;gt; by assigning or re-requesting a bot as reviewer,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;configurable&amp;lt;/strong&amp;gt; through reusable system prompt entries,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;contextual&amp;lt;/strong&amp;gt; because the bot reviews the pull-request diff and can answer follow-up questions,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;centralized&amp;lt;/strong&amp;gt; through a gateway that manages Git integrations, AI integrations, prompts, credentials, and sessions,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;portable&amp;lt;/strong&amp;gt; across supported Git platforms and AI providers.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;For ShopFlow, the team keeps its existing CI pipeline:&amp;lt;/p&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;flowchart TD
    Dev[&amp;quot;Developer opens PR&amp;quot;] --&amp;gt; CI[&amp;quot;CI: compile, tests, lint, dependency scan&amp;quot;]
    Dev --&amp;gt; Bots[&amp;quot;AI-Git-Bot reviewer personas&amp;quot;]
    CI --&amp;gt; Status[&amp;quot;Git status checks&amp;quot;]
    Bots --&amp;gt; Reviews[&amp;quot;Persona-specific PR reviews&amp;quot;]
    Status --&amp;gt; HumanReview[&amp;quot;Human review&amp;quot;]
    Reviews --&amp;gt; HumanReview
    HumanReview --&amp;gt; Merge{&amp;quot;Ready to merge?&amp;quot;}
    Merge -- &amp;quot;No&amp;quot; --&amp;gt; Fix[&amp;quot;Developer fixes findings&amp;quot;]
    Fix --&amp;gt; CI
    Merge -- &amp;quot;Yes&amp;quot; --&amp;gt; Main[&amp;quot;Merge to main&amp;quot;]
&amp;lt;/div&amp;gt;&amp;lt;p&amp;gt;The important shift is that code quality is no longer treated as one generic review question. Instead, it becomes a structured set of perspectives.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;3.-analysis-of-possible-solutions&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#3.-analysis-of-possible-solutions&amp;quot;&amp;gt;3. Analysis of Possible Solutions&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;h3 id=&amp;quot;option-a%3A-one-generic-ai-reviewer&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#option-a%3A-one-generic-ai-reviewer&amp;quot;&amp;gt;Option A: One Generic AI Reviewer&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;A team can create one bot with the default review prompt. This is easy and already valuable. The default prompt asks the AI to look for correctness bugs, security vulnerabilities, performance problems, concurrency issues, API or database concerns, missing tests, and maintainability problems.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;For small teams or low-risk repositories, this may be sufficient. The downside is that the reviewer has to cover everything at once. In practice, broad reviews often produce broad findings. They may mention many categories without going deeply into the most important ones.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;option-b%3A-specialized-reviewer-personas&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#option-b%3A-specialized-reviewer-personas&amp;quot;&amp;gt;Option B: Specialized Reviewer Personas&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Instead of one broad reviewer, the team creates multiple bots with different system prompt entries. Each bot focuses on a narrow review mission.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;For ShopFlow, the team configures three personas:&amp;lt;/p&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;Persona&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Bot name in Git&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Primary mission&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Typical findings&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Security Reviewer&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;shopflow-security-ai&amp;lt;/code&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Protect data, identity, permissions, secrets, and inputs&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;IDOR, missing authorization, unsafe token handling, injection risk&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Reliability &amp;amp;amp; Performance Reviewer&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;shopflow-reliability-ai&amp;lt;/code&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Protect production stability and scalability&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;non-idempotent retries, N+1 queries, race conditions, slow queries&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Testability &amp;amp;amp; Maintainability Reviewer&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;shopflow-maintainability-ai&amp;lt;/code&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Protect long-term changeability&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;missing tests, unclear abstractions, excessive coupling, unreadable business rules&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;p&amp;gt;This approach gives the team three independent review passes. Each bot reads the same diff, but its system prompt changes the evaluation lens.&amp;lt;/p&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;graph TD
    PromptSec[&amp;quot;System Prompt Entry:&#92;nSecurity Review&amp;quot;] --&amp;gt; BotSec[&amp;quot;Bot:&#92;nshopflow-security-ai&amp;quot;]
    PromptRel[&amp;quot;System Prompt Entry:&#92;nReliability &amp;amp; Performance Review&amp;quot;] --&amp;gt; BotRel[&amp;quot;Bot:&#92;nshopflow-reliability-ai&amp;quot;]
    PromptMaint[&amp;quot;System Prompt Entry:&#92;nTestability &amp;amp; Maintainability Review&amp;quot;] --&amp;gt; BotMaint[&amp;quot;Bot:&#92;nshopflow-maintainability-ai&amp;quot;]

    AI[&amp;quot;AI Integration&#92;nCloud or local model&amp;quot;] --&amp;gt; BotSec
    AI --&amp;gt; BotRel
    AI --&amp;gt; BotMaint

    GitInt[&amp;quot;Git Integration&#92;nGitea / GitHub / GitLab / Bitbucket&amp;quot;] --&amp;gt; BotSec
    GitInt --&amp;gt; BotRel
    GitInt --&amp;gt; BotMaint

    BotSec --&amp;gt; PR[&amp;quot;Pull request reviews&amp;quot;]
    BotRel --&amp;gt; PR
    BotMaint --&amp;gt; PR
&amp;lt;/div&amp;gt;&amp;lt;h3 id=&amp;quot;option-c%3A-combine-writer-agent%2C-coding-agent%2C-and-review-personas&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#option-c%3A-combine-writer-agent%2C-coding-agent%2C-and-review-personas&amp;quot;&amp;gt;Option C: Combine Writer Agent, Coding Agent, and Review Personas&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;AI-Git-Bot also supports issue-based agent workflows. A writer bot can improve vague issues into structured, testable follow-up issues. A coding bot can implement assigned issues and open pull requests. The reviewer personas then inspect the resulting PR.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;For ShopFlow, this creates a complete quality loop:&amp;lt;/p&amp;gt;
&amp;lt;ol&amp;gt;
&amp;lt;li&amp;gt;A product owner writes a vague issue: “Make checkout faster and support express payment.”&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;A writer bot turns it into an implementation-ready issue with acceptance criteria, non-goals, and test cases.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;A coding agent or developer implements the change.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;The three reviewer personas review the pull request.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;A human reviewer decides what must be fixed before merge.&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;sequenceDiagram
    participant PO as Product Owner
    participant Git as Git Platform
    participant Writer as Writer Bot
    participant Dev as Developer or Coding Agent
    participant Reviewers as Reviewer Personas
    participant Human as Human Reviewer

    PO-&amp;gt;&amp;gt;Git: Create vague ShopFlow checkout issue
    PO-&amp;gt;&amp;gt;Git: Assign writer bot
    Git-&amp;gt;&amp;gt;Writer: Issue assignment webhook
    Writer-&amp;gt;&amp;gt;Git: Ask clarifying questions or create AI Created Issue
    Dev-&amp;gt;&amp;gt;Git: Implement issue and open PR
    Git-&amp;gt;&amp;gt;Reviewers: PR review requested for each persona
    Reviewers-&amp;gt;&amp;gt;Git: Post specialized review feedback
    Human-&amp;gt;&amp;gt;Git: Review findings and approve or request changes
&amp;lt;/div&amp;gt;&amp;lt;h3 id=&amp;quot;recommended-solution&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#recommended-solution&amp;quot;&amp;gt;Recommended Solution&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;For a production project like ShopFlow, the best balance is &amp;lt;strong&amp;gt;Option B plus selected parts of Option C&amp;lt;/strong&amp;gt;:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;Use three reviewer personas for every medium- or high-risk pull request.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Use a writer bot for unclear issues before implementation starts.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Keep CI, dependency scanning, and human review as mandatory gates.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Treat bot findings as structured expert input, not as automatic merge decisions.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;4.-implementing-the-solution&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#4.-implementing-the-solution&amp;quot;&amp;gt;4. Implementing the Solution&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;h3 id=&amp;quot;4.1-configure-ai-git-bot-as-the-gateway&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#4.1-configure-ai-git-bot-as-the-gateway&amp;quot;&amp;gt;4.1 Configure AI-Git-Bot as the Gateway&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;In AI-Git-Bot, each bot connects:&amp;lt;/p&amp;gt;
&amp;lt;ol&amp;gt;
&amp;lt;li&amp;gt;one &amp;lt;strong&amp;gt;Git integration&amp;lt;/strong&amp;gt;,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;one &amp;lt;strong&amp;gt;AI integration&amp;lt;/strong&amp;gt;,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;one &amp;lt;strong&amp;gt;system prompt entry&amp;lt;/strong&amp;gt;,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;one bot identity and webhook URL.&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;p&amp;gt;For ShopFlow, the team creates:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;a Git integration for its Git platform,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;an AI integration for the chosen model provider,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;three system prompt entries,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;three coding bots with review-focused prompts,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;optionally one writer bot for issue refinement.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;The gateway pattern is useful because secrets, prompts, sessions, model choices, and Git-provider details are managed centrally rather than scattered across repositories.&amp;lt;/p&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;flowchart LR
    subgraph Git[&amp;quot;Git System&amp;quot;]
        Repo[&amp;quot;ShopFlow repository&amp;quot;]
        PR[&amp;quot;Pull requests&amp;quot;]
        Issues[&amp;quot;Issues&amp;quot;]
    end

    subgraph Gateway[&amp;quot;AI-Git-Bot Gateway&amp;quot;]
        Bots[&amp;quot;Bots&amp;quot;]
        Prompts[&amp;quot;System prompt entries&amp;quot;]
        Sessions[&amp;quot;Review sessions&amp;quot;]
        Secrets[&amp;quot;Encrypted tokens and API keys&amp;quot;]
    end

    subgraph Providers[&amp;quot;AI Providers&amp;quot;]
        Cloud[&amp;quot;Cloud AI&amp;quot;]
        Local[&amp;quot;Ollama / llama.cpp&amp;quot;]
    end

    Repo &amp;lt;--&amp;gt; Gateway
    PR &amp;lt;--&amp;gt; Gateway
    Issues &amp;lt;--&amp;gt; Gateway
    Bots --&amp;gt; Prompts
    Bots --&amp;gt; Sessions
    Bots --&amp;gt; Secrets
    Gateway &amp;lt;--&amp;gt; Cloud
    Gateway &amp;lt;--&amp;gt; Local
&amp;lt;/div&amp;gt;&amp;lt;h3 id=&amp;quot;4.2-create-the-three-reviewer-personas&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#4.2-create-the-three-reviewer-personas&amp;quot;&amp;gt;4.2 Create the Three Reviewer Personas&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;The system prompts below are intentionally short enough to maintain predictable output, but strict enough to shape each review. In AI-Git-Bot, these would be stored under &amp;lt;strong&amp;gt;System settings → System prompts&amp;lt;/strong&amp;gt; as separate prompt entries. The &amp;lt;strong&amp;gt;Review System-Prompt&amp;lt;/strong&amp;gt; field is the important field for pull-request reviews and PR conversations.&amp;lt;/p&amp;gt;
&amp;lt;h4 id=&amp;quot;persona-1%3A-security-reviewer&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#persona-1%3A-security-reviewer&amp;quot;&amp;gt;Persona 1: Security Reviewer&amp;lt;/a&amp;gt;&amp;lt;/h4&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Purpose:&amp;lt;/strong&amp;gt; identify security defects before they become production vulnerabilities.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Example Review System-Prompt:&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-markdown&amp;quot;&amp;gt;You are a senior application security reviewer for the ShopFlow e-commerce platform.

Review the pull request diff only from a security and abuse-resistance perspective.
Focus on authentication, authorization, tenant/user isolation, payment data handling,
input validation, injection risks, secrets, logging of sensitive data, dependency exposure,
and prompt-injection or instruction-handling risks.

Prioritize findings that could expose customer data, payment data, admin actions,
or internal credentials. Do not invent vulnerabilities that are not supported by the diff.

Format the review as:
1. Blocking security issues
2. Non-blocking hardening suggestions
3. Security tests to add
4. Overall security assessment
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;How this results in the Git system:&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;The team creates a bot named &amp;lt;code&amp;gt;shopflow-security-ai&amp;lt;/code&amp;gt;.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;The bot uses the &amp;lt;code&amp;gt;Security Review&amp;lt;/code&amp;gt; system prompt entry.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;The Git provider is configured with the bot’s webhook URL.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;When &amp;lt;code&amp;gt;shopflow-security-ai&amp;lt;/code&amp;gt; is assigned as reviewer, AI-Git-Bot posts a security-focused review.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Developers can ask follow-up questions in the PR with &amp;lt;code&amp;gt;@shopflow-security-ai&amp;lt;/code&amp;gt;.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;In the express-checkout PR, this bot might comment:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;“Blocking: the saved-card lookup uses &amp;lt;code&amp;gt;cardId&amp;lt;/code&amp;gt; without verifying ownership by the current user.”&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;“Add a regression test proving user A cannot use user B’s saved payment method.”&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;“Avoid logging payment-provider request payloads because they may contain sensitive identifiers.”&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h4 id=&amp;quot;persona-2%3A-reliability-%26-performance-reviewer&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#persona-2%3A-reliability-%26-performance-reviewer&amp;quot;&amp;gt;Persona 2: Reliability &amp;amp;amp; Performance Reviewer&amp;lt;/a&amp;gt;&amp;lt;/h4&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Purpose:&amp;lt;/strong&amp;gt; prevent production incidents, scalability bottlenecks, and operationally unsafe behavior.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Example Review System-Prompt:&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-markdown&amp;quot;&amp;gt;You are a reliability and performance reviewer for the ShopFlow e-commerce platform.

Review the pull request diff for production stability, scalability, latency, resource usage,
database efficiency, concurrency, transaction boundaries, retry behavior, idempotency,
timeouts, fallback behavior, and observability.

Prioritize issues that can cause outages, duplicate orders, retry storms, slow checkout,
deadlocks, memory pressure, or hard-to-debug production failures. Be concrete and explain
what load or failure mode would trigger the issue.

Format the review as:
1. Blocking reliability/performance issues
2. Non-blocking operational improvements
3. Load, concurrency, or failure-mode tests to add
4. Overall production-readiness assessment
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;How this results in the Git system:&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;The team creates a bot named &amp;lt;code&amp;gt;shopflow-reliability-ai&amp;lt;/code&amp;gt;.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;The bot uses the &amp;lt;code&amp;gt;Reliability &amp;amp;amp; Performance Review&amp;lt;/code&amp;gt; prompt entry.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;The bot is requested on PRs touching checkout, database access, async jobs, or integrations.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;AI-Git-Bot fetches the diff and posts findings as a PR review or review comments.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Follow-up discussions stay inside the Git pull request.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;In the express-checkout PR, this bot might comment:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;“Blocking: payment retries are not idempotent; a timeout after provider success could create duplicate orders.”&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;“The new discount query is executed once per cart item and may create an N+1 query pattern.”&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;“Add a test for concurrent checkout submissions with the same cart.”&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h4 id=&amp;quot;persona-3%3A-testability-%26-maintainability-reviewer&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#persona-3%3A-testability-%26-maintainability-reviewer&amp;quot;&amp;gt;Persona 3: Testability &amp;amp;amp; Maintainability Reviewer&amp;lt;/a&amp;gt;&amp;lt;/h4&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Purpose:&amp;lt;/strong&amp;gt; protect long-term code health, reduce regression risk, and improve developer velocity.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Example Review System-Prompt:&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-markdown&amp;quot;&amp;gt;You are a testability and maintainability reviewer for the ShopFlow e-commerce platform.

Review the pull request diff for clarity, cohesive design, readable business rules,
low coupling, meaningful naming, consistency with surrounding code, regression risk,
and sufficient automated tests. Focus on whether future developers can safely understand,
modify, and test the change.

Avoid minor style nitpicks unless they materially affect readability or consistency.
Prefer actionable suggestions with concrete refactoring or test recommendations.

Format the review as:
1. Blocking maintainability or testability issues
2. Non-blocking design/readability suggestions
3. Missing or weak tests
4. Overall maintainability assessment
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;How this results in the Git system:&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;The team creates a bot named &amp;lt;code&amp;gt;shopflow-maintainability-ai&amp;lt;/code&amp;gt;.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;The bot uses the &amp;lt;code&amp;gt;Testability &amp;amp;amp; Maintainability Review&amp;lt;/code&amp;gt; prompt entry.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;The bot is assigned to feature PRs and refactoring PRs.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Its findings appear in the same Git review discussion as human comments, CI results, and other bot reviews.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Developers can ask targeted questions such as &amp;lt;code&amp;gt;@shopflow-maintainability-ai how would you split this service?&amp;lt;/code&amp;gt;.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;In the express-checkout PR, this bot might comment:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;“The discount-stacking rules are embedded in &amp;lt;code&amp;gt;CheckoutService&amp;lt;/code&amp;gt;; extract a &amp;lt;code&amp;gt;DiscountPolicy&amp;lt;/code&amp;gt; so each rule can be tested independently.”&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;“The tests cover the happy path but not conflicting promotions or expired discounts.”&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;“The migration changes order state semantics; add an integration test for existing pending orders.”&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h3 id=&amp;quot;4.3-define-review-routing-rules&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#4.3-define-review-routing-rules&amp;quot;&amp;gt;4.3 Define Review Routing Rules&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;The team should not request every persona on every tiny change. A practical routing matrix keeps review signal high.&amp;lt;/p&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;Change type&amp;lt;/th&amp;gt;
&amp;lt;th style=&amp;quot;text-align:right&amp;quot;&amp;gt;Security&amp;lt;/th&amp;gt;
&amp;lt;th style=&amp;quot;text-align:right&amp;quot;&amp;gt;Reliability &amp;amp;amp; Performance&amp;lt;/th&amp;gt;
&amp;lt;th style=&amp;quot;text-align:right&amp;quot;&amp;gt;Testability &amp;amp;amp; Maintainability&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Authentication or authorization&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Required&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Optional&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Required&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Payment or checkout flow&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Required&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Required&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Required&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Database migration&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Optional&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Required&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Required&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;UI-only text change&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Optional&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Optional&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Optional&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Business-rule refactoring&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Optional&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Optional&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Required&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Background jobs or queues&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Optional&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Required&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Required&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Dependency upgrade&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Required for security-sensitive libs&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Optional&amp;lt;/td&amp;gt;
&amp;lt;td style=&amp;quot;text-align:right&amp;quot;&amp;gt;Optional&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;p&amp;gt;This matrix can be implemented as a team convention. For example, pull-request templates can contain a checklist asking authors which AI reviewer personas they requested.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;4.4-keep-review-output-actionable&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#4.4-keep-review-output-actionable&amp;quot;&amp;gt;4.4 Keep Review Output Actionable&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;The most important prompt-design rule is to force useful structure. Each persona should separate:&amp;lt;/p&amp;gt;
&amp;lt;ol&amp;gt;
&amp;lt;li&amp;gt;blocking issues,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;non-blocking suggestions,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;recommended tests,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;an overall assessment.&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;p&amp;gt;This maps well to how teams already work in Git:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;blocking issues become required fixes before merge,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;non-blocking suggestions can become follow-up issues,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;test recommendations become new commits in the PR,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;overall assessments help the human reviewer judge residual risk.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;flowchart TD
    Finding[&amp;quot;Bot finding&amp;quot;] --&amp;gt; Classify{&amp;quot;Finding type&amp;quot;}
    Classify -- &amp;quot;Blocking&amp;quot; --&amp;gt; FixNow[&amp;quot;Fix in current PR&amp;quot;]
    Classify -- &amp;quot;Non-blocking&amp;quot; --&amp;gt; FollowUp[&amp;quot;Create follow-up issue if valuable&amp;quot;]
    Classify -- &amp;quot;Test gap&amp;quot; --&amp;gt; AddTests[&amp;quot;Add or improve tests&amp;quot;]
    Classify -- &amp;quot;Uncertain&amp;quot; --&amp;gt; Ask[&amp;quot;Ask @bot or human owner for clarification&amp;quot;]
    FixNow --&amp;gt; NewCommit[&amp;quot;Push new commit&amp;quot;]
    AddTests --&amp;gt; NewCommit
    NewCommit --&amp;gt; Rerequest[&amp;quot;Re-request relevant bot review&amp;quot;]
&amp;lt;/div&amp;gt;&amp;lt;h3 id=&amp;quot;4.5-evaluate-and-improve-the-prompts&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#4.5-evaluate-and-improve-the-prompts&amp;quot;&amp;gt;4.5 Evaluate and Improve the Prompts&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Prompts are part of the quality system and should be reviewed like code. Before rolling out new reviewer personas broadly, the team should:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;collect representative historical PR diffs,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;define a scoring rubric for correctness, security awareness, actionability, concision, and false positives,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;compare the default prompt with the specialized prompt,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;test malicious or confusing PR content to ensure the bot ignores instructions inside diffs or comments,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;start with one repository or one team before expanding.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;AI-Git-Bot’s reusable system prompt entries make this manageable. A team can clone a prompt entry, adjust the wording, assign it to one bot, observe review quality, and then promote it more broadly.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;4.6-example-end-to-end-flow-in-shopflow&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#4.6-example-end-to-end-flow-in-shopflow&amp;quot;&amp;gt;4.6 Example End-to-End Flow in ShopFlow&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;The following scenario shows the system in practice:&amp;lt;/p&amp;gt;
&amp;lt;ol&amp;gt;
&amp;lt;li&amp;gt;A developer opens a PR for express checkout.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;CI starts automatically.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;The developer requests reviews from:
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;shopflow-security-ai&amp;lt;/code&amp;gt;,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;shopflow-reliability-ai&amp;lt;/code&amp;gt;,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;shopflow-maintainability-ai&amp;lt;/code&amp;gt;.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;AI-Git-Bot receives review-request webhooks.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Each bot fetches the PR diff and uses its configured system prompt.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Each bot posts a focused review.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;The developer fixes blocking issues and asks follow-up questions with bot mentions.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;The developer re-requests only the relevant bot reviews.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;A human reviewer uses the bot reviews plus CI status to make the final decision.&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;sequenceDiagram
    participant Dev as Developer
    participant Git as Git Platform
    participant Gateway as AI-Git-Bot
    participant Sec as Security AI
    participant Rel as Reliability AI
    participant Maint as Maintainability AI
    participant Human as Human Reviewer

    Dev-&amp;gt;&amp;gt;Git: Open express-checkout PR
    Dev-&amp;gt;&amp;gt;Git: Request three AI reviewers
    Git-&amp;gt;&amp;gt;Gateway: Webhook for review requests
    Gateway-&amp;gt;&amp;gt;Git: Fetch PR diff and context
    Gateway-&amp;gt;&amp;gt;Sec: Send diff with security prompt
    Gateway-&amp;gt;&amp;gt;Rel: Send diff with reliability prompt
    Gateway-&amp;gt;&amp;gt;Maint: Send diff with maintainability prompt
    Sec--&amp;gt;&amp;gt;Gateway: Security review
    Rel--&amp;gt;&amp;gt;Gateway: Reliability review
    Maint--&amp;gt;&amp;gt;Gateway: Maintainability review
    Gateway-&amp;gt;&amp;gt;Git: Post reviews/comments
    Dev-&amp;gt;&amp;gt;Git: Push fixes and ask @bot follow-ups
    Human-&amp;gt;&amp;gt;Git: Final approval or request changes
&amp;lt;/div&amp;gt;&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;conclusion&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#conclusion&amp;quot;&amp;gt;Conclusion&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;AI-Git-Bot improves software quality when it is treated as a configurable review gateway rather than a single generic chatbot. By creating multiple reviewer personas, teams can make quality concerns explicit and repeatable:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;the &amp;lt;strong&amp;gt;Security Reviewer&amp;lt;/strong&amp;gt; protects users, secrets, permissions, and sensitive data,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;the &amp;lt;strong&amp;gt;Reliability &amp;amp;amp; Performance Reviewer&amp;lt;/strong&amp;gt; protects production stability and scalability,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;the &amp;lt;strong&amp;gt;Testability &amp;amp;amp; Maintainability Reviewer&amp;lt;/strong&amp;gt; protects long-term code health.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;In the ShopFlow example, the same pull request receives three independent perspectives before a human reviewer makes the merge decision. This creates a stronger review process without replacing existing best practices such as CI, tests, static analysis, dependency scanning, and human ownership.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The practical result in the Git system is simple: different bot identities appear as different reviewers, each guided by its own system prompt, each posting focused feedback directly into the pull request. Over time, this turns quality standards from informal reviewer preferences into a visible, repeatable, and continuously improvable review system.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;short-note%3A-getting-and-installing-the-gateway&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#short-note%3A-getting-and-installing-the-gateway&amp;quot;&amp;gt;Short Note: Getting and Installing the Gateway&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;AI-Git-Bot is available from the project repository at &amp;lt;a href=&amp;quot;https://github.com/tmseidel/ai-git-bot&amp;quot;&amp;gt;https://github.com/tmseidel/ai-git-bot&amp;lt;/a&amp;gt;. The documented Docker image is &amp;lt;code&amp;gt;tmseidel/ai-git-bot:latest&amp;lt;/code&amp;gt;, published on Docker Hub as &amp;lt;code&amp;gt;tmseidel/ai-git-bot&amp;lt;/code&amp;gt;.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;For a quick installation, clone the repository and start the provided Docker Compose setup:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;git clone https://github.com/tmseidel/ai-git-bot.git
cd ai-git-bot
docker compose up -d
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;The gateway then runs as a web application, typically on port &amp;lt;code&amp;gt;8080&amp;lt;/code&amp;gt;, where administrators can create AI integrations, Git integrations, system prompt entries, and reviewer bots.&amp;lt;/p&amp;gt;
</content>
    <author>
      <name>Tom Seidel</name>
    </author>
    <category term="AI"/>
    <category term="Code Review"/>
    <category term="Best Practices"/>
  </entry>
  <entry>
    <title>Running Docker Engine in WSL2 alongside Docker Desktop for Windows</title>
    <link href="https://remus-software.org/articles/kb_two_docker/" rel="alternate" type="text/html"/>
    <id>https://remus-software.org/articles/kb_two_docker/</id>
    <published>2026-04-11T00:00:00.000Z</published>
    <updated>2026-04-11T00:00:00.000Z</updated>
    <summary>A practical guide to running both Docker Engine natively in WSL2 and Docker Desktop for Windows in complete isolation — diagnosing conflicts, fixing networking issues, and configuring coexistence.</summary>
    <content type="html">&amp;lt;p&amp;gt;This guide explains why you might need both Docker Engine (installed natively in WSL2) and Docker Desktop for Windows, what conflicts arise when both are present, and how to configure them to run in complete isolation.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;why-you-need-both&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#why-you-need-both&amp;quot;&amp;gt;Why You Need Both&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;h3 id=&amp;quot;docker-engine-in-wsl2-%E2%80%94-for-local-development&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#docker-engine-in-wsl2-%E2%80%94-for-local-development&amp;quot;&amp;gt;Docker Engine in WSL2 — For Local Development&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;For local development and testing, you run containers directly in WSL2 using the native Docker Engine (installed via &amp;lt;code&amp;gt;apt&amp;lt;/code&amp;gt; from Docker’s official repository).&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Key advantages:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;Containers share the WSL2 network stack — &amp;lt;code&amp;gt;host.docker.internal:host-gateway&amp;lt;/code&amp;gt; routes to the real host IP (&amp;lt;code&amp;gt;172.x.x.x&amp;lt;/code&amp;gt;), so containers can reach services running on the host (e.g., a web server on port 8080)&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Fast, lightweight, and fully Linux-native&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;No dependency on a Windows GUI application&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h3 id=&amp;quot;docker-desktop-%E2%80%94-why-it-might-be-installed&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#docker-desktop-%E2%80%94-why-it-might-be-installed&amp;quot;&amp;gt;Docker Desktop — Why It Might Be Installed&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;There are several reasons why you might also have Docker Desktop for Windows installed:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Remote SSH deployments from Windows:&amp;lt;/strong&amp;gt; Docker Compose supports deploying to remote servers via SSH contexts (&amp;lt;code&amp;gt;docker context create my-server --docker &amp;amp;quot;host=ssh://user@server&amp;amp;quot;&amp;lt;/code&amp;gt;). When running this from a &amp;lt;strong&amp;gt;Windows terminal&amp;lt;/strong&amp;gt; (PowerShell, CMD), Docker Desktop provides the Docker daemon. While SSH-based Docker contexts work perfectly fine from native Linux (including WSL2), some workflows require running &amp;lt;code&amp;gt;docker compose&amp;lt;/code&amp;gt; from the Windows side — e.g., CI/CD scripts running on Windows, or using Windows-native tooling.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Windows container support:&amp;lt;/strong&amp;gt; Docker Desktop can run Windows containers (for .NET Framework apps, IIS, etc.), which Docker Engine in WSL2 cannot.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;GUI-based management:&amp;lt;/strong&amp;gt; Docker Desktop provides a graphical dashboard for container management, image browsing, and resource configuration.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Corporate/team requirement:&amp;lt;/strong&amp;gt; Your organization may standardize on Docker Desktop for license compliance or support reasons.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Kubernetes integration:&amp;lt;/strong&amp;gt; Docker Desktop includes a single-node Kubernetes cluster for local testing.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;The key point is: &amp;lt;strong&amp;gt;you don’t need to uninstall Docker Desktop&amp;lt;/strong&amp;gt; — you just need to prevent it from interfering with the Docker Engine in WSL2.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;summary&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#summary&amp;quot;&amp;gt;Summary&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;Use Case&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Tool&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Where to Run&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Local development containers&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Docker Engine in WSL2&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;WSL2 terminal&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Host service + container integration testing&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Docker Engine in WSL2&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;WSL2 terminal&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;SSH remote deployment from WSL2&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Docker Engine in WSL2&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;WSL2 terminal&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Windows containers, GUI management, K8s&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Docker Desktop&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Windows terminal / PowerShell&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Corporate/team-mandated Docker usage on Windows&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Docker Desktop&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Windows terminal / PowerShell&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;h2 id=&amp;quot;the-conflict&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-conflict&amp;quot;&amp;gt;The Conflict&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;When Docker Desktop is installed, it enables &amp;lt;strong&amp;gt;WSL2 Integration&amp;lt;/strong&amp;gt; by default (Settings → Resources → WSL Integration). This injects Docker Desktop’s own binaries and configuration into your WSL2 distro, overriding the native Docker Engine. The two cannot coexist when this integration is active.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;what-goes-wrong&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#what-goes-wrong&amp;quot;&amp;gt;What Goes Wrong&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;Symptom&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Cause&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;host.docker.internal&amp;lt;/code&amp;gt; resolves to &amp;lt;code&amp;gt;192.168.65.254&amp;lt;/code&amp;gt; instead of &amp;lt;code&amp;gt;172.17.0.1&amp;lt;/code&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Docker Desktop routes through its internal VM gateway, not the WSL2 bridge&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;curl http://host.docker.internal:&amp;amp;lt;port&amp;amp;gt;&amp;lt;/code&amp;gt; fails from containers&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;The VM gateway (&amp;lt;code&amp;gt;192.168.65.254&amp;lt;/code&amp;gt;) does not forward to WSL2 host ports&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;/run/docker.sock&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;/var/run/docker.sock&amp;lt;/code&amp;gt; does not exist&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Docker Desktop replaces the socket with its own, and the native &amp;lt;code&amp;gt;docker.socket&amp;lt;/code&amp;gt; systemd unit gets confused&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;docker context ls&amp;lt;/code&amp;gt; shows &amp;lt;code&amp;gt;desktop-linux&amp;lt;/code&amp;gt; context&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Docker Desktop injected its context into WSL2&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;docker ps&amp;lt;/code&amp;gt; shows different containers than expected&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;CLI is talking to Docker Desktop’s daemon instead of the local one&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Webhooks or callbacks from containers never reach a host service&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Container → &amp;lt;code&amp;gt;host.docker.internal&amp;lt;/code&amp;gt; → Docker Desktop VM → &amp;lt;strong&amp;gt;dead end&amp;lt;/strong&amp;gt; (not forwarded to WSL2)&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;--network host&amp;lt;/code&amp;gt; still doesn’t reach host services&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;On Docker Desktop, &amp;lt;code&amp;gt;host&amp;lt;/code&amp;gt; mode connects to the VM’s network, not WSL2’s&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;h3 id=&amp;quot;understanding-the-routing-problem&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#understanding-the-routing-problem&amp;quot;&amp;gt;Understanding the Routing Problem&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;When a container needs to reach a service on your WSL2 host (e.g., a web application running on port 8080), it uses &amp;lt;code&amp;gt;host.docker.internal&amp;lt;/code&amp;gt;. How this hostname resolves determines whether the connection works.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;With Docker Desktop WSL2 Integration (broken):&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code&amp;gt;Container
  → host.docker.internal (192.168.65.254)
    → Docker Desktop VM
      → Windows host
        ✘ WSL2 host (not forwarded)
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;The traffic goes through Docker Desktop’s internal VM, reaches the Windows host, but is never forwarded to the WSL2 instance where your service is actually running.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;With native Docker Engine in WSL2 (working):&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code&amp;gt;Container
  → host.docker.internal (172.17.0.1)
    → WSL2 host ✔ (direct bridge route)
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;The traffic goes directly over the Docker bridge network to the WSL2 host — no intermediate VM, no forwarding issues.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;diagnosis&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#diagnosis&amp;quot;&amp;gt;Diagnosis&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;If you suspect Docker Desktop is interfering with your WSL2 Docker Engine, run these commands inside your WSL2 terminal:&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;1.-check-which-docker-binary-is-active&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#1.-check-which-docker-binary-is-active&amp;quot;&amp;gt;1. Check which Docker binary is active&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;which docker
ls -la $(which docker)
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;✅ Native: &amp;lt;code&amp;gt;/usr/bin/docker&amp;lt;/code&amp;gt; owned by &amp;lt;code&amp;gt;root&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;❌ Desktop override: symlink to &amp;lt;code&amp;gt;/mnt/wsl/docker-desktop/...&amp;lt;/code&amp;gt; or similar&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h3 id=&amp;quot;2.-check-docker-contexts&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#2.-check-docker-contexts&amp;quot;&amp;gt;2. Check Docker contexts&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;docker context ls
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;✅ Only &amp;lt;code&amp;gt;default&amp;lt;/code&amp;gt; pointing to &amp;lt;code&amp;gt;unix:///var/run/docker.sock&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;❌ A &amp;lt;code&amp;gt;desktop-linux&amp;lt;/code&amp;gt; context exists (Docker Desktop injected it)&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h3 id=&amp;quot;3.-check-the-docker-socket&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#3.-check-the-docker-socket&amp;quot;&amp;gt;3. Check the Docker socket&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;ls -la /run/docker.sock
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;✅ Socket file exists: &amp;lt;code&amp;gt;srw-rw---- root docker ... /run/docker.sock&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;❌ File does not exist, or is a symlink to a Docker Desktop path&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h3 id=&amp;quot;4.-check-host.docker.internal-from-inside-a-container&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#4.-check-host.docker.internal-from-inside-a-container&amp;quot;&amp;gt;4. Check &amp;lt;code&amp;gt;host.docker.internal&amp;lt;/code&amp;gt; from inside a container&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;docker run --rm --add-host=host.docker.internal:host-gateway &#92;
  busybox cat /etc/hosts | grep host.docker
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;✅ &amp;lt;code&amp;gt;172.17.0.1  host.docker.internal&amp;lt;/code&amp;gt; (WSL2 Docker bridge gateway)&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;❌ &amp;lt;code&amp;gt;192.168.65.254  host.docker.internal&amp;lt;/code&amp;gt; (Docker Desktop VM gateway)&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h3 id=&amp;quot;5.-test-container-to-host-connectivity&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#5.-test-container-to-host-connectivity&amp;quot;&amp;gt;5. Test container-to-host connectivity&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;If you have a service running on the host (e.g., on port 8080):&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;docker run --rm --add-host=host.docker.internal:host-gateway &#92;
  curlimages/curl curl -s -o /dev/null -w &amp;amp;quot;HTTP %{http_code}&amp;amp;quot; &#92;
  --connect-timeout 3 http://host.docker.internal:8080/
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;✅ &amp;lt;code&amp;gt;HTTP 200&amp;lt;/code&amp;gt; (or any non-zero HTTP status)&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;❌ &amp;lt;code&amp;gt;HTTP 000&amp;lt;/code&amp;gt; (connection failed)&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h2 id=&amp;quot;resolution&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#resolution&amp;quot;&amp;gt;Resolution&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;h3 id=&amp;quot;step-1%3A-disable-docker-desktop-wsl2-integration&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#step-1%3A-disable-docker-desktop-wsl2-integration&amp;quot;&amp;gt;Step 1: Disable Docker Desktop WSL2 Integration&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;This is the critical step. In Docker Desktop:&amp;lt;/p&amp;gt;
&amp;lt;ol&amp;gt;
&amp;lt;li&amp;gt;Open &amp;lt;strong&amp;gt;Docker Desktop&amp;lt;/strong&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Go to &amp;lt;strong&amp;gt;Settings → Resources → WSL Integration&amp;lt;/strong&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Disable&amp;lt;/strong&amp;gt; “Enable integration with my default WSL distro”&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Disable&amp;lt;/strong&amp;gt; the toggle for your Ubuntu (or other) WSL2 distro&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Click &amp;lt;strong&amp;gt;Apply &amp;amp;amp; Restart&amp;lt;/strong&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;p&amp;gt;Docker Desktop will continue working from Windows terminals (PowerShell, CMD), but it will stop injecting itself into WSL2.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;step-2%3A-remove-the-docker-desktop-context-from-wsl2&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#step-2%3A-remove-the-docker-desktop-context-from-wsl2&amp;quot;&amp;gt;Step 2: Remove the Docker Desktop context from WSL2&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;docker context rm desktop-linux
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;Verify only the native context remains:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;docker context ls
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;Expected output:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code&amp;gt;NAME        DESCRIPTION                               DOCKER ENDPOINT
default *   Current DOCKER_HOST based configuration   unix:///var/run/docker.sock
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;h3 id=&amp;quot;step-3%3A-restart-the-native-docker-engine&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#step-3%3A-restart-the-native-docker-engine&amp;quot;&amp;gt;Step 3: Restart the native Docker Engine&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;If the Docker socket is missing after disabling Desktop integration:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;sudo systemctl stop docker docker.socket
sudo systemctl start docker.socket
sudo systemctl start docker
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;Verify the socket exists:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;ls -la /run/docker.sock
# Expected: srw-rw---- 1 root docker 0 ... /run/docker.sock
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;h3 id=&amp;quot;step-4%3A-verify-container-to-host-connectivity&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#step-4%3A-verify-container-to-host-connectivity&amp;quot;&amp;gt;Step 4: Verify container-to-host connectivity&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Start any service on the host (or use a simple test server), then:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;# Quick test: start a temporary HTTP server on the host
python3 -m http.server 9999 &amp;amp;amp;
TEST_PID=$!

# Test from a container
docker run --rm --add-host=host.docker.internal:host-gateway &#92;
  curlimages/curl curl -s -o /dev/null -w &amp;amp;quot;HTTP %{http_code}&amp;amp;quot; &#92;
  --connect-timeout 3 http://host.docker.internal:9999/

# Clean up
kill $TEST_PID
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;Expected result: &amp;lt;code&amp;gt;HTTP 200&amp;lt;/code&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;step-5%3A-recreate-your-running-containers&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#step-5%3A-recreate-your-running-containers&amp;quot;&amp;gt;Step 5: Recreate your running containers&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Existing containers were created under the old networking configuration. You need to recreate them for the fix to take effect:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;docker compose -f &amp;amp;lt;your-compose-file.yml&amp;amp;gt; down
docker compose -f &amp;amp;lt;your-compose-file.yml&amp;amp;gt; up -d
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Note:&amp;lt;/strong&amp;gt; If your compose file uses named volumes and you’re also changing container image versions, you may need &amp;lt;code&amp;gt;down -v&amp;lt;/code&amp;gt; to remove old volumes. Only use &amp;lt;code&amp;gt;-v&amp;lt;/code&amp;gt; if you don’t mind losing persisted data.&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;h2 id=&amp;quot;running-in-isolation&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#running-in-isolation&amp;quot;&amp;gt;Running in Isolation&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;After completing the resolution steps, the two Docker installations run independently:&amp;lt;/p&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Docker Engine (WSL2)&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Docker Desktop (Windows)&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Access from&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;WSL2 terminal (&amp;lt;code&amp;gt;bash&amp;lt;/code&amp;gt;)&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Windows terminal (PowerShell, CMD)&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Daemon&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;dockerd&amp;lt;/code&amp;gt; via systemd in WSL2&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Docker Desktop VM&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Socket&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;/run/docker.sock&amp;lt;/code&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;npipe:////./pipe/docker_engine&amp;lt;/code&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Containers&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Separate set&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Separate set&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Networks&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;WSL2 bridge (&amp;lt;code&amp;gt;172.17.0.0/16&amp;lt;/code&amp;gt;)&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Desktop VM bridge (&amp;lt;code&amp;gt;192.168.65.0/24&amp;lt;/code&amp;gt;)&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;&amp;lt;code&amp;gt;host.docker.internal&amp;lt;/code&amp;gt;&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;172.17.0.1&amp;lt;/code&amp;gt; (WSL2 host)&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;192.168.65.254&amp;lt;/code&amp;gt; (Windows host)&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Use for&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Local dev, testing, SSH deployments&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Windows containers, GUI, K8s&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;h3 id=&amp;quot;using-docker-desktop-from-windows&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#using-docker-desktop-from-windows&amp;quot;&amp;gt;Using Docker Desktop from Windows&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;When you need Docker Desktop features (Windows containers, GUI, Kubernetes), open &amp;lt;strong&amp;gt;PowerShell&amp;lt;/strong&amp;gt; (not WSL2):&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-powershell&amp;quot;&amp;gt;# In PowerShell / Windows Terminal
docker ps
docker compose -f docker-compose.yml up -d
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;h3 id=&amp;quot;using-docker-engine-from-wsl2&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#using-docker-engine-from-wsl2&amp;quot;&amp;gt;Using Docker Engine from WSL2&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;For local development, use your WSL2 terminal as usual:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;# In WSL2 bash
docker compose -f docker-compose.yml up -d
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;h2 id=&amp;quot;ensuring-docker-starts-on-wsl2-boot&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#ensuring-docker-starts-on-wsl2-boot&amp;quot;&amp;gt;Ensuring Docker Starts on WSL2 Boot&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;WSL2 doesn’t always start systemd services automatically. To ensure Docker Engine is available when you open a WSL2 terminal:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;# Enable Docker to start with systemd
sudo systemctl enable docker.service
sudo systemctl enable docker.socket
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;If your WSL2 distro doesn’t have systemd enabled, add this to &amp;lt;code&amp;gt;/etc/wsl.conf&amp;lt;/code&amp;gt;:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-ini&amp;quot;&amp;gt;[boot]
systemd=true
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;Then restart WSL2 from PowerShell:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-powershell&amp;quot;&amp;gt;wsl --shutdown
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;h2 id=&amp;quot;troubleshooting&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#troubleshooting&amp;quot;&amp;gt;Troubleshooting&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;h3 id=&amp;quot;%E2%80%9Cinteractive-authentication-required%E2%80%9D-when-starting-docker&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#%E2%80%9Cinteractive-authentication-required%E2%80%9D-when-starting-docker&amp;quot;&amp;gt;“Interactive authentication required” when starting Docker&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code&amp;gt;Failed to start docker.service: Interactive authentication required.
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;You ran &amp;lt;code&amp;gt;systemctl start docker&amp;lt;/code&amp;gt; without &amp;lt;code&amp;gt;sudo&amp;lt;/code&amp;gt;. Use:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;sudo systemctl start docker
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;To avoid needing &amp;lt;code&amp;gt;sudo&amp;lt;/code&amp;gt; for Docker commands (not for systemctl), add your user to the &amp;lt;code&amp;gt;docker&amp;lt;/code&amp;gt; group:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;sudo usermod -aG docker $USER
newgrp docker
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;h3 id=&amp;quot;docker-socket-disappears-after-reboot&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#docker-socket-disappears-after-reboot&amp;quot;&amp;gt;Docker socket disappears after reboot&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Docker Desktop’s WSL2 integration was re-enabled (e.g., after a Docker Desktop update), or the systemd socket unit didn’t start. Check and fix:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;# Check if Desktop integration is back
docker context ls  # Should NOT show desktop-linux

# Restart the socket
sudo systemctl restart docker.socket docker
ls -la /run/docker.sock
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;If &amp;lt;code&amp;gt;desktop-linux&amp;lt;/code&amp;gt; reappeared, Docker Desktop re-enabled WSL2 integration — repeat &amp;lt;a href=&amp;quot;#step-1-disable-docker-desktop-wsl2-integration&amp;quot;&amp;gt;Step 1&amp;lt;/a&amp;gt; of the resolution.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;container-can-ping-host.docker.internal-but-can%E2%80%99t-connect-to-a-port&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#container-can-ping-host.docker.internal-but-can%E2%80%99t-connect-to-a-port&amp;quot;&amp;gt;Container can ping &amp;lt;code&amp;gt;host.docker.internal&amp;lt;/code&amp;gt; but can’t connect to a port&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;The hostname resolves but the port is unreachable. Verify:&amp;lt;/p&amp;gt;
&amp;lt;ol&amp;gt;
&amp;lt;li&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;The service is actually running on the host:&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;curl http://localhost:&amp;amp;lt;port&amp;amp;gt;/
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Check what IP the container sees:&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;docker exec &amp;amp;lt;container-name&amp;amp;gt; cat /etc/hosts | grep host.docker
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;If &amp;lt;code&amp;gt;192.168.65.254&amp;lt;/code&amp;gt; → Docker Desktop is still interfering (see &amp;lt;a href=&amp;quot;#resolution&amp;quot;&amp;gt;Resolution&amp;lt;/a&amp;gt;)&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;If &amp;lt;code&amp;gt;172.17.0.1&amp;lt;/code&amp;gt; → the service might not be running, might be bound to &amp;lt;code&amp;gt;127.0.0.1&amp;lt;/code&amp;gt; only, or a firewall is blocking&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Verify the service listens on all interfaces:&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;ss -tlnp | grep &amp;amp;lt;port&amp;amp;gt;
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;✅ &amp;lt;code&amp;gt;*:&amp;amp;lt;port&amp;amp;gt;&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;0.0.0.0:&amp;amp;lt;port&amp;amp;gt;&amp;lt;/code&amp;gt; — listening on all interfaces&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;❌ &amp;lt;code&amp;gt;127.0.0.1:&amp;amp;lt;port&amp;amp;gt;&amp;lt;/code&amp;gt; — only localhost; reconfigure the service to bind to &amp;lt;code&amp;gt;0.0.0.0&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;h3 id=&amp;quot;docker-compose-shows-containers-from-the-wrong-docker&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#docker-compose-shows-containers-from-the-wrong-docker&amp;quot;&amp;gt;&amp;lt;code&amp;gt;docker compose&amp;lt;/code&amp;gt; shows containers from the wrong Docker&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;If &amp;lt;code&amp;gt;docker ps&amp;lt;/code&amp;gt; in WSL2 shows containers you created in Docker Desktop (or vice versa), the wrong context is active:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;docker context ls       # Check which context is active (marked with *)
docker context use default   # Switch to native WSL2 Docker
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;h3 id=&amp;quot;docker-desktop-updates-re-enable-wsl2-integration&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#docker-desktop-updates-re-enable-wsl2-integration&amp;quot;&amp;gt;Docker Desktop updates re-enable WSL2 integration&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Docker Desktop may re-enable WSL2 integration after updates. If things suddenly break again after a Docker Desktop update, re-check Settings → Resources → WSL Integration and disable it again. Consider disabling Docker Desktop auto-updates if this becomes a recurring issue.&amp;lt;/p&amp;gt;
</content>
    <author>
      <name>Tom Seidel</name>
    </author>
    <category term="Docker"/>
    <category term="WSL2"/>
    <category term="DevOps"/>
    <category term="Windows"/>
  </entry>
  <entry>
    <title>Building an AI Agent for Code Generation: Lessons from 13 Iterations</title>
    <link href="https://remus-software.org/articles/how-to-develop-an-ai-agent/" rel="alternate" type="text/html"/>
    <id>https://remus-software.org/articles/how-to-develop-an-ai-agent/</id>
    <published>2026-04-07T00:00:00.000Z</published>
    <updated>2026-04-07T00:00:00.000Z</updated>
    <summary>Practical lessons learned from developing an agentic AI system that generates source code — from naive prompting to a robust, tool-using agent with a generic loop, schema-validated plans, provider-native function calls, and an optional self-critique step.</summary>
    <content type="html">&amp;lt;p&amp;gt;Building software with an AI agent at its core is fundamentally different from traditional application development. Over the course of thirteen iterations — nine on the agent’s behaviour, four more on the surrounding &amp;lt;em&amp;gt;architecture&amp;lt;/em&amp;gt; of the host — I developed an AI agent that reads GitHub/Gitea issues, generates implementation code, validates it, and commits the result, all autonomously. This article distils the key lessons learned along the way.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;the-paradigm-shift%3A-inversion-of-control&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-paradigm-shift%3A-inversion-of-control&amp;quot;&amp;gt;The Paradigm Shift: Inversion of Control&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;The single biggest mental shift when building agentic systems is the &amp;lt;strong&amp;gt;inversion of control flow&amp;lt;/strong&amp;gt;. In traditional software, the application owns the business logic and orchestrates every step. In an agentic system, a significant portion of that decision-making shifts to the AI — the surrounding program increasingly acts as a communication partner, executing commands on the agent’s behalf.&amp;lt;/p&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;graph LR
    subgraph Traditional
        App[Application] --&amp;gt;|calls| Lib[Library / API]
        App --&amp;gt;|orchestrates| DB[(Database)]
    end
&amp;lt;/div&amp;gt;&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;graph LR
    subgraph Agentic
        Agent[AI Agent] --&amp;gt;|requests action| Host[Host Program]
        Host --&amp;gt;|returns result| Agent
        Agent --&amp;gt;|requests tool| Host
        Host --&amp;gt;|executes &amp;amp; returns| Agent
    end
&amp;lt;/div&amp;gt;&amp;lt;p&amp;gt;Your application becomes, to a large degree, an &amp;lt;strong&amp;gt;execution environment&amp;lt;/strong&amp;gt; for the agent. It provides tools, fetches context, applies file changes, and reports results — while the agent takes on much of the reasoning about &amp;lt;em&amp;gt;what&amp;lt;/em&amp;gt; to do. In practice, the split is not black and white — which is exactly what the next section explores.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;designing-the-agent-flow%3A-what-to-keep%2C-what-to-delegate&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#designing-the-agent-flow%3A-what-to-keep%2C-what-to-delegate&amp;quot;&amp;gt;Designing the Agent Flow: What to Keep, What to Delegate&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Once you accept the inversion of control, the next critical question is: &amp;lt;strong&amp;gt;which actions belong in your application, and which do you delegate to the LLM?&amp;lt;/strong&amp;gt; This is not an all-or-nothing decision — it’s a spectrum, and every point on it has trade-offs.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;keeping-logic-in-the-application&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#keeping-logic-in-the-application&amp;quot;&amp;gt;Keeping Logic in the Application&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Deterministic steps — file I/O, git operations, API calls, JSON parsing, diff application — are natural candidates for application-side logic.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Advantages:&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Predictable and fast.&amp;lt;/strong&amp;gt; Same input, same output, every time. No API latency, no token cost.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Testable.&amp;lt;/strong&amp;gt; You can write unit tests with clear assertions.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Debuggable.&amp;lt;/strong&amp;gt; Stack traces, breakpoints, and logging work exactly as expected.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Disadvantages:&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Rigid.&amp;lt;/strong&amp;gt; You must anticipate every edge case upfront. An unexpected file format or an unusual error message breaks the flow.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;More code to maintain.&amp;lt;/strong&amp;gt; Every special case becomes an &amp;lt;code&amp;gt;if&amp;lt;/code&amp;gt; branch or a new parser.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h3 id=&amp;quot;delegating-logic-to-the-llm&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#delegating-logic-to-the-llm&amp;quot;&amp;gt;Delegating Logic to the LLM&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Reasoning tasks — deciding &amp;lt;em&amp;gt;what&amp;lt;/em&amp;gt; to change, interpreting error messages, choosing a fix strategy, analysing code structure — are where the LLM excels.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Advantages:&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Flexible and adaptive.&amp;lt;/strong&amp;gt; The model handles novel situations without explicit programming.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Reduces code complexity.&amp;lt;/strong&amp;gt; A single prompt can replace hundreds of lines of hand-coded decision trees.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Understands intent.&amp;lt;/strong&amp;gt; The AI can reason about &amp;lt;em&amp;gt;why&amp;lt;/em&amp;gt; something should change, not just &amp;lt;em&amp;gt;what&amp;lt;/em&amp;gt; the syntax rules say.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Disadvantages:&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Non-deterministic.&amp;lt;/strong&amp;gt; The same input can produce different outputs on each run.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Slower.&amp;lt;/strong&amp;gt; Each LLM call adds 5–30 seconds of latency depending on the model and input size.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Token cost.&amp;lt;/strong&amp;gt; Every delegation costs money and consumes context window space.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Hallucination risk.&amp;lt;/strong&amp;gt; The model may confidently produce incorrect results.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Harder to test.&amp;lt;/strong&amp;gt; Assertions on LLM output are inherently fuzzy.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h3 id=&amp;quot;finding-the-right-split&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#finding-the-right-split&amp;quot;&amp;gt;Finding the Right Split&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;In practice, the division that worked best for our code generation agent was:&amp;lt;/p&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;flowchart LR
    subgraph &amp;quot;Application (deterministic)&amp;quot;
        A[File I/O]
        B[Git operations]
        C[Diff application]
        D[JSON parsing]
        E[Tool execution]
    end
    subgraph &amp;quot;LLM (reasoning)&amp;quot;
        F[Code generation]
        G[Error analysis]
        H[Fix strategy]
        I[Choosing validation tools]
        J[Context requests]
    end
    E --&amp;gt;|results| G
    G --&amp;gt;|decisions| A
&amp;lt;/div&amp;gt;&amp;lt;p&amp;gt;The guiding principle: &amp;lt;strong&amp;gt;if a step requires understanding intent or adapting to ambiguity, delegate it. If it requires reliability and speed, keep it in the application.&amp;lt;/strong&amp;gt; As you will see throughout the iterations below, we gradually moved &amp;lt;em&amp;gt;more&amp;lt;/em&amp;gt; logic to the AI side — not because we wanted to, but because the deterministic alternatives kept failing on edge cases.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;the-tightrope-walk%3A-context-window-vs.-output-quality&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-tightrope-walk%3A-context-window-vs.-output-quality&amp;quot;&amp;gt;The Tightrope Walk: Context Window vs. Output Quality&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;One of the most persistent challenges is balancing the context window. Too little context, and the AI hallucinates imports, invents method signatures, or misses existing patterns. Too much context, and the model loses focus, produces lower-quality output, or exceeds token limits.&amp;lt;/p&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;The sweet spot shifts with every model generation, but the tension never goes away.&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;p&amp;gt;This interplay shaped almost every iteration of the agent.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;the-iteration-dilemma%3A-how-many-loops%3F&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-iteration-dilemma%3A-how-many-loops%3F&amp;quot;&amp;gt;The Iteration Dilemma: How Many Loops?&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Closely related to the context problem is the question of &amp;lt;strong&amp;gt;how many iterations to allow&amp;lt;/strong&amp;gt; between your application and the LLM. LLMs have a remarkable ability to improve their output when given feedback — a compilation error sent back to the model often results in a correct fix on the second attempt. This makes iterative correction loops very attractive.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;But iteration comes at a price:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Time.&amp;lt;/strong&amp;gt; Each round-trip adds 10–30 seconds of latency. Three retry loops turn a 15-second task into a two-minute task.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Context bloat.&amp;lt;/strong&amp;gt; Every iteration appends messages to the conversation — the error report, the AI’s response, the next error report. The context window fills up fast, which degrades output quality (see above).&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Hallucination drift.&amp;lt;/strong&amp;gt; Counter-intuitively, too many iterations can make things &amp;lt;em&amp;gt;worse&amp;lt;/em&amp;gt;. After three or four failed attempts, the AI tends to “drift” — introducing new errors while fixing old ones, inventing methods that don’t exist, or producing solutions that look plausible but are semantically wrong. The model starts optimising for &amp;lt;em&amp;gt;passing the immediate check&amp;lt;/em&amp;gt; rather than &amp;lt;em&amp;gt;being correct&amp;lt;/em&amp;gt;.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Cost.&amp;lt;/strong&amp;gt; Each iteration consumes tokens. With large context windows, a single retry can cost as much as the original request.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h3 id=&amp;quot;practical-guards&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#practical-guards&amp;quot;&amp;gt;Practical Guards&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Through experimentation, we arrived at the following guidelines:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Hard caps on every loop.&amp;lt;/strong&amp;gt; No open-ended retries. We used 3 rounds for file requests, 5 rounds for code validation loops, and 3 rounds for diff recovery attempts.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Progress monitoring.&amp;lt;/strong&amp;gt; If the error count isn’t decreasing between iterations, abort early. An AI that produces &amp;lt;em&amp;gt;more&amp;lt;/em&amp;gt; errors after a fix attempt is unlikely to recover.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Compaction between rounds.&amp;lt;/strong&amp;gt; Summarise earlier turns aggressively to keep the context lean (see Iteration 3).&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Escalation, not repetition.&amp;lt;/strong&amp;gt; If simple “fix this error” prompts fail twice, escalate to a richer strategy — provide more context, rephrase the problem, or fall back to a full file regeneration instead of incremental fixes.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Rule of thumb:&amp;lt;/strong&amp;gt; If your agent hasn’t solved the problem in three iterations, throwing more iterations at it is unlikely to help — you need a different strategy, not more attempts.&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;h2 id=&amp;quot;core-principles&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#core-principles&amp;quot;&amp;gt;Core Principles&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Before diving into the iterations, here are the overarching lessons:&amp;lt;/p&amp;gt;
&amp;lt;ol&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Always give the AI the ability to request more information.&amp;lt;/strong&amp;gt; Never assume your initial context is sufficient. Let the agent ask for files, type definitions, or documentation on demand.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Define a protocol in the system prompt for structured output&amp;lt;/strong&amp;gt; — but build your application to be resilient against protocol violations. The AI will &amp;lt;em&amp;gt;sometimes&amp;lt;/em&amp;gt; deviate from the agreed JSON schema, return partial responses, or mix formats. Your parser must handle this gracefully.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;The agent drives the reasoning&amp;lt;/strong&amp;gt; — your code provides the infrastructure and guardrails.&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;h2 id=&amp;quot;iteration-1-%E2%80%94-naive-generate-and-validate-loop&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#iteration-1-%E2%80%94-naive-generate-and-validate-loop&amp;quot;&amp;gt;Iteration 1 — Naive Generate-and-Validate Loop&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;The first version was straightforward: send the issue description to the AI, receive generated code, then validate it.&amp;lt;/p&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;flowchart TD
    A[Issue Description] --&amp;gt; B[AI generates code]
    B --&amp;gt; C[CodeValidationService validates all files]
    C --&amp;gt; D{Errors found?}
    D --&amp;gt;|Yes| E[Build error report]
    E --&amp;gt; F[&amp;quot;Send to AI: &amp;#039;Fix these syntax errors&amp;#039;&amp;quot;]
    F --&amp;gt; B
    D --&amp;gt;|No| G[Commit code]
    D --&amp;gt;|Max retries reached| H[Post warning in issue &amp;amp; commit anyway]
&amp;lt;/div&amp;gt;&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;What worked:&amp;lt;/strong&amp;gt; The basic loop caught obvious syntax errors and gave the AI a chance to self-correct.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;What didn’t work:&amp;lt;/strong&amp;gt; The AI frequently generated code that referenced classes, methods, or interfaces it had never seen. Without sufficient context about the existing codebase, the output was often structurally correct but semantically wrong.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;iteration-2-%E2%80%94-smarter-context-gathering&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#iteration-2-%E2%80%94-smarter-context-gathering&amp;quot;&amp;gt;Iteration 2 — Smarter Context Gathering&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;The root cause from Iteration 1 was clear: the AI didn’t know enough about the existing codebase. The &amp;lt;code&amp;gt;fetchRelevantFileContents()&amp;lt;/code&amp;gt; method was significantly improved:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Package awareness:&amp;lt;/strong&amp;gt; When a file is mentioned, all files in the same package are loaded.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Partial name matching:&amp;lt;/strong&amp;gt; The word “Task” in an issue matches &amp;lt;code&amp;gt;Task.java&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;TaskService.java&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;TaskRepository.java&amp;lt;/code&amp;gt;, etc.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Domain structure recognition:&amp;lt;/strong&amp;gt; Files in &amp;lt;code&amp;gt;/domain/&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;/model/&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;/entity/&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;/config/&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;/dto/&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;/repository/&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;/service/&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;/controller/&amp;lt;/code&amp;gt; directories are included when they relate to the issue.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Increased file limit:&amp;lt;/strong&amp;gt; From 15 to 30 files.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;This gave the AI visibility into existing method signatures, interface definitions, inheritance hierarchies, and repository methods — drastically reducing hallucinated references.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Lesson:&amp;lt;/strong&amp;gt; Context quality matters more than prompt engineering. A perfectly worded prompt with missing context will always lose to a mediocre prompt with complete context.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;iteration-3-%E2%80%94-conversation-compaction&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#iteration-3-%E2%80%94-conversation-compaction&amp;quot;&amp;gt;Iteration 3 — Conversation Compaction&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;With richer context and multi-turn code review conversations, the context window filled up fast. After several rounds of back-and-forth, the conversation could easily exceed 100 KB of tokens.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The solution: &amp;lt;strong&amp;gt;automatic compaction&amp;lt;/strong&amp;gt; after every code review interaction. The system retains only the last four messages plus a short summary of the earlier conversation.&amp;lt;/p&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;flowchart LR
    A[&amp;quot;100 KB+ conversation&amp;quot;] --&amp;gt; B[Compaction]
    B --&amp;gt; C[&amp;quot;Summary + last 4 messages&amp;quot;]
    C --&amp;gt; D[&amp;quot;~10 KB context&amp;quot;]
&amp;lt;/div&amp;gt;&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Lesson:&amp;lt;/strong&amp;gt; LLMs work best with focused context. Aggressively summarise historical turns — the AI doesn’t need the full transcript, just the current state and a brief recap.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;iteration-4-%E2%80%94-prompt-deduplication&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#iteration-4-%E2%80%94-prompt-deduplication&amp;quot;&amp;gt;Iteration 4 — Prompt Deduplication&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;A careful audit of all prompts revealed massive redundancy:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;Instructions already present in the system prompt were repeated in every user prompt.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;The full repository tree was sent with every continuation, even though the AI already had it from the previous turn.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Verbose formatting instructions (Markdown headers, extra blank lines) were duplicated.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Changes made:&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;&amp;amp;quot;Output your response as a JSON object with the structure described in the system prompt&amp;amp;quot;&amp;lt;/code&amp;gt; → &amp;lt;code&amp;gt;&amp;amp;quot;Output JSON per system prompt format&amp;amp;quot;&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;treeContext&amp;lt;/code&amp;gt; removed from &amp;lt;code&amp;gt;buildContinuationPrompt&amp;lt;/code&amp;gt; — the AI retains it from the conversation history.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Repeated formatting directives eliminated.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Lesson:&amp;lt;/strong&amp;gt; Treat your prompts like production code. Audit them for duplication, dead instructions, and unnecessary verbosity. Every wasted token is context the AI could have used for actual reasoning.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;iteration-5-%E2%80%94-diff-based-updates-and-dynamic-file-requests&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#iteration-5-%E2%80%94-diff-based-updates-and-dynamic-file-requests&amp;quot;&amp;gt;Iteration 5 — Diff-Based Updates and Dynamic File Requests&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;This was the most impactful single iteration. Two major features were introduced:&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;diff-based-changes&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#diff-based-changes&amp;quot;&amp;gt;Diff-Based Changes&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Instead of returning entire files for every small change, the AI now returns &amp;lt;strong&amp;gt;SEARCH/REPLACE diffs&amp;lt;/strong&amp;gt;:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-json&amp;quot;&amp;gt;{
  &amp;amp;quot;fileChanges&amp;amp;quot;: [
    {
      &amp;amp;quot;path&amp;amp;quot;: &amp;amp;quot;src/main/java/com/example/Task.java&amp;amp;quot;,
      &amp;amp;quot;operation&amp;amp;quot;: &amp;amp;quot;UPDATE&amp;amp;quot;,
      &amp;amp;quot;diff&amp;amp;quot;: &amp;amp;quot;&amp;amp;lt;&amp;amp;lt;&amp;amp;lt;&amp;amp;lt;&amp;amp;lt;&amp;amp;lt;&amp;amp;lt; SEARCH&#92;nprivate String name;&#92;n=======&#92;nprivate String name;&#92;nprivate String description;&#92;n&amp;amp;gt;&amp;amp;gt;&amp;amp;gt;&amp;amp;gt;&amp;amp;gt;&amp;amp;gt;&amp;amp;gt; REPLACE&amp;amp;quot;
    }
  ]
}
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;A new &amp;lt;code&amp;gt;DiffApplyService&amp;lt;/code&amp;gt; applies these blocks to the actual file content.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;dynamic-file-requests&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#dynamic-file-requests&amp;quot;&amp;gt;Dynamic File Requests&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;The AI can now respond with a &amp;lt;strong&amp;gt;file request&amp;lt;/strong&amp;gt; instead of code changes:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-json&amp;quot;&amp;gt;{
  &amp;amp;quot;summary&amp;amp;quot;: &amp;amp;quot;Need more context about the repository interface&amp;amp;quot;,
  &amp;amp;quot;requestFiles&amp;amp;quot;: [&amp;amp;quot;src/main/java/com/example/TaskRepository.java&amp;amp;quot;, &amp;amp;quot;pom.xml&amp;amp;quot;]
}
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;The host program fetches the requested files and continues the conversation.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Token savings were dramatic:&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;Scenario&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Before&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;After&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Saving&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Small change in a 500-line file&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;~500 lines&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;~10 lines (diff)&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;~98%&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Follow-up without new files&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Tree + file list&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Only comment&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;~90%&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Iterative requests&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;All files again&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Only requested files&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;~70%&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Lesson:&amp;lt;/strong&amp;gt; Give the AI the tools to be efficient. Diff-based output and on-demand file requests transformed a chatty, wasteful interaction into a focused, surgical one.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;In hindsight, though, this was an important &amp;lt;em&amp;gt;transitional&amp;lt;/em&amp;gt; design, not the final one. Diff-based updates reduced token usage, but they also introduced a fragile mini-language that the host application had to interpret and repair.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;iteration-6-%E2%80%94-robust-diff-application&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#iteration-6-%E2%80%94-robust-diff-application&amp;quot;&amp;gt;Iteration 6 — Robust Diff Application&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Real-world diffs from the AI are messy. The &amp;lt;code&amp;gt;DiffApplyService&amp;lt;/code&amp;gt; had to handle numerous edge cases:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Empty SEARCH blocks&amp;lt;/strong&amp;gt; — content is appended to the file.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Placeholder comments&amp;lt;/strong&amp;gt; like &amp;lt;code&amp;gt;/* Add existing... */&amp;lt;/code&amp;gt; — treated as append operations.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Append patterns&amp;lt;/strong&amp;gt; — when the REPLACE block starts with the SEARCH content and adds more, only the new part is appended.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Trailing whitespace differences&amp;lt;/strong&amp;gt; — a fuzzy match is attempted before failing.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;Additionally, the &amp;lt;code&amp;gt;IssueImplementationService&amp;lt;/code&amp;gt; was ignoring AI responses that contained &amp;lt;code&amp;gt;requestFiles&amp;lt;/code&amp;gt; but no &amp;lt;code&amp;gt;fileChanges&amp;lt;/code&amp;gt;, returning &amp;lt;code&amp;gt;null&amp;lt;/code&amp;gt; instead. The fix:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;Detect &amp;lt;code&amp;gt;requestFiles&amp;lt;/code&amp;gt; even when &amp;lt;code&amp;gt;fileChanges&amp;lt;/code&amp;gt; is empty.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Fetch the requested files and continue the conversation.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Allow a maximum of &amp;lt;strong&amp;gt;three rounds&amp;lt;/strong&amp;gt; of file requests to prevent infinite loops.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Lesson:&amp;lt;/strong&amp;gt; The interface between AI output and your application is inherently fuzzy. Build robust parsers, add fallback strategies, and always cap iteration counts.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;That said, there is a deeper lesson here: every extra recovery rule in the host is a signal that the protocol itself may be too brittle. If you keep adding fuzzy matching, placeholder handling, and recovery paths, you may be compensating for the wrong abstraction.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;iteration-7-%E2%80%94-ai-driven-validation-with-tools&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#iteration-7-%E2%80%94-ai-driven-validation-with-tools&amp;quot;&amp;gt;Iteration 7 — AI-Driven Validation with Tools&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;A fundamental architectural change: &amp;lt;strong&amp;gt;remove built-in validators entirely&amp;lt;/strong&amp;gt; and let the AI decide how to validate its own output.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The agent prompt was updated to make tool usage mandatory:&amp;lt;/p&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;em&amp;gt;“IMPORTANT: You MUST include &amp;lt;code&amp;gt;runTool&amp;lt;/code&amp;gt; in every response that contains &amp;lt;code&amp;gt;fileChanges&amp;lt;/code&amp;gt;. The bot does not have built-in validators — only you can determine how to validate the code by executing external tools.”&amp;lt;/em&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;p&amp;gt;The AI now specifies a validation command (e.g., &amp;lt;code&amp;gt;mvn compile&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;npm run build&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;gradle check&amp;lt;/code&amp;gt;) alongside its code changes. If it forgets, the host program sends it back with a reminder.&amp;lt;/p&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;flowchart TD
    A[AI returns fileChanges + runTool] --&amp;gt; B[Host applies file changes]
    B --&amp;gt; C[Host executes specified tool]
    C --&amp;gt; D{Tool output}
    D --&amp;gt;|Success| E[Commit]
    D --&amp;gt;|Failure| F[Send tool output back to AI]
    F --&amp;gt; A
    G[AI returns fileChanges WITHOUT runTool] --&amp;gt; H[&amp;quot;Host: &amp;#039;Please specify a validation tool&amp;#039;&amp;quot;]
    H --&amp;gt; A
&amp;lt;/div&amp;gt;&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Lesson:&amp;lt;/strong&amp;gt; The AI often knows better than a hardcoded validator what constitutes “correct” in a given context. A Java project needs &amp;lt;code&amp;gt;mvn compile&amp;lt;/code&amp;gt;; a Node project needs &amp;lt;code&amp;gt;npm run build&amp;lt;/code&amp;gt;; a Python project might need &amp;lt;code&amp;gt;pytest&amp;lt;/code&amp;gt;. Let the agent choose.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;the-tool-selection-problem&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-tool-selection-problem&amp;quot;&amp;gt;The Tool Selection Problem&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Letting the AI choose &amp;lt;em&amp;gt;which&amp;lt;/em&amp;gt; tool to run immediately raises a thorny question: &amp;lt;strong&amp;gt;which tools do you allow it to execute at all?&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;This is one of the hardest design decisions in an agentic system, and the answer should be conservative: &amp;lt;strong&amp;gt;allow the absolute minimum set of tools needed to get the job done.&amp;lt;/strong&amp;gt; Every tool you expose is:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;A security risk.&amp;lt;/strong&amp;gt; A build command like &amp;lt;code&amp;gt;mvn compile&amp;lt;/code&amp;gt; is safe. An arbitrary shell command is not. The distance between “run my tests” and &amp;lt;code&amp;gt;rm -rf /&amp;lt;/code&amp;gt; is one hallucinated token.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;A source of complexity.&amp;lt;/strong&amp;gt; More tool types mean more parsing, more error handling, more edge cases in your host program.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;A surface for misuse.&amp;lt;/strong&amp;gt; The AI might call tools in unexpected ways, with unexpected arguments, or in unexpected order. The more tools available, the larger the space of possible (and possibly harmful) interactions.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;For our code generation agent, the minimal toolset was:&amp;lt;/p&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;Tool&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Purpose&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Read file&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Fetch source code from the repository&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;File tools (&amp;lt;code&amp;gt;write-file&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;patch-file&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;mkdir&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;delete-file&amp;lt;/code&amp;gt;)&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Modify source code&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Execute build command&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Validate changes (&amp;lt;code&amp;gt;mvn compile&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;npm run build&amp;lt;/code&amp;gt;, etc.)&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;Request additional files&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Ask for more context&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;p&amp;gt;We deliberately did &amp;lt;em&amp;gt;not&amp;lt;/em&amp;gt; expose: arbitrary shell access, database queries, network requests, or deployment commands. Each tool that didn’t make this list was considered and rejected because it either wasn’t strictly necessary or introduced unacceptable risk.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Sandboxing is essential.&amp;lt;/strong&amp;gt; Even with a minimal toolset, run tool executions in an isolated environment. Restrict file system access to the project directory. Set timeouts on build commands. Log every tool invocation for audit. The AI is not malicious, but it is unpredictable — and unpredictable + powerful = dangerous.&amp;lt;/p&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Principle:&amp;lt;/strong&amp;gt; Start with zero tools and add them only when the agent demonstrably cannot complete its task without them. Resist the temptation to expose “just one more” convenience tool.&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;h2 id=&amp;quot;iteration-8-%E2%80%94-resilient-json-parsing&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#iteration-8-%E2%80%94-resilient-json-parsing&amp;quot;&amp;gt;Iteration 8 — Resilient JSON Parsing&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;With complex multi-turn conversations, the AI occasionally produced truncated or malformed JSON — especially near token limits. The &amp;lt;code&amp;gt;repairTruncatedJson&amp;lt;/code&amp;gt; method was overhauled:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Check completeness first:&amp;lt;/strong&amp;gt; Verify whether brackets are balanced before attempting any repair.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Only truncate genuinely incomplete JSON&amp;lt;/strong&amp;gt; — previously, valid JSON was sometimes mangled by premature repair attempts.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Add &amp;lt;code&amp;gt;@NoArgsConstructor&amp;lt;/code&amp;gt;&amp;lt;/strong&amp;gt; to all DTO classes to ensure Jackson can deserialize partial objects.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Parse &amp;lt;code&amp;gt;runTool&amp;lt;/code&amp;gt;&amp;lt;/strong&amp;gt; as a proper typed object (&amp;lt;code&amp;gt;AiToolRequest&amp;lt;/code&amp;gt;) instead of a raw map.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Lesson:&amp;lt;/strong&amp;gt; When you define a structured protocol in the system prompt, the AI will follow it &amp;lt;em&amp;gt;most of the time&amp;lt;/em&amp;gt; — perhaps 95%. Your application must handle the other 5% gracefully. Invest in resilient parsing, not stricter prompts.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;iteration-9-%E2%80%94-ai-assisted-diff-recovery&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#iteration-9-%E2%80%94-ai-assisted-diff-recovery&amp;quot;&amp;gt;Iteration 9 — AI-Assisted Diff Recovery&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Even with all the fuzzy matching from Iteration 6, diffs still sometimes failed to apply — typically because the file had been modified by a previous step in the same conversation, and the SEARCH block no longer matched.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The elegant solution: &amp;lt;strong&amp;gt;ask the AI to resolve it&amp;lt;/strong&amp;gt;.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;When a &amp;lt;code&amp;gt;DiffApplyException&amp;lt;/code&amp;gt; occurs:&amp;lt;/p&amp;gt;
&amp;lt;ol&amp;gt;
&amp;lt;li&amp;gt;Fetch the current file content from the repository.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Send both the current content and the failed diff to the AI.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Ask it to produce the complete new file content.&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;flowchart TD
    A[Apply diff] --&amp;gt; B{DiffApplyException?}
    B --&amp;gt;|No| C[Success]
    B --&amp;gt;|Yes| D[Fetch current file from repo]
    D --&amp;gt; E[&amp;quot;Send to AI:&#92;n• Current file content&#92;n• Failed diff&#92;n• &amp;#039;Produce complete new file&amp;#039;&amp;quot;]
    E --&amp;gt; F[AI returns full file content]
    F --&amp;gt; G[Use directly — no diff needed]
&amp;lt;/div&amp;gt;&amp;lt;p&amp;gt;This is far more robust than implementing ever-more-complex matching strategies in the &amp;lt;code&amp;gt;DiffApplyService&amp;lt;/code&amp;gt;. The AI sees the actual current state of the file and can produce the intended result directly.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Lesson:&amp;lt;/strong&amp;gt; When your deterministic code fails, don’t add more deterministic complexity — delegate back to the AI. It can reason about intent in ways that string matching never will.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;postscript-%E2%80%94-tool-requests-beat-fragile-diff-protocols&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#postscript-%E2%80%94-tool-requests-beat-fragile-diff-protocols&amp;quot;&amp;gt;Postscript — Tool Requests Beat Fragile Diff Protocols&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;After publishing the original version of this architecture, I revisited a later iteration of the agent and checked how the design had evolved over time. The pattern was clear: there were several rounds of hardening around diff handling, but eventually the &amp;lt;code&amp;gt;fileChanges&amp;lt;/code&amp;gt; mechanism was removed entirely and replaced with tool-based file operations.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;That refactoring switched the protocol from &amp;lt;em&amp;gt;describing&amp;lt;/em&amp;gt; edits as a custom JSON diff format to &amp;lt;em&amp;gt;requesting concrete actions&amp;lt;/em&amp;gt;:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;write-file&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;patch-file&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;mkdir&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;delete-file&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;All of these are executed through &amp;lt;code&amp;gt;runTools&amp;lt;/code&amp;gt;, alongside validation commands. In other words, file mutation stopped being a special side channel and became just another tool request.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;This turned out to be much more robust for three reasons:&amp;lt;/p&amp;gt;
&amp;lt;ol&amp;gt;
&amp;lt;li&amp;gt;The contract is simpler. The host executes explicit operations instead of parsing and heuristically applying a synthetic diff language.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Failures are clearer. &amp;lt;code&amp;gt;patch-file&amp;lt;/code&amp;gt; either finds the exact text once or it fails with a precise error; there is less “maybe this is close enough” behavior.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;The conversation model is cleaner. The agent can use &amp;lt;code&amp;gt;requestTools&amp;lt;/code&amp;gt; to inspect files first, then issue &amp;lt;code&amp;gt;runTools&amp;lt;/code&amp;gt; for exact changes and validation in the next round.&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;p&amp;gt;The most important updated lesson is this: &amp;lt;strong&amp;gt;when the bridge between agent and host becomes fragile, prefer executable tool requests over clever output formats.&amp;lt;/strong&amp;gt; A protocol that asks the model to say &amp;lt;em&amp;gt;what to do&amp;lt;/em&amp;gt; is usually sturdier than one that asks it to emit a compact patch language the host must interpret.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Diffs were still a valuable intermediate step. They reduced token usage and forced a more structured contract. But in practice, explicit tool requests turned out to be the more durable endpoint.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;iteration-10-%E2%80%94-from-a-hand-wired-loop-to-a-generic-agent-loop-with-strategies&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#iteration-10-%E2%80%94-from-a-hand-wired-loop-to-a-generic-agent-loop-with-strategies&amp;quot;&amp;gt;Iteration 10 — From a Hand-Wired Loop to a Generic Agent Loop with Strategies&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;By the time we had a second agent in the system — one that turns vague user reports into well-formed issues, alongside the original code-generating one — the original implementation loop had been copy-pasted, renamed, and quietly diverged. Both loops did almost the same thing:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;bump a round counter and check budgets,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;mirror every message into the persisted session &amp;lt;em&amp;gt;and&amp;lt;/em&amp;gt; into an in-memory list,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;call the AI with &amp;lt;code&amp;gt;(history, message, systemPrompt, modelOverride, maxTokens)&amp;lt;/code&amp;gt;,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;parse the response into a plan,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;branch on “context request” vs. “tool run” vs. “final”,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;and call back into the agent-specific finisher.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;The divergences were the dangerous part. One loop carried an &amp;lt;code&amp;gt;attempt--&amp;lt;/code&amp;gt; hack that decremented the validation counter whenever the AI asked for more context, “because context lookups shouldn’t burn an implementation attempt.” The other had a different off-by-one on the same idea. Neither could be reasoned about without reading the surrounding 200 lines.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;the-refactor&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-refactor&amp;quot;&amp;gt;The refactor&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Instead of designing the new loop top-down, we started bottom-up by &amp;lt;em&amp;gt;characterising&amp;lt;/em&amp;gt; the existing one. Several new tests pinned three branch combinations that nobody dared touch: multi-round validation retry, the “ignore non-blocking tool failures after a successful build” policy, and the “file-only success without any validation tool” path. Only with those tests green did we extract:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;a generic loop class that owns rounds, history mirroring, and the AI call,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;a sealed &amp;lt;code&amp;gt;StepDecision&amp;lt;/code&amp;gt; type with exactly two cases (&amp;lt;code&amp;gt;Continue(nextMessage)&amp;lt;/code&amp;gt; / &amp;lt;code&amp;gt;Finish(outcome)&amp;lt;/code&amp;gt;) that the strategy returns each round,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;a small &amp;lt;code&amp;gt;Strategy&amp;lt;/code&amp;gt; interface with &amp;lt;code&amp;gt;systemPrompt()&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;step(ctx, aiResponse, round)&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;onBudgetExhausted(ctx)&amp;lt;/code&amp;gt;,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;an immutable &amp;lt;code&amp;gt;Budget(maxRounds, maxContextRounds, maxValidationRetries, maxTokensPerCall)&amp;lt;/code&amp;gt; value object,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;and a &amp;lt;code&amp;gt;RunContext&amp;lt;/code&amp;gt; carrying per-run mutable state (the workspace, the current base branch, the issue identifier — whatever the agent needs to know).&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;The &amp;lt;code&amp;gt;attempt--&amp;lt;/code&amp;gt; hack was deleted. Context-fetch rounds and implementation attempts now use &amp;lt;em&amp;gt;separate&amp;lt;/em&amp;gt; counters in the strategy. The session/history double-bookkeeping disappeared into the loop, where it belongs.&amp;lt;/p&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;flowchart TD
    A[&amp;quot;loop.run()&amp;quot;] --&amp;gt; B[&amp;quot;ai.chat() / ai.chatWithTools()&amp;quot;]
    B --&amp;gt; C[&amp;quot;strategy.step(ctx, response, round)&amp;quot;]
    C --&amp;gt;|Continue| A
    C --&amp;gt;|Finish| D[strategy-specific finisher]
    A -. budget exhausted .-&amp;gt; E[strategy.onBudgetExhausted]
&amp;lt;/div&amp;gt;&amp;lt;h3 id=&amp;quot;lesson&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#lesson&amp;quot;&amp;gt;Lesson&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;Don’t refactor what you cannot reproduce. Characterisation tests are the cheapest possible insurance policy when ripping apart a loop full of off-by-one tricks.&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;p&amp;gt;The second, subtler lesson surfaced months later: a sealed &amp;lt;code&amp;gt;StepDecision&amp;lt;/code&amp;gt; interface forces every new control-flow case to opt in. We later considered adding an &amp;lt;code&amp;gt;Abort(reason)&amp;lt;/code&amp;gt; variant for Iteration 13’s critic step — and the compiler immediately pointed at every &amp;lt;code&amp;gt;switch&amp;lt;/code&amp;gt; that would need updating. The exhaustiveness check turned the loop from a behavioural surprise generator into something &amp;lt;em&amp;gt;typed&amp;lt;/em&amp;gt;. That alone justified the refactor.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;iteration-11-%E2%80%94-schema-validation%3A-trust%2C-but-verify-(quietly)&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#iteration-11-%E2%80%94-schema-validation%3A-trust%2C-but-verify-(quietly)&amp;quot;&amp;gt;Iteration 11 — Schema Validation: Trust, but Verify (Quietly)&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;The output protocol from Iteration 5 onwards was JSON-in-prompt. We documented its shape in the system prompt, deserialised it with a standard JSON library, and patched up violations in a thicket of helpers — extract-the-JSON-from-the-fenced-block (four strategies), truncate-to-first-balanced-object, repair-truncated-JSON, sanitise-invalid-escapes, find-the-last-complete-tool-call. The pile grew with every new model we tested.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The real cost was not the helpers themselves but the &amp;lt;em&amp;gt;invisibility&amp;lt;/em&amp;gt; of their work. We had no idea how often the AI actually violated the contract — and which fields it got wrong most often. The agent might be quietly recovering from 30 % schema violations and we wouldn’t know.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;what-we-built&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#what-we-built&amp;quot;&amp;gt;What we built&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;JSON-Schema documents&amp;lt;/strong&amp;gt; (Draft 2020-12) for every plan shape the AI is allowed to return. They accept the de-facto aliases the models had already invented (&amp;lt;code&amp;gt;requestedFiles&amp;lt;/code&amp;gt; next to &amp;lt;code&amp;gt;requestFiles&amp;lt;/code&amp;gt;, singular &amp;lt;code&amp;gt;runTool&amp;lt;/code&amp;gt; next to plural &amp;lt;code&amp;gt;runTools[]&amp;lt;/code&amp;gt;) via &amp;lt;code&amp;gt;oneOf&amp;lt;/code&amp;gt;, because legacy responses must remain valid.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;A &amp;lt;strong&amp;gt;validator component&amp;lt;/strong&amp;gt; that runs &amp;lt;em&amp;gt;after&amp;lt;/em&amp;gt; the existing extract/repair pipeline and &amp;lt;em&amp;gt;before&amp;lt;/em&amp;gt; deserialisation.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;A &amp;lt;strong&amp;gt;counter&amp;lt;/strong&amp;gt; like &amp;lt;code&amp;gt;agent.plan.schema_violations_total{agent=...}&amp;lt;/code&amp;gt; exposed on the metrics endpoint.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;A &amp;lt;strong&amp;gt;feature flag&amp;lt;/strong&amp;gt; &amp;lt;code&amp;gt;agent.schema.enforce&amp;lt;/code&amp;gt; (default &amp;lt;code&amp;gt;false&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Snapshot tests&amp;lt;/strong&amp;gt; against real captured AI responses, plus a couple of deliberately broken negatives.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;The key design decision: in the default mode, &amp;lt;strong&amp;gt;the validator does not change behaviour at all.&amp;lt;/strong&amp;gt; It logs the violation, increments the counter, and lets the existing repair heuristics do their job. The whole layer is observational.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;lesson-1&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#lesson-1&amp;quot;&amp;gt;Lesson&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;Adding a strict validator next to a forgiving parser is &amp;lt;em&amp;gt;not&amp;lt;/em&amp;gt; a contradiction. Run them in parallel for a release or two, measure how often the strict layer would have rejected, and only then decide whether to flip enforcement on.&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;p&amp;gt;This is the boring-but-correct path. We did not “delete the messy parser and replace it with the proper one” — that would have broken production the same day. We added a measurement and waited.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;A second lesson, hidden in the implementation: the schemas turned out to be reusable. In Iteration 12 we needed a JSON-Schema for each tool the agent can call. We already had two well-tested plan schemas to learn from.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;iteration-12-%E2%80%94-provider-native-function-calling-(without-burning-the-bridges)&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#iteration-12-%E2%80%94-provider-native-function-calling-(without-burning-the-bridges)&amp;quot;&amp;gt;Iteration 12 — Provider-Native Function Calling (Without Burning the Bridges)&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;All four major LLM providers we support now expose first-class tool calling. The provider parses the model’s tool-call intent server-side and hands you a structured &amp;lt;code&amp;gt;{ name, arguments }&amp;lt;/code&amp;gt; object. No JSON-in-prompt, no fenced code blocks, no fuzzy extraction.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;We had wanted to use this from the start. We could not, because:&amp;lt;/p&amp;gt;
&amp;lt;ol&amp;gt;
&amp;lt;li&amp;gt;Not every model supports it (older fine-tunes, certain self-hosted setups, raw-completion endpoints).&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Switching providers mid-flight would have invalidated months of accumulated prompt engineering.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Operators of an existing deployment may have a strong reason — model choice, cost, debuggability — to stay on the legacy path even on providers that &amp;lt;em&amp;gt;do&amp;lt;/em&amp;gt; support tools.&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;p&amp;gt;So the design constraint was firm: native function calling had to be &amp;lt;strong&amp;gt;opt-out per provider configuration&amp;lt;/strong&amp;gt;, with the JSON-in-prompt path remaining fully functional and fully tested.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;the-contract-changes&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-contract-changes&amp;quot;&amp;gt;The contract changes&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-java&amp;quot;&amp;gt;record ChatTurn(String assistantText, List&amp;amp;lt;ToolCall&amp;amp;gt; toolCalls, StopReason stop) {}
record ToolCall(String id, String name, JsonNode args) {}
record ToolDescriptor(String name, String description, JsonNode jsonSchema) {}

interface AiClient {
    String chat(List&amp;amp;lt;Message&amp;amp;gt; history, String msg, String sys, String model, int maxTokens);

    // default delegates to chat(), so non-native clients work unchanged
    default ChatTurn chatWithTools(List&amp;amp;lt;Message&amp;amp;gt; history, String msg,
                                   List&amp;amp;lt;ToolDescriptor&amp;amp;gt; tools,
                                   String sys, String model, int maxTokens) {
        return ChatTurn.text(chat(history, msg, sys, model, maxTokens));
    }

    default boolean supportsNativeTools() { return false; }
}
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;Three things make this work:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;The default method shields old clients.&amp;lt;/strong&amp;gt; Anything that hasn’t been migrated keeps its plain-text &amp;lt;code&amp;gt;chat()&amp;lt;/code&amp;gt; semantics, even when the loop asks for tools. No big-bang client rewrite.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;A shared helper&amp;lt;/strong&amp;gt; wraps a &amp;lt;code&amp;gt;chat()&amp;lt;/code&amp;gt; call into a &amp;lt;code&amp;gt;ChatTurn&amp;lt;/code&amp;gt;. Every native client uses it as its fallback when the operator-level kill switch is set.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;The loop triple-gates the native path:&amp;lt;/strong&amp;gt; the strategy must prefer &amp;lt;code&amp;gt;NATIVE&amp;lt;/code&amp;gt;, the client must report &amp;lt;code&amp;gt;supportsNativeTools() == true&amp;lt;/code&amp;gt;, &amp;lt;em&amp;gt;and&amp;lt;/em&amp;gt; the strategy must actually expose at least one &amp;lt;code&amp;gt;ToolDescriptor&amp;lt;/code&amp;gt;. Any missing condition falls back to the legacy path, with the reason logged.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h3 id=&amp;quot;a-per-configuration-kill-switch&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#a-per-configuration-kill-switch&amp;quot;&amp;gt;A per-configuration kill switch&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;A &amp;lt;code&amp;gt;useLegacyToolCalling&amp;lt;/code&amp;gt; flag was added to the AI-provider configuration record. The admin UI got a switch with a popover explaining both modes:&amp;lt;/p&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;em&amp;gt;“Native tool calls give the model a structured tools API and are the recommended default. Disable this if you observe parse failures with a specific model or want to keep using the prompt-engineered protocol.”&amp;lt;/em&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;p&amp;gt;When the operator flips the switch, the client is reconstructed with native mode disabled, so &amp;lt;code&amp;gt;chatWithTools(...)&amp;lt;/code&amp;gt; quietly returns a &amp;lt;code&amp;gt;ChatTurn.text(...)&amp;lt;/code&amp;gt; and the legacy text path runs end-to-end — without any change to the agent code above it.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;telemetry-that-finally-tells-the-truth&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#telemetry-that-finally-tells-the-truth&amp;quot;&amp;gt;Telemetry that finally tells the truth&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Three metrics were added:&amp;lt;/p&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;Metric&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Tags&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;What it measures&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;agent.tool_call.mode_total&amp;lt;/code&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;mode={native,legacy}&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;provider&amp;lt;/code&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;One increment per AI round. Lets us see migration progress per provider.&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;agent.tool_call.parse_failures_total&amp;lt;/code&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;provider&amp;lt;/code&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Where the model defied the protocol.&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;agent.tool_call.latency_seconds&amp;lt;/code&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;mode&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;provider&amp;lt;/code&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Wall-clock per AI round.&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;p&amp;gt;The first metric matters more than it looks. The single hardest question in this kind of migration is “are we &amp;lt;em&amp;gt;actually&amp;lt;/em&amp;gt; on the new path, or are we silently falling back?” A simple count per mode answers that without instrumenting the loop further.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;lesson-2&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#lesson-2&amp;quot;&amp;gt;Lesson&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;When you change a contract that several providers must implement, give the interface a default method that preserves the old behaviour. Then add the new behaviour as an opt-in capability, with telemetry from day one. Migrating providers becomes a measurable, reversible operation instead of a big bang.&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;p&amp;gt;A related, second-order lesson: the schemas from Iteration 11 were exactly the right artefact to hand to the providers’ tool-call APIs. The investment in a stricter protocol paid off precisely when we no longer needed the loose one.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;iteration-13-%E2%80%94-persisted-state%2C-consolidated-budgets%2C-and-an-optional-critic&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#iteration-13-%E2%80%94-persisted-state%2C-consolidated-budgets%2C-and-an-optional-critic&amp;quot;&amp;gt;Iteration 13 — Persisted State, Consolidated Budgets, and an Optional Critic&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;The final iteration is really three small refactors that share a theme: the agent had grown subtle bugs from &amp;lt;em&amp;gt;implicit&amp;lt;/em&amp;gt; state. Each fix replaces an inference-from-history with an explicit, persisted value.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;13a.-persisting-the-last-plan-instead-of-replaying-history&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#13a.-persisting-the-last-plan-instead-of-replaying-history&amp;quot;&amp;gt;13a. Persisting the last plan instead of replaying history&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;The old “get the last plan” helper walked the conversation backwards, calling the JSON parser on every &amp;lt;code&amp;gt;assistant&amp;lt;/code&amp;gt; message until one returned non-null. It was used in three places — the PR body, the follow-up comment, and the critic step we were about to add. It had three problems:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Performance.&amp;lt;/strong&amp;gt; Long sessions re-parsed dozens of messages per call.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Inconsistency.&amp;lt;/strong&amp;gt; If the plan format evolved between runs, an older message would parse differently than a fresh one would.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Brittleness in tests.&amp;lt;/strong&amp;gt; “The last plan” depended on the exact mock of the history accessor, leading to several hours of debugging “why does this test see an empty plan?”.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;The fix is unglamorous: three new columns on the session table (&amp;lt;code&amp;gt;last_plan_summary&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;last_plan_json&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;last_plan_at&amp;lt;/code&amp;gt;), a &amp;lt;code&amp;gt;recordPlan(...)&amp;lt;/code&amp;gt; method on the session service, and a single call in the strategy right after the response is parsed successfully. Downstream readers now hit the row in O(1). The history walk survives as a fallback for sessions that pre-date the migration.&amp;lt;/p&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Lesson:&amp;lt;/strong&amp;gt; If you find yourself re-deriving the same fact from history three times in three places, persist the fact. The history is for &amp;lt;em&amp;gt;audit&amp;lt;/em&amp;gt;, not for &amp;lt;em&amp;gt;lookup&amp;lt;/em&amp;gt;.&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;h3 id=&amp;quot;13b.-one-budget-config-to-rule-them-all&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#13b.-one-budget-config-to-rule-them-all&amp;quot;&amp;gt;13b. One budget config to rule them all&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Before this iteration, the agent had at least seven overlapping limits scattered across three places: separate validation retries, max tool executions, a hard-coded &amp;lt;code&amp;gt;MAX_CONTEXT_TOOL_REQUESTS = 5&amp;lt;/code&amp;gt; constant, separate writer-specific knobs, a hard-coded &amp;lt;code&amp;gt;fileRequestRounds &amp;amp;lt; 3&amp;lt;/code&amp;gt; literal in the loop, and a legacy &amp;lt;code&amp;gt;maxTokens&amp;lt;/code&amp;gt; setting. Each one was set somewhere different and meant something subtly different.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The new &amp;lt;code&amp;gt;BudgetConfig&amp;lt;/code&amp;gt; collapses these into five named knobs with sensible defaults: &amp;lt;code&amp;gt;maxRounds = 10&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;maxContextRounds = 3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;maxValidationRetries = 3&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;maxContextToolRequestsPerRound = 5&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;maxTokensPerCall = 16384&amp;lt;/code&amp;gt;. The deprecated fields stay on the config object for backwards compatibility, but a constructor hook copies them into the new struct whenever its values are still at the defaults — so an operator who customised &amp;lt;code&amp;gt;agent.max-tokens=8192&amp;lt;/code&amp;gt; in their YAML keeps that value, without ever touching the new property.&amp;lt;/p&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Lesson:&amp;lt;/strong&amp;gt; When the operator-visible configuration drifts from the model the code actually uses, add a migration &amp;lt;em&amp;gt;inside&amp;lt;/em&amp;gt; the config class. A post-construct hook (in your framework’s equivalent) is the cheapest possible adapter and keeps the documentation honest.&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;h3 id=&amp;quot;13c.-an-optional-critic-%2F-reflection-step&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#13c.-an-optional-critic-%2F-reflection-step&amp;quot;&amp;gt;13c. An optional Critic / Reflection step&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;The last addition is intentionally small, and intentionally off by default. It is inspired by the “Self-Refine” / “Reflexion” family of prompting techniques: after the implementation has passed validation, ask a &amp;lt;em&amp;gt;second&amp;lt;/em&amp;gt; LLM call to read the diff and answer one question — &amp;lt;em&amp;gt;does this change actually address the issue?&amp;lt;/em&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The contract is a three-valued return:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-json&amp;quot;&amp;gt;{&amp;amp;quot;outcome&amp;amp;quot;: &amp;amp;quot;APPROVE&amp;amp;quot; | &amp;amp;quot;ITERATE&amp;amp;quot; | &amp;amp;quot;ABORT&amp;amp;quot;, &amp;amp;quot;feedback&amp;amp;quot;: &amp;amp;quot;short, actionable text&amp;amp;quot;}
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;APPROVE&amp;lt;/code&amp;gt; proceeds to commit and PR creation, unchanged.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;ITERATE&amp;lt;/code&amp;gt; posts the critic’s feedback back to the issue and resets the session to “waiting for the user to ping the bot”. (We deliberately did &amp;lt;em&amp;gt;not&amp;lt;/em&amp;gt; feed the feedback back into the same loop. That door is open, but it has subtle failure modes — a critic that keeps demanding “more tests” can lock the loop in an unproductive cycle.)&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;ABORT&amp;lt;/code&amp;gt; aborts the PR creation entirely, posts the reason as a comment, and marks the session as failed.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;There is also a hidden fourth outcome, &amp;lt;code&amp;gt;SKIPPED&amp;lt;/code&amp;gt;, emitted whenever the feature is disabled (the default). It exists purely so the metric &amp;lt;code&amp;gt;agent.critic.outcome_total{outcome=…}&amp;lt;/code&amp;gt; makes the off-state visible — &amp;lt;em&amp;gt;zero&amp;lt;/em&amp;gt; additional LLM calls per implementation, but a non-zero count of &amp;lt;code&amp;gt;SKIPPED&amp;lt;/code&amp;gt; increments.&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-yaml&amp;quot;&amp;gt;agent:
  critic:
    enabled: false                       # opt-in
    max-iterations: 1
    require-approval-for: [LARGE_DIFF]   # informational, future trigger
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;The most important property of this design is what it &amp;lt;em&amp;gt;does not do&amp;lt;/em&amp;gt; in the default configuration: it makes no extra AI call, allocates no extra tokens, and adds no latency. A unit test asserts exactly that — “when disabled, the AI client is never called”.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;lesson-3&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#lesson-3&amp;quot;&amp;gt;Lesson&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;A useful self-critique step is one that costs nothing when disabled, that always fails open in the face of unparseable AI output, and that publishes its outcome as a first-class metric. Reflection is powerful, but a critic that blocks PRs when the network blips is worse than no critic at all.&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;p&amp;gt;The pragmatic policy we settled on: parse failure → APPROVE. AI exception → APPROVE. Empty response → APPROVE. The only paths that can stop a PR are an &amp;lt;em&amp;gt;explicit&amp;lt;/em&amp;gt; &amp;lt;code&amp;gt;ITERATE&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;ABORT&amp;lt;/code&amp;gt; with a well-formed JSON body. The metric tags (&amp;lt;code&amp;gt;approve&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;iterate&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;abort&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;skipped&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;error&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;parse_error&amp;lt;/code&amp;gt;) make every fail-open visible to operators.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;reflecting-on-the-architectural-iterations&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#reflecting-on-the-architectural-iterations&amp;quot;&amp;gt;Reflecting on the Architectural Iterations&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Iterations 10–13 are different in flavour from 1–9. The early iterations were about &amp;lt;em&amp;gt;what the agent does&amp;lt;/em&amp;gt;; the later ones are about &amp;lt;em&amp;gt;the shape of the host that contains it&amp;lt;/em&amp;gt;. Three patterns kept recurring:&amp;lt;/p&amp;gt;
&amp;lt;ol&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Sealed types beat strings.&amp;lt;/strong&amp;gt; Replacing string-based control flow (&amp;lt;code&amp;gt;if (response.contains(&amp;amp;quot;requestFiles&amp;amp;quot;))&amp;lt;/code&amp;gt;) with a sealed &amp;lt;code&amp;gt;StepDecision&amp;lt;/code&amp;gt; and a handful of records gave the compiler enough visibility to catch every new branch. Native function calling’s &amp;lt;code&amp;gt;StopReason&amp;lt;/code&amp;gt; enum did the same job for provider responses.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Default methods are the migration tool nobody talks about.&amp;lt;/strong&amp;gt; Both Iteration 12’s &amp;lt;code&amp;gt;chatWithTools&amp;lt;/code&amp;gt; rollout and Iteration 13b’s budget migration leaned on the same idiom: an interface with a default implementation, or a config object with a post-construct hook that bridges old fields to new. Both were strictly additive. Neither broke a single existing call site.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;A metric is the entry ticket for a feature flag.&amp;lt;/strong&amp;gt; Every flag we added — schema enforcement, native tool calling, the critic — landed together with a counter that distinguishes “the flag is on” from “the flag is doing work”. Without that, “we have schema validation now” is a press release. With it, it’s an operations capability.&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;h2 id=&amp;quot;summary-of-key-takeaways&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#summary-of-key-takeaways&amp;quot;&amp;gt;Summary of Key Takeaways&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;After thirteen iterations — nine on the agent’s behaviour, four on the architecture around it — these are the principles I’d carry into any agentic system:&amp;lt;/p&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;Principle&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Detail&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Inversion of control&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;The agent drives the logic; your app is the execution environment.&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Choose the split wisely&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Keep deterministic steps in the app; delegate reasoning to the LLM. Each side has clear trade-offs.&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Context is everything&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Invest heavily in smart, dynamic context gathering.&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Cap your iteration loops&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;LLMs improve with feedback, but after 3 failed attempts, change your strategy — don’t just retry.&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Let the agent ask for more&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Never assume you’ve provided enough information upfront.&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Define a protocol, but be resilient&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Structured output formats are essential, but the AI will violate them. Build robust parsers.&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Minimise token waste&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Use summaries, targeted context, and compact tool requests to keep the context window focused.&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Delegate validation to the agent&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;The AI knows the build system better than a hardcoded checker.&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Minimise the toolset&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Expose only the tools strictly needed. Every additional tool is a security risk and a source of complexity.&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Prefer tool requests over patch protocols&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;If diff parsing keeps getting more complex, simplify the contract and let the host execute explicit file operations.&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Use the AI to fix AI failures&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;When diff application or parsing fails, ask the AI to resolve it.&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Don’t refactor what you can’t reproduce&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Characterisation tests before invasive loop changes. Sealed &amp;lt;code&amp;gt;StepDecision&amp;lt;/code&amp;gt; types beat string-based control flow.&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Add the strict layer next to the lenient one&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Run schema validation in parallel with the repair heuristics first. Flip enforcement only after measuring violation rates in production.&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Migrate via default methods, not big bangs&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Interface defaults (&amp;lt;code&amp;gt;chatWithTools&amp;lt;/code&amp;gt; delegating to &amp;lt;code&amp;gt;chat&amp;lt;/code&amp;gt;) and &amp;lt;code&amp;gt;@PostConstruct&amp;lt;/code&amp;gt; config bridges let you ship the new contract without breaking the old one.&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;A flag without a metric is a press release&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Every feature toggle ships with a Prometheus counter that distinguishes “on” from “doing work”.&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Fail open on optional critique&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;A self-critic that blocks PRs on a parse error or network blip is worse than no critic. APPROVE on uncertainty; only block on explicit, well-formed &amp;lt;code&amp;gt;ITERATE&amp;lt;/code&amp;gt; / &amp;lt;code&amp;gt;ABORT&amp;lt;/code&amp;gt;.&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;p&amp;gt;Building agentic systems is an exercise in designing for uncertainty. The AI is powerful but imprecise. Your surrounding infrastructure must be resilient, adaptive, and willing to hand control back to the agent when deterministic approaches fail. The result is a system that is more capable than either part alone.&amp;lt;/p&amp;gt;
</content>
    <author>
      <name>Tom Seidel</name>
    </author>
    <category term="AI"/>
    <category term="LLM"/>
    <category term="Agents"/>
    <category term="Java"/>
    <category term="Software Architecture"/>
  </entry>
  <entry>
    <title>Restic Explorer 1.0 — A Lightweight Monitoring Dashboard for Restic Backups</title>
    <link href="https://remus-software.org/articles/rest-explorer-1-0-released/" rel="alternate" type="text/html"/>
    <id>https://remus-software.org/articles/rest-explorer-1-0-released/</id>
    <published>2026-04-04T00:00:00.000Z</published>
    <updated>2026-04-04T00:00:00.000Z</updated>
    <summary>Restic Explorer 1.0 is out — a lightweight, self-hosted web dashboard that monitors all restic backup repositories across S3, Azure, SFTP, REST, and Rclone from a single UI with automated scans, integrity checks, and retention policy tracking.</summary>
    <content type="html">&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Backups are only as good as the confidence that they actually work.&amp;lt;/strong&amp;gt; Restic Explorer 1.0 is now available — a focused, self-hosted web dashboard that provides exactly that confidence for all &amp;lt;a href=&amp;quot;https://restic.net/&amp;quot;&amp;gt;restic&amp;lt;/a&amp;gt; repositories in one place.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;img src=&amp;quot;https://raw.githubusercontent.com/tmseidel/restic-explorer/main/docs/screenshot_dashboard.png&amp;quot; alt=&amp;quot;Restic Explorer Dashboard&amp;quot;&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;the-problem&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-problem&amp;quot;&amp;gt;The Problem&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Restic is an outstanding backup tool. Fast, encrypted, deduplicated — it has become the go-to choice for backing up servers, NAS devices, and cloud workloads. But restic is a CLI tool by design. When running multiple repositories across different backends — S3 buckets, Azure Blob, SFTP servers — keeping track of &amp;lt;em&amp;gt;“is everything still running?”&amp;lt;/em&amp;gt; becomes a chore. It often means writing shell scripts, parsing JSON output, wiring up cron jobs, and hoping someone notices when something breaks.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Existing monitoring solutions are excellent pieces of software, but they tend to come with far more complexity than many use cases require: agent-based architectures, extensive plugin systems, or dashboards designed for hundreds of repositories across large teams. For operators who simply need a single pane of glass that answers &amp;lt;strong&amp;gt;are the backups running, are they healthy, and do they meet retention requirements?&amp;lt;/strong&amp;gt; — a lighter approach is needed.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;the-solution&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-solution&amp;quot;&amp;gt;The Solution&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Restic Explorer is that single pane of glass. It connects directly to restic repositories — wherever they live — and provides:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Multi-Repository Dashboard&amp;lt;/strong&amp;gt; — status of all repos at a glance with color-coded badges (green/red/amber)&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Automated Scanning&amp;lt;/strong&amp;gt; — scheduled &amp;lt;code&amp;gt;restic snapshots&amp;lt;/code&amp;gt; calls cache metadata for fast browsing without CLI round-trips&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Integrity Checks&amp;lt;/strong&amp;gt; — scheduled &amp;lt;code&amp;gt;restic check --read-data&amp;lt;/code&amp;gt; runs with configurable intervals per repository&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Retention Policy Monitoring&amp;lt;/strong&amp;gt; — daily/weekly/monthly/yearly rules with soft warnings when snapshots fall short&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Health Endpoint&amp;lt;/strong&amp;gt; — &amp;lt;code&amp;gt;/actuator/health&amp;lt;/code&amp;gt; JSON endpoint reporting per-repo status, ready for Uptime Kuma, Prometheus, or any HTTP health checker&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Snapshot Browser&amp;lt;/strong&amp;gt; — paginated, sortable snapshot list with a dedicated detail page showing paths, tags, hostname, and size&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Lock Detection&amp;lt;/strong&amp;gt; — automatic stale lock detection with one-click unlock&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Encrypted Credentials&amp;lt;/strong&amp;gt; — AES-256-GCM encryption at rest for repository passwords and backend keys&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h3 id=&amp;quot;five-backends%2C-one-ui&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#five-backends%2C-one-ui&amp;quot;&amp;gt;Five Backends, One UI&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;Backend&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;What it covers&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;S3 / S3-Compatible&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;AWS S3, MinIO, Wasabi, Backblaze B2 (S3 API)&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Azure Blob Storage&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Native Azure integration&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;SFTP&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Any SSH-accessible server, key-based auth&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;REST Server&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Restic’s own REST backend with optional HTTP auth&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Rclone&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Google Drive, Dropbox, OneDrive, B2, and 40+ more via rclone&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;h2 id=&amp;quot;getting-started-in-60-seconds&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#getting-started-in-60-seconds&amp;quot;&amp;gt;Getting Started in 60 Seconds&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;The fastest way to get running is Docker Compose:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-yaml&amp;quot;&amp;gt;services:
  app:
    image: tmseidel/restic-explorer:latest
    ports:
      - &amp;amp;quot;8080:8080&amp;amp;quot;
    environment:
      SPRING_PROFILES_ACTIVE: docker
      DB_HOST: db
      DB_PORT: 5432
      DB_NAME: resticexplorer
      DB_USER: resticexplorer
      DB_PASSWORD: resticexplorer
    depends_on:
      db:
        condition: service_healthy
    restart: unless-stopped

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: resticexplorer
      POSTGRES_USER: resticexplorer
      POSTGRES_PASSWORD: resticexplorer
    volumes:
      - db-data:/var/lib/postgresql/data
    healthcheck:
      test: [&amp;amp;quot;CMD-SHELL&amp;amp;quot;, &amp;amp;quot;pg_isready -U resticexplorer&amp;amp;quot;]
      interval: 10s
      timeout: 5s
      retries: 5
    restart: unless-stopped

volumes:
  db-data:
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;docker compose up -d
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;Open &amp;lt;code&amp;gt;http://localhost:8080&amp;lt;/code&amp;gt;, create the admin account, and start adding repositories. That’s it.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;The image ships with restic, rclone, and openssh-client pre-installed — no additional setup required for any backend type.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;why-restic-is-a-great-fit-for-cloud-%26-infrastructure-as-code&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#why-restic-is-a-great-fit-for-cloud-%26-infrastructure-as-code&amp;quot;&amp;gt;Why Restic is a Great Fit for Cloud &amp;amp;amp; Infrastructure-as-Code&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;For teams managing cloud infrastructure through Terraform, Ansible, Pulumi, or similar tools, restic fits naturally into the workflow:&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;stateless-by-design&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#stateless-by-design&amp;quot;&amp;gt;Stateless by Design&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Restic repositories are self-contained. There is no central server, no daemon, no database to maintain. A repository is just a structured set of encrypted blobs in any storage backend. This makes restic trivially reproducible — IaC can provision the storage bucket and the backup job in the same run.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;backend-agnostic&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#backend-agnostic&amp;quot;&amp;gt;Backend Agnostic&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Moving from AWS to Azure? Migrating from on-prem to cloud? Restic’s backend abstraction means the backup strategy isn’t tied to a vendor. A Terraform module provisions an S3 bucket today; tomorrow it provisions Azure Blob Storage. The restic commands stay the same.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;encryption-without-infrastructure&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#encryption-without-infrastructure&amp;quot;&amp;gt;Encryption Without Infrastructure&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Restic encrypts everything client-side. There is no need for a KMS, a Vault instance, or an HSM for backup encryption. One password, stored in the secrets manager of choice, and data is encrypted at rest regardless of the storage backend’s capabilities.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;deduplication-saves-cloud-storage-costs&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#deduplication-saves-cloud-storage-costs&amp;quot;&amp;gt;Deduplication Saves Cloud Storage Costs&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Restic’s content-defined chunking and deduplication means incremental backups are genuinely incremental — even across different source machines backing up to the same repository. In cloud environments where storage is metered, this translates directly to lower costs.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;scriptable-and-composable&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#scriptable-and-composable&amp;quot;&amp;gt;Scriptable and Composable&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Restic is a CLI tool that outputs JSON. It composes perfectly with cron, systemd timers, CI/CD pipelines, and container sidecars. No agents to install, no ports to open, no protocols to configure — just a binary and a repository URL.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Restic Explorer adds the monitoring layer on top: existing restic workflows remain untouched, and Restic Explorer watches the repositories and surfaces issues when they need attention.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;what%E2%80%99s-in-1.0&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#what%E2%80%99s-in-1.0&amp;quot;&amp;gt;What’s in 1.0&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;This release marks the point where the feature set is stable, tested, and production-ready:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Five backend types&amp;lt;/strong&amp;gt; — S3, Azure, SFTP, REST, Rclone&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Repository groups&amp;lt;/strong&amp;gt; — organize repos by team, environment, or purpose&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Configurable scan and check intervals&amp;lt;/strong&amp;gt; per repository&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Retention policy monitoring&amp;lt;/strong&amp;gt; with violation warnings&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Error log&amp;lt;/strong&amp;gt; with date filtering and auto-cleanup&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Dark mode&amp;lt;/strong&amp;gt; with automatic theme detection&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Health &amp;amp;amp; info endpoints&amp;lt;/strong&amp;gt; for external monitoring integration&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Admin-only download&amp;lt;/strong&amp;gt; of snapshots as &amp;lt;code&amp;gt;.tar&amp;lt;/code&amp;gt; archives&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Encrypted credential storage&amp;lt;/strong&amp;gt; (AES-256-GCM)&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Docker image&amp;lt;/strong&amp;gt; running as non-root user with built-in healthcheck&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;Snapshots&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Snapshot Detail&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;img src=&amp;quot;https://raw.githubusercontent.com/tmseidel/restic-explorer/main/docs/screenshot_snapshots.png&amp;quot; alt=&amp;quot;Snapshots&amp;quot;&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;img src=&amp;quot;https://raw.githubusercontent.com/tmseidel/restic-explorer/main/docs/screenshot_snapshot.png&amp;quot; alt=&amp;quot;Detail&amp;quot;&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;h2 id=&amp;quot;get-it&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#get-it&amp;quot;&amp;gt;Get It&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Docker Hub&amp;lt;/strong&amp;gt;: &amp;lt;a href=&amp;quot;https://hub.docker.com/r/tmseidel/restic-explorer&amp;quot;&amp;gt;&amp;lt;code&amp;gt;tmseidel/restic-explorer:latest&amp;lt;/code&amp;gt;&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;GitHub&amp;lt;/strong&amp;gt;: &amp;lt;a href=&amp;quot;https://github.com/tmseidel/restic-explorer&amp;quot;&amp;gt;tmseidel/restic-explorer&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Documentation&amp;lt;/strong&amp;gt;: &amp;lt;a href=&amp;quot;https://github.com/tmseidel/restic-explorer/blob/main/docs/USER_GUIDE.md&amp;quot;&amp;gt;User Guide&amp;lt;/a&amp;gt; · &amp;lt;a href=&amp;quot;https://github.com/tmseidel/restic-explorer/blob/main/docs/CONFIGURATION.md&amp;quot;&amp;gt;Configuration&amp;lt;/a&amp;gt; · &amp;lt;a href=&amp;quot;https://github.com/tmseidel/restic-explorer/blob/main/docs/ARCHITECTURE.md&amp;quot;&amp;gt;Architecture&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;Licensed under MIT. Contributions, issues, and feedback welcome.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;em&amp;gt;Restic Explorer is built with Spring Boot 4, Thymeleaf, and Bootstrap 5. It runs as a single container alongside PostgreSQL and requires no additional infrastructure beyond what is already in place.&amp;lt;/em&amp;gt;&amp;lt;/p&amp;gt;
</content>
    <author>
      <name>Tom Seidel</name>
    </author>
    <category term="backup"/>
    <category term="Self-Hosting"/>
    <category term="news"/>
    <category term="Restic"/>
  </entry>
  <entry>
    <title>From Legacy to Lean: Rethinking Your Backup Strategy</title>
    <link href="https://remus-software.org/articles/replacing-veeam-with-restic/" rel="alternate" type="text/html"/>
    <id>https://remus-software.org/articles/replacing-veeam-with-restic/</id>
    <published>2026-03-29T00:00:00.000Z</published>
    <updated>2026-03-29T00:00:00.000Z</updated>
    <summary>How we replaced a costly, complex backup system with a simple shell script and S3 storage — and the key questions to ask before you do the same.</summary>
    <content type="html">&amp;lt;h1 id=&amp;quot;&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;&amp;lt;/h1&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;We ditched our expensive, bloated backup platform for a shell script and S3. Here’s how — and what to think about before you do the same.&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;the-problem-nobody-wants-to-touch&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-problem-nobody-wants-to-touch&amp;quot;&amp;gt;The Problem Nobody Wants to Touch&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Let’s be honest: most backup systems are set up once and then nobody looks at them again. They just… run. Hopefully.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;We were in that exact spot. A centralized commercial backup server on Windows, proprietary agents on every machine, enterprise licenses, the whole deal. It worked — until it didn’t:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;The config kept breaking.&amp;lt;/strong&amp;gt; More than once, the backup server’s internal state got corrupted. Trying to add a new backup job? Error dialog. Can’t configure anything until someone fixes it manually.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Way too much overhead.&amp;lt;/strong&amp;gt; Each server needed a proprietary agent, a service user, SSH access, firewall rules — all for what’s basically “copy some files somewhere safe.”&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;We used 5% of the features.&amp;lt;/strong&amp;gt; Bare-metal recovery? Granular restore? Application-aware snapshots? We never used any of that. Our servers are provisioned with automation — we can rebuild them from scratch. We just needed the &amp;lt;em&amp;gt;data&amp;lt;/em&amp;gt;.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;It cost real money.&amp;lt;/strong&amp;gt; A Windows Server with commercial licenses, just to store backups. For a team that runs Linux everywhere else, that’s an expensive oddball.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;before-you-migrate%3A-ask-yourself-these-questions&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#before-you-migrate%3A-ask-yourself-these-questions&amp;quot;&amp;gt;Before You Migrate: Ask Yourself These Questions&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Don’t jump to a new tool just because the old one annoys you. Think it through first:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;What are you actually backing up?&amp;lt;/strong&amp;gt; If your servers can be rebuilt from code, you probably just need data-level backups (database dumps, config files), not full disk images.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Have you ever restored from backup?&amp;lt;/strong&amp;gt; If the answer is “uh, I think so?” — that’s your real problem, regardless of the tool.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;What’s the total cost?&amp;lt;/strong&amp;gt; Licenses + the server it runs on + agent maintenance + engineer time spent debugging weird issues.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Do you get alerts when a backup fails?&amp;lt;/strong&amp;gt; A backup that silently breaks is worse than no backup at all.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Is backup part of your provisioning?&amp;lt;/strong&amp;gt; If setting up backup for a new server is a separate manual process, it &amp;lt;em&amp;gt;will&amp;lt;/em&amp;gt; get skipped eventually.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;what-we-switched-to&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#what-we-switched-to&amp;quot;&amp;gt;What We Switched To&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;We landed on &amp;lt;a href=&amp;quot;https://restic.net/&amp;quot;&amp;gt;Restic&amp;lt;/a&amp;gt; — open-source, encrypts everything, deduplicates, compresses, and stores to any S3-compatible backend. It’s in the default Debian repos. Install is literally &amp;lt;code&amp;gt;apt install restic&amp;lt;/code&amp;gt;.&amp;lt;/p&amp;gt;
&amp;lt;table&amp;gt;
&amp;lt;thead&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;th&amp;gt;&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Old System&amp;lt;/th&amp;gt;
&amp;lt;th&amp;gt;Restic&amp;lt;/th&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/thead&amp;gt;
&amp;lt;tbody&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Install&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Proprietary repo + agent + service user + firewall rules&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;apt install restic&amp;lt;/code&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Storage&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Dedicated Windows backup server&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Any S3-compatible object storage&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Config&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;GUI on backup server&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Environment variables + shell script&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Licensing&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Per-server commercial license&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Free&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;tr&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;strong&amp;gt;Restore&amp;lt;/strong&amp;gt;&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;Through backup server UI&amp;lt;/td&amp;gt;
&amp;lt;td&amp;gt;&amp;lt;code&amp;gt;restic restore&amp;lt;/code&amp;gt; from anywhere&amp;lt;/td&amp;gt;
&amp;lt;/tr&amp;gt;
&amp;lt;/tbody&amp;gt;
&amp;lt;/table&amp;gt;
&amp;lt;p&amp;gt;When picking any replacement tool, look for: simple deployment, storage flexibility (don’t get locked in), full CLI scriptability, client-side encryption, active community, and built-in retention management.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;the-architecture&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-architecture&amp;quot;&amp;gt;The Architecture&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Here’s what we ended up with — three layers:&amp;lt;/p&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;graph TB
    subgraph servers[&amp;quot;Servers&amp;quot;]
        native[&amp;quot;&amp;lt;b&amp;gt;Native App&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;pg_dumpall → gzip&amp;lt;br/&amp;gt;→ restic backup&amp;quot;]
        docker[&amp;quot;&amp;lt;b&amp;gt;Docker App&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;docker exec → pg_dump&amp;lt;br/&amp;gt;→ gzip → restic backup&amp;quot;]
        legacy[&amp;quot;&amp;lt;b&amp;gt;Legacy App&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;mysqldump&amp;lt;br/&amp;gt;→ legacy agent&amp;quot;]
    end

    subgraph storage[&amp;quot;Storage Layer&amp;quot;]
        s3[&amp;quot;&amp;lt;b&amp;gt;S3-Compatible Object Store&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;One bucket per project&amp;quot;]
        legacysrv[&amp;quot;&amp;lt;b&amp;gt;Legacy Backup Server&amp;lt;/b&amp;gt;&amp;quot;]
    end

    subgraph monitoring[&amp;quot;Monitoring Layer&amp;quot;]
        explorer[&amp;quot;&amp;lt;b&amp;gt;Backup Explorer&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Browse repos,&amp;lt;br/&amp;gt;check health&amp;quot;]
        heartbeat[&amp;quot;&amp;lt;b&amp;gt;Heartbeat Monitor&amp;lt;/b&amp;gt;&amp;lt;br/&amp;gt;Push-based alerts on&amp;lt;br/&amp;gt;success / failure&amp;quot;]
    end

    native -- &amp;quot;Restic + S3&amp;quot; --&amp;gt; s3
    docker -- &amp;quot;Restic + S3&amp;quot; --&amp;gt; s3
    legacy -- &amp;quot;Legacy Agent&amp;quot; --&amp;gt; legacysrv
    s3 --&amp;gt; explorer
    s3 --&amp;gt; heartbeat
&amp;lt;/div&amp;gt;&amp;lt;p&amp;gt;A few rules we learned the hard way:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;One bucket per project.&amp;lt;/strong&amp;gt; Never mix backups from different apps in the same bucket. Isolation, access control, cost tracking — all easier this way.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Every backup is individual.&amp;lt;/strong&amp;gt; A Postgres DB needs &amp;lt;code&amp;gt;pg_dumpall&amp;lt;/code&amp;gt;. A Docker service needs &amp;lt;code&amp;gt;docker compose exec&amp;lt;/code&amp;gt;. A VPN server needs its config files. There’s no universal “back up everything” script. Write one per app.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Credentials go in a team vault.&amp;lt;/strong&amp;gt; If the person who set up the backup leaves, you don’t want the passwords leaving with them.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;the-script-pattern&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-script-pattern&amp;quot;&amp;gt;The Script Pattern&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;After iterating across a bunch of projects, we settled on a template every backup job follows:&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;#!/usr/bin/env bash
set -euo pipefail

source /opt/app/.restic-env

# Error trap — always notify on failure
trap &amp;#039;notify_monitor &amp;amp;quot;down&amp;amp;quot; &amp;amp;quot;Backup failed&amp;amp;quot;; rm -f &amp;amp;quot;${DUMP_FILE}&amp;amp;quot;; exit 1&amp;#039; ERR

# Init repo if first run
restic snapshots &amp;amp;gt; /dev/null 2&amp;amp;gt;&amp;amp;amp;1 || restic init

# Create the dump (customize this per app)
pg_dumpall | gzip &amp;amp;gt; &amp;amp;quot;${DUMP_FILE}&amp;amp;quot;

# Don&amp;#039;t upload empty dumps
[[ -s &amp;amp;quot;${DUMP_FILE}&amp;amp;quot; ]] || { notify_monitor &amp;amp;quot;down&amp;amp;quot; &amp;amp;quot;Empty dump&amp;amp;quot;; exit 1; }

# Upload, clean up, prune old snapshots
restic backup &amp;amp;quot;${DUMP_FILE}&amp;amp;quot; --tag app-name
rm -f &amp;amp;quot;${DUMP_FILE}&amp;amp;quot;
restic forget --keep-daily 30 --keep-weekly 8 --keep-monthly 12 --prune

# All good
notify_monitor &amp;amp;quot;up&amp;amp;quot; &amp;amp;quot;OK&amp;amp;quot;
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;The important bits: the &amp;lt;strong&amp;gt;error trap&amp;lt;/strong&amp;gt; makes sure you hear about failures. The &amp;lt;strong&amp;gt;empty-dump check&amp;lt;/strong&amp;gt; catches silent breakage (like a database dump that exits 0 but produces nothing). &amp;lt;strong&amp;gt;Retention runs on every backup&amp;lt;/strong&amp;gt;, not as a separate task. And &amp;lt;strong&amp;gt;tags&amp;lt;/strong&amp;gt; let you filter snapshots later.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;With default retention (30 daily, 8 weekly, 12 monthly) you end up with about 44 snapshots at any given time — good granularity without blowing up storage.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;monitoring%3A-don%E2%80%99t-skip-this&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#monitoring%3A-don%E2%80%99t-skip-this&amp;quot;&amp;gt;Monitoring: Don’t Skip This&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Two layers — you need both:&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Heartbeat monitoring:&amp;lt;/strong&amp;gt; Every backup script pings a monitor on success or failure (we use &amp;lt;a href=&amp;quot;https://github.com/louislam/uptime-kuma&amp;quot;&amp;gt;Uptime Kuma&amp;lt;/a&amp;gt;, but anything push-based works). If no ping arrives within 26 hours → alert. This catches script failures, cron being broken, and servers being down.&amp;lt;/p&amp;gt;
&amp;lt;pre&amp;gt;&amp;lt;code class=&amp;quot;language-bash&amp;quot;&amp;gt;curl -sf &amp;amp;quot;${MONITOR_URL}?status=up&amp;amp;amp;msg=OK&amp;amp;quot;       # on success
curl -sf &amp;amp;quot;${MONITOR_URL}?status=down&amp;amp;amp;msg=Failed&amp;amp;quot;  # in error trap
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Repository browser:&amp;lt;/strong&amp;gt; A heartbeat tells you &amp;lt;em&amp;gt;if&amp;lt;/em&amp;gt; the backup ran. A browser tells you &amp;lt;em&amp;gt;what’s in it&amp;lt;/em&amp;gt; — snapshot counts, sizes, retention compliance, integrity checks. This catches things like backups that “succeed” but are suspiciously small.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;how-to-actually-migrate&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#how-to-actually-migrate&amp;quot;&amp;gt;How to Actually Migrate&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Don’t flip the switch overnight. We did it in phases:&amp;lt;/p&amp;gt;
&amp;lt;ol&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;New servers get the new tool from day one.&amp;lt;/strong&amp;gt; Zero risk, no migration needed.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Old servers run both systems in parallel.&amp;lt;/strong&amp;gt; Set up the new backup alongside the legacy one.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Test restores from the new backup.&amp;lt;/strong&amp;gt; Actually restore on a test environment. Verify the data.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Remove the legacy agent per server&amp;lt;/strong&amp;gt; after the new backup has been solid for a couple of months.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Kill the legacy server last&amp;lt;/strong&amp;gt; — only after every server is migrated and validated.&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;p&amp;gt;Don’t rush step 4. Storage is cheap. Lost data is not.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;h2 id=&amp;quot;tl%3Bdr&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#tl%3Bdr&amp;quot;&amp;gt;TL;DR&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;If your servers are provisioned from code, you don’t need image-level backups. Just back up the data.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Write a backup script per application — there is no one-size-fits-all.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Monitor everything. Heartbeats for “did it run?”, a browser for “what’s in it?”&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Bake backup into your provisioning. If it’s manual, it’ll get skipped.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Test your restores. A backup you’ve never restored from is a hope, not a strategy.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Migrate gradually. Parallel-run, validate, then decommission.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;A shell script, a cron job, encrypted uploads to S3, and a heartbeat ping. That’s the whole system. No servers, no GUI, no licenses.&amp;lt;/p&amp;gt;
&amp;lt;hr&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;em&amp;gt;The best backup system is the one your team actually understands, maintains, and tests.&amp;lt;/em&amp;gt;&amp;lt;/p&amp;gt;
</content>
    <author>
      <name>Tom Seidel</name>
    </author>
    <category term="backup"/>
    <category term="Self-Hosting"/>
    <category term="DevOps"/>
    <category term="Restic"/>
  </entry>
  <entry>
    <title>Evaluating Self-Hosted AI Services: A Translation Service Case Study</title>
    <link href="https://remus-software.org/articles/self-hosted-ai-translation-service/" rel="alternate" type="text/html"/>
    <id>https://remus-software.org/articles/self-hosted-ai-translation-service/</id>
    <published>2026-02-02T00:00:00.000Z</published>
    <updated>2026-02-02T00:00:00.000Z</updated>
    <summary>A practical evaluation of replacing DeepL with a self-hosted translation service using open-source LLMs — comparing quality, performance, and cost.</summary>
    <content type="html">&amp;lt;p&amp;gt;With freely available large language models now widely accessible, it has become straightforward to self-host software that was previously only available through commercial providers. The key question always comes down to the resulting costs and the effort involved.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;In this case study, I examined whether the translation service DeepL can be replaced by a self-hosted solution. The goal was to provide a DeepL-compatible REST API that:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;achieves comparable translation quality,&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;offers similar performance, and&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;implements the same REST API specification&amp;lt;sup class=&amp;quot;footnote-ref&amp;quot;&amp;gt;&amp;lt;a href=&amp;quot;#fn1&amp;quot; id=&amp;quot;fnref1&amp;quot;&amp;gt;[1]&amp;lt;/a&amp;gt;&amp;lt;/sup&amp;gt;,&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;in order to then compare the one-time and ongoing costs. Using the DeepL API requires a paid subscription; while the pay-as-you-go model is transparent, it can become very expensive with heavy usage. Additionally, data leaves the corporate network, and the API’s behaviour under heavy load is not fully transparent.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;choosing-a-suitable-local-model&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#choosing-a-suitable-local-model&amp;quot;&amp;gt;Choosing a Suitable Local Model&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;The first question is which freely available models are suitable for translation tasks. Hugging Face offers a large selection of models that can be easily integrated into custom software&amp;lt;sup class=&amp;quot;footnote-ref&amp;quot;&amp;gt;&amp;lt;a href=&amp;quot;#fn2&amp;quot; id=&amp;quot;fnref2&amp;quot;&amp;gt;[2]&amp;lt;/a&amp;gt;&amp;lt;/sup&amp;gt;. For this evaluation, Meta’s &amp;lt;strong&amp;gt;nllb-200-distilled&amp;lt;/strong&amp;gt; model was chosen, as it is widely used, easy to deploy, and available in three sizes (600M, 1.3B, and 3.3B parameters).&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;implementing-the-deepl-compatible-rest-api&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#implementing-the-deepl-compatible-rest-api&amp;quot;&amp;gt;Implementing the DeepL-Compatible REST API&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;A pragmatic approach was taken for the implementation: a Spring Boot application serves as the API frontend and delegates the actual translation request to a Python Flask component that controls the LLM.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;For easy deployment, the system can be run either:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;in Docker containers, or&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;natively on a Debian/Ubuntu server.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;The goal was a straightforward deployment on various cloud hardware platforms to test quality and performance there. The complete implementation is available on GitHub&amp;lt;sup class=&amp;quot;footnote-ref&amp;quot;&amp;gt;&amp;lt;a href=&amp;quot;#fn3&amp;quot; id=&amp;quot;fnref3&amp;quot;&amp;gt;[3]&amp;lt;/a&amp;gt;&amp;lt;/sup&amp;gt;. Ansible was used for automated native deployment.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;test-%E2%80%94-translation-quality&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#test-%E2%80%94-translation-quality&amp;quot;&amp;gt;Test — Translation Quality&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;The following German reference sentence was used to evaluate translation quality:&amp;lt;/p&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;em&amp;gt;“Sobald der Glasfaser-Ausbau abgeschlossen ist, erhalten Sie eine Mitteilung zum Schaltungstermin und eine Schnell-Start-Anleitung für die Einrichtung des Glasfaser-Anschlusses.”&amp;lt;/em&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;p&amp;gt;DeepL produces the following translation:&amp;lt;/p&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;“Once the fiber optic expansion is complete, you will receive a notification of the activation date and a quick start guide for setting up your fiber optic connection.”&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;p&amp;gt;This translation serves as the reference.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;test-with-nllb-200-distilled-600m&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#test-with-nllb-200-distilled-600m&amp;quot;&amp;gt;Test with nllb-200-distilled-600M&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;The smallest model was first run on a development machine via Docker. Performance was not a concern at this stage. The generated translation was:&amp;lt;/p&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;“Once the glass-faser-Ausbau is closed, you receive a Mitteilung zum Schaltungstermin und eine Schnell-Start-Anleitung für die Einrichtung der Glasfaser-Anschlusses.”&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;img src=&amp;quot;nllb-200-distilled-600M.png&amp;quot; alt=&amp;quot;Response of the small model&amp;quot;&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;test-with-nllb-200-distilled-1.3b&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#test-with-nllb-200-distilled-1.3b&amp;quot;&amp;gt;Test with nllb-200-distilled-1.3B&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;The medium model produced the following output:&amp;lt;/p&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;“The Commission shall inform the Member States of the date of the entry into force of this Regulation.”&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;h3 id=&amp;quot;test-with-nllb-200-distilled-3.3b&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#test-with-nllb-200-distilled-3.3b&amp;quot;&amp;gt;Test with nllb-200-distilled-3.3B&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;The largest model generated the following translation:&amp;lt;/p&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;“Once the glass fibre installation is completed, you will receive a notice on the date of installation and a quick start guide for the installation of the glass fibre connections.”&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;img src=&amp;quot;nllb-200-distilled-3.3B.png&amp;quot; alt=&amp;quot;Response of the large model&amp;quot;&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;translation-quality-conclusion&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#translation-quality-conclusion&amp;quot;&amp;gt;Translation Quality Conclusion&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;A comprehensive assessment is difficult after just a few tests. Nevertheless, it became clear that only the largest model is viable for production use. It was also notable that the models performed significantly more reliably when the source language was English. If translation is exclusively from English, the medium model might therefore be sufficient.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;test-%E2%80%94-performance&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#test-%E2%80%94-performance&amp;quot;&amp;gt;Test — Performance&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Once the suitable model was identified, the next step was to determine under which hardware conditions productive operation is feasible. As a benchmark, it was assumed that translating the reference sentence should take no longer than two seconds. Additionally, the difference between a traditional CPU-based server and a GPU-based system was to be determined.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;test%3A-traditional-server&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#test%3A-traditional-server&amp;quot;&amp;gt;Test: Traditional Server&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;A Hetzner CX53 with 16 vCPUs and 32 GB RAM was used as the CPU server (cost: €17 per month).&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Response time: 12.93 seconds&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;test%3A-gpu-server&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#test%3A-gpu-server&amp;quot;&amp;gt;Test: GPU Server&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;An Amazon g4dn.large with 16 GB GPU RAM (Nvidia) was used as the GPU server. The cost is €0.67 per hour, roughly €500 per month.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Response time: 1.31 seconds&amp;lt;/strong&amp;gt; — GPU memory usage: approx. 13 GB&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;performance-conclusion&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#performance-conclusion&amp;quot;&amp;gt;Performance Conclusion&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;The difference between the two systems was significantly larger than expected. Even without deep knowledge of the internal workings of LLMs, it is clear that productive operation is practically only feasible with GPU-based hardware. Costs on AWS are currently high, but cheaper alternatives exist — for example at Hetzner&amp;lt;sup class=&amp;quot;footnote-ref&amp;quot;&amp;gt;&amp;lt;a href=&amp;quot;#fn4&amp;quot; id=&amp;quot;fnref4&amp;quot;&amp;gt;[4]&amp;lt;/a&amp;gt;&amp;lt;/sup&amp;gt;. The achieved response time is fundamentally suitable for production use. Parallel requests had no significant impact on latency in the tests.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;img src=&amp;quot;nvidia-smi.png&amp;quot; alt=&amp;quot;nvidia-smi output showing GPU memory usage&amp;quot;&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;overall-conclusion&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#overall-conclusion&amp;quot;&amp;gt;Overall Conclusion&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;This evaluation clearly demonstrates that it is possible to self-host AI-based services like machine translation using freely available models and modern hardware — with reasonable effort and competitive quality. While the ongoing costs for GPU-based systems are still relatively high, falling prices and increasing efficiency can be expected as adoption grows and technology advances. Moreover, more affordable hosting alternatives beyond the major cloud providers already exist today.&amp;lt;/p&amp;gt;
&amp;lt;p&amp;gt;Especially in heavily regulated industries — such as finance, healthcare, or the public sector — a self-hosted AI service can offer significant advantages:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Data sovereignty&amp;lt;/strong&amp;gt; is fully preserved, as no sensitive information leaves external systems.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Compliance requirements&amp;lt;/strong&amp;gt; are easier to meet, since infrastructure and data flows are fully controllable.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Performance and scalability&amp;lt;/strong&amp;gt; can be precisely tailored to your own needs.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Competitive advantages&amp;lt;/strong&amp;gt; emerge when you can offer services that are not only cheaper but also more secure and flexible than commercial alternatives.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h2 id=&amp;quot;references&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#references&amp;quot;&amp;gt;References&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;hr class=&amp;quot;footnotes-sep&amp;quot;&amp;gt;
&amp;lt;section class=&amp;quot;footnotes&amp;quot;&amp;gt;
&amp;lt;ol class=&amp;quot;footnotes-list&amp;quot;&amp;gt;
&amp;lt;li id=&amp;quot;fn1&amp;quot; class=&amp;quot;footnote-item&amp;quot;&amp;gt;&amp;lt;p&amp;gt;&amp;lt;a href=&amp;quot;https://developers.deepl.com/docs/getting-started/intro&amp;quot;&amp;gt;DeepL API Documentation&amp;lt;/a&amp;gt; &amp;lt;a href=&amp;quot;#fnref1&amp;quot; class=&amp;quot;footnote-backref&amp;quot;&amp;gt;↩︎&amp;lt;/a&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;li id=&amp;quot;fn2&amp;quot; class=&amp;quot;footnote-item&amp;quot;&amp;gt;&amp;lt;p&amp;gt;&amp;lt;a href=&amp;quot;https://huggingface.co/models?pipeline_tag=translation&amp;quot;&amp;gt;Hugging Face Translation Models&amp;lt;/a&amp;gt; &amp;lt;a href=&amp;quot;#fnref2&amp;quot; class=&amp;quot;footnote-backref&amp;quot;&amp;gt;↩︎&amp;lt;/a&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;li id=&amp;quot;fn3&amp;quot; class=&amp;quot;footnote-item&amp;quot;&amp;gt;&amp;lt;p&amp;gt;&amp;lt;a href=&amp;quot;https://github.com/tmseidel/simple_ai_translation_service&amp;quot;&amp;gt;simple_ai_translation_service on GitHub&amp;lt;/a&amp;gt; &amp;lt;a href=&amp;quot;#fnref3&amp;quot; class=&amp;quot;footnote-backref&amp;quot;&amp;gt;↩︎&amp;lt;/a&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;li id=&amp;quot;fn4&amp;quot; class=&amp;quot;footnote-item&amp;quot;&amp;gt;&amp;lt;p&amp;gt;&amp;lt;a href=&amp;quot;https://www.hetzner.com/dedicated-rootserver/matrix-gpu/&amp;quot;&amp;gt;Hetzner GPU Dedicated Servers&amp;lt;/a&amp;gt; &amp;lt;a href=&amp;quot;#fnref4&amp;quot; class=&amp;quot;footnote-backref&amp;quot;&amp;gt;↩︎&amp;lt;/a&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;/section&amp;gt;
</content>
    <author>
      <name>Tom Seidel</name>
    </author>
    <category term="AI"/>
    <category term="LLM"/>
    <category term="Self-Hosting"/>
    <category term="DevOps"/>
    <category term="Spring Boot"/>
    <category term="Python"/>
  </entry>
  <entry>
    <title>Migrating a Monolith to Microservices: A Practical Guide</title>
    <link href="https://remus-software.org/articles/monolith-to-microservices/" rel="alternate" type="text/html"/>
    <id>https://remus-software.org/articles/monolith-to-microservices/</id>
    <published>2024-03-15T00:00:00.000Z</published>
    <updated>2024-03-15T00:00:00.000Z</updated>
    <summary>A hands-on walkthrough of the architectural decisions and patterns I use when migrating Java monoliths to cloud-native microservices.</summary>
    <content type="html">&amp;lt;p&amp;gt;Migrating a monolithic Java application to microservices is one of the most impactful — and challenging — transformations you can undertake. This article shares the practical approach I’ve refined over multiple engagements.&amp;lt;/p&amp;gt;
&amp;lt;h2 id=&amp;quot;why-migrate%3F&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#why-migrate%3F&amp;quot;&amp;gt;Why Migrate?&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Before touching a single line of code, ask: &amp;lt;em&amp;gt;why are we doing this?&amp;lt;/em&amp;gt; The most common drivers I encounter are:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Deployment bottlenecks&amp;lt;/strong&amp;gt;: A single deployable artifact blocks independent team delivery.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Scalability constraints&amp;lt;/strong&amp;gt;: You need to scale a specific module, not the entire application.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Technology modernisation&amp;lt;/strong&amp;gt;: Teams want to adopt newer frameworks or languages for specific domains.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Organisational growth&amp;lt;/strong&amp;gt;: Conway’s Law — architecture tends to mirror team structure.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;“Never migrate for migration’s sake. Identify the concrete pain point and validate that microservices solve it.”&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;h2 id=&amp;quot;the-strangler-fig-pattern&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#the-strangler-fig-pattern&amp;quot;&amp;gt;The Strangler Fig Pattern&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;My go-to approach is the &amp;lt;strong&amp;gt;Strangler Fig Pattern&amp;lt;/strong&amp;gt;: incrementally replace monolith functionality behind a facade, leaving the monolith running until it’s fully strangled.&amp;lt;/p&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;graph LR
    Client --&amp;gt;|All traffic| Facade[API Gateway / Facade]
    Facade --&amp;gt;|Legacy routes| Monolith[(Monolith)]
    Facade --&amp;gt;|New routes| SvcA[User Service]
    Facade --&amp;gt;|New routes| SvcB[Order Service]
    Monolith -.-&amp;gt;|Shared DB - phase 1| DB[(Database)]
    SvcA --&amp;gt;|Own DB - phase 2| DBA[(Users DB)]
    SvcB --&amp;gt;|Own DB - phase 2| DBB[(Orders DB)]
&amp;lt;/div&amp;gt;&amp;lt;p&amp;gt;This lets you:&amp;lt;/p&amp;gt;
&amp;lt;ol&amp;gt;
&amp;lt;li&amp;gt;Ship value incrementally&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Reduce risk by keeping the fallback running&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Validate each new service before extracting the next&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;h2 id=&amp;quot;identifying-service-boundaries&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#identifying-service-boundaries&amp;quot;&amp;gt;Identifying Service Boundaries&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;p&amp;gt;Domain-Driven Design (DDD) gives us the best tools for finding service boundaries. I use &amp;lt;strong&amp;gt;Event Storming&amp;lt;/strong&amp;gt; workshops to:&amp;lt;/p&amp;gt;
&amp;lt;ol&amp;gt;
&amp;lt;li&amp;gt;Map all domain events with the business team&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Identify &amp;lt;strong&amp;gt;bounded contexts&amp;lt;/strong&amp;gt; — areas with consistent language and ownership&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Use bounded contexts as candidate service boundaries&amp;lt;/li&amp;gt;
&amp;lt;/ol&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;graph TD
    subgraph &amp;quot;Order Context&amp;quot;
        OE1[OrderPlaced]
        OE2[OrderConfirmed]
        OE3[OrderShipped]
    end
    subgraph &amp;quot;Inventory Context&amp;quot;
        IE1[StockReserved]
        IE2[StockReleased]
    end
    subgraph &amp;quot;Notification Context&amp;quot;
        NE1[EmailSent]
        NE2[SMSSent]
    end
    OE2 --&amp;gt; IE1
    OE3 --&amp;gt; NE1
    IE2 --&amp;gt; NE2
&amp;lt;/div&amp;gt;&amp;lt;h2 id=&amp;quot;practical-steps&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#practical-steps&amp;quot;&amp;gt;Practical Steps&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;h3 id=&amp;quot;1.-start-with-the-api-layer&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#1.-start-with-the-api-layer&amp;quot;&amp;gt;1. Start with the API Layer&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Deploy an &amp;lt;strong&amp;gt;API Gateway&amp;lt;/strong&amp;gt; (AWS API Gateway, Kong, or a simple Spring Cloud Gateway) in front of the monolith. This gives you:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;A single entry point for traffic&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;The ability to route selectively to new services&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;A foundation for cross-cutting concerns (auth, rate limiting, logging)&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h3 id=&amp;quot;2.-extract-stateless-services-first&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#2.-extract-stateless-services-first&amp;quot;&amp;gt;2. Extract Stateless Services First&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;Pick a bounded context that:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;Has clear, stable APIs&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Is relatively self-contained&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Has low coupling to the rest of the monolith&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;Notification services, reporting modules, and authentication are often good first targets.&amp;lt;/p&amp;gt;
&amp;lt;h3 id=&amp;quot;3.-database-decomposition&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#3.-database-decomposition&amp;quot;&amp;gt;3. Database Decomposition&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;The hardest part. Never share a database between the monolith and a new service in the long run. The interim approach:&amp;lt;/p&amp;gt;
&amp;lt;div class=&amp;quot;mermaid&amp;quot;&amp;gt;sequenceDiagram
    participant New Service
    participant Monolith
    participant Shared DB
    participant New DB

    Note over New Service, Shared DB: Phase 1 – Dual Write
    New Service-&amp;gt;&amp;gt;Shared DB: Write (compatibility)
    New Service-&amp;gt;&amp;gt;New DB: Write (new schema)
    Monolith-&amp;gt;&amp;gt;Shared DB: Read/Write

    Note over New Service, New DB: Phase 2 – Cutover
    New Service-&amp;gt;&amp;gt;New DB: Write only
    Monolith-&amp;gt;&amp;gt;Shared DB: Read/Write (deprecated path)
&amp;lt;/div&amp;gt;&amp;lt;h3 id=&amp;quot;4.-embrace-eventual-consistency&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#4.-embrace-eventual-consistency&amp;quot;&amp;gt;4. Embrace Eventual Consistency&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&amp;lt;p&amp;gt;With separate services comes eventual consistency. Use &amp;lt;strong&amp;gt;domain events&amp;lt;/strong&amp;gt; over synchronous REST calls wherever possible:&amp;lt;/p&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;Publish events to a message broker (Kafka, RabbitMQ)&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Services subscribe to relevant events&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Saga pattern for distributed transactions&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;h2 id=&amp;quot;key-takeaways&amp;quot; tabindex=&amp;quot;-1&amp;quot;&amp;gt;&amp;lt;a class=&amp;quot;header-anchor&amp;quot; href=&amp;quot;#key-takeaways&amp;quot;&amp;gt;Key Takeaways&amp;lt;/a&amp;gt;&amp;lt;/h2&amp;gt;
&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Migrate iteratively&amp;lt;/strong&amp;gt; — the Strangler Fig pattern is your friend.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Define clear boundaries&amp;lt;/strong&amp;gt; using DDD bounded contexts.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Decouple the database&amp;lt;/strong&amp;gt; as a separate, explicit step.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Invest in observability&amp;lt;/strong&amp;gt; early — distributed tracing (Jaeger, Zipkin) and centralised logging (ELK stack) become essential.&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;strong&amp;gt;Automate everything&amp;lt;/strong&amp;gt; — CI/CD per service, infrastructure as code, automated testing.&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;p&amp;gt;The migration journey is long, but each extracted service pays dividends in team autonomy and deployment velocity. Start small, validate, and build momentum.&amp;lt;/p&amp;gt;
</content>
    <author>
      <name>Tom Seidel</name>
    </author>
    <category term="Java"/>
    <category term="Microservices"/>
    <category term="Cloud"/>
    <category term="Architecture"/>
  </entry>
</feed>

