<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://edihasaj.com/atom.xml" rel="self" type="application/atom+xml" /><link href="https://edihasaj.com/" rel="alternate" type="text/html" /><updated>2026-06-07T09:32:47+00:00</updated><id>https://edihasaj.com/atom.xml</id><title type="html">Edi Hasaj</title><subtitle>Software, AI and Thinking.</subtitle><author><name>Edi Hasaj</name></author><entry><title type="html">Universal Memory Protocol: Simple Agent Memory That Moves</title><link href="https://edihasaj.com/posts/universal-memory-protocol-simple-agent-memory" rel="alternate" type="text/html" title="Universal Memory Protocol: Simple Agent Memory That Moves" /><published>2026-06-07T00:00:00+00:00</published><updated>2026-06-07T00:00:00+00:00</updated><id>https://edihasaj.com/posts/universal-memory-protocol-simple-agent-memory</id><content type="html" xml:base="https://edihasaj.com/posts/universal-memory-protocol-simple-agent-memory"><![CDATA[<p>I built <a href="https://github.com/edihasaj/recall">Recall</a> because I was tired of correcting coding agents about the same repo rules every week. That solved a local problem. The agent would remember “use pnpm here”, “run this gate before handoff”, “do not touch this folder”, and the next session would start with that context already loaded.</p>

<p>Then I hit the next problem.</p>

<p>The memory was useful, but it was trapped in one shape. Claude Code had one way to hold project rules. Codex had another. ChatGPT had another. Local agents had files. Some frameworks had vector stores. Some had graph stores. Some had nothing. Every tool was learning the same things about me and my projects, but none of them could share those memories in a clean way.</p>

<p>That is why I started <a href="https://github.com/edihasaj/universal-memory-protocol">Universal Memory Protocol</a>, or UMP.</p>

<p>The simple version:</p>

<blockquote>
  <p>MCP lets agents use tools. A2A lets agents talk to agents. UMP lets agents carry memory.</p>
</blockquote>

<p>It is not meant to be a new database. It is not meant to replace Recall, Mem0, Letta, Zep, Obsidian, Postgres, SQLite, Redis, or a vector index. It is the small contract between them.</p>

<figure class="post-figure">
<svg viewBox="0 0 760 350" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="Universal Memory Protocol structure" style="width:100%;height:auto;color:currentColor">
  <style>
    .ttl { font: 700 16px ui-sans-serif, system-ui, sans-serif; fill: currentColor }
    .sub { font: 12px ui-sans-serif, system-ui, sans-serif; fill: currentColor; opacity: 0.74 }
    .lbl { font: 700 13px ui-sans-serif, system-ui, sans-serif; fill: currentColor }
    .txt { font: 12px ui-sans-serif, system-ui, sans-serif; fill: currentColor; opacity: 0.78 }
    .box { fill: none; stroke: currentColor; stroke-width: 1.2; opacity: 0.58 }
    .a { fill: #14b8a6; opacity: 0.16 }
    .b { fill: #f59e0b; opacity: 0.18 }
    .c { fill: #6366f1; opacity: 0.15 }
    .d { fill: #ef4444; opacity: 0.12 }
    .line { stroke: currentColor; stroke-width: 1.4; opacity: 0.42; fill: none }
  </style>

  <text x="28" y="34" class="ttl">UMP is three boring pieces</text>
  <text x="28" y="54" class="sub">one record shape, one tiny operation set, three ways to connect</text>

  <rect x="40" y="82" width="200" height="210" rx="8" class="a" />
  <rect x="40" y="82" width="200" height="210" rx="8" class="box" />
  <text x="66" y="116" class="lbl">1. Portable record</text>
  <text x="66" y="146" class="txt">kind</text>
  <text x="66" y="166" class="txt">body</text>
  <text x="66" y="186" class="txt">scope</text>
  <text x="66" y="206" class="txt">time</text>
  <text x="66" y="226" class="txt">provenance</text>
  <text x="66" y="246" class="txt">consent</text>
  <text x="66" y="266" class="txt">integrity</text>

  <rect x="280" y="82" width="200" height="210" rx="8" class="b" />
  <rect x="280" y="82" width="200" height="210" rx="8" class="box" />
  <text x="306" y="116" class="lbl">2. Six operations</text>
  <text x="306" y="146" class="txt">capabilities</text>
  <text x="306" y="166" class="txt">recall</text>
  <text x="306" y="186" class="txt">remember</text>
  <text x="306" y="206" class="txt">get</text>
  <text x="306" y="226" class="txt">revise</text>
  <text x="306" y="246" class="txt">forget</text>
  <text x="306" y="266" class="txt">feedback at L3</text>

  <rect x="520" y="82" width="200" height="210" rx="8" class="c" />
  <rect x="520" y="82" width="200" height="210" rx="8" class="box" />
  <text x="546" y="116" class="lbl">3. Bindings</text>
  <text x="546" y="146" class="txt">MCP tools</text>
  <text x="546" y="166" class="txt">HTTP endpoints</text>
  <text x="546" y="186" class="txt">JSON files</text>
  <text x="546" y="206" class="txt">Markdown files</text>
  <text x="546" y="226" class="txt">.well-known discovery</text>

  <path d="M240 187 H280" class="line" />
  <path d="M480 187 H520" class="line" />

  <rect x="120" y="312" width="520" height="30" rx="6" class="d" />
  <text x="380" y="332" text-anchor="middle" class="txt">The storage engine still competes on retrieval, ranking, decay, and compression.</text>
</svg>
</figure>

<h2 id="the-shape">The Shape</h2>

<p>UMP is deliberately small:</p>

<ol>
  <li>A portable memory record.</li>
  <li>Six core operations.</li>
  <li>MCP, HTTP, and file bindings.</li>
  <li>Conformance levels from simple export to full signed runtime.</li>
</ol>

<p>The record is the main thing. A memory needs a type, a body, a scope, time fields, lifecycle hints, relations, provenance, consent, and integrity.</p>

<p>In normal words:</p>

<blockquote>
  <p>what is this memory, who owns it, where is it valid, when was it true, where did it come from, who may see it, and can I verify it?</p>
</blockquote>

<p>That sounds obvious, but most agent memory today does not carry all of that. It might have text and an embedding. It might have a user id. It might have a timestamp. But the moment you try to move it from one agent or store to another, the missing fields matter.</p>

<p>If a memory says “use pnpm”, is that global, repo-specific, team-specific, or only true for one branch? If it says “the deployment target is staging”, was that true yesterday or is it true now? If an agent wrote it, did the user approve it? If it contains a secret or personal detail, should it be exported at all?</p>

<p>UMP puts those questions into the record instead of leaving them as product-specific behavior.</p>

<h2 id="how-i-came-to-it">How I Came To It</h2>

<p>The first version of the idea came from pain, not architecture.</p>

<p>I use multiple agents. I switch between tools. I also work across enough repos that the same mistake keeps coming back in different clothes. One agent learns a rule, another agent does not know it, and I become the bridge.</p>

<p>Recall proved the useful part: corrections can become repo memory, memory can be ranked, stale memories can be retired, and the agent gets better without a giant instruction file.</p>

<p>But Recall also made the protocol gap obvious. The useful abstraction was not “Recall”. It was:</p>

<blockquote>
  <p>an agent needs to ask for relevant memory, write new memory, revise old memory, forget unsafe memory, and prove where memory came from.</p>
</blockquote>

<p>That maps cleanly to operations:</p>

<table>
  <thead>
    <tr>
      <th>Operation</th>
      <th>Meaning</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">capabilities</code></td>
      <td>What does this memory server support?</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">recall</code></td>
      <td>Find relevant memories for this scope and query.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">remember</code></td>
      <td>Store a new memory.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">get</code></td>
      <td>Fetch a memory by id.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">revise</code></td>
      <td>Replace a memory without destroying its history.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">forget</code></td>
      <td>Tombstone or remove a memory with a reason.</td>
    </tr>
  </tbody>
</table>

<p>That is enough for a small client. It is also enough for adapters. A fancy graph memory system and a simple JSON file can speak the same verbs, while still behaving differently underneath.</p>

<h2 id="what-it-fixes">What It Fixes</h2>

<p>UMP fixes four practical problems.</p>

<p>First, memory lock-in. If Claude learns a project rule and Codex cannot read it, that is not memory. That is product state. UMP makes the memory record portable so it can move between hosts and stores.</p>

<p>Second, stale memory. A lot of memory systems overwrite facts. That is dangerous because old context disappears and the agent cannot reason about when something was true. UMP uses bi-temporal records and supersession. A memory can be replaced without pretending the old one never existed.</p>

<p>Third, trust. Recalled memory is not sacred. It can be wrong, stale, malicious, or out of scope. UMP treats memory as untrusted input and requires safe rehydration before it enters the model context. The context block says, in effect, “these are references, not instructions.”</p>

<p>Fourth, ownership. A useful memory record should carry provenance, consent, and eventually signatures. UMP uses existing standards where they fit, like <a href="https://www.w3.org/TR/prov-o/">W3C PROV</a> for provenance, <a href="https://www.w3.org/TR/did-core/">DID</a> for owner identity, and <a href="https://www.rfc-editor.org/rfc/rfc8785.html">RFC 8785 JSON canonicalization</a> for deterministic signing.</p>

<p>The important point is what UMP does not try to fix. It does not decide which embedding model is best. It does not mandate graph search. It does not freeze a decay curve. It does not tell Recall, Zep, Letta, Mem0, SQLite, or Postgres how to rank memory.</p>

<p>It standardizes the parts that must match for memory to travel. The intelligence stays inside the engine.</p>

<h2 id="how-it-runs">How It Runs</h2>

<p>The fastest way to use it is through <a href="https://modelcontextprotocol.io/docs/getting-started/intro">MCP</a>, because most agent hosts already know how to talk to MCP servers.</p>

<div class="language-jsonc highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"mcpServers"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"ump"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nl">"command"</span><span class="p">:</span><span class="w"> </span><span class="s2">"npx"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"args"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"-y"</span><span class="p">,</span><span class="w"> </span><span class="s2">"@universalmemoryprotocol/core"</span><span class="p">,</span><span class="w"> </span><span class="s2">"ump-memory"</span><span class="p">]</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>That gives the host <code class="language-plaintext highlighter-rouge">ump.recall</code>, <code class="language-plaintext highlighter-rouge">ump.remember</code>, <code class="language-plaintext highlighter-rouge">ump.get</code>, <code class="language-plaintext highlighter-rouge">ump.revise</code>, <code class="language-plaintext highlighter-rouge">ump.forget</code>, and <code class="language-plaintext highlighter-rouge">ump.capabilities</code>.</p>

<p>By default, the reference server writes a portable file at:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~/.ump/memory.ump.json
</code></pre></div></div>

<p>Point another MCP host at the same store and it can use the same memories. That is the money shot: write in one agent, recall in another.</p>

<p>There is also an HTTP binding for apps that do not speak MCP, and a file binding for plain exports. That matters because adoption should not require the full runtime. A project can start at L0 with a <code class="language-plaintext highlighter-rouge">*.ump.json</code> or <code class="language-plaintext highlighter-rouge">*.ump.md</code> file, then move to an L1 or L2 server later.</p>

<figure class="post-figure">
<svg viewBox="0 0 760 300" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="UMP memory flow from one agent to another" style="width:100%;height:auto;color:currentColor">
  <style>
    .ttl { font: 700 15px ui-sans-serif, system-ui, sans-serif; fill: currentColor }
    .lbl { font: 700 13px ui-sans-serif, system-ui, sans-serif; fill: currentColor }
    .txt { font: 12px ui-sans-serif, system-ui, sans-serif; fill: currentColor; opacity: 0.78 }
    .box { fill: none; stroke: currentColor; stroke-width: 1.2; opacity: 0.58 }
    .agent { fill: #14b8a6; opacity: 0.16 }
    .store { fill: #f59e0b; opacity: 0.18 }
    .agent2 { fill: #6366f1; opacity: 0.15 }
    .line { stroke: currentColor; stroke-width: 1.5; opacity: 0.44; fill: none; marker-end: url(#arrow) }
  </style>
  <defs>
    <marker id="arrow" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto">
      <path d="M0,0 L10,5 L0,10 z" fill="currentColor" opacity="0.55" />
    </marker>
  </defs>

  <text x="28" y="34" class="ttl">One memory, two agents</text>

  <rect x="46" y="86" width="180" height="92" rx="8" class="agent" />
  <rect x="46" y="86" width="180" height="92" rx="8" class="box" />
  <text x="76" y="122" class="lbl">Agent A</text>
  <text x="76" y="146" class="txt">calls ump.remember</text>

  <rect x="300" y="66" width="160" height="132" rx="8" class="store" />
  <rect x="300" y="66" width="160" height="132" rx="8" class="box" />
  <text x="338" y="112" class="lbl">UMP store</text>
  <text x="330" y="138" class="txt">signed record</text>
  <text x="330" y="158" class="txt">scope checked</text>
  <text x="330" y="178" class="txt">history kept</text>

  <rect x="534" y="86" width="180" height="92" rx="8" class="agent2" />
  <rect x="534" y="86" width="180" height="92" rx="8" class="box" />
  <text x="564" y="122" class="lbl">Agent B</text>
  <text x="564" y="146" class="txt">calls ump.recall</text>

  <path d="M226 132 H300" class="line" />
  <path d="M460 132 H534" class="line" />

  <text x="380" y="242" text-anchor="middle" class="txt">The host changes. The memory record does not.</text>
</svg>
</figure>

<h2 id="how-fast-it-is">How Fast It Is</h2>

<p>For the default local file store, the answer is boring and good: around 5ms recall at 3,000 records in the current benchmarks. That is the right default for most project memory. It is local, portable, and fast enough that the agent does not feel like it is waiting on a separate brain.</p>

<p>The richer Recall-backed path costs more because it does actual semantic retrieval. In my current benchmark notes, it is roughly 100ms to write and roughly 200ms to recall after warming the local embedding model. The tradeoff is quality: paraphrase top-1 recall improves from 1 out of 8 with lexical matching to 5 out of 8 with Recall’s vector plus BM25 search.</p>

<p>So the rule is simple:</p>

<table>
  <thead>
    <tr>
      <th>Store</th>
      <th>Best for</th>
      <th>Rough speed</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>JSON file</td>
      <td>portable local memory, repo rules, simple preferences</td>
      <td>about 5ms recall at 3k records</td>
    </tr>
    <tr>
      <td>Markdown files</td>
      <td>human-editable memory</td>
      <td>depends on folder size</td>
    </tr>
    <tr>
      <td>Recall adapter</td>
      <td>semantic search and stronger retrieval quality</td>
      <td>about 100ms write, about 200ms recall</td>
    </tr>
    <tr>
      <td>Vector store</td>
      <td>scale and semantic recall</td>
      <td>depends on embeddings and backend</td>
    </tr>
  </tbody>
</table>

<p>Fast memory is useful because agents ask for memory often. But retrieval quality matters too. UMP keeps that choice open. You can start with the file store, then swap the backend when the shape of the work needs it.</p>

<h2 id="the-part-i-care-about">The Part I Care About</h2>

<p>The part I care about is not that UMP exists as a spec. Specs are cheap.</p>

<p>The part I care about is the round trip:</p>

<blockquote>
  <p>an agent writes a memory, another agent recalls it, the record is still owned by the user, and the receiving agent treats it as scoped reference data instead of hidden instruction text.</p>
</blockquote>

<p>That is the missing piece for serious agent work. Agents are getting better at tools through MCP. They are getting better at coordination through <a href="https://a2a-protocol.org/latest/specification/">A2A</a>. But they still forget, fork, and trap memory inside product-specific state.</p>

<p>I do not want every new agent to start from zero. I also do not want all my working memory locked inside one vendor.</p>

<p>UMP is my attempt to make the layer small enough to adopt and structured enough to trust.</p>

<p>One record. Six operations. Memory that moves.</p>]]></content><author><name>Edi Hasaj</name></author><category term="obsidian" /><category term="posts" /><category term="AI" /><category term="agents" /><category term="memory" /><category term="protocols" /><category term="MCP" /><category term="open-source" /><summary type="html"><![CDATA[Agent memory is fragmented across tools, files, and products. Universal Memory Protocol is my attempt to make memory portable: one record, six operations, and bindings that work with MCP, HTTP, and files.]]></summary></entry><entry><title type="html">Everything Will Be Agent Use</title><link href="https://edihasaj.com/posts/everything-will-be-agent-use" rel="alternate" type="text/html" title="Everything Will Be Agent Use" /><published>2026-05-14T00:00:00+00:00</published><updated>2026-05-14T00:00:00+00:00</updated><id>https://edihasaj.com/posts/everything-will-be-agent-use</id><content type="html" xml:base="https://edihasaj.com/posts/everything-will-be-agent-use"><![CDATA[<p>I have been watching the new wave of computer-use agents and honestly, most of what they do is incredible. The mouse moves like a real user. Forms get filled. Sites get navigated. Tabs get juggled. Tasks that needed a person sitting in front of a screen are now running on their own, end to end, with surprisingly little hand-holding. Five years ago this was science fiction. Today it is a demo you can run on your own laptop, and the future of how we use software clearly runs through here.</p>

<p>Then you watch long enough and the same shape keeps showing up. The agent looks great for the first half of the task, sometimes the first ninety percent, and then the workflow hits a login wall, a captcha, a 3-D Secure popup, a card form that wants a human phone tap, and everything stalls. The agent was good enough to make the stall feel absurd. That is new. A year or two ago the impressive part was that it could move through the task at all.</p>

<p>That is the actual story of computer use right now. Agents got useful faster than the systems around them got ready to trust, verify, and bill them.</p>

<h2 id="the-cursor-is-the-new-interface">the cursor is the new interface</h2>

<p>For a few years we assumed AI would live in a chat box. Type, answer, repeat. The interface was words.</p>

<p>That assumption is dead. The interface is your operating system now. The agent gets the mouse, the keyboard, the browser tabs, sometimes your card. It does not describe how to do the thing. It does the thing.</p>

<p>Every serious lab is on this. Anthropic shipped computer use in late 2024, and Claude has been driving desktops since. OpenAI followed with Operator. Google has Gemini computer use plus Project Mariner inside Chrome. Perplexity built Comet, an entire browser shaped around an agent. Manus came out of China with a general-purpose agent that books trips, files paperwork, and scrapes whatever you point it at. Microsoft and Apple are wiring this into the OS itself, quieter, slower, much closer to where the real leverage is.</p>

<p>Pick any of them. The message is the same. The agent is the new user.</p>

<h2 id="how-far-we-actually-got">how far we actually got</h2>

<p>Two years ago, asking a model to complete a real multi-step task on a real desktop was a coin flip at best. OSWorld, the most honest early benchmark for desktop agents, was sitting at around 14% when it launched in 2024. The progress since has been fast enough that any single number starts going stale almost immediately.</p>

<figure class="post-figure">
<svg viewBox="0 0 720 320" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="OSWorld benchmark progress chart" style="width:100%;height:auto;color:currentColor">
  <style>
    .axis { stroke: currentColor; stroke-width: 1; opacity: 0.4 }
    .grid { stroke: currentColor; stroke-width: 1; opacity: 0.1; stroke-dasharray: 2 4 }
    .lbl  { font: 12px ui-sans-serif, system-ui, -apple-system, sans-serif; fill: currentColor; opacity: 0.7 }
    .ttl  { font: 600 13px ui-sans-serif, system-ui, sans-serif; fill: currentColor }
    .val  { font: 600 12px ui-sans-serif, system-ui, sans-serif; fill: currentColor }
    .ln   { fill: none; stroke: #14b8a6; stroke-width: 2.5; stroke-linejoin: round }
    .hum  { stroke: #f59e0b; stroke-width: 1.5; stroke-dasharray: 4 4; fill: none }
    .pt   { fill: #14b8a6 }
  </style>
  <text x="20" y="22" class="ttl">OSWorld benchmark progress, desktop agents</text>
  <text x="20" y="40" class="lbl">success rate on desktop-task benchmark suites</text>

  <!-- grid + y axis -->
  <line class="axis" x1="60" y1="60" x2="60" y2="270" />
  <line class="axis" x1="60" y1="270" x2="700" y2="270" />
  <g>
    <line class="grid" x1="60" y1="90" x2="700" y2="90" /><text x="50" y="94" text-anchor="end" class="lbl">80%</text>
    <line class="grid" x1="60" y1="135" x2="700" y2="135" /><text x="50" y="139" text-anchor="end" class="lbl">60%</text>
    <line class="grid" x1="60" y1="180" x2="700" y2="180" /><text x="50" y="184" text-anchor="end" class="lbl">40%</text>
    <line class="grid" x1="60" y1="225" x2="700" y2="225" /><text x="50" y="229" text-anchor="end" class="lbl">20%</text>
    <text x="50" y="274" text-anchor="end" class="lbl">0%</text>
  </g>

  <!-- human baseline ~72% => y = 270 - (72/100)*180 = 140.4 -->
  <line class="hum" x1="60" y1="140" x2="700" y2="140" />
  <text x="694" y="134" text-anchor="end" class="lbl">human baseline ~72%</text>

  <!-- points: x positions for Apr24, Oct24, Apr25, Oct25, Apr26
       values: 14, 22, 38, 53, 75 -->
  <polyline class="ln" points="
    100,238
    250,220
    400,184
    550,151
    680,135
  " />
  <g>
    <circle class="pt" cx="100" cy="238" r="5" /><text x="100" y="258" text-anchor="middle" class="val">14%</text>
    <circle class="pt" cx="250" cy="220" r="5" /><text x="250" y="240" text-anchor="middle" class="val">22%</text>
    <circle class="pt" cx="400" cy="184" r="5" /><text x="400" y="204" text-anchor="middle" class="val">38%</text>
    <circle class="pt" cx="550" cy="151" r="5" /><text x="550" y="171" text-anchor="middle" class="val">53%</text>
    <circle class="pt" cx="680" cy="135" r="5" /><text x="680" y="125" text-anchor="middle" class="val">75%</text>
  </g>

  <g class="lbl">
    <text x="100" y="290" text-anchor="middle">Apr 2024</text>
    <text x="250" y="290" text-anchor="middle">Oct 2024</text>
    <text x="400" y="290" text-anchor="middle">Apr 2025</text>
    <text x="550" y="290" text-anchor="middle">Oct 2025</text>
    <text x="680" y="290" text-anchor="middle">Apr 2026</text>
  </g>
  <text x="380" y="314" text-anchor="middle" class="lbl">Approximate top reported scores. Human baseline ~72% from Xie et al., OSWorld 2024.</text>
</svg>
</figure>

<p>On “can the agent see and click the right thing”, the field is getting very good. Pixel-grounding benchmarks like ScreenSpot are above 90% for the top models. The eyes work. The hands mostly work. The old joke that agents cannot use computers is aging badly.</p>

<p>On “can the agent finish a real workflow”, the answer is messier. The best systems are now brushing against human baselines on OSWorld-style tasks, which is a ridiculous improvement from 2024. But benchmark parity is not the same thing as operational trust. Newer cross-application evals still show ugly failure rates once the task requires several apps, conditional judgment, cleanup, and persistence. Even a 75% task success rate means one in four things still falls over somewhere.</p>

<p>And that “somewhere” is almost never “I could not read the button”. It is the page reloaded, the login screen came back, a captcha appeared, a popup blocked the click, the cart wants your card, or the agent did the right local action inside the wrong global state.</p>

<p>The cursor is not the bottleneck. The world around the cursor is.</p>

<h2 id="the-five-shapes-people-are-shipping">the five shapes people are shipping</h2>

<p>There is a tendency to talk about “computer use” like it is one product. It is at least five, and the shapes have very different trade-offs.</p>

<p>The desktop agent runs on your machine and can touch anything you can touch. Anthropic’s computer use sits here. Max power, max blast radius. If it gets confused, it can delete things you cared about. Trust bar is high. Best for power users and developers who can sandbox it.</p>

<p>The browser agent lives inside a tab. Perplexity Comet, Arc’s agent, the various Chromium-based experiments. Safer becuase the world it can break is small, but it hits a wall every time a workflow leaves the browser. Try to download a PDF, sign it, attach it back somewhere, and the seams show up immediately.</p>

<p>The managed agent runs in the cloud. OpenAI Operator is the cleanest example. Convenient because you do not give it your laptop, but the moment the site you care about decides cloud IP’s look like bots, you are stuck in captcha purgatory.</p>

<p>The OS-level agent is the long game. Apple, Microsoft, and Google’s mobile push are all trying to wire agents into the system itself: real accessibility trees, real intents, real content types. When the OS exposes structure, the agent stops guessing. This is also the lane with the most patience, which is why Apple in particular looks slow untill they suddenly are not.</p>

<p>The generalist agent is the demo champion. Manus, Mariner-style long-horizon planners. They will plan your trip, file your form, scrape your competitor. They will also confidently do the wrong thing for ninety steps before failing in a way you cannot easily unwind.</p>

<p>Each shape is real. None of them is finished.</p>

<h2 id="where-the-workflow-actually-breaks">where the workflow actually breaks</h2>

<p>If you sit and watch a serious computer-use demo end to end, the failure mode is almost always the same. The agent looks brilliant for the first half of the task and then loses to something boring.</p>

<figure class="post-figure">
<svg viewBox="0 0 720 340" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="Agent workflow drop-off funnel" style="width:100%;height:auto;color:currentColor">
  <style>
    .lbl { font: 12px ui-sans-serif, system-ui, sans-serif; fill: currentColor; opacity: 0.8 }
    .ttl { font: 600 13px ui-sans-serif, system-ui, sans-serif; fill: currentColor }
    .val { font: 600 13px ui-sans-serif, system-ui, sans-serif; fill: currentColor }
    .step { fill: currentColor; opacity: 0.85 }
    .bar1 { fill: #14b8a6 }
    .bar2 { fill: #2dd4bf }
    .bar3 { fill: #fbbf24 }
    .bar4 { fill: #f97316 }
    .bar5 { fill: #ef4444 }
  </style>
  <text x="20" y="22" class="ttl">Where a real agent workflow drops off</text>
  <text x="20" y="40" class="lbl">success rate at each stage of "buy a thing online", roughly</text>

  <!-- bars: x=240 start, max width = 420 -->
  <g>
    <rect class="bar1" x="240" y="70" width="420" height="32" rx="4" />
    <text x="232" y="90" text-anchor="end" class="step">find the right product</text>
    <text x="668" y="90" class="val">95%</text>

    <rect class="bar2" x="240" y="116" width="394" height="32" rx="4" />
    <text x="232" y="136" text-anchor="end" class="step">add to cart</text>
    <text x="640" y="136" class="val">88%</text>

    <rect class="bar3" x="240" y="162" width="294" height="32" rx="4" />
    <text x="232" y="182" text-anchor="end" class="step">log in / handle popups</text>
    <text x="540" y="182" class="val">66%</text>

    <rect class="bar4" x="240" y="208" width="231" height="32" rx="4" />
    <text x="232" y="228" text-anchor="end" class="step">pass captcha / fraud check</text>
    <text x="477" y="228" class="val">52%</text>

    <rect class="bar5" x="240" y="254" width="180" height="32" rx="4" />
    <text x="232" y="274" text-anchor="end" class="step">complete payment + 3-D Secure</text>
    <text x="426" y="274" class="val">~40%</text>
  </g>

  <text x="380" y="320" text-anchor="middle" class="lbl">Illustrative composite from public agent evals, mid 2026. Indicative shape, not a measured benchmark.</text>
</svg>
</figure>

<p>It is not glamorous. Finding the product is easy. Adding it to the cart is easy. Then auth shows up, fraud signals trip, a phone-based 3-D Secure challenge fires off, and the agent stalls because the verification was designed to assume there is a human holding a phone, not a long-running automation.</p>

<p>Most demos quietly stop right before this cliff. The honest ones hand control back to the user at checkout, which kind of defeats the point of an agent.</p>

<p><img src="/images/posts/everything-will-be-agent-use-checkout-2026-05-14.jpg" alt="An AI cursor frozen in front of a glowing 3-D Secure popup over a blurred checkout form" /></p>

<h2 id="the-protocols-are-the-actual-story">the protocols are the actual story</h2>

<p>The reason buying is so painful is structural. The stack assumes a human in a browser, with cookies, a card on file, a phone for 3-D Secure, and a fraud score built from device fingerprints and behavior. An agent breaks almost every one of those assumptions, and the merchant has no clean way to know “this purchase was authorized by Edi, with these limits, through this agent, for this purpose”. So they either block it, or they let it through and hope.</p>

<p>This is what protocols are quietly fixing. Three layers matter.</p>

<figure class="post-figure">
<svg viewBox="0 0 720 290" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="Agent protocol stack diagram" style="width:100%;height:auto;color:currentColor">
  <style>
    .ttl { font: 600 13px ui-sans-serif, system-ui, sans-serif; fill: currentColor }
    .lbl { font: 12px ui-sans-serif, system-ui, sans-serif; fill: currentColor; opacity: 0.8 }
    .tag { font: 600 12px ui-sans-serif, system-ui, sans-serif; fill: currentColor }
    .box { stroke: currentColor; stroke-width: 1.2; fill: none; rx: 6 }
    .l1  { fill: #14b8a6; opacity: 0.18 }
    .l2  { fill: #6366f1; opacity: 0.18 }
    .l3  { fill: #f59e0b; opacity: 0.20 }
    .arr { stroke: currentColor; stroke-width: 1.2; opacity: 0.5; fill: none; marker-end: url(#a) }
  </style>
  <defs>
    <marker id="a" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto">
      <path d="M0,0 L10,5 L0,10 z" fill="currentColor" opacity="0.5" />
    </marker>
  </defs>

  <text x="20" y="22" class="ttl">The agent protocol stack, mid 2026</text>
  <text x="20" y="40" class="lbl">three layers, three different adoption curves</text>

  <!-- AP2 -->
  <rect x="40" y="60" width="640" height="60" rx="8" class="l3" />
  <rect x="40" y="60" width="640" height="60" rx="8" class="box" />
  <text x="60" y="86" class="tag">AP2 — Agent Payments Protocol</text>
  <text x="60" y="106" class="lbl">verifiable mandates: this agent may spend up to X, for user Y, at merchant Z, signed and revocable</text>
  <text x="660" y="86" text-anchor="end" class="lbl">agent ↔ merchant</text>

  <!-- A2A -->
  <rect x="40" y="135" width="640" height="60" rx="8" class="l2" />
  <rect x="40" y="135" width="640" height="60" rx="8" class="box" />
  <text x="60" y="161" class="tag">A2A — Agent2Agent</text>
  <text x="60" y="181" class="lbl">two agents from different vendors negotiate, hand off tasks, exchange context, return results</text>
  <text x="660" y="161" text-anchor="end" class="lbl">agent ↔ agent</text>

  <!-- MCP -->
  <rect x="40" y="210" width="640" height="60" rx="8" class="l1" />
  <rect x="40" y="210" width="640" height="60" rx="8" class="box" />
  <text x="60" y="236" class="tag">MCP — Model Context Protocol</text>
  <text x="60" y="256" class="lbl">the plumbing: agent talks to tools, files, APIs, databases in a single standard way</text>
  <text x="660" y="236" text-anchor="end" class="lbl">agent ↔ tool</text>
</svg>
</figure>

<p>MCP is the floor. It is the boring, beautiful plumbing that lets an agent talk to a tool, a database, a file system, an API in one standard way. It is already widely adopted across Anthropic, OpenAI, Cursor, and most serious IDEs. If you are building anything agentic, MCP is the layer you build on, not around.</p>

<p>A2A is the layer above. Google and a coalition of partners pushed it out in 2025 as an open protocol for agents from different vendors to talk to each other. My booking agent calls your inventory agent, they negotiate, work happens, result comes back. If A2A actually catches on, the web stops being a pile of HTML that an agent has to squint at, and starts being a network of cooperating services with structured handshakes.</p>

<p>AP2 is the one I find most concrete. The Agent Payments Protocol, announced by Google with Mastercard, American Express, Coinbase, PayPal and a long list of issuers in late 2025, is an attempt to do for agent commerce what 3-D Secure tried to do for online cards. The agent presents a verifiable mandate signed by the human. The merchant verifies it. The issuer verifies it. Payment flows, with cryptographic evidence at every step. “This agent bought this item for this user with this budget” finally has a real signature behind it instead of a session cookie and a prayer.</p>

<p>If you are a merchant, this is the upgrade path you cannot ignore for long. Either you wire up agent-friendly payments with verifiable mandates, or you start quietly blocking a non-trivial slice of your traffic, because that slice is no longer human.</p>

<h2 id="the-messy-middle-is-where-we-live">the messy middle is where we live</h2>

<p>The bad news is that none of these protocols are live across the long tail. MCP is the closest. A2A is real but mostly between cooperative parties. AP2 has the right names on the press release and nowhere near enough merchant support to run your week through it yet.</p>

<p>So agents in 2026 are stuck doing two jobs at once.</p>

<p>That is why the moment feels so strange. Agents are good enough to use every day, and not good enough to forget about while they hold your card, your calendar, your inbox, or your company account.</p>

<p>One job is helping the long tail of legacy sites and apps that will never speak A2A by pretending to be a human. This is a perception and resilience problem. Better screen understanding, better recovery when a popup appears, better memory of “I already tried this and it failed”. The flashy demo lives here, and so does most of the brittleness. Every UI redesign quietly breaks an agent somewhere.</p>

<p>The other job is helping the leading edge of services adopt MCP, A2A, AP2, so the next round can be cleaner. This is mostly a trust and identity problem. Who signed the mandate, who is liable, how do we revoke, what audit trail do we keep. Less glamorous. Much more durable.</p>

<p>The winners over the next few years will not be the labs with the flashiest demo. They will be the ones that solve the perception problem and the trust problem at the same time, so the agent is both capable of running a task on a legacy site and credible enough to do it on a modern one. Most of the current crop is good at one and pretending about the other.</p>

<h2 id="what-to-actually-do">what to actually do</h2>

<p>If you build software, four things are worth doing this year. Design your UI so an agent can also use it: real accessibility trees, structured labels, predictable flows, not just whatever the design system spat out. Ship an MCP server if you have any tool surface worth exposing. Start thinking about an A2A integration even before you build it, because the shape of your data and identity model matters more than the wire format. If you take payments, track AP2 and similar mandates, even if you do not implement yet, because the fraud and identity decisions you make in 2026 will determine whether you can adopt cleanly in 2027.</p>

<p>If you do not build software, the practical version is shorter. The next few apps you install, ask whether an agent could run it for you. If yes, thats your new productivity ceiling. If no, that is a future migration waiting to happen.</p>

<p>The cursor is moving. Whether it is moved by you or by something acting on your behalf is going to become a question you ask several times a day. I am not nostalgic about clicking through forms. I am, however, nervous about the gap between what these agents can already do and what the systems on the other side are ready to verify, trust and bill. That gap is the work of the next two years, and the companies that close it first are going to look very different from the ones currently winning the demo cycle.</p>

<p>The demo is the easy part. The plumbing under it is the rest of the decade.</p>]]></content><author><name>Edi Hasaj</name></author><category term="obsidian" /><category term="posts" /><category term="AI" /><category term="agents" /><category term="computer-use" /><category term="protocols" /><category term="A2A" /><category term="MCP" /><summary type="html"><![CDATA[Computer use is the new browser war. Google, OpenAI, Anthropic, Perplexity and Manus are all racing to put an agent in front of the screen. The hard part is not moving the mouse, it is that the world on the other side of the cursor was never built for them.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://edihasaj.com/images/posts/everything-will-be-agent-use-2026-05-14.jpg" /><media:content medium="image" url="https://edihasaj.com/images/posts/everything-will-be-agent-use-2026-05-14.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Security in the age of AI coders</title><link href="https://edihasaj.com/posts/security-in-the-age-of-ai-coders" rel="alternate" type="text/html" title="Security in the age of AI coders" /><published>2026-05-11T00:00:00+00:00</published><updated>2026-05-11T00:00:00+00:00</updated><id>https://edihasaj.com/posts/security-in-the-age-of-ai-coders</id><content type="html" xml:base="https://edihasaj.com/posts/security-in-the-age-of-ai-coders"><![CDATA[<p>Security has been quietly broken for a long time. AI did not break it. AI just made the gap between attackers and defenders much more visible, and much more uncomfortable.</p>

<p>This is not only a developer problem. If you run a business, use a banking app, store files in the cloud, or rely on any software at all, you are sitting on top of the same gap. The people building the software are stretched, the libraries underneath are someone else’s code, and the patches arrive long after the holes are already known.</p>

<p>I keep coming back to this when I look at the code I ship, the dependencies I pull in, the agents I let touch my repos, and the speed at which everything around it is moving. The honest answer is that most of us, including me, are not running the kind of security checks the current world deserves. We rely on a mix of “it builds”, “the tests pass”, “I trust the maintainer”, and “I will patch later”. Later usually never comes.</p>

<p>That worked, sort of, when finding a real exploit took a person, weeks of reading, and some luck. It does not work the same way now.</p>

<h2 id="exploits-got-cheaper">Exploits got cheaper</h2>

<p>A few years ago, finding a vulnerability in a serious codebase was a craft. You needed to read the source, understand the protocol, build a mental model of memory, sessions, parsing, auth, and then find the one place nobody thought about. That is still real work, but the floor has dropped.</p>

<p>Now you can hand a model a codebase, a compiled app, a recent change, or a CVE description (CVE is just the public ID for a known security bug), and it will tell you where similar patterns probably live. It will write the proof of concept. It will write the fuzzing harness, the small program that throws garbage at the code until something breaks. It will draft the exploit. Not always correct, not always exploitable, but very often “good enough to be dangerous”.</p>

<blockquote>
  <p>read code at scale<br />
spot pattern reuse across files<br />
generate payloads in seconds<br />
rewrite the same exploit for ten variants<br />
turn a CVE description into working code<br />
chain small bugs into something useful</p>
</blockquote>

<p>That is the new baseline. It does not require a nation state, it requires an API key and patience.</p>

<p>The defensive side has the same tools, in theory. In practice, defenders are slower, because defense is harder than offense. You only need to find one bug. The defender needs to find all of them, in code they often did not write, against an attacker who keeps trying.</p>

<h2 id="developers-dont-get-told-or-dont-patch">Developers don’t get told, or don’t patch</h2>

<p>The other half of the problem is human.</p>

<p>Most developers do not read security advisories. They do not run advisory dashboards. They do not check release notes of the libraries they use. They do not know which third-party piece of code, three layers down inside another library, just shipped a critical fix. They install it once, lock it, and move on.</p>

<p>If you are not a developer, picture it like this. Every app you use is a stack of building blocks. Most of those blocks were made by other people. When one block at the bottom turns out to have a hole, every app on top of it has the same hole until someone swaps the block. Most teams are slow to swap.</p>

<p>Some teams have automation. Dependabot opens PRs. GitHub flags advisories. Sometimes a security team forwards a Slack message. But the merge rate is the real metric. Lots of those PRs sit there. Lots of those advisories get closed without action. Lots of teams wait until something breaks in prod to look at it.</p>

<p>Then add the new layer:</p>

<blockquote>
  <p>agents committing code<br />
agents running migrations<br />
agents installing packages<br />
agents reading secrets<br />
agents calling external APIs<br />
agents creating PRs nobody fully reviews</p>
</blockquote>

<p>Every one of those is also a security surface. Not because the agent is malicious, but because the agent moves faster than your review pipeline. If a new dependency is added by an agent and merged by another agent, no human ever looked at the transitive tree. That is fine until it isn’t.</p>

<h2 id="the-patch-gap-is-the-real-bug">The patch gap is the real bug</h2>

<p>When a serious bug becomes public, the world splits in two. There is a small group that patches within hours, and there is everyone else.</p>

<p>The everyone else group is not lazy. They are busy. They have a roadmap. They have features. They have customers. They have an oncall. They have a backlog. The patch goes on the list and the list is long.</p>

<p>Attackers do not wait for the list. The window between disclosure and exploitation in the wild keeps getting shorter. Public PoCs land within days. Mass scanners pick up signatures within hours. By the time a team gets around to upgrading, the weaponized version is already touring the internet.</p>

<p>This is the gap that AI on the defensive side has to close, because no human team is going to close it manually for every project they own.</p>

<h2 id="what-better-ai-security-should-actually-do">What better AI security should actually do</h2>

<p>Most current “AI security” pitches are still the old security stack with a chatbot bolted on. That is not enough. The shape of the work is different now.</p>

<p>What I actually want is something like a constant listener.</p>

<blockquote>
  <p>watch the repos I own<br />
watch the dependencies they pull in<br />
watch the advisories the moment they publish<br />
watch the agents that touch the code<br />
watch the diffs the agents create<br />
watch the secrets that show up in logs<br />
watch the configs that change in production</p>
</blockquote>

<p>And when something changes, tell me. Not “here is a 40 page report once a month”. More like “this thing you shipped last Tuesday now has a public exploit, here is the upgrade, here is the diff, and by the way the staging deploy already pulls the patched version, want me to roll prod”.</p>

<p>That is a very different product than a scanner that runs once a day and emails a PDF.</p>

<p>It needs to know:</p>

<blockquote>
  <p>which projects are mine<br />
which versions are deployed where<br />
what the blast radius of an issue is<br />
what fix is safe and what fix needs review<br />
which agent or human last touched the affected code<br />
what to do when nobody answers</p>
</blockquote>

<p>That last one matters. A lot of the security gap right now is that there is nobody on the other side of the alert. The alert fires into an empty channel. A good agent should keep nudging until something happens, and escalate if it doesn’t.</p>

<h2 id="the-new-threat-model-is-also-ai">The new threat model is also AI</h2>

<p>There is another piece nobody likes to talk about much. The threat model now includes other agents.</p>

<p>If your code or product is being read, indexed, or poked at by automated systems, those systems are going to find weak spots a human attacker would not have bothered with. Prompt injection is one of the new ones: hiding instructions inside a piece of text (a customer message, a webpage, a document) that the AI then reads and obeys, as if you typed them yourself. Other categories include leaking secrets through chat, tricking an agent into calling internal services, or abusing the tools you gave it. The reward is too good. Many systems are now reachable through “make the agent do X” instead of “make the server do X”. That is a different category and most of our existing defenses do not cover it.</p>

<p>So security AI also has to think about agentic abuse:</p>

<blockquote>
  <p>what tools is my agent exposing<br />
what data goes into prompts<br />
what comes back from the model<br />
what side effects can a tool call cause<br />
what would a hostile prompt try to do with my agent<br />
what is the smallest reasonable scope for each tool</p>
</blockquote>

<p>If you do not design for this, the first time your agent gets prompt injected through a customer ticket or a scraped page, you will learn quickly.</p>

<h2 id="what-im-doing-about-it">what i’m doing about it</h2>

<p>I do not have a clean answer yet, and I am suspicious of anyone who says they do. But I am very sure of the direction.</p>

<p>The next layer of security tooling has to be:</p>

<blockquote>
  <p>always on, not weekly<br />
repo and runtime aware, not just static<br />
fast at turning advisories into patches<br />
aware that agents are part of the surface now<br />
opinionated enough to act, not only report<br />
open enough that you can trust what it does</p>
</blockquote>

<p>Some of this can be glued together from existing tools. Some of it does not exist yet. Some of it is going to look like a small agent that lives next to your repos and never sleeps.</p>

<p>I think 2026 is when this stops being a “nice to have” for serious teams. The cost of ignoring it is going up every week, because the attackers already upgraded their tooling. The defenders should too.</p>

<p>The boring summary, for builders and users alike: AI did not invent insecurity, but it removed every excuse we used to have for not paying attention. The patch is still our job. The work is just faster on both sides now, and the side that automates better is going to win the next few years.</p>]]></content><author><name>Edi Hasaj</name></author><category term="obsidian" /><category term="posts" /><category term="AI" /><category term="security" /><category term="agents" /><category term="software" /><summary type="html"><![CDATA[Exploits are cheaper than ever, patching is still manual, and most developers will never read the CVE about the package they shipped last week. The next layer of security has to be an agent that constantly listens, checks, and tells you before someone else finds it.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://edihasaj.com/images/posts/security-in-the-age-of-ai-coders-2026-05-11.jpg" /><media:content medium="image" url="https://edihasaj.com/images/posts/security-in-the-age-of-ai-coders-2026-05-11.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Recall: The Memory My Agents Were Missing</title><link href="https://edihasaj.com/posts/recall-the-memory-my-agents-were-missing" rel="alternate" type="text/html" title="Recall: The Memory My Agents Were Missing" /><published>2026-05-06T00:00:00+00:00</published><updated>2026-05-06T00:00:00+00:00</updated><id>https://edihasaj.com/posts/recall-the-memory-my-agents-were-missing</id><content type="html" xml:base="https://edihasaj.com/posts/recall-the-memory-my-agents-were-missing"><![CDATA[<h1 id="recall-the-memory-my-agents-were-missing">Recall: The Memory My Agents Were Missing</h1>

<p>I open sourced <a href="https://github.com/edihasaj/recall">Recall</a> a while back, and I want to write down why, becuase it is the kind of tool I wish someone else had built so I would not have to.</p>

<p>The short version is, every project has unwritten rules. The long version is, those rules keep biting me when an agent ignores them for the fifth time in the same week.</p>

<h2 id="the-rules-that-never-make-it-into-a-md">The Rules That Never Make It Into a .md</h2>

<p>You can write a <code class="language-plaintext highlighter-rouge">CLAUDE.md</code>, <code class="language-plaintext highlighter-rouge">AGENTS.md</code>, or <code class="language-plaintext highlighter-rouge">README</code> and put the obvious things in there. Tech stack, commands, conventions, deployment notes. Fine. That covers maybe 20% of how a project actually works.</p>

<p>The other 80% lives in your head:</p>

<blockquote>
  <p>never run migrations without a backup first<br />
in this repo we always update the changelog before merging<br />
in that other repo we never touch the auth middleware without pinging someone<br />
when you commit, also run the docs gate, every single time<br />
after a refactor, regenerate the types, do not skip it<br />
this codebase uses pnpm, the other one uses npm, do not mix them up</p>
</blockquote>

<p>You cannot put all of that into a markdown file. Or you can, but nobody, including the agent, will read 4000 lines of “and also remember this.” It becomes noise. Most of it is conditional, situational, or only matters after you have made the mistake once.</p>

<p>So you end up repeating yourself. “Don’t do X.” “Yes, run the tests first.” “I told you to use trash, not rm.” Every conversation, every new session, same corrections. The agent is smart but it has no memory of the last time you yelled at it.</p>

<p>That is the gap Recall fills.</p>

<h2 id="what-it-actually-does">What It Actually Does</h2>

<p>Recall is a local memory layer for coding agents. Repo-scoped, file-based, owned by you. Nothing leaves your machine unless you wire a provider.</p>

<p>It learns from corrections, from review feedback, from session outcomes, from explicit “remember this” calls. It compiles those into a small set of trusted instructions per repo. Then it injects them into the agent at the right moment, so the agent shows up to the next session already knowing the rules.</p>

<p><img src="/images/posts/recall-2026-05-06.png" alt="Recall injecting repo rules into the agent at session start" /></p>

<p>The flow is roughly:</p>

<blockquote>
  <p>you correct the agent once<br />
Recall captures that as a memory<br />
next session in the same repo, the rule is already in context<br />
if you contradict it later, Recall updates or retires it</p>
</blockquote>

<p>It also runs a small daily maintenance pass that merges duplicates, retires stale entries, and tightens fuzzy ones. Memories that are wrong now do not stay forever.</p>

<h2 id="connects-without-an-extra-bill">Connects Without an Extra Bill</h2>

<p>This part matters to me. I did not want yet another subscription with its own context window, its own tokens, its own dashboard.</p>

<p>Recall connects to the agents I already use, two ways:</p>

<p><strong>Hooks.</strong> On <code class="language-plaintext highlighter-rouge">SessionStart</code> it injects a minimal block of relevant memories into the agent’s own context. On <code class="language-plaintext highlighter-rouge">UserPromptSubmit</code> it does a per-prompt relevence check and adds anything that fits. The agent uses its own tokens to read it. There is no second model running, no proxy, no extra cost.</p>

<p><strong>MCP.</strong> When the hook block missed something, the agent can call <code class="language-plaintext highlighter-rouge">recall.query</code>, <code class="language-plaintext highlighter-rouge">recall.list</code>, <code class="language-plaintext highlighter-rouge">recall.report_correction</code>, and friends as regular tools. Same model, same tokens, just a few more tool calls.</p>

<p>So the cost is whatever you were already paying your agent provider. Recall itself is local, free, and does not phone home.</p>

<h2 id="why-opensource">Why Opensource</h2>

<p>Because this is exactly the kind of thing that should not be closed.</p>

<p>Memory for agents is going to be a default feature, not a moat. Everyone building serious agentic workflows is going to need something like this, and a closed SaaS version of it is a bad idea: your corrections, your repo conventions, your unwritten rules are some of the most sensitive things about how you work. You do not want them on someone else’s server, indexed for “improvement.”</p>

<p>Local first, file based, inspectable, forkable. If you do not like how it ranks memories, change it. If you want a different injection style, override it with an env var. If you want to plug it into a different agent runtime, the hooks and MCP are documented.</p>

<p>This also fits what I was <a href="/2026/05/04/2026-is-going-to-be-the-year-of-open-source.html">writing about last week</a>. The code alone is not the moat. The moat is the context, the workflow, the trust. Open sourcing the memory layer is not giving anything important away. It is making the layer better for everyone, including me.</p>

<h2 id="what-i-want-from-it">What I Want From It</h2>

<p>Honestly, I want to stop repeating myself.</p>

<p>I want every repo I touch to remember the unwritten rules. I want new agents I try to inherit those rules without me having to brief them again. I want the corrections I make today to be in context tomorrow. I want to stop writing “do not run X without Y” for the hundreth time.</p>

<p>Recall is not done. It is the kind of project I will keep rethinking because the problem keeps changing. Agents get better, hooks get better, MCP gets better, and the way memories should be retrieved and ranked will keep moving with that.</p>

<p>But the core idea I am sure about. Every project has rules that do not fit in a markdown file. Agents need a place to keep them. That place should be local, opensource, and ride on top of the tools you already pay for.</p>

<p>That is Recall.</p>]]></content><author><name>Edi Hasaj</name></author><category term="obsidian" /><category term="posts" /><category term="AI" /><category term="agents" /><category term="open-source" /><category term="memory" /><category term="recall" /><summary type="html"><![CDATA[Every project has unwritten rules that don't fit in a .md file, and agents keep forgetting them. So I built Recall, opensourced it, and wired it into the agents I already use without paying for a separate brain.]]></summary></entry><entry><title type="html">Building Neni: legal search that actually has to know the law</title><link href="https://edihasaj.com/posts/building-neni-statute-atlas" rel="alternate" type="text/html" title="Building Neni: legal search that actually has to know the law" /><published>2026-05-04T22:01:00+00:00</published><updated>2026-05-04T22:01:00+00:00</updated><id>https://edihasaj.com/posts/building-neni-statute-atlas</id><content type="html" xml:base="https://edihasaj.com/posts/building-neni-statute-atlas"><![CDATA[<h1 id="building-neni-legal-search-that-actually-has-to-know-the-law">Building Neni: legal search that actually has to know the law</h1>

<p>I have been working on <a href="https://neni.me">neni.me</a>, a legal research app built on top of Statute Atlas.</p>

<p>The idea sounds simple if you say it too fast: put Kosovo laws into a database, add embeddings, let people ask questions, return answers with citations.</p>

<p>But that is the demo version of the problem. The real version is much more annoying and much more interesting.</p>

<p>Legal text is not a normal document collection. A constitution, a statute, a regulation, a court decision, and a small amendment are all text, but they should not be treated the same. They have structure. They have dates. They have article numbers. They have official sources. Sometimes they change. Sometimes they reference each other. And if the answer is wrong, “the model sounded confident” is not a good excuse.</p>

<p>So the main work was not building a chat UI. The main work was turning a messy legal corpus into something that can be searched, cited, and trusted.</p>

<h2 id="a-lot-of-the-work-is-ingestion">A Lot Of The Work Is Ingestion</h2>

<p>Before the RAG part can be good, the documents have to be clean enough.</p>

<p>Neni has to fetch legal sources, parse the pages or PDFs, extract the actual law text, split it into articles and sections, and keep the metadata around. Things like country, source, document type, publication date, effective date, title, article number, and source URL matter a lot.</p>

<p>If you loose that structure, retrieval becomes mush.</p>

<p>This is why I care about chunking by article or section instead of just cutting every document into random token windows. If someone asks about a legal rule, the answer usually lives inside a specific article, not in some arbitrary slice of 800 tokens. The chunk should know where it came from.</p>

<p>The constitution is a good example. You do not want it only as one big PDF blob. You want each article to exist as a searchable unit, while still keeping its connection to the whole document. Same for statutes. Same for future versions.</p>

<h2 id="vector-space-is-useful-but-not-magic">Vector Space Is Useful, But Not Magic</h2>

<p>The way most people explain this is: embed every chunk, put it in vector space, then find the closest chunks to the user question.</p>

<p>That is basically true, but it hides the hard part.</p>

<p>Legal search needs both semantic similarity and boring filters. If someone asks about a Kosovo company registration rule, the system should not return something that only matched because the wording was kind of similar. It needs country filters, document-kind filters, source trust, article metadata, and sometimes date filters.</p>

<p>So the retrieval layer is more like:</p>

<blockquote>
  <p>understand the query<br />
normalize it into legal language<br />
search textually<br />
search semantically<br />
merge the results<br />
rerank the best candidates<br />
only then ask the model to answer from those citations</p>
</blockquote>

<p>That last part matters. The model should not answer from memory. It should answer from evidence.</p>

<h2 id="language-makes-it-harder">Language Makes It Harder</h2>

<p>Another problem is how people actually ask questions.</p>

<p>Nobody writes like an official gazette. People ask in Albanian, English, mixed language, Gheg, with missing diacritics, with abbreviations, sometimes with half a sentence. The indexed law text is formal. The query is usually not.</p>

<p>So the system needs a small query-understanding step before retrieval. Not to answer the user, but to translate the question into better search hints: legal domain, possible law titles, and a few formal Albanian rewrites.</p>

<p>This helps a lot because the user might say “qka duhet per shpk”, while the law might talk about “shoqëri me përgjegjësi të kufizuar” or “Ligji për Shoqëritë Tregtare”.</p>

<p>That is the kind of detail that makes the difference between a nice demo and a tool people can actually use.</p>

<h2 id="performance-is-its-own-product-feature">Performance Is Its Own Product Feature</h2>

<p>RAG systems can get slow very fast, idk this part is where the nice demos usually start to break.</p>

<p>Every question can trigger query routing, embeddings, lexical search, vector search, reranking, and final answer generation. If you do that naively, the product feels broken even when the answer is correct.</p>

<p>So a lot of the engineering becomes unglamorous things:</p>

<blockquote>
  <p>cache query embeddings<br />
cache query rewrites<br />
keep retrieval candidate pools small<br />
cap how many chunks one document can dominate<br />
avoid sending too much text to the model<br />
use metadata filters before expensive ranking<br />
make repeated common questions cheap</p>
</blockquote>

<p>This is maybe the least flashy part of AI apps, but probably one of the most important. A legal assistant that takes forever is not good software. It is just a slow demo.</p>

<h2 id="citations-change-the-whole-design">Citations Change The Whole Design</h2>

<p>The answer is not the product by itself. The citation is part of the product.</p>

<p>When Neni answers, it needs to show where the answer came from. Which law. Which article. Which source. Ideally with the exact passage that supports the answer.</p>

<p>That requirement changes the architecture. You cannot just throw documents into a vector DB and hope. You need stable document records, version records, section records, chunk records, and enough metadata to trace every answer back to source.</p>

<p>This is also why I like building this as a legal platform, not only one Kosovo chatbot. The pattern should work for more countries later:</p>

<blockquote>
  <p>official sources first<br />
version the law<br />
index article-level chunks<br />
retrieve with filters<br />
answer only from cited evidence<br />
say “I do not have enough source text” when retrieval is weak</p>
</blockquote>

<p>Simple principle, but a lot of engineering underneath.</p>

<h2 id="what-i-learned">What I Learned</h2>

<p>The funny thing about building AI products now is that the model is not always the hard part.</p>

<p>The hard part is often everything around it: ingestion, data shape, retrieval quality, latency, trust, citations, evals, and all the boring infrastructure that makes the model useful.</p>

<p>Neni is still early, but this project made that very clear for me.</p>

<p>You can build a legal RAG demo in a day.</p>

<p>Building one that respects the structure of law, keeps sources traceable, handles messy language, stays fast, and knows when not to answer is a very different thing.</p>

<p>That is the part I find interesting.</p>]]></content><author><name>Edi Hasaj</name></author><category term="obsidian" /><category term="posts" /><category term="AI" /><category term="legal-tech" /><category term="RAG" /><category term="neni" /><summary type="html"><![CDATA[Some notes from building Neni, a legal research app for Kosovo law, and why the hard part was not the chat box but indexing, retrieval, citations, and performance.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://edihasaj.com/images/posts/neni-statute-atlas-2026-05-05.png" /><media:content medium="image" url="https://edihasaj.com/images/posts/neni-statute-atlas-2026-05-05.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">2026 is going to be the year of open source</title><link href="https://edihasaj.com/posts/2026-is-going-to-be-the-year-of-open-source" rel="alternate" type="text/html" title="2026 is going to be the year of open source" /><published>2026-05-04T00:00:00+00:00</published><updated>2026-05-04T00:00:00+00:00</updated><id>https://edihasaj.com/posts/2026-is-going-to-be-the-year-of-open-source</id><content type="html" xml:base="https://edihasaj.com/posts/2026-is-going-to-be-the-year-of-open-source"><![CDATA[<h1 id="2026-is-going-to-be-the-year-of-open-source">2026 is going to be the year of open source</h1>

<p>2026 is going to be the year of open source, at least for software builders. Not because everyone suddenly became more kind, but because the market is changing fast and a lot of the easy software is not worth hiding anymore.</p>

<p>Coding is becoming a commodity. I don’t mean good engineering is a commodity, that is still hard and you can see it every time some generated app looks fine but breaks as soon as you try to use it for something real. But a lot of software that people were selling before is now much easier to create. A small app, an internal tool, an admin dashboard, a simple RAG system, a browser extension, a wrapper around an API, some automation between two systems, a small agent that does one thing good enough.</p>

<p>Most of those things can be built with one good prompt if it is simple enough, or maybe with one week of agentic engineering if it has some moving pieces. That changes everything.</p>

<h2 id="easy-software-is-getting-hard-to-sell">easy software is getting hard to sell</h2>

<p>for the last years a lot of software was sold just because it was annoying to build. You had to setup auth, billing, database, email, deployment, some UI, docs, maybe permissions, maybe a few integrations, and suddenly even a simple app became a project. Now you can describe most of that to an agent and it will get you very far. Not perfect, but far enough.</p>

<p>And that is the important part. The market does not need perfect for every category. It needs useful, cheap, fast, and good enough to solve the problem. So if your whole product is basically “LLM + prompt + UI”, good luck keeping that closed and pretending it is some big moat. Someone will recreate it. Maybe worse, maybe better, but close enough for many users.</p>

<p>This is why small apps can be done within a day or two nowadays, and bigger things that used to take months can be pushed in weeks if you know what you are doing. It is still very easy to create slop. Actually it is easier than ever. But that also means the value is moving away from just writing code.</p>

<h2 id="the-value-is-not-only-the-code-anymore">the value is not only the code anymore</h2>

<p>The code is becoming cheaper. The context is not.</p>

<p>Knowing what to build is not cheap. Knowing the domain is not cheap. Knowing why a workflow is broken is not cheap. Knowing how users actually talk, what they expect, what data matters, what should be automated and what should stay manual, that is still the real work.</p>

<p>I see this all the time. When I built a small RAG system over Kosovo’s laws, the important part was not only embeddings and citations. Everyone can make a demo with that now. The important part was that people ask in everyday language, in Albanian, Gheg, English, no diacritics, slang, abbreviations, and they still expect the right article back. That is the product.</p>

<p>Same with AI over business data. “Chat with your database” is not enough anymore, everyone can do that badly. The hard part is schema-aware prompts, glossary builder, business vocabulary, permissions, joins, aggregations, and making the AI understand the way the company actually thinks about its own data. That is where the value is. The code helps, but the code alone is not the moat.</p>

<h2 id="open-source-makes-more-sense-now">open source makes more sense now</h2>

<p>If the code alone is not the moat, why keep everything closed? This is where I am shifting more and more. I want to open source more things, especially the tools, agents, libraries, scripts, workflows, and small platforms where more people can benefit from seeing how it works.</p>

<p>Not everything should be open source. Customer data should not. Some business logic should not. Security sensitive things should not. Some enterprise features or hosted operations can stay closed. There are still businesses that should sell closed software, no doubt about that.</p>

<p>But a lot of things we build are not in that category. They are useful because they show a pattern. They help someone move faster. They give someone a starting point. They connect two things that were annoying to connect. They make agents work better with real systems.</p>

<p>For those things, open source is probably the better default. People can inspect it, run it, fork it, fix it, and learn from it. It also forces the work to be more honest, because if the code is bad people will see it. In a world full of generated demos, that matters a lot.</p>

<h2 id="open-models-are-putting-pressure-everywhere">open models are putting pressure everywhere</h2>

<p>Open source models for coding are already making it harder for Anthropic, OpenAI and everyone else to keep coding as a huge profit center. Closed models are still very good. I still use them. Infrastructure matters. Reliability matters. Tool calls matter. Speed matters. Trust matters.</p>

<p>This is where people get too excited with benchmarks sometimes. Yes, a model can be cheap and score well, but have you actually used it for everyday complex work? If the API is unstable, slow, looping, bad with tools, or hosted somewhere you don’t trust with sensitive data, then the benchmark does not help much.</p>

<p>But the direction is clear. Open-weight models are getting better. Local and hosted options are getting better. Agent scaffolds are getting better. The gap is not what it was before. This means more builders will try open models first for the everyday work, and only use the expensive closed frontier models where they actually need them. That is healthy. It pushes everyone to get better.</p>

<h2 id="agents-change-the-shape-of-apps">agents change the shape of apps</h2>

<p>I also think people still underestimate how much agents will change apps. You don’t need a new OS anymore, you need a good agent.</p>

<p>The whole idea is to create apps and platforms that require no or minimal UI to do the things they do. Before you bloat 100 other pages, think one more time if that action should just be available to an agent. Your app is going to need AI integration eventually. Not a chatbot pasted on top.</p>

<p>Real access. Check billing, update users, pull reports, create invoices, search documents, route expenses, talk to your ERP, ask legal questions, whatever the system actually does. That is why I think the apps that survive are the ones that bridge this agentic way of communicating.</p>

<p>And a lot of that bridge should be open, because we all need better patterns for how agents talk to systems. MCPs are token hungry and not perfect, but they are a solid foundation for the future of communication between agents and systems. Same thing with open agents, memory systems, tool protocols, testing loops, browser control, CLI workflows. These things should be shared more.</p>

<h2 id="what-you-can-still-sell">what you can still sell</h2>

<p>Some people hear open source and think it means no business. That is not true.</p>

<p>You can still sell hosting. You can sell support. You can sell setup. You can sell enterprise controls. You can sell integrations. You can sell better reliability. You can sell managed infrastructure. You can sell the boring operational stuff that nobody wants to run themselves. You can sell the part where it actually works.</p>

<p>That is very different from selling a small hidden codebase and hoping nobody can rebuild it. In 2026, if something can be recreated by a one shot prompt or a short agentic coding session, maybe the code should not be the thing you protect. Maybe the code is the thing you give away, and the value is everything around it.</p>

<h2 id="this-is-the-shift">this is the shift</h2>

<p>everything is different now. Small apps are cheaper to build. Agents are improving fast. Open models are getting good enough for more real work. Users do not want 100 apps and 100 subscriptions for every small thing. Companies will need bridges between their systems and agents. Developers will want to inspect and trust what they run.</p>

<p>So for me, 2026 is the year to shift more towards open source. Not as charity, and not as some ideology. Just because it makes sense.</p>

<p>If the project is useful as a pattern, open it. If people can benefit from it, open it. If the real value is context, reliability, hosting, data, or domain knowledge, then open source will probably make the project stronger, not weaker.</p>

<p>The code is no longer always the thing to protect. Sometimes it is the thing that gets people to trust you.</p>]]></content><author><name>Edi Hasaj</name></author><category term="obsidian" /><category term="posts" /><category term="AI" /><category term="open-source" /><category term="software" /><category term="agents" /><summary type="html"><![CDATA[Software is getting cheaper to build, agents are getting better, and the things that are easy to create will be harder and harder to sell as closed source.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://edihasaj.com/images/pages/edi-og.png" /><media:content medium="image" url="https://edihasaj.com/images/pages/edi-og.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Blogging’s AI Evolution: Why Human Stories Matter More Than Ever</title><link href="https://edihasaj.com/posts/blogging-s-ai-evolution" rel="alternate" type="text/html" title="Blogging’s AI Evolution: Why Human Stories Matter More Than Ever" /><published>2025-07-18T00:00:00+00:00</published><updated>2025-07-18T00:00:00+00:00</updated><id>https://edihasaj.com/posts/blogging-s-ai-evolution</id><content type="html" xml:base="https://edihasaj.com/posts/blogging-s-ai-evolution"><![CDATA[<h1 id="bloggings-ai-evolution-why-human-stories-matter-more-than-ever">Blogging’s AI Evolution: Why Human Stories Matter More Than Ever</h1>

<p>Blogging isn’t dying in the age of artificial intelligence it’s transforming into something more powerful and personal than ever before. While <strong>80% of bloggers now use AI tools</strong> in 2024 (up from nearly zero in 2022), the most successful content creators are discovering that AI’s greatest value lies not in replacing human creativity, but in amplifying it. The future belongs to bloggers who can harness AI’s efficiency while doubling down on the uniquely human elements that no algorithm can replicate: authentic experience, emotional depth, and genuine connection.</p>

<p>This shift represents the most significant evolution in content creation since the internet’s birth. With over 600 million active blogs competing for attention from 4.4 billion readers worldwide, and search engines processing content through increasingly sophisticated AI systems, bloggers face both unprecedented opportunities and existential questions about their craft’s future.</p>

<h2 id="the-current-blogging-landscape-reveals-surprising-resilience">The current blogging landscape reveals surprising resilience</h2>

<p>Despite predictions of AI-driven obsolescence, blogging remains remarkably robust. <strong>83% of internet users read blogs regularly</strong>, generating 7.5 million new posts daily and 3 billion annually. The average blog post now spans 1,400 words and takes 3.8 hours to create a 77% increase in length from a decade ago, suggesting readers still crave substantial, thoughtful content.</p>

<p>However, the landscape is undeniably challenging. Only <strong>20% of bloggers report strong results</strong> today, down from 30% five years ago. The culprit isn’t AI competition alone it’s the collision of multiple forces: algorithm changes, content saturation, and evolving reader expectations. Google’s March 2024 core update alone reduced low-quality content by 45%, while 53% of bloggers report increased difficulty attracting search traffic.</p>

<p>Yet successful bloggers are thriving. Those earning $50,000+ annually increasingly focus on <strong>selling their own products and services</strong> rather than relying solely on advertising, with food bloggers averaging $9,169 monthly and established bloggers (10+ years) earning $5,625 monthly on average. The winners aren’t just surviving the AI revolution they’re leveraging it.</p>

<h2 id="ai-transforms-workflow-efficiency-while-exposing-creativity-gaps">AI transforms workflow efficiency while exposing creativity gaps</h2>

<p>The numbers tell a compelling story of rapid AI adoption. From virtually zero usage in 2022, AI tools now assist 80% of bloggers with everything from brainstorming to final editing. <strong>ChatGPT leads adoption</strong> among content creators, followed by Claude for technical writing and Jasper for marketing copy. The productivity gains are substantial: <strong>30% reduction in content creation time</strong> and <strong>50% increase in output volume</strong>.</p>

<p>But AI’s limitations quickly become apparent to serious bloggers. Current tools excel at pattern recognition and template-based content but struggle with <strong>emotional intelligence, cultural nuance, and genuine creativity</strong>. They’re prone to hallucinations, repetitive phrasing, and the kind of generic insights that make readers’ eyes glaze over. As one successful blogger noted, “I delete 20-60% of AI-generated content as fluff, then rewrite the introduction entirely to add real expertise.”</p>

<p>The most effective approach emerging is <strong>strategic collaboration</strong>: AI handles research, initial drafts, and routine optimization, while humans provide the insights, personality, and expertise that create genuine value. This hybrid model allows bloggers to maintain their authentic voice while dramatically improving efficiency.</p>

<h2 id="the-authenticity-question-becomes-more-urgent-not-less">The authenticity question becomes more urgent, not less</h2>

<p>Contrary to expectations, the rise of AI content has made human authenticity more valuable, not less. Industry experts consistently emphasize that <strong>emotional connection remains crucial</strong> as the market floods with “lifeless, AI creations.” AJ Wilcox, founder of B2Linked.com, argues that “making an emotional connection is going to be more important than ever.”</p>

<p>This creates a paradox: while AI makes content creation more efficient, it simultaneously raises the bar for human creativity. Readers can increasingly distinguish between genuine insight and algorithmic output. The most successful bloggers are responding by <strong>leaning into their humanity</strong> sharing personal experiences, offering contrarian viewpoints, and providing the kind of cultural context and emotional depth that AI cannot replicate.</p>

<p>Trust metrics reflect this shift. Only <strong>4% of B2B marketers have high trust in AI outputs</strong>, while 67% maintain medium trust levels. This suggests that while AI serves as a valuable tool, human oversight and creativity remain essential for building reader confidence and engagement.</p>

<h2 id="search-engines-reward-quality-over-creation-method">Search engines reward quality over creation method</h2>

<p>Google’s official stance provides clarity amidst the confusion: <strong>AI content isn’t inherently penalized</strong> if it demonstrates quality, expertise, and user value. The search giant’s E-E-A-T guidelines (Experience, Expertise, Authoritativeness, Trustworthiness) apply regardless of creation method, focusing on content quality rather than its origins.</p>

<p>However, the reality is more nuanced. Research analyzing 487 search results found that <strong>83% of top Google results are not AI-generated</strong>. Google’s updated Quality Rater Guidelines specifically instruct reviewers to give “Lowest” ratings to content that is “all or almost all” AI-generated with little human effort or added value.</p>

<p>The March 2024 algorithm update targeted “scaled content abuse” essentially, mass-produced AI content created primarily for search manipulation. Meanwhile, <strong>AI Overviews now appear in 12.47% of searches</strong>, potentially reducing click-through rates by 34.5% for traditional results. This suggests that while quality AI content can succeed, the bar for success is rising rapidly.</p>

<h2 id="successful-bloggers-are-pioneering-hybrid-strategies">Successful bloggers are pioneering hybrid strategies</h2>

<p>Real-world success stories illuminate the path forward. One blogger launched a new site in January 2023 and reached <strong>$15,000+ monthly revenue</strong> by October using 90%+ AI-generated content but with extensive human editing, completely rewritten introductions, and significant content reduction to eliminate “fluff.”</p>

<p>The most successful approaches share common elements: <strong>AI for efficiency, humans for authenticity</strong>. Successful bloggers use AI for research, initial drafts, and optimization, then apply human expertise for voice, tone, fact-checking, and strategic direction. They’re building what experts call “collaborative AI workspaces” that maintain editorial standards while scaling production.</p>

<p>Platform policies support this hybrid approach. Rather than banning AI content, major platforms require <strong>transparency and quality control</strong>. YouTube mandates labeling for realistic AI-generated content, while WordPress offers extensive AI plugin ecosystems. The message is clear: AI assistance is acceptable, but human oversight and disclosure are essential.</p>

<h2 id="the-future-points-toward-collaboration-not-replacement">The future points toward collaboration, not replacement</h2>

<p>Looking ahead, industry experts predict that <strong>AI will become a baseline expectation</strong> rather than a competitive advantage by 2025. Andy Crestodina of Orbit Media anticipates that “content marketers will start optimizing their content to appear in AI responses,” similar to how brands adapted to search engine optimization in the late 1990s.</p>

<p>This shift suggests a fundamental change in how content discovery works. With Google’s search market share declining and AI-native search engines like Perplexity and SearchGPT gaining traction, bloggers must prepare for a <strong>multi-platform content strategy</strong> that optimizes for both traditional search and conversational AI interfaces.</p>

<p>The brands winning this transition will be those who, as one expert put it, “never forget that tech is the tool, but people are the point.” They’ll use AI to enhance human creativity rather than replace it, focusing on building genuine community connections and providing unique value that algorithms cannot replicate.</p>

<h2 id="practical-strategies-for-the-ai-enhanced-blogging-future">Practical strategies for the AI-enhanced blogging future</h2>

<p>For current and aspiring bloggers, the path forward requires strategic adaptation rather than wholesale change. <strong>Start with quality over quantity</strong> use AI to improve efficiency, but maintain rigorous editorial standards and human oversight. Focus on developing your unique voice and expertise in specific niches where personal experience and insights create genuine value.</p>

<p><strong>Embrace the hybrid workflow</strong>: Use AI for research, initial drafts, and routine optimization, but ensure every published piece reflects your authentic perspective and expertise. Invest time in fact-checking AI outputs and adding the personal touches that create emotional connection with readers.</p>

<p><strong>Diversify your platform strategy</strong> beyond traditional search. Build email lists, engage on social media, and consider platforms like Substack or Medium that prioritize direct reader relationships. As search continues evolving, direct audience connections become increasingly valuable.</p>

<p><strong>Stay informed about AI detection and quality standards</strong>. Search engines are becoming more sophisticated at identifying and potentially penalizing low-quality automated content. Focus on creating content that would be valuable to readers regardless of how it was created.</p>

<h2 id="the-human-element-remains-irreplaceable">The human element remains irreplaceable</h2>

<p>The future of blogging in the LLM era isn’t about choosing between human creativity and AI efficiency it’s about combining them strategically. While AI can handle routine tasks and provide research assistance, the elements that make content truly compelling authentic experience, emotional resonance, unique insights, and genuine expertise remain distinctly human.</p>

<p>The bloggers who thrive will be those who view AI as a powerful assistant rather than a replacement, using it to amplify their human capabilities while maintaining the authentic voice and genuine expertise that readers seek. In an increasingly automated world, the human touch doesn’t become less valuable it becomes precious.</p>

<p>The blogging industry’s resilience through this transformation suggests that while the tools and techniques may evolve, the fundamental human need for authentic connection, genuine expertise, and compelling storytelling remains constant. The future belongs to those who can harness AI’s efficiency while preserving the irreplaceable elements that make content truly human.</p>]]></content><author><name>Edi Hasaj</name></author><category term="obsidian" /><category term="posts" /><category term="Blogging" /><category term="AI" /><category term="Content" /><category term="Writing" /><category term="SEO" /><summary type="html"><![CDATA[Blogging isn't dying in the AI era it's evolving into something more powerful. While 80% of bloggers now use AI tools, the most successful creators are discovering that artificial intelligence works best as a creative amplifier, not a replacement. As search engines flood with algorithmic content, human authenticity, genuine expertise, and emotional connection become more valuable than ever. This comprehensive guide explores how smart bloggers are building hybrid workflows that combine AI efficiency with irreplaceable human creativity, revealing why the future belongs to those who can harness technology while doubling down on what makes them uniquely human.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://edihasaj.com/images/pages/blogging-ai.jpg" /><media:content medium="image" url="https://edihasaj.com/images/pages/blogging-ai.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The Engineer’s Guide to ERP Migration: How We Left Dynamics NAV Behind</title><link href="https://edihasaj.com/posts/the-engineers-guide-to-erp-migraition-how-we-left-navision-behind" rel="alternate" type="text/html" title="The Engineer’s Guide to ERP Migration: How We Left Dynamics NAV Behind" /><published>2025-04-05T13:21:00+00:00</published><updated>2025-04-05T13:21:00+00:00</updated><id>https://edihasaj.com/posts/the-engineers-guide-to-erp-migraition-how-we-left-navision-behind</id><content type="html" xml:base="https://edihasaj.com/posts/the-engineers-guide-to-erp-migraition-how-we-left-navision-behind"><![CDATA[<h2 id="introduction">Introduction</h2>

<p>Dynamics NAV, also known as Navision, is a robust ERP (Enterprise Resource Planning) software backed by Microsoft. <strong>Ur&amp;Penn</strong> had been using Dynamics NAV for 14 years, where it became deeply ingrained in their daily operations, from Product and WMS to customer interactions via different channels like POS, E-Commerce, Self-Checkout, and more. <strong>Ur&amp;Penn</strong> is one of the largest retailers in the Nordics, particularly in Sweden, exemplifying innovation and success in applying technology to solve problems and improve employee efficiency.</p>

<p>Building your own ERP is rarely discussed (initially, people called our plans “ambitious” 🤓) due to the complexity and risks involved. However, <strong>Ur&amp;Penn</strong> boldly ventured into this territory because of the limitations inherent in large-scale commercial software. Specifically, NAV is a generic system that doesn’t easily adapt to the unique needs of each business. While it’s certainly impressive (there’s a reason it’s the most widely used ERP system globally, though its database architecture leaves much to be desired 😬), it makes workflows unnecessarily complex and difficult.</p>

<p>Implementing changes or adding functionality typically takes weeks or months, requiring navigation through multiple channels of resellers or authorized NAV providers. The costs are substantial, which would be acceptable if the software offered comprehensive functionality, but it doesn’t. The POS system must be built or purchased separately, as must human resources modules, finance modules for EU companies, and numerous external integrations that are cumbersome to implement. Due to the high costs, developmental challenges, and endless maintenance requirements for the IT department, the decision to build a custom ERP was set in motion.</p>

<p>As the lead software engineer for this migration project, I was responsible for designing the architecture of our ERP solution, development, and orchestrating the complex data migration from Dynamics NAV. My role involved not only writing code but also mapping business processes, collaborating with department heads to understand their specific needs, and ensuring that historical data remained intact and accessible in the new system while implementing new features. With over 10 years of experience in enterprise software development, I approached this project by first understanding the limitations of our existing NAV implementation, then creating a migration roadmap that minimized business disruption while maximizing the benefits of our tailored solution.</p>

<p>Due to the transactional nature of ERPs, I couldn’t adopt a pure microservice-based architecture. Instead, I implemented a monolithic structure with microservice elements, keeping the core ERP functionality clean and robust for all other modules through what I call “Pipelines”. For example, a sales pipeline that applies consistent logic regardless of the sales channel, based on various parameters. We maintained separate APIs for POS (already developed by Ur&amp;Penn’s excellent team), Self-checkout, E-commerce, HR, Finance, Brand Management, Supplier Relations, Customer Service, and many other integrations.</p>

<p>In this blog post, I’ll share my experience and the challenges we faced while successfully migrating such a complex system, which might prove useful for other technical professionals considering a similar path.</p>

<hr />

<h2 id="technical-architecture--design-decisions">Technical Architecture &amp; Design Decisions</h2>

<p>I selected a monolithic architecture because ERP systems are highly transactional and must maintain atomicity. With our available resources (both for development and time constraints), I couldn’t justify experimenting with newer architectural patterns. The monolithic approach allowed us to build core ERP functionality for Inventory (including Purchasing), Sales, WMS, Finance, Ledgers, and other essential functions within a single core engine. Meanwhile, Ur&amp;Penn specific requirements were developed in external APIs, keeping the core more generic and giving us flexibility for necessary integrations without compromising the system’s integrity. This approach enabled us to build an MVP that we successfully launched, with plans for post-implementation improvements, a decision that proved advantageous.</p>

<p>My “Pipelines” concept became one of the most intelligent aspects of the ERP design. These pipelines incorporate one-time validations and logical decision points that handle complex chains of processes affecting multiple components simultaneously, determining whether steps should involve postings, orders, or other actions based on various parameters. This approach saved considerable time when integrating systems like POS with different logical scenarios but identical posting requirements, as we could reuse logic without additional development. The pipelines are sophisticated enough to selectively use transactions based on parameters, making them versatile for chained operations from other pipelines or standalone use (directly from endpoints/mutations).</p>

<p>I prioritized open-source software to avoid licensing complications, selecting PostgreSQL as our primary database, Redis for NoSQL requirements, and Meilisearch as our search engine (sorry, Elasticsearch). For the backend, we implemented JavaScript with NestJS as our core framework. The main API uses GraphQL for 99% of operations, significantly enhancing performance and reducing maintenance needs. We also utilize Python extensively, building supplementary APIs in Python (including ML inference APIs), PHP for POS and existing integrations, and ReactJS for the frontend.</p>

<p>The entire system was designed for serverless deployment from the ground up, allowing easy migration between cloud providers. I initially planned for Kubernetes deployment, but given Ur&amp;Penn’s deep integration with Google Cloud services, we leveraged Google’s serverless offerings. Everything connects to version control with automatic deployment to different environments based on branch via CI/CD. Pre-publication checks ensure zero downtime, complemented by load balancing and automatic scaling.</p>

<p>While I acknowledge NAV’s strengths in reliability and stability once configured, its flaws became most apparent during data migration. The unusual database table structure, lacking defined constraints between table key connections, offers flexibility for adding records but creates challenges for maintaining state and relationships during data insertion. The absence of foreign key constraints caused numerous issues as we encountered deleted data referenced in other tables. After 14 years of data accumulation, Ur&amp;Penn had adapted to this approach, but we couldn’t import records where foreign key connections were missing. This represents one of NAV’s most significant limitations, though it may seem inconsequential to non-technical users once initially configured (despite occasionally affecting business operations).</p>

<hr />

<h2 id="migration-strategy--implementation">Migration Strategy &amp; Implementation</h2>

<p>Data migration emerged as our primary challenge, delaying our go-live date multiple times due to various complexities. We needed to ensure minimal downtime since Ur&amp;Penn couldn’t close stores during the migration process, requiring creative solutions to transition without operational interruption, which we successfully achieved.</p>

<p>NAV’s lack of key constraints between tables necessitated building a specialized API for data migration. This API mapped all necessary fields from NAV to our ERP, verifying each field’s existence before posting. This process was exceptionally time-consuming due to missing data and fundamental differences in how our ERP handles organizational structures compared to NAV, which doesn’t support multi-company profiles effectively. The approach was necessary given our time constraints and NAV’s SQL implementation without auto-generated keys. Since Ur&amp;Penn operates multiple companies across the Nordic region, we developed logic to consolidate everything into a unified system with hierarchical data structures for master data, organizational entities, locations, warehouses, stores, and other dimensional data.</p>

<p>Testing and validation were conducted manually due to the associated risks and lack of direct comparison methods between systems. Team members, including Purchasers and Inventory Managers, inspected segments of the new system to verify data accuracy (a process that significantly improved our migration procedures, though it required multiple iterations). We performed comprehensive checks across inventory, ledgers, purchases, sales, transfers, items, and other critical areas.</p>

<p>Since we couldn’t simply deactivate one system while activating the other, particularly with e-commerce orders flowing continuously worldwide, we developed a superior approach, albeit one requiring additional effort. We maintained both systems concurrently, establishing a cutoff date after which all integrations would transmit data to both platforms. This strategy enabled us to keep NAV operational as a contingency while validating data and identifying discrepancies in the new system through comparative analysis. This insightful decision from Ur&amp;Penn’s IT department head proved invaluable. We migrated all data preceding the cutoff date and ensured proper connections between pre- and post-cutoff information. For ongoing open orders, we conducted manual data migration cleanup over a sleepless 48-hour weekend. This manual intervention was necessary because NAV structures Sales, Transfers, and Purchases differently across multiple tables, and we determined that developing a comprehensive automated solution for relatively few open orders (with missing foreign key constraints) wasn’t worth the additional development effort.</p>

<hr />

<h2 id="challenges--lessons-learned">Challenges &amp; Lessons Learned</h2>

<p>Throughout the migration, we encountered numerous challenges. While logical issues were quickly resolved, data problems repeatedly delayed our launch. I learned several valuable lessons through difficult experiences, particularly regarding the importance of thorough initial data analysis. In retrospect, implementing a direct one-time connection migration from NAV to our ERP database, rather than the continuous line-by-line posting approach I chose, would have saved considerable time and reduced complications.</p>

<p>Integration with external systems presented another significant challenge. We maintain numerous integrations with platforms including gift card systems, our custom-built price and promotion engine, HR systems, robotic WMS lifts, AI-powered order forecasting and creation, self-checkout solutions, POS, and many others. I initially estimated completing these integrations within 1-2 months, but they required substantially more time due to dependencies on external validation, coordination meetings, and procedural constraints. This area offers opportunities for improvement by developing more efficient migration methodologies.</p>

<p>Resource limitations also affected our progress. Despite initially having numerous developers available, we encountered difficulties because ERPs represent complex logical systems, and developers struggled to implement logic according to requirements. This created a dilemma: either invest 3-6 months teaching developers ERP process flows or review and revise their work afterward (which proved time-consuming and stressful). While acknowledging potential bias, I believe distributing work across more developers often generates additional meetings, reviews, and coordination overhead, potentially slowing integration, particularly for comprehensive system migrations rather than maintenance or feature development.</p>

<p>Today, Ur&amp;Penn operates with unprecedented time and cost efficiency, and this represents just the beginning since the ERP forms the core and intelligence driving the business forward. Our platforms are intuitive and straightforward, even in MVP form, with plans to implement additional features and functionality that enable employees to focus on objectives rather than processes.</p>

<p>The platform is sophisticated yet architected to facilitate easy feature additions and extensions, both from development and user perspectives. This design reflects 14 years of Ur&amp;Penn’s operational experience and my decade of translating business requirements into functional code.</p>

<h2 id="conclusion">Conclusion</h2>

<p>The journey of migrating from Dynamics NAV to our custom ERP solution has been challenging but immensely rewarding. What started as an “ambitious” plan has transformed into a tailored system that truly fits Ur&amp;Penn’s unique retail operations across the Nordics. By embracing the challenge of building our own solution, we’ve eliminated the limitations, high costs, and endless maintenance cycles that came with our 14-year NAV implementation.</p>

<p>The custom ERP we’ve built provides Ur&amp;Penn with complete control over their technology stack, faster implementation of new features, and significantly reduced operational costs. Our hybrid architecture approach with a monolithic core and specialized external APIs has proven to be the right choice for balancing transactional integrity with business flexibility.</p>

<p>For those considering a similar path, I’d emphasize that building a custom ERP isn’t for everyone. It requires deep technical expertise, business domain knowledge, and leadership support. But for organizations with unique processes and a commitment to long-term technological independence, it can be transformative.</p>

<p>Looking ahead, we’re continuing to refine our system, adding new capabilities and optimizations that simply wouldn’t have been possible or would have been prohibitively expensive with our previous setup. What we’ve created isn’t just a replacement for NAV. It’s a foundation for Ur&amp;Penn’s future innovation and growth.</p>

<p>If you’re considering a similar migration or have questions about our approach, I’d be happy to connect and share more detailed insights from our experience.</p>]]></content><author><name>Edi Hasaj</name></author><category term="obsidian" /><category term="posts" /><category term="programming" /><category term="erp" /><category term="migration" /><category term="navision" /><category term="engineering" /><summary type="html"><![CDATA[When Ur&Penn decided to leave Microsoft Dynamics NAV after 14 years, most called us "ambitious" for building our own ERP from scratch. As the lead engineer responsible for this migration, I designed a hybrid architecture that eliminated the limitations of off-the-shelf solutions while preserving years of business-critical data. This is the untold technical story of how we successfully migrated a complex retail operation serving the Nordic region to a custom ERP system without disrupting daily operations. I'll share the architecture decisions, migration strategies, and unexpected challenges that shaped our journey away from NAV toward a solution perfectly tailored to our unique retail needs.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://edihasaj.com/images/pages/city-enterprise.jpg" /><media:content medium="image" url="https://edihasaj.com/images/pages/city-enterprise.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>