Anthill
A collaborative research-paper editor where AI agents edit the document directly over a shared Yjs CRDT, with citations grounded against the actual cited PDF.
MOTIVATION
Why put AI inside the document?
Writing a research paper is mostly context-switching, and the tools we use treat AI like a sidebar.
A normal revision loop looks like this. Draft a paragraph. Tab away to find prior work. Tab away again to format the citation. Tab away a third time to read the reviewer's email and figure out which paragraph it's actually about. By the time you get back to the document, you've forgotten what you were trying to say. Most of the published research about "AI for writing" has measured the wrong thing: it has measured the quality of the suggestion, not the cost of leaving the page to ask for one.
Today's AI writing tools all share the same shape. A chat panel sits next to the document and talks about it without ever touching it. The chat suggests, the user copies, the user pastes, the user formats. The chrome is different but the cost is the same.
Anthill is a research-paper editor that puts the AI on the same surface as the human. It is built on Plate over a Yjs CRDT, and it exposes a small HTTP bridge that lets agents read the live document and write back into it as Yjs transactions. The browser sees agent edits the same way it sees a collaborator's edits: they appear in real time, with an avatar in the corner attributing the change. There is no copy, no paste, and no sidebar. Citations, reviewer feedback, and literature search all happen on the same page as the prose.
It was built over a weekend at the Anthropic × Nia × AgentMail hackathon, and the post below is the system end-to-end: the embedding store, the agent bridge, the auto-cite plugin, Nia-grounded verification, and the review-response agent.
OVERVIEW
The four pieces
A Plate editor in the browser, a Bun process holding the CRDT, a FastAPI backend behind a dev tunnel, and one shared Postgres.
Editor (web)
A Next.js + Plate editor with a citation-suggest plugin, a five-state citation badge, an agents panel, and a "connect agent" dialog. Live edits flow over a WebSocket; agent runs stream over Server-Sent Events. The browser never talks to the FastAPI backend or to Nia directly.
Collab (collab/)
One Bun process exposing two ports. Hocuspocus on the WebSocket handles Yjs sync and presence. The agent bridge on :8889 accepts HTTP edit ops and writes them into the same Y.Doc via openDirectConnection. That colocation is what makes "agents and humans share one CRDT" cheap.
Backend (backend/)
FastAPI on a laptop, exposed via dev tunnel. Hosts the Harrier-embedded local arXiv corpus, the /search and /embed endpoints, and an agent runner that spawns one of eight named agents per call and streams events back through a per-run SSE channel.
Y.Doc.
CORPUS
A thousand papers, embedded locally
The retrieval substrate is ~1,000 cs.AI papers, embedded with Harrier-OSS-v1 270M through llama.cpp.
At hackathon scale you embed once and query a thousand times, and the GPU sits right under the desk. So the embedding step lives in the same FastAPI process that serves search, and the index is a single Chroma collection on disk. The chunker walks each PDF with PyMuPDF, slides a 512-word window with 64-word overlap, and writes (arxiv_id, chunk_index, char_start, char_end) alongside every vector. Storing char offsets per chunk costs nothing during ingest and pays for itself the first time someone wants a citation badge that deep-links to the source PDF.
Two implementation gotchas were worth more time than the rest of the pipeline combined. Harrier expects an instruction prefix on every query ("Instruct: Given a scientific query, retrieve relevant paper passages\nQuery: "); without it the cosine scores collapse because the query vectors land in a different region of the space than the indexed chunks. And llama.cpp's batched embedding path errors on this model, so the script falls back to one create_embedding(text) call per chunk. Slower, reliable, and worth the warm-up time on the first /search request.
AUTO-CITE
Type, pause, Tab
The smallest moving part in the system, and the one that makes the editor feel alive.
The Plate plugin in citation-suggest-kit.tsx tracks the active block in module-scope refs (no React re-renders on every keystroke). After 1.2 seconds of inactivity on a paragraph with at least 30 characters, the block text is sent to the FastAPI /search endpoint via a Next.js proxy that holds the shared secret. The top hit becomes a ghost-text pill anchored at the caret. Tab inserts it as a Plate citation inline element carrying the full search trace. Esc dismisses, and the plugin remembers the dismissed text in a Map<blockId, lastQueriedText> so it doesn't re-fire on the next keystroke that lands the cursor back on the block.
DEFAULT_OPTIONS in the plugin source.
Three small behaviors took most of the time. The plugin re-checks refs.blockId and refs.blockText when the response comes back, so if the user kept typing while the request was in flight, the result is dropped and no ghost pill appears. The same memoization map prevents re-querying an unchanged paragraph after every cursor move. And when the top-k contains several closely-tied scores (top.score - score ≤ 0.08), Tab inserts up to maxInsert = 3 badges instead of one, because most paragraphs cite one paper but the ones that need a cluster really need a cluster.
The inserted citation node carries the entire search trace (query, latency, top-k candidates), so clicking the badge opens a popover that shows why the agent picked this paper. That trace is the affordance that makes auto-cite trustworthy enough to leave on while you write.
VERIFICATION
Five badge states
Embeddings are good for "probably relevant" and bad for "actually supports the claim". That's where Nia comes in.
The moment a citation is accepted, the editor fires a ground_citation agent run with the inserted node's identity and the surrounding paragraph as a claim. Nia's document/agent endpoint reads the actual cited PDF and returns a structured verdict against a JSON schema we hand it. The browser is listening on the agent's SSE stream; when a finding event arrives with kind: 'grounded_citation', the verification driver finds the matching node by (arxivId, chunkIndex, searchedAt) (stable across Yjs reorderings) and merges the verdict into its verification field. The badge re-renders in place.
“We find that retrieval-augmented generation reduces hallucination by 39% on the long-tail subset of TriviaQA-Web compared to the no-retrieval baseline.”
not_ready exists because Nia silently hallucinates if you query a still-indexing source.
The schema we send is intentionally strict, so the agent has to answer the verification question rather than wax on. Page number and section path nullable, confidence bounded to [0, 1], exact quote required (with the contract that it must be verbatim, not paraphrased).
{
"type": "object",
"properties": {
"supports_claim": {
"type": "boolean",
"description": "True if the cited paper directly supports..."
},
"exact_quote": {
"type": "string",
"description": "Verbatim sentence — no paraphrase. Empty if none."
},
"page_number": { "type": ["integer", "null"] },
"section_path": { "type": ["string", "null"] },
"confidence": { "type": "number", "minimum": 0, "maximum": 1 },
"rationale": { "type": "string" }
},
"required": ["supports_claim", "exact_quote",
"confidence", "rationale"]
}{
"supports_claim": true,
"exact_quote": "We find that retrieval-augmented
generation reduces hallucination by 39% on the
long-tail subset of TriviaQA-Web compared to the
no-retrieval baseline.",
"page_number": 7,
"section_path": "Methods > Architecture",
"confidence": 0.91,
"rationale": "Quote directly compares the two
baselines on the cited benchmark."
}nia_client.py.
BRIDGE
How agents share the CRDT
The naive way to put an agent in a doc is to give it a tool that calls setValue(plate_value). That nukes anyone else's in-flight edits. The Anthill bridge does something more careful.
The bridge opens a Hocuspocus direct connection to the live document and mutates the Yjs fragment inside a Y.transact block, with the agent's identity stamped on the origin so undo and presence stay sane. It is a small Bun HTTP server with a deliberately small surface: discovery, snapshot, full state, edit, presence, and a dev-only repair endpoint. Edits are validated, applied as a single transaction, and broadcast through the same WebSocket every browser is already on.
{
"ops": [{
"type": "appendInline",
"ref": "b3",
"element": {
"type": "citation",
"arxivId": "2305.06983",
"chunkIndex": 4,
"title": "CodeT5+",
"score": 0.79,
"children": [{ "text": "" }]
}
}]
}collab/src/types.ts exactly.
Stable block refs
Every snapshot exposes blocks as b1, b2, ... by ordinal position. Agents reason about "the third paragraph", not about Slate paths that shift on every keystroke. Snapshots also carry inline children (citations, mentions) so the agent knows what would be lost on a destructive edit.
Optimistic locking
Every snapshot returns a baseRevision: a content-hashed token derived from the Yjs state. Pass it on POST /edit and the bridge refuses stale writes with 409 STALE_REVISION. That's how we avoid an agent stomping on a human typing in the same paragraph.
Idempotency keys
Every edit requires an Idempotency-Key. Same key with the same body returns the cached response, so a flaky network never double-applies. Same key with a different body returns 409 IDEMPOTENCY_KEY_REUSED_DIFFERENT_BODY, so a buggy agent can't quietly mutate history.
Carry-inlines on destructive edits
replaceBlock and setBlockText auto-reattach existing inline citations to the new text. deleteBlock refuses with 409 INLINE_ELEMENTS_WOULD_BE_LOST when the block carries inlines, unless the agent passes dropInlineElements: true. Agents reading text-only snapshots rarely know inlines exist; silently losing them would destroy the bibliography.
Because the bridge is just HTTP and a documented op vocabulary, any agent can drive the document. The "Connect agent" dialog in the editor hands you a copy-paste prompt prefilled with this document's bridge URL, ID, headers, and op reference, ready to paste into Claude Code or ChatGPT. Within a minute of opening the dialog, you can have an external LLM rewriting your introduction in real time.
https://collab.anthill.app:8889Document ID003cb3da-9f17-4c44-9d3e-2a8e0f1b7e1cProtocolanthill-agent-bridge/1PRESENCE
One CRDT, many cursors
Agents publish to the same Yjs awareness map every human is on. The editor doesn't care what kind of client you are.
Hocuspocus is the only thing that owns presence. Browsers publish their cursors through the standard @platejs/yjs awareness binding. Agents publish through the bridge's POST /presence endpoint, which writes into the same awareness map from the bridge's openDirectConnection handle. The avatar stack in the top-right of the editor renders both the same way (a colored circle with initials, agents marked with a soft pulse ring).
SEARCH
Plan, discover, rank
Auto-cite handles paragraphs you've written. The literature-search agent handles the inverse: "I want to write about X, what's out there?"
The agent in literature_search.py is a planner-then-searcher with optional discovery. The planner uses gpt-4o-mini (cheap, fast, perfectly fine at this; the rest of the system uses Claude Sonnet 4 but the planner doesn't need it) to fan the topic out into 4 short sub-queries. Each sub-query gets its own Chroma top-k pass, and the results are merged best-per-paper so you don't see the same arXiv ID four times.
plan_done event.
The interesting bit is discovery. If the user opts in, the agent also queries arXiv directly for fresh papers that aren't in the local Chroma yet. Each candidate is downloaded, chunked, embedded with Harrier, and inserted into the same collection during the run. By the time the search step runs, the new papers are queryable like everything else. Every step of the way (plan, discover, discover_ingest, search, rank, each finding) is published as an SSE event, so the agents panel renders the run live: planner queries appear, new papers download, the rankings update.
REVIEW
Reviewer email to tracked changes
A peer reviewer sends an email. Four tracked-change cards appear in the doc, anchored to the right paragraphs.
This is the AgentMail integration, and it is the feature that most concretely shows the value of the bridge: an external event becomes a document event with no human in the loop. The shipped UX is a paste-text form in the agents panel; the agent itself accepts both modes (paste-text or AgentMail by inbox_id+message_id), and the bridge end of the pipeline is identical either way.
The agent in review_response.py snapshots the doc, asks Claude Sonnet 4 to map the email into a structured action list ({kind, anchor_ref, replacement, rationale}), validates each anchor_ref against the snapshot (Claude doesn't get to invent block IDs), and posts each surviving action through the bridge as an addNote op with idempotency key review:{run_id}:{anchor_ref}:{kind}. Same key on retry, same response, no double-applied edits.
The crucial UX decision is that the agent never edits prose destructively. Every textual change goes in as an addNote with kind: 'suggestion', rendered in the editor as an accept/reject card anchored under the original block. It is the peer-review experience everyone already knows from Word, with an LLM as the proposer instead of a human.
REFLECTION