Priyanshu Mahey.

Fleur

An AI-native code editor that showcases how AI should be integrated into development workflows.

Fleur AI Code Editor
TimelineJanuary 2026
StackReact · Rust · Python · gRPC
FocusAI · Code Intelligence · Embeddings
priyanshumahey/fleurGitHub Repository
TypeScript48.7%
Python25%
Rust21.9%
Jupyter Notebook2%
CSS1.1%
JavaScript0.7%
Other0.6%

MOTIVATION

Why build an AI code editor from scratch?

AI-assisted coding tools have become ubiquitous — GitHub Copilot, Cursor, Windsurf — but most developers interact with them as black boxes. You type, a suggestion appears, and you either accept it or don't. The underlying mechanics of how these tools understand your code, what makes a prediction good, and why certain suggestions appear at certain moments remain opaque.

I wanted to peel back that abstraction. Not just to use AI coding tools, but to build one — understanding every layer from file watching and codebase indexing, through embedding generation and semantic retrieval, to next-edit prediction and agentic code generation.

Fleur is the result: a fully functional AI-native code editor built with Tauri, React, Rust, and Python. It implements the three core capabilities found in modern AI editors — codebase indexing, next-edit prediction, and an AI copilot — each built from scratch with local models, so every piece of the pipeline is visible and hackable.


OVERVIEW

Core Capabilities

Three pillars define how AI integrates into the editing experience. Each one required a different approach to model selection, data flow, and UX.

Codebase Indexing

A file watcher streams changes through a Rust → Python pipeline. Tree-sitter parses ASTs into semantic chunks, Jina Code embeds them into 768-dimensional vectors, and ChromaDB stores them for instant semantic search.

Next Edit Prediction

After a 2-second typing pause, the editor sends your original file, current edits, and recent diffs to Sweep's Next-Edit 1.5B model. The predicted file is diffed line-by-line and surfaced as native Monaco ghost text.

AI Copilot

An agentic assistant powered by Azure-hosted models with full codebase context via RAG retrieval. It can reason over multiple files, generate code, and provide inline suggestions — all backed by semantic search over the indexed codebase.


ARCHITECTURE

Editor Anatomy

Fleur is a desktop application built on Tauri, with a React + Monaco frontend and a Rust backend that communicates with Python AI services over gRPC. This split lets the UI stay fast and responsive while heavy inference runs in separate processes.

Editor Anatomy7 layers
Backend
|fleur
TypeScript|Ln 7, Col 22
Copilot
Copilot
I'll refactor the Editor component to use the new hooks...
read_file→ Editor.tsx
Editor.tsxmodified
- const [code] = useState()
+ const [code] = useEditor()
Ask Copilot...
Terminalzsh
~/fleur $ bun dev
▶ Starting Tauri development server...
Ready on http://localhost:3000
~/fleur $
Editor.tsx
Chat.tsx
1import { useState } from 'react'
2
3export function Editor() {
4 const [code, setCode] = useState('')
5
6 return (
7 <MonacoEditor
8 value={code}
9 language="typescript"
Explorer
src
components
Editor.tsx
Chat.tsx
Terminal.tsx
lib
app
package.json
tsconfig.json
Fleur
Interactive exploded view of the editor — hover to explore each layer of the architecture.
INDEXING

Code Indexing

The indexing pipeline is the foundation everything else is built on. Without an understanding of the codebase, neither the copilot nor next-edit prediction can provide meaningful suggestions. The pipeline has two modes: an initial bulk scan that syncs the full workspace, and a real-time watcher that streams incremental updates as you edit.

CLIENTPROCESSINGSTORAGEgRPCFile WatcherTauri eventsEditor UISearch interfaceChunkertree-sitter ASTEmbedderllama.cppJina Code v21.5B paramsChromaDBVector storeRedisMetadataIndexQuery
Semantic code search pipeline
Animated pipeline showing data flow from file watcher through to vector storage.

File changes are detected by the notify crate (FSEvents on macOS, inotify on Linux) with a 500ms debounce window. Events pass through a filtering layer — gitignore rules, common exclusions like node_modules, and SHA-256 content hashing to skip unchanged files. Valid events flow into a tokio mpsc channel (capacity 1,000) that feeds the Python backend, where tree-sitter extracts semantic code chunks and Jina Code generates embeddings.

Watching...
file watcher
Watching
fleur
Simulated file events flowing through the watcher — filtered files are dropped, valid changes are queued for indexing.
PREDICTION

Next Edit Prediction

Next-edit prediction is one of the more compelling AI features in modern editors. Rather than waiting for you to ask for help, the editor observes your editing patterns and proactively suggests what you'll type next. Fleur uses the Sweep Next-Edit 1.5B model (quantized GGUF, running locally via llama.cpp with Metal GPU acceleration) to generate predictions.

The flow works like this: after a 2-second typing pause, the editor collects the original saved file, current editor state, up to 3 context files from open tabs, and your recent diffs. These are assembled into a structured prompt and sent to the model. The predicted file is diffed against the current content line-by-line, and the first changed region is surfaced as Monaco ghost text — the same mechanism VS Code uses for Copilot. Press Tab to accept, Escape to dismiss.

Editing...
editor.py
Next Edit
1def calculate_total(items):
2 total = sum(i.price for i in items)
3
fleur
The editor detects a typing pause, runs inference, and surfaces ghost text inline — Tab to accept.
FRONTENDTAURI BRIDGEPYTHON SERVERgRPCMonaco EditorInlineCompletionsDiff TrackerRecent changesTauri Commandgrpc_predictgRPC ClientRust tonicNextEditService:50051Prompt BuilderSweep formatNext-Edit 1.5Bllama.cpp MetalRequestResponse
Next-edit prediction pipeline
Animated architecture diagram tracing the prediction request from editor to model and back.
AI COPILOT

Copilot

The copilot ties everything together. It calls Azure-hosted models and uses RAG (retrieval-augmented generation) to pull relevant code chunks from the indexed codebase, providing the language model with precise context rather than dumping entire files into the prompt. This means the copilot can answer questions about code it hasn't directly seen — as long as it's semantically related to the query.

FRONTENDTAURI BRIDGEPYTHON BACKENDgRPC≤20 iterChat PanelReact UIEvent ListenerStreaming eventsDiff ViewerFile approvalTauri Commandgrpc_chatgRPC ClientStreaming RPCCopilotServiceAgentic loopLLM ClientGPT-5.2 / Grok 3Tool Executor6 toolsSQLiteConversationsRequestLoopStream
AI copilot — agentic tool-calling loop
The copilot pipeline — query embedding, semantic retrieval from ChromaDB, and prompt assembly for the Azure-hosted model.

Every copilot interaction is logged in real time — requests, retrieved context chunks, model responses, and latency. This observability layer was critical during development for debugging retrieval quality and prompt construction.

Fleur copilot logging dashboard
Live logging view tracking copilot requests, retrieved code context, and model responses.

The copilot generates proposed changes that the user reviews inline before accepting or declining — no code is applied automatically. This review step keeps the developer in control while still benefiting from multi-file reasoning.

Fleur AI Copilot in action
The copilot suggests code changes — the user reviews each proposal and accepts or declines before anything is applied.
KEY DECISIONS

Design Decisions & Trade-offs

Local models where possible, cloud where it matters

Embeddings (Jina Code) and next-edit prediction (Sweep 1.5B) run locally for low latency and privacy. The copilot uses Azure-hosted models for the reasoning capability that agentic workflows demand. This hybrid approach keeps the fast feedback loops local while leveraging cloud inference for the tasks that benefit most from larger models.

Streaming over batching

The indexing pipeline uses streaming gRPC rather than batch processing. File events flow through tokio channels in FIFO order with no priority system. This keeps the architecture simple and latency low for real-time edits, at the cost of not being able to prioritize currently-open files.

Tree-sitter for chunking

Rather than naive line-based splitting, Fleur uses tree-sitter AST parsing to extract semantic units — functions, classes, methods. This produces more meaningful embeddings since each chunk represents a coherent code concept rather than an arbitrary line range.

Rust + Python split

File watching, event processing, and the Tauri shell are in Rust for performance. AI inference and vector storage are in Python for ecosystem access (llama.cpp bindings, ChromaDB, tree-sitter). gRPC bridges the two, allowing each layer to use the best tool for the job.


REFLECTION

What I Learned

Building Fleur end-to-end — from OS-level file watchers to GPU-accelerated model inference — taught me a few things that wouldn't have been obvious from just using these tools.

The hard part isn't the model — it's the context.

Model inference is relatively straightforward. The real engineering challenge is figuring out what context to feed the model and when. For next-edit prediction, sending too much context overwhelms the model; too little makes predictions useless. Getting the prompt format, diff history window, and debounce timing right took far more iteration than model integration.

Embeddings are surprisingly powerful for code.

Semantic search over code embeddings works remarkably well — even across different languages. A query about "authentication middleware" correctly surfaces relevant handlers in both TypeScript and Python. The quality of chunks matters far more than the embedding model; tree-sitter chunking made a dramatic difference compared to naive splitting.

Cross-language systems are humbling.

Coordinating TypeScript, Rust, and Python across gRPC boundaries meant debugging in three different paradigms simultaneously. A mismatched protobuf field in Rust would surface as a cryptic empty response in the React UI. The experience gave me much more empathy for the complexity behind tools like VS Code and Cursor.


RELATED