Fleur
An AI-native code editor that showcases how AI should be integrated into development workflows.

MOTIVATION
Why build an AI code editor from scratch?
AI-assisted coding tools have become ubiquitous — GitHub Copilot, Cursor, Windsurf — but most developers interact with them as black boxes. You type, a suggestion appears, and you either accept it or don't. The underlying mechanics of how these tools understand your code, what makes a prediction good, and why certain suggestions appear at certain moments remain opaque.
I wanted to peel back that abstraction. Not just to use AI coding tools, but to build one — understanding every layer from file watching and codebase indexing, through embedding generation and semantic retrieval, to next-edit prediction and agentic code generation.
Fleur is the result: a fully functional AI-native code editor built with Tauri, React, Rust, and Python. It implements the three core capabilities found in modern AI editors — codebase indexing, next-edit prediction, and an AI copilot — each built from scratch with local models, so every piece of the pipeline is visible and hackable.
OVERVIEW
Core Capabilities
Three pillars define how AI integrates into the editing experience. Each one required a different approach to model selection, data flow, and UX.
Codebase Indexing
A file watcher streams changes through a Rust → Python pipeline. Tree-sitter parses ASTs into semantic chunks, Jina Code embeds them into 768-dimensional vectors, and ChromaDB stores them for instant semantic search.
Next Edit Prediction
After a 2-second typing pause, the editor sends your original file, current edits, and recent diffs to Sweep's Next-Edit 1.5B model. The predicted file is diffed line-by-line and surfaced as native Monaco ghost text.
AI Copilot
An agentic assistant powered by Azure-hosted models with full codebase context via RAG retrieval. It can reason over multiple files, generate code, and provide inline suggestions — all backed by semantic search over the indexed codebase.
ARCHITECTURE
Editor Anatomy
Fleur is a desktop application built on Tauri, with a React + Monaco frontend and a Rust backend that communicates with Python AI services over gRPC. This split lets the UI stay fast and responsive while heavy inference runs in separate processes.
INDEXING
Code Indexing
The indexing pipeline is the foundation everything else is built on. Without an understanding of the codebase, neither the copilot nor next-edit prediction can provide meaningful suggestions. The pipeline has two modes: an initial bulk scan that syncs the full workspace, and a real-time watcher that streams incremental updates as you edit.
File changes are detected by the notify crate (FSEvents on macOS, inotify on Linux) with a 500ms debounce window. Events pass through a filtering
layer — gitignore rules, common exclusions like node_modules, and SHA-256 content hashing to skip unchanged files. Valid events flow into a
tokio mpsc channel (capacity 1,000) that feeds the Python backend, where tree-sitter extracts semantic code chunks and Jina Code generates embeddings.
PREDICTION
Next Edit Prediction
Next-edit prediction is one of the more compelling AI features in modern editors. Rather than waiting for you to ask for help, the editor observes your editing patterns and proactively suggests what you'll type next. Fleur uses the Sweep Next-Edit 1.5B model (quantized GGUF, running locally via llama.cpp with Metal GPU acceleration) to generate predictions.
The flow works like this: after a 2-second typing pause, the editor collects the original saved file, current editor state, up to 3 context files from open tabs, and your recent diffs. These are assembled into a structured prompt and sent to the model. The predicted file is diffed against the current content line-by-line, and the first changed region is surfaced as Monaco ghost text — the same mechanism VS Code uses for Copilot. Press Tab to accept, Escape to dismiss.
AI COPILOT
Copilot
The copilot ties everything together. It calls Azure-hosted models and uses RAG (retrieval-augmented generation) to pull relevant code chunks from the indexed codebase, providing the language model with precise context rather than dumping entire files into the prompt. This means the copilot can answer questions about code it hasn't directly seen — as long as it's semantically related to the query.
Every copilot interaction is logged in real time — requests, retrieved context chunks, model responses, and latency. This observability layer was critical during development for debugging retrieval quality and prompt construction.

The copilot generates proposed changes that the user reviews inline before accepting or declining — no code is applied automatically. This review step keeps the developer in control while still benefiting from multi-file reasoning.

KEY DECISIONS
Design Decisions & Trade-offs
Local models where possible, cloud where it matters
Embeddings (Jina Code) and next-edit prediction (Sweep 1.5B) run locally for low latency and privacy. The copilot uses Azure-hosted models for the reasoning capability that agentic workflows demand. This hybrid approach keeps the fast feedback loops local while leveraging cloud inference for the tasks that benefit most from larger models.
Streaming over batching
The indexing pipeline uses streaming gRPC rather than batch processing. File events flow through tokio channels in FIFO order with no priority system. This keeps the architecture simple and latency low for real-time edits, at the cost of not being able to prioritize currently-open files.
Tree-sitter for chunking
Rather than naive line-based splitting, Fleur uses tree-sitter AST parsing to extract semantic units — functions, classes, methods. This produces more meaningful embeddings since each chunk represents a coherent code concept rather than an arbitrary line range.
Rust + Python split
File watching, event processing, and the Tauri shell are in Rust for performance. AI inference and vector storage are in Python for ecosystem access (llama.cpp bindings, ChromaDB, tree-sitter). gRPC bridges the two, allowing each layer to use the best tool for the job.
REFLECTION
What I Learned
Building Fleur end-to-end — from OS-level file watchers to GPU-accelerated model inference — taught me a few things that wouldn't have been obvious from just using these tools.
The hard part isn't the model — it's the context.
Model inference is relatively straightforward. The real engineering challenge is figuring out what context to feed the model and when. For next-edit prediction, sending too much context overwhelms the model; too little makes predictions useless. Getting the prompt format, diff history window, and debounce timing right took far more iteration than model integration.
Embeddings are surprisingly powerful for code.
Semantic search over code embeddings works remarkably well — even across different languages. A query about "authentication middleware" correctly surfaces relevant handlers in both TypeScript and Python. The quality of chunks matters far more than the embedding model; tree-sitter chunking made a dramatic difference compared to naive splitting.
Cross-language systems are humbling.
Coordinating TypeScript, Rust, and Python across gRPC boundaries meant debugging in three different paradigms simultaneously. A mismatched protobuf field in Rust would surface as a cryptic empty response in the React UI. The experience gave me much more empathy for the complexity behind tools like VS Code and Cursor.
RELATED