Priyanshu Mahey Portfolio

TimelineJanuary 2026

StackReact · Rust · Python · gRPC

FocusAI · Code Intelligence · Embeddings

priyanshumahey/fleurGitHub Repository

TypeScript48.7%

Python25%

Rust21.9%

Jupyter Notebook2%

CSS1.1%

JavaScript0.7%

Other0.6%

MOTIVATION

Why build an AI code editor from scratch?

AI-assisted coding tools have become ubiquitous — GitHub Copilot, Cursor, Windsurf — but most developers interact with them as black boxes. You type, a suggestion appears, and you either accept it or don't. The underlying mechanics of how these tools understand your code, what makes a prediction good, and why certain suggestions appear at certain moments remain opaque.

I wanted to peel back that abstraction. Not just to use AI coding tools, but to build one — understanding every layer from file watching and codebase indexing, through embedding generation and semantic retrieval, to next-edit prediction and agentic code generation.

Fleur is the result: a fully functional AI-native code editor built with Tauri, React, Rust, and Python. It implements the three core capabilities found in modern AI editors — codebase indexing, next-edit prediction, and an AI copilot — each built from scratch with local models, so every piece of the pipeline is visible and hackable.

OVERVIEW

Core Capabilities

Three pillars define how AI integrates into the editing experience. Each one required a different approach to model selection, data flow, and UX.

Codebase Indexing

A file watcher streams changes through a Rust → Python pipeline. Tree-sitter parses ASTs into semantic chunks, Jina Code embeds them into 768-dimensional vectors, and ChromaDB stores them for instant semantic search.

Next Edit Prediction

After a 2-second typing pause, the editor sends your original file, current edits, and recent diffs to Sweep's Next-Edit 1.5B model. The predicted file is diffed line-by-line and surfaced as native Monaco ghost text.

AI Copilot

An agentic assistant powered by Azure-hosted models with full codebase context via RAG retrieval. It can reason over multiple files, generate code, and provide inline suggestions — all backed by semantic search over the indexed codebase.

ARCHITECTURE

Editor Anatomy

Fleur is a desktop application built on Tauri, with a React + Monaco frontend and a Rust backend that communicates with Python AI services over gRPC. This split lets the UI stay fast and responsive while heavy inference runs in separate processes.

Editor Anatomy7 layers

Backend

|fleur

TypeScript|Ln 7, Col 22

Copilot

I'll refactor the Editor component to use the new hooks...

read_file→ Editor.tsx

Editor.tsxmodified

- const [code] = useState()

+ const [code] = useEditor()

Ask Copilot...

Terminalzsh

~/fleur $ bun dev

▶ Starting Tauri development server...

Ready on http://localhost:3000

~/fleur $ ▌

Editor.tsx

Chat.tsx

1import { useState } from 'react'

3export function Editor() {

4 const [code, setCode] = useState('')

6 return (

7 <MonacoEditor

8 value={code}

9 language="typescript"

Explorer

src

components

Editor.tsx

Chat.tsx

Terminal.tsx

lib

app

package.json

tsconfig.json

Fleur

Interactive exploded view of the editor — hover to explore each layer of the architecture.

INDEXING

Code Indexing

The indexing pipeline is the foundation everything else is built on. Without an understanding of the codebase, neither the copilot nor next-edit prediction can provide meaningful suggestions. The pipeline has two modes: an initial bulk scan that syncs the full workspace, and a real-time watcher that streams incremental updates as you edit.

Semantic code search pipeline

Animated pipeline showing data flow from file watcher through to vector storage.

File changes are detected by the notify crate (FSEvents on macOS, inotify on Linux) with a 500ms debounce window. Events pass through a filtering layer — gitignore rules, common exclusions like node_modules, and SHA-256 content hashing to skip unchanged files. Valid events flow into a tokio mpsc channel (capacity 1,000) that feeds the Python backend, where tree-sitter extracts semantic code chunks and Jina Code generates embeddings.

Watching...

file watcher

Watching

fleur

Simulated file events flowing through the watcher — filtered files are dropped, valid changes are queued for indexing.

PREDICTION

Next Edit Prediction

Next-edit prediction is one of the more compelling AI features in modern editors. Rather than waiting for you to ask for help, the editor observes your editing patterns and proactively suggests what you'll type next. Fleur uses the Sweep Next-Edit 1.5B model (quantized GGUF, running locally via llama.cpp with Metal GPU acceleration) to generate predictions.

The flow works like this: after a 2-second typing pause, the editor collects the original saved file, current editor state, up to 3 context files from open tabs, and your recent diffs. These are assembled into a structured prompt and sent to the model. The predicted file is diffed against the current content line-by-line, and the first changed region is surfaced as Monaco ghost text — the same mechanism VS Code uses for Copilot. Press Tab to accept, Escape to dismiss.

Editing...

editor.py

Next Edit

1def calculate_total(items):
2    total = sum(i.price for i in items)
3    

fleur

The editor detects a typing pause, runs inference, and surfaces ghost text inline — Tab to accept.

Next-edit prediction pipeline

Animated architecture diagram tracing the prediction request from editor to model and back.

AI COPILOT

Copilot

The copilot ties everything together. It calls Azure-hosted models and uses RAG (retrieval-augmented generation) to pull relevant code chunks from the indexed codebase, providing the language model with precise context rather than dumping entire files into the prompt. This means the copilot can answer questions about code it hasn't directly seen — as long as it's semantically related to the query.

AI copilot — agentic tool-calling loop

The copilot pipeline — query embedding, semantic retrieval from ChromaDB, and prompt assembly for the Azure-hosted model.

Every copilot interaction is logged in real time — requests, retrieved context chunks, model responses, and latency. This observability layer was critical during development for debugging retrieval quality and prompt construction.

Live logging view tracking copilot requests, retrieved code context, and model responses.

The copilot generates proposed changes that the user reviews inline before accepting or declining — no code is applied automatically. This review step keeps the developer in control while still benefiting from multi-file reasoning.

The copilot suggests code changes — the user reviews each proposal and accepts or declines before anything is applied.

KEY DECISIONS

Design Decisions & Trade-offs

Local models where possible, cloud where it matters

Embeddings (Jina Code) and next-edit prediction (Sweep 1.5B) run locally for low latency and privacy. The copilot uses Azure-hosted models for the reasoning capability that agentic workflows demand. This hybrid approach keeps the fast feedback loops local while leveraging cloud inference for the tasks that benefit most from larger models.

Streaming over batching

The indexing pipeline uses streaming gRPC rather than batch processing. File events flow through tokio channels in FIFO order with no priority system. This keeps the architecture simple and latency low for real-time edits, at the cost of not being able to prioritize currently-open files.

Tree-sitter for chunking

Rather than naive line-based splitting, Fleur uses tree-sitter AST parsing to extract semantic units — functions, classes, methods. This produces more meaningful embeddings since each chunk represents a coherent code concept rather than an arbitrary line range.

Rust + Python split

File watching, event processing, and the Tauri shell are in Rust for performance. AI inference and vector storage are in Python for ecosystem access (llama.cpp bindings, ChromaDB, tree-sitter). gRPC bridges the two, allowing each layer to use the best tool for the job.

REFLECTION

What I Learned

Building Fleur end-to-end — from OS-level file watchers to GPU-accelerated model inference — taught me a few things that wouldn't have been obvious from just using these tools.

The hard part isn't the model — it's the context.

Model inference is relatively straightforward. The real engineering challenge is figuring out what context to feed the model and when. For next-edit prediction, sending too much context overwhelms the model; too little makes predictions useless. Getting the prompt format, diff history window, and debounce timing right took far more iteration than model integration.

Embeddings are surprisingly powerful for code.

Semantic search over code embeddings works remarkably well — even across different languages. A query about "authentication middleware" correctly surfaces relevant handlers in both TypeScript and Python. The quality of chunks matters far more than the embedding model; tree-sitter chunking made a dramatic difference compared to naive splitting.

Cross-language systems are humbling.

Coordinating TypeScript, Rust, and Python across gRPC boundaries meant debugging in three different paradigms simultaneously. A mismatched protobuf field in Rust would surface as a cryptic empty response in the React UI. The experience gave me much more empathy for the complexity behind tools like VS Code and Cursor.

Next Edit ModelsVisualizing how AI predicts your next code edits

How AI Indexes Your CodebaseA visual exploration of code embeddings and semantic search

TimelineJanuary 2026

StackReact · Rust · Python · gRPC

FocusAI · Code Intelligence · Embeddings

priyanshumahey/fleurGitHub Repository

TypeScript48.7%

Python25%

Rust21.9%

Jupyter Notebook2%

CSS1.1%

JavaScript0.7%

Other0.6%

MOTIVATION

Why build an AI code editor from scratch?

OVERVIEW

Core Capabilities

Three pillars define how AI integrates into the editing experience. Each one required a different approach to model selection, data flow, and UX.

Codebase Indexing

Next Edit Prediction

AI Copilot

ARCHITECTURE

Editor Anatomy

Editor Anatomy7 layers

Backend

|fleur

TypeScript|Ln 7, Col 22

Copilot

I'll refactor the Editor component to use the new hooks...

read_file→ Editor.tsx

Editor.tsxmodified

- const [code] = useState()

+ const [code] = useEditor()

Ask Copilot...

Terminalzsh

~/fleur $ bun dev

▶ Starting Tauri development server...

Ready on http://localhost:3000

~/fleur $ ▌

Editor.tsx

Chat.tsx

1import { useState } from 'react'

3export function Editor() {

4 const [code, setCode] = useState('')

6 return (

7 <MonacoEditor

8 value={code}

9 language="typescript"

Explorer

src

components

Editor.tsx

Chat.tsx

Terminal.tsx

lib

app

package.json

tsconfig.json

Fleur

Interactive exploded view of the editor — hover to explore each layer of the architecture.

INDEXING

Code Indexing

Semantic code search pipeline

Animated pipeline showing data flow from file watcher through to vector storage.

Watching...

file watcher

Watching

fleur

Simulated file events flowing through the watcher — filtered files are dropped, valid changes are queued for indexing.

PREDICTION

Next Edit Prediction

Editing...

editor.py

Next Edit

1def calculate_total(items):
2    total = sum(i.price for i in items)
3    

fleur

The editor detects a typing pause, runs inference, and surfaces ghost text inline — Tab to accept.

Next-edit prediction pipeline

Animated architecture diagram tracing the prediction request from editor to model and back.

AI COPILOT

Copilot

AI copilot — agentic tool-calling loop

The copilot pipeline — query embedding, semantic retrieval from ChromaDB, and prompt assembly for the Azure-hosted model.

Live logging view tracking copilot requests, retrieved code context, and model responses.

The copilot suggests code changes — the user reviews each proposal and accepts or declines before anything is applied.

KEY DECISIONS

Design Decisions & Trade-offs

Local models where possible, cloud where it matters

Streaming over batching

Tree-sitter for chunking

Rust + Python split

REFLECTION

What I Learned

Building Fleur end-to-end — from OS-level file watchers to GPU-accelerated model inference — taught me a few things that wouldn't have been obvious from just using these tools.

The hard part isn't the model — it's the context.

Embeddings are surprisingly powerful for code.

Cross-language systems are humbling.

Next Edit ModelsVisualizing how AI predicts your next code edits

How AI Indexes Your CodebaseA visual exploration of code embeddings and semantic search