Priyanshu Mahey Portfolio

IV - Voice to Text

Voice is a very important medium in modern apps. Not only for accessibility purposes, but also for improving experiences overall. For full honesty, I find myself not using voice modes often. My goal for this project was not only to build a voice to text system but also to understand why people opt for voice based systems. This project taught me a lot about how voice based sytems work, introduced me to on device audio models and helped me understand why tools like this exist.

IV is simple voice to text application. Speaking into it, you get live waveform visualizations and then whereever your mouse is currently active, the text you speak will be pasted.

Listening...

How It Works

Press and hold to start recording
The waveform shows your audio input in real-time
Release to begin transcription
Text is automatically copied to your clipboard

Build Overview

IV is built with Rust and TypeScript. The primary frameworks used are Tauri (for building desktop applications) and Vite/React (frontend). The application enables users to choose which voice model they want. Currently, I've added support for Parakeet v3 and OpenAI's Whisper API. Parakeet runs locally and requires only a good CPU. To get a deeper dive, the github repo is provided below:

priyanshumahey/ivGitHub Repository

Rust51.4%

TypeScript42%

CSS5.9%

HTML0.7%

Features

Real-time Waveform Visualization: See your voice as you speak with a responsive audio waveform
Fast Transcription: Powered by modern speech recognition technology
Desktop Native: Built with Tauri for optimal performance and minimal resource usage
Visual Feedback: Distinct visual states for recording, processing, and idle modes

Technical Deep Dive

Start off with system design first

Phase 1 Phase 2 Phase 3 Phase 4

IV - Voice to Text

IV is simple voice to text application. Speaking into it, you get live waveform visualizations and then whereever your mouse is currently active, the text you speak will be pasted.

Listening...

Press and hold to start recording

The waveform shows your audio input in real-time

Release to begin transcription

Text is automatically copied to your clipboard

priyanshumahey/ivGitHub Repository

Rust51.4%

TypeScript42%

CSS5.9%

HTML0.7%

Real-time Waveform Visualization: See your voice as you speak with a responsive audio waveform

Fast Transcription: Powered by modern speech recognition technology

Desktop Native: Built with Tauri for optimal performance and minimal resource usage

Visual Feedback: Distinct visual states for recording, processing, and idle modes

Start off with system design first

Phase 1 Phase 2 Phase 3 Phase 4