Priyanshu Mahey.

IV

A speech-to-text tool that transforms voice into typing.

IV

IV - Voice to Text

Voice is a very important medium in modern apps. Not only for accessibility purposes, but also for improving experiences overall. For full honesty, I find myself not using voice modes often. My goal for this project was not only to build a voice to text system but also to understand why people opt for voice based systems. This project taught me a lot about how voice based sytems work, introduced me to on device audio models and helped me understand why tools like this exist.

IV is simple voice to text application. Speaking into it, you get live waveform visualizations and then whereever your mouse is currently active, the text you speak will be pasted.

Listening...
iv

How It Works

  1. Press and hold to start recording
  2. The waveform shows your audio input in real-time
  3. Release to begin transcription
  4. Text is automatically copied to your clipboard

Build Overview

IV is built with Rust and TypeScript. The primary frameworks used are Tauri (for building desktop applications) and Vite/React (frontend). The application enables users to choose which voice model they want. Currently, I've added support for Parakeet v3 and OpenAI's Whisper API. Parakeet runs locally and requires only a good CPU. To get a deeper dive, the github repo is provided below:

priyanshumahey/ivGitHub Repository
Rust51.4%
TypeScript42%
CSS5.9%
HTML0.7%

Features

  • Real-time Waveform Visualization: See your voice as you speak with a responsive audio waveform
  • Fast Transcription: Powered by modern speech recognition technology
  • Desktop Native: Built with Tauri for optimal performance and minimal resource usage
  • Visual Feedback: Distinct visual states for recording, processing, and idle modes

Technical Deep Dive

Start off with system design first

Phase 1 Phase 2 Phase 3 Phase 4