Illustrated editorial-style artwork featuring two Wispr Flow founders

How Wispr Flow Won the Most Crowded Corner of AI and a $2 Billion Valuation

What Wispr Flow got right sits one layer above it, in whether you trust the text enough to send it without looking.

NervNow · Voice AIThe Profile

· Wispr Flow

How Wispr Flow Won the Most Crowded Corner of AI

Voice dictation is a commodity. Apple and Google give it away, OpenAI’s Whisper is free, and a dozen startups undercut each other to sell it. Wispr Flow built the one thing none of them have, a habit, and rode it from a scrapped neural wristband to a product used inside 270 of the Fortune 500 and now in talks to raise at close to a $2 billion valuation. Here is what it got right.

2021Founded in San Francisco
~$2BValuation in talks, May 2026 (not closed)
72%Characters written by voice after six months
270Fortune 500 companies using Flow

In November, Wispr was worth about $700 million. By May, according to Bloomberg, the company was in talks to raise roughly $260 million at a valuation close to $2 billion, with Menlo Ventures leading again. The round has not closed, so treat the figure as a marker rather than a fact. The direction is the story. That is a near-tripling of the company’s price in about six months, on a product whose core function takes one sentence to describe. You hold a key, you speak, and finished text appears in whatever app you already had open.

Voice dictation is one of the most crowded corners of consumer AI. Superwhisper, Aqua Voice, Willow, Voibe, Typeless, MacWhisper and a long tail of smaller apps all convert speech to text. Apple and Google build the feature into every phone for free. OpenAI’s Whisper model is open and costs nothing to run. When the basic capability is a commodity, the obvious question is why anyone would pay for Wispr Flow when a competent version ships with the operating system. The answer is in one number the company reports. After six months, the average Flow user writes 72% of their characters by voice, across nearly 70 apps and sites. People do not build that kind of habit around a feature they tolerate.

Part I

The product was the part they almost threw away

Wispr did not set out to make a dictation app. Tanay Kothari and Sahaj Garg, roommates at Stanford, founded the company in 2021 to build a non-invasive wearable that could read silent speech. The device used sensors to pick up the faint muscle and nerve signals a person produces when mouthing words without making a sound, then turned those signals into text. Patents filed by the company describe electromyography sensors monitoring the electrical activity of activated muscles. The pitch was a wristband you could think at, more or less, with your phone responding as if you had typed.

The technology worked. The market did not exist. Body-signal interfaces in 2023 needed a level of user trust, hardware miniaturization and regulatory clearance that no consumer product could reach at the time. Kothari has since described it as a product for 2030 being sold in 2024. The team raised seed money, spent more than two years in research, and cycled through sensor arrays and form factors without finding the version that would sell.

To test the hardware, they had built a small piece of software, a dictation layer that took transcribed speech, stripped the filler words, fixed the grammar and formatted the result. It was meant to be the companion app for the wristband. Somewhere in the testing, the founders noticed that the companion app had a pull the hardware never did. People wanted the software on its own. The pivot was severe. The team shrank from roughly 40 people to a handful, and Kothari gave the survivors six weeks to ship. On October 1, 2024, they launched the Mac app on Product Hunt and, by his account, finished first for both the day and the week.

From a neural wristband to a $2 billion round
WhenMilestoneWhy it matters
2021Wispr founded in SFSet out to build a silent-speech wearable, not an app.
2021–24Two-plus years on hardwareAn EMG wristband reads subvocalized speech; the market is not ready.
Mid-2024Pivot to softwareThe companion dictation app becomes the product; the team is cut to a handful.
Oct 2024Wispr Flow for MacLaunches on Product Hunt; finishes No. 1 for the day and the week.
Mar 2025WindowsSecond platform, only after Mac proves retention.
Jun 2025iOS; $30M Series AMenlo Ventures leads; total funding reaches about $56M.
Nov 2025$25M extensionNotable Capital leads; about $700M valuation; roughly $81M raised in all.
Feb 2026AndroidLaunches with the first voice model built to handle Hinglish.
May 2026$2B talksIn talks to raise about $260M at close to $2B, per Bloomberg (not closed).
Part II

The repeat founder and the engineer behind the models

Kothari is an unusual operator to bet a category on, mostly because he has spent his whole life shipping things. He grew up in Delhi and, by his own account, taught himself to code around age 10, sleeping every other night for a couple of years so he could build through the off nights. As a teenager at Delhi Public School, R.K. Puram, he won a bronze medal at the 2015 International Olympiad in Informatics in Kazakhstan. He studied computer science and AI at Stanford, ran the university’s venture capital club for three years, worked as a teaching assistant on the deep learning course taught by Andrew Ng, and published research at the Stanford AI Lab. Before Wispr he built a music discovery platform, Convert, that reached 2.5 million monthly users with no marketing spend, and co-founded an e-commerce personalization startup, FeatherX, that was acquired by Cerebra Technologies. Forbes named him to its 30 Under 30 list, in the consumer technology category, in 2023. His stated childhood ambition, which he still repeats, was to build a real-life version of Tony Stark’s JARVIS.

Garg, Kothari’s college roommate and the company’s chief technology officer, is the reason the technology holds up. He received Stanford’s Henry Ford II Scholar Award, the highest academic honor in its School of Engineering, and was a research assistant at the Stanford AI Lab, where he published in generative modeling, with separate research at Google and in computational neuroscience. Before Wispr he was the fifth employee and AI lead at Luminous Computing, a deep-tech startup building photonic hardware to relieve the bottlenecks in large-scale AI systems, the kind of problem that teaches a person to wring latency out of a system. That background maps onto Wispr’s hardest problem directly. Owning the speech models, adapting them to each user and returning clean text in half a second is a modeling and infrastructure job, and it is his.

Part III

The crowded field, and where Flow sits in it

The category Wispr chose is brutal. Speech-to-text is a solved problem at the level of raw accuracy, which means dozens of products compete on the same ground, and the biggest companies in the world give a version away for nothing. The field sorts roughly into four groups. Switch between them below.

ToolApproachNote
superwhisperMac, on-devicePrivacy-first positioning; processing runs locally.
MacWhisperMac, on-deviceBuilt on OpenAI’s open Whisper model.
VoiceInkOn-deviceOpen-source local dictation for desktop.
ToolApproachNote
Aqua VoiceCloudDictation built on its own models.
Willow VoiceCloudY Combinator-backed dictation app.
TypelessCloudReached Android shortly before Flow did.
VoibeCloudCross-platform dictation tool.
ProviderApproachNote
Apple Dictation / SiriBuilt inShips free on every Mac and iPhone.
Google (Gboard)Built inFree voice typing across Android.
OpenAI WhisperOpen modelFree to run and powers many of the rivals above.
EdgeWhat it means
Own modelsBuilt in-house and tuned to the individual user, rather than wrapping an off-the-shelf model.
Zero-edit outputFiller removed, grammar fixed, formatting matched to the app you are in.
Every platformMac, Windows, iOS and Android, working inside any app rather than one operating system.
Enterprise-readyCertifications that clear a corporate security review where most rivals cannot.
Part IV

The thing they sell is the absence of editing

Here is the insight that separates Wispr from the field. Most dictation tools sell transcription and measure themselves on word error rate. Wispr sells the moment after transcription, the moment you decide whether to fix the text or send it. The company has a name for its target, the zero-edit standard, the point at which a user trusts the output enough to hit enter without reading it. Notable Capital, which led the November round, points to a single behavioral stat as the reason it invested. Flow users press enter about half a second after the text appears. That is not someone proofreading. That is someone who has stopped checking.

Most rivals sell transcription. Wispr sells the half-second where the user stops proofreading and presses send.

Getting there is a full-stack problem, and Wispr treats it as one. Most rivals route audio through an off-the-shelf model such as Whisper. Garg’s team built Wispr’s own speech recognition and tunes it per user. The company reports a 10% error rate against 27% for OpenAI’s Whisper and 47% for Apple’s built-in dictation. Those figures come from Wispr’s own testing and have not been independently verified, so they are best read as the company’s claim. The logic behind them holds up better than any single number. When you own the whole voice stack, you can adapt it to how a specific person speaks, learn their vocabulary and phrasing, and format the output in ways a general model will not.

The cleanup is where Flow earns the habit. It removes the ums and false starts, punctuates, and matches the formatting to the app you are in, so a Slack message looks like a Slack message and a code comment looks like a code comment. A command mode lets you select text and revise it by voice, telling it to make a paragraph more formal or turn a list into prose. A personal dictionary handles names and jargon. There is a whispering mode for dictating without disturbing an open-plan office, and the whole thing runs off a single held hotkey. The friction is low enough that people forget they are using a tool, which is the condition under which a tool becomes a reflex.

Two approaches to dictation
DimensionWispr FlowThe typical rival
ModelOwn speech models, tuned per userOff-the-shelf, often Whisper
OutputFiller removed, formatted to the appRaw transcript you clean up
ReachAny app, four platformsOne OS or a browser plug-in
BuyerConsumer and enterpriseMostly consumer
AmbitionA voice interface for computingA dictation feature
Part V

The distribution channel was the customers

Wispr’s growth engine is one most startups cannot copy, because it depends on the product being good enough that powerful people use it unprompted. The early adopters were venture capitalists. Kothari has said that effectively every tier-one fund in the valley started using Flow for emails and memos, got hooked, and generated the inbound that led to the funding. Reid Hoffman, the LinkedIn co-founder, has publicly called himself voicepilled, and Wispr features him among its named users, none of them paid. When a venture partner uses your product daily, they introduce it to their portfolio, they mention it to other partners, and eventually they wire you money, and Wispr got all three. Menlo’s Matt Kraning was an angel investor and a daily user before he led the Series A and took a board seat.

The company then turned an investor into a megaphone. As part of the November round, Steven Bartlett’s Flight Fund came in alongside a year-long partnership with The Diary of a CEO, one of the largest podcasts in the world. That is direct access to an audience of ambitious professionals, close to an exact match for the people Flow wants dictating their email. The expansion itself has been deliberate to the point of looking slow. Mac came first, then Windows, then iOS, and Android did not arrive until February 2026. The company shipped each platform only after the previous one proved retention, and never launched two at once. For engineers, Wispr built extensions for Cursor, VS Code and Warp that let developers tag files and run commands by voice, and a share of its early enterprise pull came from technical teams at companies like Nvidia and Amazon.

To advertise with us, write to nervnow.com/contact
Part VI

The enterprise moat, and the way people actually talk

The reason Flow is inside 270 of the Fortune 500, signing a reported 125 new corporate customers a week, has less to do with the demo and more to do with paperwork. The company carries SOC 2 Type II and ISO 27001 certification, offers HIPAA terms across plans, supports single sign-on, and ships a privacy mode that retains no dictation data. For a tool that needs deep system permissions to work across every app, those credentials are what get it past a security review at a bank or a hospital. Most of the cheaper rivals do not have them, and that gap is harder to close than a few points of accuracy.

The global picture surprises people. Flow supports more than 100 languages, and about 60% of dictation happens in something other than English, with Spanish, French, German, Dutch, Hindi and Mandarin leading. The clearest sign of where the company is aiming came with Android, which launched with the first voice model that handles Hinglish, the code-switched mix of Hindi and English that hundreds of millions of Indians speak, written in Roman script instead of Devanagari. Kothari, whose own family chats slide between the two languages, said it was one of those times he had to build something for himself. On Android the product takes a different shape, a floating overlay that sits above every app rather than a keyboard, a decision made to survive Android’s fragmentation. The waitlist reached 375,000 people before a line of shipping code existed.

Part VII

The race is against Apple and Google

The threat that matters is the one Kothari raised himself as his reason for taking money he did not need, which is Big Tech distribution. Apple could ship a far better Siri dictation in a single update. Google could fold Flow-grade intelligence into Gboard for free. Wispr’s whole strategy is a race to build enough habit and enough enterprise lock-in to be irreplaceable before that happens.

The giants

Apple and Google can put competent voice typing in front of billions of people overnight, at no cost. Distribution is their weapon, and it is the one Wispr cannot match.

A target, not a fact

The roughly $2 billion figure reflects a round still in talks, priced on growth and retention rather than profit. It is a bet that the curve of the last year holds.

Everything rests on the habit

The moat is that people stop typing. If retention slips and users drift back to the keyboard, the case for the company slips with it.

Wispr’s framing for what comes next is that dictation was only the wedge. Kothari’s line is that voice never reached its potential because the industry kept treating it as a feature inside other products instead of as an interface in its own right. The roadmap points at an ambient voice layer that does more than transcribe, an assistant that handles tasks, drafts replies and clears the small administrative work that fills a knowledge worker’s day, plus a closed API for hardware partners that brings the original wristband dream back through a different door.

For now the clearest evidence is the usage curve. More than 100 million words a week are spoken through Flow, and its users, after half a year, type less than a third of their text by hand. The keyboard is roughly 150 years old. For the people who have fully switched, it is already most of the way to a museum piece.

Get the next NervNow deep dive in your inbox

Reporting on the companies, people and money shaping enterprise AI, sent direct.

Subscribe to the newsletter
Sources & method

Researched and written by NervNow Editorial, reflecting information available as of June 2026. Funding history, the in-talks $2 billion round and investor details are drawn from Bloomberg, TechCrunch and company announcements; the round had not closed as of mid-May 2026 reporting. Founder biographies and several product details are drawn from Wispr’s own media kit, which lists total funding at an earlier $56 million. Usage and performance figures, including the 72% voice-usage rate, the roughly 80% six-month retention, the 100x year-over-year growth and the 10% versus 27% versus 47% accuracy comparison, are reported by Wispr and have not been independently verified. The Product Hunt ranking and the team-size figures reflect Kothari’s own public accounts. While every effort has been made to ensure accuracy, figures may vary across sources or change after publication. To flag a correction, write to editorial@nervnow.com.

Avatar photo
NervNow Editorial

Leave a Reply

Your email address will not be published. Required fields are marked *

Stay updated with NervNow Weekly

Subscribe now