Build a real voice-controlled web app using only JavaScript, no libraries, no APIs, noย backend.

Introduction
We all use voice assistants like Siri, Alexa, and Google Assistant every day, but what if your browser could do the same?
It turns out, modern browsers already have a built-in way to understand speech through the SpeechRecognition API (also called the Web Speech API).
With just a few lines of JavaScript, you can make your web app listen, transcribe, and respond all locally, no cloud processing or API keys required.
In this post, Iโll show you how to turn your browser into a simple, fully functional voice assistant.
Youโll learn:
- How the SpeechRecognition API works
- How to build a listening interface
- How to trigger voice commands
- How to make the app talk back using SpeechSynthesis
Letโs give your browser a voice.
1. What Is the SpeechRecognition API?
The SpeechRecognition API lets you capture audio from a userโs microphone and transcribe it into text.
Itโs part of the Web Speech API and supported by most Chromium-based browsers (Chrome, Edge, Brave, Opera) and partially by Safari.
Think of it like your browserโs built-in speech-to-text engine, event-driven, fast, and surprisingly accurate.
You can combine it with other web APIs to create interactive, voice-driven experiences.
2. Basic Setup
First, check if your browser supports it and create an instance:
const SpeechRecognition =
window.SpeechRecognition || window.webkitSpeechRecognition;
const recognition = new SpeechRecognition();
Then, configure how it listens:
recognition.lang = 'en-US'; // Language
recognition.continuous = false; // Stop after one phrase
recognition.interimResults = false; // Only final results
To start listening:
recognition.start();
Thatโs all it takes to begin capturing voice input.
3. Listening and Displaying Speech
Letโs build a minimal interface that shows what you say:
<button id="start">๐๏ธ Start Listening</button>
<p id="output">Say something...</p>
<script>
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const recognition = new SpeechRecognition();
recognition.lang = 'en-US';
recognition.interimResults = false;
const output = document.getElementById('output');
const start = document.getElementById('start');
start.addEventListener('click', () => {
recognition.start();
output.textContent = 'Listening...';
});
recognition.onresult = (e) => {
const text = e.results[0][0].transcript;
output.textContent = `You said: ${text}`;
console.log('Recognized:', text);
};
</script>
Click the button, speak into your mic, and see your words appear on-screen. No APIs, no backend, no latency.
4. Add Voice Commands
Now that we can listen, letโs make the browser do something when it hears certain words.
Hereโs a simple command system:
recognition.onresult = (event) => {
const text = event.results[0][0].transcript.toLowerCase();
if (text.includes('dark mode')) {
document.body.style.background = '#111';
document.body.style.color = '#fff';
speak('Dark mode activated.');
} else if (text.includes('light mode')) {
document.body.style.background = '#fff';
document.body.style.color = '#000';
speak('Light mode on.');
} else if (text.includes('scroll down')) {
window.scrollBy(0, 400);
speak('Scrolling down.');
} else if (text.includes('scroll up')) {
window.scrollBy(0, -400);
speak('Scrolling up.');
} else {
speak(`You said ${text}`);
}
};
Now your browser doesnโt just listen, it responds to real commands.
5. Making It Talk Back
The SpeechSynthesis API lets the browser speak text aloud.
Itโs built-in, just like SpeechRecognition.
function speak(text) {
const utterance = new SpeechSynthesisUtterance(text);
speechSynthesis.speak(utterance);
}
Every time a command runs, you can have your browser reply naturally.
Try saying โdark mode,โ your page changes, and the browser responds, โDark mode activated.โ
Feels futuristic, doesnโt it?
6. Continuous Listening
By default, SpeechRecognition stops after one phrase.
If you want an always-on assistant, set it to continuous:
recognition.continuous = true;
recognition.onend = () => recognition.start();
This makes it automatically restart after each phrase, great for dashboard assistants or ongoing voice sessions.
Just remember to include a Stop button for user control.
7. Handle Errors Gracefully
Always add error handlers to improve reliability:
recognition.onerror = (e) => {
console.error('Speech error:', e.error);
speak('Sorry, I didnโt catch that.');
};
And for extra UX polish, show real-time feedback (โListeningโฆโ, โProcessingโฆโ, etc.) so users know whatโs happening.
8. Browser Support
Hereโs where it works today:

Youโll also need to serve your app over HTTPS; microphone access requires a secure context.
9. Expanding Your Assistant
Once the basics work, you can extend them easily:
- Navigation Commands: โGo to homeโ, โOpen settingsโ
- Form Input: Dictate messages or comments by voice
- Smart Actions: Integrate APIs (โWhatโs the weather today?โ)
- Accessibility: Allow voice-based site navigation for users who canโt type
You can even connect it with the Fetch API to answer real-world questions from an external service.
10. Example: Tiny Voice Assistant
Hereโs everything tied together:
<button id="start">๐๏ธ Activate Assistant</button>
<p id="output">Say: "Dark mode" or "Scroll down"</p>
<script>
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const recognition = new SpeechRecognition();
recognition.lang = 'en-US';
recognition.interimResults = false;
function speak(text) {
const utter = new SpeechSynthesisUtterance(text);
speechSynthesis.speak(utter);
}
recognition.onresult = (e) => {
const command = e.results[0][0].transcript.toLowerCase();
if (command.includes('dark')) {
document.body.style.background = '#111';
document.body.style.color = '#fff';
speak('Dark mode activated.');
} else if (command.includes('light')) {
document.body.style.background = '#fff';
document.body.style.color = '#000';
speak('Light mode on.');
} else {
speak(`You said ${command}`);
}
};
recognition.onerror = (e) => speak('Error: ' + e.error);
document.getElementById('start').addEventListener('click', () => {
recognition.start();
speak('Voice assistant ready.');
});
</script>
Now you have your own browser-based voice assistant that listens, understands, and talks back.
Conclusion
You donโt need TensorFlow, OpenAI, or cloud APIs to make your web app voice-driven.
All you need is the SpeechRecognition and SpeechSynthesis APIs already sitting inside your browser.
In under 60 lines of JavaScript, your site can recognize commands, control the page, and even respond like a real assistant.
Pro Tip: Combine SpeechRecognition with BroadcastChannel and localStorage to sync commands across tabs for a fully integrated, multi-tab voice experience.
Call to Action
Have you ever tried building a voice-controlled app?
Share your experiments in the comments and bookmark this post for your next weekend project.


Leave a Reply