Turn Your Browser Into a Voice Assistant with SpeechRecognition

Posted by

Build a real voice-controlled web app using only JavaScript, no libraries, no APIs, noย backend.

Build a real voice-controlled web app using only JavaScript, no libraries, no APIs, no backend.

Introduction

We all use voice assistants like Siri, Alexa, and Google Assistant every day, but what if your browser could do the same?

It turns out, modern browsers already have a built-in way to understand speech through the SpeechRecognition API (also called the Web Speech API).

With just a few lines of JavaScript, you can make your web app listen, transcribe, and respond all locally, no cloud processing or API keys required.

In this post, Iโ€™ll show you how to turn your browser into a simple, fully functional voice assistant.

Youโ€™ll learn:

  • How the SpeechRecognition API works
  • How to build a listening interface
  • How to trigger voice commands
  • How to make the app talk back using SpeechSynthesis

Letโ€™s give your browser a voice.


1. What Is the SpeechRecognition API?

The SpeechRecognition API lets you capture audio from a userโ€™s microphone and transcribe it into text.

Itโ€™s part of the Web Speech API and supported by most Chromium-based browsers (Chrome, Edge, Brave, Opera) and partially by Safari.

Think of it like your browserโ€™s built-in speech-to-text engine, event-driven, fast, and surprisingly accurate.

You can combine it with other web APIs to create interactive, voice-driven experiences.


2. Basic Setup

First, check if your browser supports it and create an instance:

const SpeechRecognition =
window.SpeechRecognition || window.webkitSpeechRecognition;

const recognition = new SpeechRecognition();

Then, configure how it listens:

recognition.lang = 'en-US';         // Language
recognition.continuous = false; // Stop after one phrase
recognition.interimResults = false; // Only final results

To start listening:

recognition.start();

Thatโ€™s all it takes to begin capturing voice input.


3. Listening and Displaying Speech

Letโ€™s build a minimal interface that shows what you say:

<button id="start">๐ŸŽ™๏ธ Start Listening</button>
<p id="output">Say something...</p>

<script>
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const recognition = new SpeechRecognition();

recognition.lang = 'en-US';
recognition.interimResults = false;

const output = document.getElementById('output');
const start = document.getElementById('start');

start.addEventListener('click', () => {
recognition.start();
output.textContent = 'Listening...';
});

recognition.onresult = (e) => {
const text = e.results[0][0].transcript;
output.textContent = `You said: ${text}`;
console.log('Recognized:', text);
};
</script>

Click the button, speak into your mic, and see your words appear on-screen. No APIs, no backend, no latency.


4. Add Voice Commands

Now that we can listen, letโ€™s make the browser do something when it hears certain words.

Hereโ€™s a simple command system:

recognition.onresult = (event) => {
const text = event.results[0][0].transcript.toLowerCase();

if (text.includes('dark mode')) {
document.body.style.background = '#111';
document.body.style.color = '#fff';
speak('Dark mode activated.');
} else if (text.includes('light mode')) {
document.body.style.background = '#fff';
document.body.style.color = '#000';
speak('Light mode on.');
} else if (text.includes('scroll down')) {
window.scrollBy(0, 400);
speak('Scrolling down.');
} else if (text.includes('scroll up')) {
window.scrollBy(0, -400);
speak('Scrolling up.');
} else {
speak(`You said ${text}`);
}
};

Now your browser doesnโ€™t just listen, it responds to real commands.


5. Making It Talk Back

The SpeechSynthesis API lets the browser speak text aloud.

Itโ€™s built-in, just like SpeechRecognition.

function speak(text) {
const utterance = new SpeechSynthesisUtterance(text);
speechSynthesis.speak(utterance);
}

Every time a command runs, you can have your browser reply naturally.

Try saying โ€œdark mode,โ€ your page changes, and the browser responds, โ€œDark mode activated.โ€

Feels futuristic, doesnโ€™t it?


6. Continuous Listening

By default, SpeechRecognition stops after one phrase.
 If you want an always-on assistant, set it to continuous:

recognition.continuous = true;
recognition.onend = () => recognition.start();

This makes it automatically restart after each phrase, great for dashboard assistants or ongoing voice sessions.

Just remember to include a Stop button for user control.


7. Handle Errors Gracefully

Always add error handlers to improve reliability:

recognition.onerror = (e) => {
console.error('Speech error:', e.error);
speak('Sorry, I didnโ€™t catch that.');
};

And for extra UX polish, show real-time feedback (โ€œListeningโ€ฆโ€, โ€œProcessingโ€ฆโ€, etc.) so users know whatโ€™s happening.


8. Browser Support

Hereโ€™s where it works today:

Youโ€™ll also need to serve your app over HTTPS; microphone access requires a secure context.


9. Expanding Your Assistant

Once the basics work, you can extend them easily:

  • Navigation Commands: โ€œGo to homeโ€, โ€œOpen settingsโ€
  • Form Input: Dictate messages or comments by voice
  • Smart Actions: Integrate APIs (โ€œWhatโ€™s the weather today?โ€)
  • Accessibility: Allow voice-based site navigation for users who canโ€™t type

You can even connect it with the Fetch API to answer real-world questions from an external service.


10. Example: Tiny Voice Assistant

Hereโ€™s everything tied together:

<button id="start">๐ŸŽ™๏ธ Activate Assistant</button>
<p id="output">Say: "Dark mode" or "Scroll down"</p>

<script>
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const recognition = new SpeechRecognition();
recognition.lang = 'en-US';
recognition.interimResults = false;

function speak(text) {
const utter = new SpeechSynthesisUtterance(text);
speechSynthesis.speak(utter);
}

recognition.onresult = (e) => {
const command = e.results[0][0].transcript.toLowerCase();

if (command.includes('dark')) {
document.body.style.background = '#111';
document.body.style.color = '#fff';
speak('Dark mode activated.');
} else if (command.includes('light')) {
document.body.style.background = '#fff';
document.body.style.color = '#000';
speak('Light mode on.');
} else {
speak(`You said ${command}`);
}
};

recognition.onerror = (e) => speak('Error: ' + e.error);

document.getElementById('start').addEventListener('click', () => {
recognition.start();
speak('Voice assistant ready.');
});
</script>

Now you have your own browser-based voice assistant that listens, understands, and talks back.


Conclusion

You donโ€™t need TensorFlow, OpenAI, or cloud APIs to make your web app voice-driven.
All you need is the SpeechRecognition and SpeechSynthesis APIs already sitting inside your browser.

In under 60 lines of JavaScript, your site can recognize commands, control the page, and even respond like a real assistant.

Pro Tip: Combine SpeechRecognition with BroadcastChannel and localStorage to sync commands across tabs for a fully integrated, multi-tab voice experience.


Call to Action

Have you ever tried building a voice-controlled app?
Share your experiments in the comments and bookmark this post for your next weekend project.

Leave a Reply

Your email address will not be published. Required fields are marked *