Build Voice Commands in JavaScript with the SpeechRecognition API

Add real voice interaction to your web app, no libraries, no backend, just native browser power.

Introduction

Ever wanted your web app to actually listen to you?

From “search this” to “turn on dark mode,” voice commands are no longer futuristic; they’re already built into modern browsers through the SpeechRecognition API.

The best part? You don’t need any external libraries or cloud services to make it work. Just a few lines of JavaScript can turn your site into an interactive, voice-controlled experience.

In this guide, we’ll walk through:

How SpeechRecognition works
How to set it up in plain JavaScript
Real examples (like search and theme toggles)
Common gotchas and performance tips

Let’s make your app talk and listen.

1. What Is the SpeechRecognition API?

The SpeechRecognition API (also known as the Web Speech API) allows web apps to capture voice input from the user’s microphone, convert it to text, and act on that text.

It’s built right into Chromium-based browsers and some versions of Safari.

Think of it as an event-driven voice-to-text engine that lives in the browser, no external API keys or cloud setup required.

2. Setting It Up

The API is available under two names, depending on the browser:

const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;

Then create a new instance:

const recognition = new SpeechRecognition();

Configuration Options

recognition.lang = 'en-US';       // language
recognition.continuous = false;   // stop after one sentence
recognition.interimResults = true; // show partial results

Finally, start listening:

recognition.start();

3. Basic Example: Log What the User Says

Let’s start with the simplest working demo.

<button id="start">🎙️ Start Listening</button>
<p id="output">Say something...</p>

<script>
  const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
  const recognition = new SpeechRecognition();
  const output = document.getElementById('output');
  const startBtn = document.getElementById('start');

  recognition.lang = 'en-US';
  recognition.interimResults = false;

  recognition.onresult = (event) => {
    const transcript = event.results[0][0].transcript;
    output.textContent = `You said: ${transcript}`;
    console.log('Voice Input:', transcript);
  };

  recognition.onerror = (e) => {
    output.textContent = 'Error: ' + e.error;
  };

  startBtn.addEventListener('click', () => {
    recognition.start();
  });
</script>

Click the button, speak into your mic, and watch your words appear in real time.

That’s voice input pure JavaScript, no dependencies.

4. Turn Speech into Commands

Now let’s make it actually do something.

We’ll listen for specific keywords and trigger actions, for example, “change background,” “go dark mode,” or “scroll down.”

recognition.onresult = (event) => {
  const transcript = event.results[0][0].transcript.toLowerCase();

  if (transcript.includes('dark')) {
    document.body.style.background = '#111';
    document.body.style.color = '#fff';
  } else if (transcript.includes('light')) {
    document.body.style.background = '#fff';
    document.body.style.color = '#000';
  } else if (transcript.includes('scroll down')) {
    window.scrollBy(0, 500);
  } else if (transcript.includes('scroll up')) {
    window.scrollBy(0, -500);
  }

  console.log('Command recognized:', transcript);
};

Now your site literally listens to your voice and executes commands locally.

5. Continuous Listening (for Live Commands)

If you want the app to keep listening without restarting after each phrase, use continuous mode:

recognition.continuous = true;

recognition.onend = () => {
  recognition.start(); // restart automatically
};

This makes your app feel more like a virtual assistant that’s always “on.”

Just be cautious, it’ll keep using the microphone until stopped, so always provide a clear Stop button.

6. Practical Example: Voice-Powered Search

Here’s a small snippet for a search input that fills automatically from speech.

<input id="searchBox" placeholder="Say your search..." />
<button id="voiceBtn">🎤 Speak</button>

<script>
  const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)();
  const searchBox = document.getElementById('searchBox');

  recognition.onresult = (e) => {
    const transcript = e.results[0][0].transcript;
    searchBox.value = transcript;
    // Optionally auto-submit
    // document.querySelector('form').submit();
  };

  document.getElementById('voiceBtn').addEventListener('click', () => {
    recognition.start();
  });
</script>

You can integrate this directly into an e-commerce search bar, documentation site, or chat input for accessibility.

7. Common Errors and Gotchas

1. HTTPS Required
The API needs a secure context (HTTPS) for microphone access.

2. Browser Support
Currently works in:
✅ Chrome, Edge, Brave, Opera
⚠️ Safari (prefixed, partial support)
❌ Firefox (no native support yet)

3. Permission Handling
Always trigger recognition.start() From a user gesture like a click, browsers block auto-start for privacy reasons.

4. Background Noise
Performance depends on mic quality and environment. Add visual cues like “Listening…” or background suppression for better UX.

8. Advanced: Handle Multiple Languages

You can switch recognition language dynamically based on user preference.

recognition.lang = 'fr-FR'; // French

Or expose a dropdown:

<select id="lang">
  <option value="en-US">English</option>
  <option value="fr-FR">French</option>
  <option value="es-ES">Spanish</option>
</select>

langSelect.addEventListener('change', (e) => {
  recognition.lang = e.target.value;
});

Instant multilingual support, no extra APIs required.

9. UX Tips for Voice Commands

Show a visual indicator (e.g., blinking mic icon) while listening.
Provide feedback like “Got it!” after a successful command.
Gracefully handle misheard phrases or silence.
Always give users manual control (start/stop).

Remember, your voice features should enhance, not replace, your app’s usability.

10. Real-World Use Cases

Accessibility: Let users navigate or fill forms without typing.
Smart Search: Enable “search by voice” in dashboards or e-commerce.
Productivity Tools: Create voice shortcuts for repetitive UI actions.
Smart Home UIs: Control lighting, sound, or layouts via browser apps.
Learning Apps: Let users practice pronunciation or language exercises.

Conclusion

The SpeechRecognition API is one of the most underrated web APIs powerful, privacy-friendly, and surprisingly easy to use web APIs.

In under 30 lines of code, you can turn your web app into a voice-controlled interface, no servers, no cloud APIs, just JavaScript and the browser.

Pro Tip: Combine SpeechRecognition with Text-to-Speech (SpeechSynthesis) to create a fully interactive voice assistant that can both listen and respond.

Call to Action

Have you built something with SpeechRecognition before?
Share your favorite voice command ideas in the comments and bookmark this post for your next project.

Skill Stuff