Stop Clicking and Start Talking: Build Voice Commands in JavaScript

Learn how to make your web apps respond to your voice using the SpeechRecognition API, no frameworks, no cloud, just pure JavaScript.

Introduction

Wouldn’t it be great if you could just tell your web app what to do?

No more clicking tiny buttons, scrolling endlessly, or typing commands. Just say “Dark mode on” or “Search for JavaScript tutorials,” and your site reacts instantly.

That’s not sci-fi. It’s the SpeechRecognition API, a native web feature that lets your browser listen, transcribe, and act on voice input all without external APIs or libraries.

In this post, we’ll build a simple yet powerful voice-controlled app using nothing but plain JavaScript.
You’ll learn how to:

Capture live speech
Convert it to text in real time
Trigger actions based on keywords
Make your app talk back

Let’s give your browser a voice.

1. Meet the SpeechRecognition API

The SpeechRecognition API (also called the Web Speech API) is built into modern browsers.
It allows developers to access the user’s microphone, process audio, and return a text transcript.

No server, no external API key, everything happens right inside the browser.

Here’s how you create an instance safely across browsers:

const SpeechRecognition =
  window.SpeechRecognition || window.webkitSpeechRecognition;

const recognition = new SpeechRecognition();
recognition.lang = 'en-US';
recognition.interimResults = false;

Once started, it listens to the user’s voice and fires an onresult event whenever it recognizes speech.

2. The Smallest Possible Demo

Let’s build a simple interface that listens and displays what you say.

<button id="start">🎤 Start Listening</button>
<p id="output">Say something...</p>

<script>
  const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
  const recognition = new SpeechRecognition();
  recognition.lang = 'en-US';
  recognition.interimResults = false;

  const output = document.getElementById('output');
  const startBtn = document.getElementById('start');

  startBtn.addEventListener('click', () => {
    recognition.start();
    output.textContent = 'Listening...';
  });

  recognition.onresult = (event) => {
    const text = event.results[0][0].transcript;
    output.textContent = `You said: ${text}`;
  };
</script>

Click the button, talk, and watch your words appear on screen.
That’s real-time speech recognition with zero dependencies.

3. Turn Speech into Actions

Now that we can capture text, let’s make the app respond to commands.

recognition.onresult = (event) => {
  const command = event.results[0][0].transcript.toLowerCase();

  if (command.includes('dark mode')) {
    document.body.style.background = '#111';
    document.body.style.color = '#fff';
  } else if (command.includes('light mode')) {
    document.body.style.background = '#fff';
    document.body.style.color = '#000';
  } else if (command.includes('scroll down')) {
    window.scrollBy(0, 500);
  } else if (command.includes('scroll up')) {
    window.scrollBy(0, -500);
  }

  console.log('Command received:', command);
};

Now say “Dark mode” or “Scroll down,” and your app instantly reacts.
You’ve just built a basic voice-controlled interface.

4. Add a Voice That Talks Back

To make your assistant respond verbally, use the SpeechSynthesis API, the browser’s built-in text-to-speech feature.

function speak(text) {
  const utter = new SpeechSynthesisUtterance(text);
  speechSynthesis.speak(utter);
}

Integrate it with your commands:

if (command.includes('dark mode')) {
  document.body.style.background = '#111';
  document.body.style.color = '#fff';
  speak('Dark mode activated');
}

Now your app not only listens but talks, turning it into a real mini assistant.

5. Voice Search Example

Let’s build something more useful: a voice-powered search bar.

<input id="searchBox" placeholder="Say your search..." />
<button id="mic">🎙️</button>

<script>
  const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)();

  document.getElementById('mic').addEventListener('click', () => recognition.start());

  recognition.onresult = (e) => {
    const text = e.results[0][0].transcript;
    document.getElementById('searchBox').value = text;
    console.log('Searching for:', text);
  };
</script>

When users click the mic and speak, their words appear in the search field, perfect for e-commerce or dashboards.

6. Continuous Listening Mode

By default, SpeechRecognition stops after one phrase.
You can make it keep listening by setting:

recognition.continuous = true;

recognition.onend = () => {
  recognition.start(); // Restart automatically
};

This transforms your site into a true “always listening” assistant, great for hands-free apps or kiosks.
Just remember to provide a Stop button so users stay in control.

7. Handling Errors Gracefully

Add an error listener to handle cases where the user denies mic access or no speech is detected:

recognition.onerror = (e) => {
  console.error('Speech error:', e.error);
  speak('Sorry, I did not catch that.');
};

Also, display visual feedback (like “Listening…” or a pulsing mic icon) for a smoother user experience.

8. Browser Support

Requires HTTPS (or localhost) to access the microphone.

9. Real-World Use Cases

Once you know the basics, you can build some surprisingly powerful features:

Voice Navigation: “Go to dashboard,” “Open settings.”
Accessibility Controls: Let users navigate without a keyboard.
Smart Forms: Dictate comments or input fields hands-free.
Voice-Activated Games or Demos: Add natural interaction.
Kiosk Interfaces: Replace buttons with spoken commands.

With a bit of creativity, you can make the web genuinely conversational.

10. Full Example Browser Voice Assistant

Here’s everything combined in one snippet:

<button id="start">🎙️ Activate Assistant</button>
<p id="output">Say: "Dark mode", "Scroll down", or "Hello"</p>

<script>
  const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
  const recognition = new SpeechRecognition();
  recognition.lang = 'en-US';
  recognition.interimResults = false;

  function speak(text) {
    const utter = new SpeechSynthesisUtterance(text);
    speechSynthesis.speak(utter);
  }

  recognition.onresult = (e) => {
    const cmd = e.results[0][0].transcript.toLowerCase();
    document.getElementById('output').textContent = `You said: ${cmd}`;

    if (cmd.includes('dark')) {
      document.body.style.background = '#111';
      document.body.style.color = '#fff';
      speak('Dark mode activated');
    } else if (cmd.includes('light')) {
      document.body.style.background = '#fff';
      document.body.style.color = '#000';
      speak('Light mode on');
    } else if (cmd.includes('hello')) {
      speak('Hello there, how can I help you?');
    } else {
      speak(`You said ${cmd}`);
    }
  };

  document.getElementById('start').addEventListener('click', () => {
    recognition.start();
    speak('Voice assistant ready.');
  });
</script>

Open this on Chrome, click the mic, and start talking. Your browser just became your assistant.

Conclusion

Voice control isn’t just for phones anymore.
The SpeechRecognition API makes it easy to give your web apps a voice and a personality using only native JavaScript.

In under a hundred lines of code, you can build interfaces that listen, respond, and even talk back, no frameworks or external APIs needed.

Pro Tip: Combine SpeechRecognition with SpeechSynthesis and localStorage to create a persistent, voice-aware web experience.

Call to Action

Would you ever replace buttons with voice commands in your projects?
Share your thoughts in the comments and bookmark this post for your next experiment in web interactivity.

Skill Stuff