“Offline speech recognition for my self-driving toy car on Jetson Nano, please!”

Jetson Nano, the latest machine learning development platform from Nvidia, is emerging as a popular platform with exciting potential for Edge AI applications. The latest such application: on-device speech recognition and natural language understanding.

Picovoice recently added support for Jetson Nano. This means you can create and add a Voice User Interface (VUI) to your projects in a matter of hours. Unlike cloud-based alternatives, Picovoice processes voice data offline and runs entirely on Jetson Nano; processing on the edge eliminates the connectivity, latency, and privacy issues. Picovoice engines are efficient enough to run on microcontrollers, meaning the Jetson Nano has ample resources to do voice recognition.

Using Picovoice SDK, you will be able to infer a user’s intent from a naturally spoken utterance such as:

Picovoice, set the lights in the living room to blue.

Picovoice detects the occurrence of the custom wake word (“Picovoice”), and then extracts the intent from the follow-on spoken command like:

{
 “intent”: “changeLightColor”,
 “slots”: {
 “location”: “living room”,
 “color”: “blue”
 }
}

I saw this Self-Driving Toy Car article recently and thought it would be fun to make it even more intelligent by controlling it with voice. Picovoice can be used through multiple SDKs on Jetson Nano. Let us go with Python and make our voice interface ready in 3 steps:

0. Set up your NVIDIA Jetson Nano

If you haven't done it yet, refer to the “Getting Started” page.

1. Configure your microphone

As for the microphone, there are already several expansion boards off the shelf for Jetson Nano, but I am going with a simple USB mic.

Connect the microphone and get the list of available input audio devices by running the following command inside your terminal:

arecord -L | grep plughw

The output should be similar to below:

plughw:CARD=PCH,DEV=0

Copy this line and create a .asoundrc file in your home folder with these options:

pcm.!default {
   type asym
   capture.pcm "mic"
}
pcm.mic {
   type plug
   slave {
      pcm "${INPUT_AUDIO_DEVICE}"
   }
}

Replace ${INPUT_AUDIO_DEVICE} with what you copied earlier. You may need to reboot the system for these settings to take effect. It is also a good idea to go to Setting->Sound->Input and check if everything is working fine and that Linux identifies your microphone correctly.

2. Install Picovoice Package

Install the Picovoice python package:

sudo pip3 install picovoice picovoicedemo

The picovoicedemo package lets us test our models rapidly on Jetson Nano. The package requires a valid AccessKey. Signup or Login to Picovoice Console to get your AccessKey.

2. Create Context model

Go to Picovoice Console, an online, interactive tool to easily design models for both wake word (Porcupine) and speech-to-intent (Rhino) engines inside the Picovoice SDK.

Press enter or click to view image in full size

For this mini-project, I created a simple context, which understands changing speed (“Rate”) and turning on/off (“State”). You can copy the paragraph below and import it into Picovoice Console using the Import YAML button:

context:
  expressions:
    changeSpeed:
      - Slow $Rate:rate
      - Speed $Rate:rate
      - $Rate:rate (the) speed
      - Set (the) speed to $pv.SingleDigitInteger:speed
    changeState:
      - "[switch, turn] $State:state (the) engine"
  slots:
    State:
      - On
      - Off
    Rate:
      - down
      - up
      - increase
      - decrease

After training the model, download and extract it into your home folder. I also picked “computer” as the wake-word from the free wake-words available on Picovoice GitHub. So go ahead and tell the tiny car to turn on the “engine”:

Computer, turn on the engine

3. Add it to your self-driving toy car project

Thus far, we have created and tested our voice interface. Next, we will add it to a self-driving toy car project. Use the following Python code with the aforementioned Picovoice SDK for Python:
Just modify the wake_word_callback and inference_callback functions based on the context model’s intents.

from picovoice import Picovoice keyword_path = ...
 def wake_word_callback():
    pass
 context_path = ...
 def inference_callback(inference):
    # `inference` exposes three immutable fields:
    # (1) `is_understood`
    # (2) `intent`
    # (3) `slots`
    pass
 handle = Picovoice(
access_key=${ACCESS_KEY},
        keyword_path=keyword_path,
        wake_word_callback=wake_word_callback,
        context_path=context_path,
        inference_callback=inference_callback)        while True:
    handle.process(get_next_audio_frame())

For more detailed information, please refer to the Python API documentation.