SonicCar: An Intelligent Voice-Controlled Autonomous Vehicle

ECE/MAE 148 Final Project

Team 12 Fall 2024

In recent years, the field of autonomous vehicles has made significant strides, creating advancements in artificial intelligence (AI), sensor technologies, and deep learning to redefine transportation. As part of UCSD’s ECE/MAE148 (Introduction to Autonomous Vehicles) course, our team worked on the development of SonicCar, a state-of-the-art autonomous vehicle capable of responding to voice commands and navigating its environment intelligently and safely.

The SonicCar system integrates several innovative technologies to achieve its functionality. It uses speech-to-text recognition to convert spoken commands into actionable text, which is then processed by a large language model (LLM) to interpret and execute the instructions. This unique approach allows SonicCar to understand complex and natural language commands, providing a seamless user experience.

To ensure safety and compliance with road rules, SonicCar incorporates both Li-DAR-based obstacle detection and a camera-based stop sign recognition system. The obstacle detection module enables the vehicle to detect and brake for objects in its path, while the stop sign recognition module, trained using RoboFlow deep learning, ensures adherence to traffic regulations. Together, these systems enhance the vehicle's reliability and situational awareness in dynamic environments.

The project represents a synthesis of voice control, AI-based natural language processing, and autonomous vehicle navigation, offering an accessible and interactive user interface while prioritizing safety. SonicCar showcases the potential for integrating human-machine interaction technologies with autonomous systems, paving the way for innovative applications in modern transportation.

Table of Contents

Team Members
Project Goals and Timeline
Mechanical Design
Electronics
Programming
How to Run (Step-by-Step)
Authors
Acknowledgments
Contacts

Team Members

Nick Ji - Electrical Engineering - Class of 2026
Johnny Li - Mechanical Engineering - Class of 2025
Daniel Glatter - Mechanical Engineering - Class of 2025
Shivharsh Kand - Mechanical Engineering - Class of 2025

Project Goals and Timeline

Project Goals: The project aimed to design and implement a sophisticated autonomous vehicle system with the following objectives:

Voice Command Integration: Developing a speech-to-text system to process natural language spoken commands, starting with the safe word "Sonic" in order to activate the system. Aditionally, we employed a large language model (LLM) to translate conversational speech into structured commands. This innovative approach allows the vehicle to understand natural speech patterns, ensuring a human-like interaction and enhancing usability.
Human-Machine Interaction: Fostering smooth communication between the user and vehicle by enabling the system to interpret natural speech patterns, thereby improving accessibility, enhancing the user experience, and allowing intuitive vehicle control.
LIDAR-based Collision Avoidance: Integrating a robust LIDAR-based obstacle detection system to ensure the vehicle detects and avoids hazards in real time, prioritizing safety and navigation reliability.

Project Scope: The project extended its functionality beyond the initial goals with the following additions:

Stop Sign Detection: Implementing a deep-learning-powered stop sign recognition system, utilizing the OAK-D camera to detect stop signs with high confidence, thereby improving situational awareness.
Graphical User Interface (GUI): Developing an intuitive GUI that allows users to monitor critical vehicle parameters in real time, such as throttle, steering angle, direction, and timeout values. The GUI enhances user understanding of current vehicle operations and provides clarity during system interaction.

The expanded scope reflects the adaptability and robustness of the system, showcasing its ability to integrate additional features seamlessly into its existing architecture. The project's innovative approach, integrating AI-driven conversational capabilities, sensor technologies, and interactive design to create a versatile, safe, and intuitive autonomous vehicle system. These features showcase the potential for real-world deployment in modern transportation systems.

Timeline:

Mechanical Design

The mechanical design of SonicCar focused on creating custom components to support the functionality of the vehicle. These components were designed, prototyped, and fabricated to ensure flexibility, durability, and compatibility with the onboard systems. Key mechanical elements include:

Platform:
- Designed and laser-cut a custom platform mounted on top of the car’s base to hold critical components such as the Jetson Nano, camera, Li-DAR, and GPS.
- The platform uses multiple parallel slots to allow flexibility in the positioning and installation of components.
- It also has four pins that connect it to the car body. They are simple, sturdy and very flexible (for example, we simply adjusted the length and reprinted the feet when we needed to raise the platform).

Camera Mount:
- Developed a 3D-printed adjustable camera mount to enable easy modification of the camera’s viewing angle.
- Integrated a ledge for sunlight protection to minimize interference from direct sunlight, ensuring a reliable camera feed.

Anti-Spark Switch mount:
- Mount to hold the Anti-Spark Switch in place.

Fabrication Process:
- Used 3D printing to prototype and produce robust, lightweight parts.
- Applied laser cutting techniques to achieve precise and efficient fabrication of the platform with intricate design features.

Electronics

The SonicCar project utilizes a comprehensive set of electronic components to ensure reliable processing, communication, navigation, and power management. The key electronic parts include:

Processing and Communication:
- Jetson Nano
- USB Hub (for multiple connections)
- GPS Board and GPS Receiver
- Logitech Receiver (for console remote control)
- WiFi Dongle (to connect to local WiFi)
- Camera (Oak-D Lite)
Sensing and Navigation:
- Li-DAR Board and Li-DAR Sensor
Actuation:
- VESC (Electronic Speed Controller)
- DC Motor (XeRun 3660 G2)
- Servo PDB (Power Distribution Board)
- Servo Motor
Power Management:
- DC/DC Converter
- Anti-Spark Switch
- Emergency Stop Button
- Battery (3 Cell LiPo)
- Battery Voltage Checker

These components are interconnected via a detailed wiring system, ensuring efficient power distribution, signal communication, and system reliability. The wiring diagram below provides an overview of these connections, serving as a guide for assembly and troubleshooting.

Programming

Overview

The chart below shows how the software is structured. Fundamentally, we are using ROS2, especially to provide communication (the blue boxes in the chart represent ROS2 nodes). A high-level overview over the components:

Publisher: Runs on a laptop, uses the laptop's microphone to turn voice commands into text (speech-to-text), then uses a Large Language Model to understand the intent of the text and translates it into commands for the car steering angle, throttle and the timeout (how long a command should be executed until it stops). It then publishes these commands to the steering_commands topics. This is also where the graphical user interface runs.
Subscriber: Runs on the Jetson Nano, listens to incoming commands on the steering_commands topic and publishes them to the VESC node via the /cmd_vel topic. Also tracks the Li-DAR to prevent collisions with appearing objects and uses the OAK-D camera to detect stop signs.
VESC node: Prewritten node from DonkeyCar, tranlates commands on the /cmd_vel topic to electronic signals to the servo motor (for steering) and DC motor (for throttle).
Li-DAR node: Prewritten node from DonkeyCar, publishes Li-DAR measurements to the /scan topic.
FastDDS discovery server: Running on the Jetson, used to enable communication between devices (laptop and Jetson). All ROS2 nodes register with the discovery server so they are discoverable to all other nodes.
OAK-D Camera: Directly connected to the Jetson via USB, runs the stop sign detection AI model.

Speech-to-Text (STT)

For understanding voice commands, we leverage the microphone of the laptop so the user does not have to move along the Jetson Nano. We use the Python package SpeechRecognition and concretely, the underlying Google Speech Recognition API, to get the command spoken as text. We typically saw latencies of 400-600 ms, depending on the network connection. Note that the SpeechRecognition uses a hardcoded API key for the API and there is a limit on the number of requests you can do per day.

The code listens for 4 seconds (by default) and then sends off anything recorded to the API.

Understanding Intent with an LLM

After handling speech-to-text, the next task is to extract three variables to direct the car how to maneuver: Direction (angle), speed (ranging from 0 to 1), and the amount of time to do such actions. To gain maximum flexibility with how commands can be expressed (and not to be just limited to certain, pre-coded keywords), we leverage a Large Language Model (LLM), in this case Google's Gemini-1.5-Flash. The LLM is instructured with a system prompt containing three sections:

General instructions on which three values to extract and which values are allowed
A range of example sentences and answers
Definition of how to format the output (in our case, we want a JSON format which we can easily parse afterwards).

The API key for the Gemini API has to be set as an environment variable (see step-by-step instructions below). You can request your own key here. The Gemini API was chosen because it offers a very easy interface and a generous free plan. With the LLM in place, our system can interpret even more complex driving instructions like "make a u-turn" or "go a little more to the right while being way faster". The LLM parses each command and converts it into structured data that we use to steer the car. After we receive the output of the LLM, we do some processing and case handling to ensure that the right parameters are published to the steering_commands topic.

Graphical User Interface (GUI)

The GUI is consists of a button, a status text box, a timeout dountdown text box and an emergency stop button. To interact with it, the user clicks the Start Recording button. After talking, the audio is transcribed and intent understood (as described above). The understood intent is then shown on the user interface (in the status text box). In case there are any errors, they are also shown in the GUI. After the command was sent to the Jetson, a countdown starts indicating how much longer the command will be executed on the Jetson. The emergency stop button works at all times to immediately publish a stop signal (zero velocity) to the steering_commands topic.

Behind the scenes, the GUI leverages the Python tkinter package and is launched from the ROS2 publisher node and keeps running continuously. It is in the graphical_user_interface.py script and called the VoiceRecorderUI.

Communication with ROS2

Because our ROS2 nodes had to communicate across devices (laptop and Jetson), we had some unique challenges with making the nodes talk to each other. In theory, ROS2 should automatically discover all nodes running on the same WiFi network. Unfortunately, this did not work for us. We found a workaround by setting up out own FastDDS discovery server. We have this server running at port 11888 on the Jetson in a separate terminal window. Then, on the laptop and all terminals where we run ROS2 nodes, we set an environment variable for the discovery server IP and port: export ROS_DISCOVERY_SERVER="[YOUR-IP]:[YOUR-PORT]" (on the Windows command line, this is set ROS_DISCOVERY_SERVER=[YOUR-IP]:[YOUR-PORT]). If this variable is set, ROS2 (more specifically FastDDS) automatically uses the server to discover nodes.

One interesting issue we had: Instead of the IP address (which changes regularly), we wanted to use the Jetson's fixed hostname, ucsdrobocar-148-12. However, this resulted in errors since IPv4 hostnames apparently can only include one dash. To save time, we did not investigate this further, though there is likely a workaround. Instead, we regularly run hostname -I on the Jetson to see the current IP address.

Command Processing & Actuation

This is running in the subscriber node, which is executed on the Jetson. The commands received on the steering_commands topic consist of three values: Steering angle (-1 to 1), throttle (0 to 1) and timeout (any number of seconds). They are processed using the command_callback() function, which essentially just writes the values to the /cmd_vel topic, which the VESC node listens to. A timed method (keep_moving()) ensures the car continues moving until a timeout occurs or a new command is received.

Li-DAR-based Collision Avoidance

The Li-DAR-based collision avoidance system utilizes the Li-DAR scanner and ROS2 to detect obstacles and shut the car down it they get too close. Below is a detailed breakdown of the components and their roles:

Li-DAR Node: Included in the UCSD-DonkeyCar packages, this package publishes Li-DAR measurements to the /scan topic.
Subscriber Node: This node subscribes to the steering_commands topic to receive motion commands and to the /scan topic for Li-DAR data. Li-DAR data is filtered to only the front third of the vehicle's field of view. Then, the minimum distance to nearby objects is calculated. If the minimum detected distance is less than the configured threshold (by default (0.4 meters)), the car is stopped immediately to avoid a collision. This logic is encapsulated in the lidar_callback() method.

Stop Sign Detection

Roboflow streamlines the process of building a robust stop sign detection model by facilitating data collection, labeling, training, and deployment. The dataset used for this project was sourced from Roboflow, which contained 453 labeled stop sign images. These diverse images simulate varied real-world conditions, improving the model's ability to generalize across different environments. The original plan with this was to integrate Roboflow as a secondary emergency stop feature. However, we ran into issues as the way we retrieve results from OAK-D gives faulty values for distance. This is no big deal as originally, we planned on Sonic to stop one meter away from the stop sign but the minimum it can stop was 3 meters.

Training results were evaluated using key metrics such as mean Average Precision (mAP), which measures precision across all classes, ensuring balanced performance. Precision reflects how often the model's predictions are correct, while recall indicates the percentage of relevant labels successfully identified by the model. These metrics highlight the model's effectiveness and help fine-tune its performance.

The trained model was deployed to the OAK-D camera using the Python package roboflowOAK. The deployment runs via the subscriber node running on the Jetson, to which the camera is connected via USB. This integration enables real-time stop sign detection directly on the OAK-D. A confidence threshold of 90% ensures reliable detections, minimizing false positives while maintaining responsiveness. The reason it was bumped to 90% instead of 80% was because it kept viewing other people as stop signs. The combination of Roboflow's robust tools and the OAK-D's hardware efficiency delivers an optimized solution for stop sign detection.

How to Run (step-by-step)

Anytime there is a variable here, remove the square brackets as well.

On the Jetson:

Make sure the Jetson is turned on. Wait until the fan starts spinning (pretty good indication on when it has connected to the Wi-Fi. Make sure your laptop is also connected to the UCSDRoboCar network. Then, SSH into the Jetson: ssh jetson@ucsdrobocar-148-[TEAM-NR].local
Get the UCSD DonkeyCar docker image: docker pull djnighti/ucsd_robocar:devel
Start Docker container: docker start [CONTAINER-NAME]
Execute bash on container docker exec -it [CONTAINER-NAME] bash
Source ROS2: source_ros2
The first time, navigate to the src folder and clone the repository git clone https://github.com/dglttr/MAE148FinalProject.git. Afterwards, remember to git pull and rebuild the package before using it to have the latest changes (colcon build --merge-install --packages-select listening_bot).
Source ROS2 package: . /install/setup.bash
Check the current IP address of the Jetson (needed for discovery server later): hostname -I
Launch FastDDS discovery server: fastdds discovery --server-id 0 --port 11888
Now, open another terminal and repeat steps 0, 3, 4 and 6 - this will be used to launch the VESC and Li-DAR nodes
Set environment variable for FastDDS server: export ROS_DISCOVERY_SERVER="[IP-ADDRESS]:11888"
Stop ROS2 daemon to make sure the discovery server will be used: ros2 daemon stop
Launch Li-DAR and VESC nodes with launchfile: ros2 launch listening_bot listening_bot.launch.py
Now, open another terminal and repeat steps 0, 3, 4, 6 and 10 - this will be used to launch the subscriber node (we do this in a separate terminal so the terminal is not flooded with messages from the VESC and Li-DAR nodes)
Set Roboflow environment variable to get the stop sign detection model: export ROBOFLOW_API_KEY="[YOUR-ROBOFLOW-API-KEY]"
Run subscriber: ros2 run listening_bot subscriber

On PC/Laptop (Windows):

First, you will need to install ROS2. We used ROS2 Foxy. Newer ROS2 versions may work, but we did not test that. Here are the install instructions for Windows.
The first time, clone the listening_bot repository into a folder of your choosing: git clone https://github.com/dglttr/MAE148FinalProject.git. Afterwards, navigate to that folder: cd [PATH-TO-FOLDER].
git pull to make sure you are on the newest version.
Build package: colcon build --merge-install --packages-select listening_bot
Source package. On Windows, it works like this: call install/setup.bat
Set environment variable for FastDDS server: set ROS_DISCOVERY_SERVER=[JETSON-IP-ADDRESS]:11888
ros2 daemon stop
Set the LLM_API_KEY: set LLM_API_KEY=[API-KEY]. You can get the API key for the Google Gemini API here.
Run the publisher: ros2 run publisher

Authors

Nick, Daniel, Johnny, Shiv

Acknowledgments

Huge thanks to the team, the TAs Alexander and Winston for carrying the class, and Professor Jack Silberman for the amazing opportunity to work on such a cool project! Special thanks to Alexander for the README template!

Contacts

Nick | yji@ieee.org
Daniel | dglatter@ucsd.edu
Johnny | jol048@ucsd.edu
Shiv | skand@ucsd.edu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SonicCar: An Intelligent Voice-Controlled Autonomous Vehicle

ECE/MAE 148 Final Project

Team Members

Project Goals and Timeline

Mechanical Design

Electronics

Programming

Overview

Speech-to-Text (STT)

Understanding Intent with an LLM

Graphical User Interface (GUI)

Communication with ROS2

Command Processing & Actuation

Li-DAR-based Collision Avoidance

Stop Sign Detection

How to Run (step-by-step)

On the Jetson:

On PC/Laptop (Windows):

Authors

Acknowledgments

Contacts

Files

README.md

Latest commit

History

README.md

File metadata and controls

SonicCar: An Intelligent Voice-Controlled Autonomous Vehicle

ECE/MAE 148 Final Project

Team Members

Project Goals and Timeline

Mechanical Design

Electronics

Programming

Overview

Speech-to-Text (STT)

Understanding Intent with an LLM

Graphical User Interface (GUI)

Communication with ROS2

Command Processing & Actuation

Li-DAR-based Collision Avoidance

Stop Sign Detection

How to Run (step-by-step)

On the Jetson:

On PC/Laptop (Windows):

Authors

Acknowledgments

Contacts