Automatic Speech Recognition Service
1. Introduction
The Automatic Speech Recognition (ASR) Service component is designed and implemented based on the ADK audio development framework, providing speech recognition functionality.
Supports Wanson ASR engine.
Supports audio data resampling functionality, which can perform sampling rate conversion according to the requirements of different ASR engines.
Supports multiple input modes including onboard microphone, UAC device, or obtaining audio data through other voice services.
Note
Users can use a unified API interface to configure and control the ASR service without concern for the specific ASR engine implementation.
For the API reference of the ASR service, please refer to:
For the example project of the ASR service, please refer to:
2. Software Framework
The main functions of the ASR service include:
Audio data collection: Supports direct collection from microphone (as shown in Figure1) or obtaining audio data from voice services (as shown in Figure2).
Audio data processing: Provides resampling functionality to ensure audio data meets the input requirements of the ASR engine.
Speech recognition: Sends processed audio data to the ASR engine for speech recognition.
The software framework is shown in the following figures:
Figure1 ASR Service Architecture
Figure2 ASR Service Architecture
Note
The current software version of the ASR service demonstrates two working modes: directly using the microphone for recording and recognition, or obtaining audio data through voice services for recognition. Users can configure other working modes for ASR according to their actual application scenarios.
Different ASR engines may have specific requirements for audio data format and sample rate, which can be adapted through configuration. For example, the Wanson ASR engine requires audio data to be in 16-bit PCM format with a sample rate of 16000Hz.
3. Macro Configuration
Function macro configuration:
Kconfig
CPU
Format
Value
CONFIG_ASR_SERVICE
AP
bool
y
Dependent function macro configuration:
Kconfig
CPU
Format
Value
CONFIG_ADK
AP
bool
y
CONFIG_ADK_RAW_STREAM
AP
bool
y
CONFIG_ADK_ONBOARD_MIC_STREAM
AP
bool
y
CONFIG_WANSON_ARMINO_ASR
AP
bool
y
ONFIG_WANSON_ASR_GROUP_VERSION_WORDS_V1
AP
bool
y
Note
If you need to support AEC and resampling functions, please refer to the component module description in the audio development framework and enable the corresponding function macros and component function macros.
4. Example Description
For specific examples, please refer to the asr_service_example project.
6. Common Questions
Q: How to select the sample rate?
A: Select an appropriate sample rate according to the requirements of the ASR engine used, typically 8kHz or 16kHz.
Q: How to process ASR recognition results?
A: Receive ASR recognition results by registering callback functions. For details, please refer to the callback function setting method in the API documentation.