Automatic Speech Recognition Service

[中文]

1. Introduction

The Automatic Speech Recognition (ASR) Service component is designed and implemented based on the ADK audio development framework, providing speech recognition functionality.

Supports Wanson ASR engine.

Supports audio data resampling functionality, which can perform sampling rate conversion according to the requirements of different ASR engines.

Supports multiple input modes including onboard microphone, UAC device, or obtaining audio data through other voice services.

Note

  1. Users can use a unified API interface to configure and control the ASR service without concern for the specific ASR engine implementation.

2. Software Framework

The main functions of the ASR service include:

  • Audio data collection: Supports direct collection from microphone (as shown in Figure1) or obtaining audio data from voice services (as shown in Figure2).

  • Audio data processing: Provides resampling functionality to ensure audio data meets the input requirements of the ASR engine.

  • Speech recognition: Sends processed audio data to the ASR engine for speech recognition.

The software framework is shown in the following figures:

ASR Service Architecture

Figure1 ASR Service Architecture

ASR Service Architecture

Figure2 ASR Service Architecture

Note

  1. The current software version of the ASR service demonstrates two working modes: directly using the microphone for recording and recognition, or obtaining audio data through voice services for recognition. Users can configure other working modes for ASR according to their actual application scenarios.

  2. Different ASR engines may have specific requirements for audio data format and sample rate, which can be adapted through configuration. For example, the Wanson ASR engine requires audio data to be in 16-bit PCM format with a sample rate of 16000Hz.

3. Macro Configuration

Function macro configuration:

Kconfig

CPU

Format

Value

CONFIG_ASR_SERVICE

AP

bool

y

Dependent function macro configuration:

Kconfig

CPU

Format

Value

CONFIG_ADK

AP

bool

y

CONFIG_ADK_RAW_STREAM

AP

bool

y

CONFIG_ADK_ONBOARD_MIC_STREAM

AP

bool

y

CONFIG_WANSON_ARMINO_ASR

AP

bool

y

ONFIG_WANSON_ASR_GROUP_VERSION_WORDS_V1

AP

bool

y

Note

  1. If you need to support AEC and resampling functions, please refer to the component module description in the audio development framework and enable the corresponding function macros and component function macros.

4. Example Description

For specific examples, please refer to the asr_service_example project.

6. Common Questions

Q: How to select the sample rate?

A: Select an appropriate sample rate according to the requirements of the ASR engine used, typically 8kHz or 16kHz.

Q: How to process ASR recognition results?

A: Receive ASR recognition results by registering callback functions. For details, please refer to the callback function setting method in the API documentation.