TensorFlow Lite Micro Developer Guide

1. Overview

TensorFlow Lite Micro (TFLM) is a lightweight machine learning inference framework designed specifically for microcontrollers and other resource-constrained devices. It is a streamlined version of TensorFlow Lite, optimized for embedded systems.

This SDK integrates the TensorFlow Lite Micro framework and provides complete example projects to help developers quickly get started with edge AI application development.

1.1 Key Features

Lightweight Design: Optimized for embedded devices with minimal memory footprint
Hardware Acceleration: Supports CMSIS-NN acceleration library for improved inference performance
Easy Integration: Complete CMake build system and example projects
Rich Operator Support: Supports commonly used deep learning operators
Dual-Core Architecture Support: Fully utilizes BK7258 multi-core processing capabilities

1.2 System Requirements

Chip Platform: BK7258 (ARM Cortex-M33 dual-core)
Memory: PSRAM for storing models and tensor arena

1.3 System Architecture

Since CPU0 has a maximum frequency of 240MHz and CPU1 can run at 480MHz.

Therefore, TensorFlow Lite Micro model inference runs on CPU1.

2. Component Architecture

2.1 Directory Structure

TensorFlow Lite Micro related code consists of two main parts:

Component Directory (components/bk_tflite_micro/):

bk_tflite_micro/
├── CMakeLists.txt              # Component build file
├── Kconfig                     # Component configuration options
└── tflite-micro/               # TensorFlow Lite Micro source code
    └── tensorflow/
        └── lite/
            └── micro/          # Core implementation

Example Project Directory (projects/tflite_micro/):

tflite_micro/
├── micro_speech/               # Speech recognition example
│   ├── main/                   # Main program code
│   │   ├── app_main_cpu1.cc   # CPU1 main function
│   │   ├── tflite/            # TensorFlow Lite related code
│   │   │   ├── main_functions.cc     # Inference main logic
│   │   │   ├── micro_speech_quantized_model_data.cc  # Model data
│   │   │   └── ...
│   │   └── CMakeLists.txt     # Main program build file
│   ├── config/                 # Chip configuration files
│   ├── CMakeLists.txt         # Project build file
│   └── Makefile               # Top-level Makefile
└── gesture_detection/          # Gesture detection example
    ├── main/                   # Main program code
    │   ├── app_main_cpu1.cc   # CPU1 main function
    │   ├── tflite/            # TensorFlow Lite related code
    │   │   ├── main_functions.cc         # Inference main logic
    │   │   ├── gesture_detection_model_data.cc  # Model data
    │   │   ├── image_provider.cc         # Image input interface
    │   │   └── detection_responder.cc    # Result processing interface
    │   └── CMakeLists.txt     # Main program build file
    └── config/                 # Chip configuration files

2.2 Component Configuration

TensorFlow Lite Micro component is configured through Kconfig and needs to be enabled in project configuration:

menu "TFLite Micro"
    config TFLITE_MICRO
        bool "Enable TFLite Micro"
        default n
endmenu

Once enabled, the component will automatically compile the TensorFlow Lite Micro static library and link it to the application.

3. Example Project Details

3.1 Micro Speech (Speech Recognition)

Micro Speech is an audio feature-based keyword recognition example that can recognize simple voice commands like “yes” and “no”.

3.1.1 Core Components

Audio Preprocessing Model: Converts raw audio data to feature vectors
Speech Recognition Model: Classifies feature vectors to recognize keywords
Test Audio Data: Contains test samples like yes, no, silence, noise

3.1.2 Main Flow

Initialization Phase:

// Run AI inference task on CPU1
void app_main_cpu1(void *arg) {
    // Set CPU frequency to 480MHz for better performance
    bk_pm_module_vote_cpu_freq(PM_DEV_ID_LIN, PM_CPU_FRQ_480M);

    // Create TensorFlow Lite inference task
    xTaskCreate(tflite_task, "test", 1024*16, NULL, 3, NULL);
}

Model Loading:

void tflite_task(void *arg) {
    // Register debug log callback
    RegisterDebugLogCallback(debugLogCallback);

    // Allocate model data buffer from PSRAM
    data_ptr = (unsigned char*)psram_malloc(
        g_micro_speech_quantized_model_data_len);

    // Copy model data to PSRAM
    os_memcpy(data_ptr, g_micro_speech_quantized_model_data,
        g_micro_speech_quantized_model_data_len);

    // Loop inference execution
    while (1) {
        loop();
        vTaskDelay(pdMS_TO_TICKS(5000));
    }
}

Feature Extraction:

// Generate features using audio preprocessing model
TfLiteStatus GenerateFeatures(const int16_t* audio_data,
                              const size_t audio_data_size,
                              Features* features_output) {
    // Load preprocessing model
    const tflite::Model* model =
        tflite::GetModel(g_audio_preprocessor_int8_model_data);

    // Create operator resolver
    AudioPreprocessorOpResolver op_resolver;
    RegisterOps(op_resolver);

    // Create interpreter
    tflite::MicroInterpreter interpreter(model, op_resolver,
                                        g_arena, kArenaSize);

    // Allocate tensor memory
    interpreter.AllocateTensors();

    // Process audio data to generate features
    // ...
}

Inference Execution:

// Load speech recognition model and perform inference
TfLiteStatus LoadMicroSpeechModelAndPerformInference(
    const Features& features, const char* expected_label) {
    // Load model
    const tflite::Model* model =
        tflite::GetModel(g_micro_speech_quantized_model_data);

    // Create operator resolver
    MicroSpeechOpResolver op_resolver;
    op_resolver.AddReshape();
    op_resolver.AddFullyConnected();
    op_resolver.AddDepthwiseConv2D();
    op_resolver.AddSoftmax();

    // Create interpreter and allocate memory
    tflite::MicroInterpreter interpreter(model, op_resolver,
                                        g_arena, kArenaSize);
    interpreter.AllocateTensors();

    // Fill input data
    TfLiteTensor* input = interpreter.input(0);
    std::copy_n(&features[0][0], kFeatureElementCount,
               tflite::GetTensorData<int8_t>(input));

    // Execute inference
    interpreter.Invoke();

    // Get output results
    TfLiteTensor* output = interpreter.output(0);
    // Dequantize and parse results
    // ...
}

3.1.3 Memory Configuration

Arena Size: 28584 bytes (for storing tensor data)
Task Stack Size: 1024*16 bytes
Model Storage: Uses PSRAM to store model data

3.2 Gesture Detection

Gesture Detection is a vision-based gesture recognition example that can detect gestures like rock, paper, scissors.

3.2.1 Core Components

Image Input Interface: image_provider.cc - Responsible for capturing camera images
Gesture Detection Model: gesture_detection_model_data.cc - Contains trained model
Result Processing Interface: detection_responder.cc - Processes detection results and outputs
Model Configuration: model_settings.cc/h - Defines model input/output parameters

3.2.2 Main Flow

Initialization Phase:

void setup() {
    tflite::InitializeTarget();

    // Load model
    model = tflite::GetModel(data_ptr);

    // Create operator resolver (13 operators)
    static tflite::MicroMutableOpResolver<13> micro_op_resolver;
    micro_op_resolver.AddConv2D(tflite::Register_CONV_2D_INT8());
    micro_op_resolver.AddPad();
    micro_op_resolver.AddMaxPool2D();
    // ... Add more operators

    // Allocate tensor arena from PSRAM (360KB)
    uint8_t *tensor_arena = (uint8_t *)psram_malloc(kTensorArenaSize);

    // Create interpreter
    static tflite::MicroInterpreter static_interpreter(
        model, micro_op_resolver, tensor_arena, kTensorArenaSize);
    interpreter = &static_interpreter;

    // Allocate tensors
    interpreter->AllocateTensors();

    // Get input tensor
    input = interpreter->input(0);
}

Image Capture and Inference:

void loop() {
    // Get image data
    if (kTfLiteOk != GetImage(kNumCols, kNumRows, kNumChannels,
                              input->data.int8, 0)) {
        MicroPrintf("Image capture failed.\r\n");
    }

    // Execute inference
    if (kTfLiteOk != interpreter->Invoke()) {
        MicroPrintf("Invoke failed.\r\n");
    }

    // Get output and post-process
    TfLiteTensor* output = interpreter->output(0);
    g_scale = output->params.scale;
    g_zero_point = output->params.zero_point;

    post_process(output->data.int8);
}

Result Post-processing:

uint8_t post_process(int8_t *out_data) {
    // Iterate through all detection boxes (2268 anchors)
    for(int i = 0; i < 2268; i++) {
        // Dequantize confidence score
        float score = (out_data[i*8 + 4] - g_zero_point) * g_scale;

        if(score > 62) {  // Confidence threshold
            // Parse bounding box coordinates
            int x = (out_data[i*8 + 0] - g_zero_point) * g_scale;
            int y = (out_data[i*8 + 1] - g_zero_point) * g_scale;
            int w = (out_data[i*8 + 2] - g_zero_point) * g_scale;
            int h = (out_data[i*8 + 3] - g_zero_point) * g_scale;

            // Parse gesture class
            float paper = (out_data[i*8 + 5] - g_zero_point) * g_scale;
            float rock = (out_data[i*8 + 6] - g_zero_point) * g_scale;
            float scissors = (out_data[i*8 + 7] - g_zero_point) * g_scale;

            // Determine gesture type and output
            if (paper > 90) {
                MicroPrintf("Paper is detected\r\n");
            } else if (rock > 90) {
                MicroPrintf("Rock is detected\r\n");
            } else if (scissors > 90) {
                MicroPrintf("Scissors is detected\r\n");
            }
        }
    }
}

3.2.3 Memory Configuration

Tensor Arena Size: 360KB (allocated from PSRAM)
Model Input: 192x192x3 INT8 image
Model Output: 2268x8 INT8 detection results (coordinates, confidence, classes)

4. Developer Guide

4.1 Environment Setup

4.1.1 Install Development Tools

Install ARM GCC toolchain
Install CMake (version ≥3.5)
Configure ARMINO_PATH environment variable

4.1.2 Enable TensorFlow Lite Micro Component

Enable TFLITE_MICRO in project configuration file:

# Use menuconfig to configure
make menuconfig
# Navigate to TFLite Micro -> Enable TFLite Micro, select [Y]

Or add in sdkconfig file:

CONFIG_TFLITE_MICRO=y

4.2 Create Custom AI Application

4.2.1 Prepare Model Files

Train Model: Use TensorFlow/Keras to train model
Quantize Model: Convert to INT8 quantized TFLite model
Convert to C Array: Use xxd tool to convert to header file

# Convert .tflite file to C array
xxd -i model.tflite > model_data.cc

4.2.2 Integrate Model into Project

Create model data files:

// model_data.h
#ifndef MODEL_DATA_H_
#define MODEL_DATA_H_

extern const unsigned char g_model_data[];
extern const int g_model_data_len;

#endif  // MODEL_DATA_H_

// model_data.cc
#include "model_data.h"

alignas(8) const unsigned char g_model_data[] = {
    // Model data...
};
const int g_model_data_len = sizeof(g_model_data);

Create main inference code:

#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/schema/schema_generated.h"
#include "model_data.h"

// Define tensor arena size
constexpr int kTensorArenaSize = 100 * 1024;
uint8_t *tensor_arena = nullptr;

void setup() {
    // Load model
    const tflite::Model* model = tflite::GetModel(g_model_data);

    // Add required operators
    static tflite::MicroMutableOpResolver<5> micro_op_resolver;
    micro_op_resolver.AddConv2D();
    micro_op_resolver.AddFullyConnected();
    micro_op_resolver.AddSoftmax();
    // Add other required operators...

    // Allocate memory
    tensor_arena = (uint8_t *)psram_malloc(kTensorArenaSize);

    // Create interpreter
    static tflite::MicroInterpreter static_interpreter(
        model, micro_op_resolver, tensor_arena, kTensorArenaSize);

    static_interpreter.AllocateTensors();

    // Get input/output tensors
    TfLiteTensor* input = static_interpreter.input(0);
    TfLiteTensor* output = static_interpreter.output(0);
}

void loop() {
    // Fill input data
    // ...

    // Execute inference
    interpreter->Invoke();

    // Process output results
    // ...
}

4.2.3 Configure CMakeLists.txt

Add TensorFlow Lite related source files in project’s main/CMakeLists.txt:

if (CONFIG_SYS_CPU1)
    file(GLOB_RECURSE TF_SOURCES tflite/*.c tflite/*.cc)

    list(APPEND srcs
        app_cpu1_main.c
        app_main_cpu1.cc
        ${TF_SOURCES}
    )

    list(APPEND incs
        tflite
    )
endif()

armino_component_register(
    SRCS "${srcs}"
    INCLUDE_DIRS "${incs}"
)

4.2.4 Add Required Compile Options

Add C++ compile options in project’s top-level CMakeLists.txt:

# Disable certain warnings
armino_build_set_property(COMPILE_OPTIONS "-Wno-unused-variable" APPEND)
armino_build_set_property(COMPILE_OPTIONS "-Wno-sign-compare" APPEND)
armino_build_set_property(CXX_COMPILE_OPTIONS "-fpermissive" APPEND)

4.3 Dual-Core Application Development

BK7258 supports dual-core architecture, allowing AI inference tasks to run on CPU1 to fully utilize hardware resources.

4.3.1 CPU Configuration

Enable dual-core mode in configuration:

CONFIG_SYS_CPU0=y    # CPU0 configuration
CONFIG_SYS_CPU1=y    # CPU1 configuration

4.3.2 Task Assignment

CPU0: Runs system management, communication tasks
CPU1: Runs AI inference tasks

Example code:

// CPU1 entry function
extern "C" void app_main_cpu1(void *arg) {
    // Increase CPU frequency for better performance
    bk_pm_module_vote_cpu_freq(PM_DEV_ID_LIN, PM_CPU_FRQ_480M);

    // Create AI inference task
    xTaskCreate(tflite_task, "tflite", 1024*16, NULL, 3, NULL);
}

4.4 Performance Optimization Tips

4.4.1 Memory Optimization

Use PSRAM: Store model data and tensor arena in PSRAM

// Allocate model buffer from PSRAM
data_ptr = (unsigned char*)psram_malloc(model_data_len);

// Allocate tensor arena from PSRAM
tensor_arena = (uint8_t *)psram_malloc(kTensorArenaSize);

Optimize Arena Size: Use interpreter->arena_used_bytes() to get actual usage and adjust arena size

MicroPrintf("Arena used: %d bytes\n",
            interpreter->arena_used_bytes());

Static Memory Allocation: Define TF_LITE_STATIC_MEMORY to avoid dynamic allocation

4.4.2 Inference Optimization

INT8 Quantization: Use INT8 quantized models to reduce memory footprint and computation
CMSIS-NN Acceleration: Component enables CMSIS-NN optimized kernels by default

CPU Frequency Adjustment: Adjust CPU frequency based on performance requirements

// Set to maximum frequency 480MHz
bk_pm_module_vote_cpu_freq(PM_DEV_ID_LIN, PM_CPU_FRQ_480M);

Task Priority: Set appropriate priority for AI inference tasks

xTaskCreate(tflite_task, "tflite", stack_size, NULL,
            priority, NULL);

4.4.3 Model Optimization

Simplify Network Structure: Reduce number of layers and parameters
Pruning and Compression: Use model pruning techniques to reduce model size
Operator Selection: Prioritize CMSIS-NN supported operators (Conv2D, DepthwiseConv2D, FullyConnected, etc.)

5. Model Adaptation and Integration

5.1 Model Conversion Flow

5.1.1 BK7258 Chip Description

Important

BK7258 chip uses ARM Cortex-M33 CPU architecture and does NOT include an NPU (Neural Processing Unit) hardware accelerator.

Therefore:

No NPU model conversion required
All inference runs on CPU
Uses CMSIS-NN software library for operator optimization
Models only need to be converted to TensorFlow Lite format

5.1.2 Model Conversion Steps

Convert trained TensorFlow/Keras model to TFLite INT8 quantized model:

import tensorflow as tf
import numpy as np

# 1. Load trained model
model = tf.keras.models.load_model('your_model.h5')

# 2. Create TFLite converter
converter = tf.lite.TFLiteConverter.from_keras_model(model)

# 3. Enable INT8 quantization optimization
converter.optimizations = [tf.lite.Optimize.DEFAULT]

# 4. Provide representative dataset for quantization calibration
def representative_dataset():
    # Use subset of training or validation data
    for i in range(100):
        # Ensure data shape matches model input
        data = np.random.rand(1, input_height, input_width, channels)
        yield [data.astype(np.float32)]

converter.representative_dataset = representative_dataset

# 5. Set input/output types to INT8
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8

# 6. Convert and save
tflite_model = converter.convert()
with open('model_int8.tflite', 'wb') as f:
    f.write(tflite_model)

print("Model conversion completed!")

5.1.3 Validate Model

After conversion, it’s recommended to validate model accuracy on PC:

import tensorflow as tf

# Load TFLite model
interpreter = tf.lite.Interpreter(model_path="model_int8.tflite")
interpreter.allocate_tensors()

# Get input/output details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

print("Input info:", input_details)
print("Output info:", output_details)

# Test inference
test_data = np.random.rand(*input_details[0]['shape']).astype(np.int8)
interpreter.set_tensor(input_details[0]['index'], test_data)
interpreter.invoke()
output = interpreter.get_tensor(output_details[0]['index'])
print("Output result:", output)

5.2 Add Model to Project

5.2.1 Convert to C Array

Use xxd tool to convert .tflite file to C/C++ array:

# Method 1: Generate .cc file using xxd
xxd -i model_int8.tflite > model_data.cc

# Method 2: Generate .h header file using xxd
xxd -i model_int8.tflite model_data.h

Generated file format looks like:

unsigned char model_int8_tflite[] = {
  0x1c, 0x00, 0x00, 0x00, 0x54, 0x46, 0x4c, 0x33, ...
};
unsigned int model_int8_tflite_len = 123456;

5.2.2 Create Model Data Files

Manually create standardized model files:

// my_model_data.h
#ifndef MY_MODEL_DATA_H_
#define MY_MODEL_DATA_H_

#include <stdint.h>

// Model data declaration
extern const unsigned char g_my_model_data[];
extern const int g_my_model_data_len;

#endif  // MY_MODEL_DATA_H_

// my_model_data.cc
#include "my_model_data.h"

// 8-byte alignment for optimized access performance
alignas(8) const unsigned char g_my_model_data[] = {
    // Paste xxd generated array data here
    0x1c, 0x00, 0x00, 0x00, 0x54, 0x46, 0x4c, 0x33,
    // ... other data
};

const int g_my_model_data_len = sizeof(g_my_model_data);

5.2.3 Add to CMakeLists.txt

Add model file in project’s main/CMakeLists.txt:

if (CONFIG_SYS_CPU1)
    list(APPEND srcs
        app_cpu1_main.c
        app_main_cpu1.cc
        tflite/my_model_data.cc      # Add model data file
        tflite/main_functions.cc      # Inference main logic
    )

    list(APPEND incs
        tflite
    )
endif()

5.3 Write Inference Code

5.3.1 Basic Inference Framework

Create inference.cc file to implement complete inference flow:

#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/schema/schema_generated.h"
#include "my_model_data.h"

extern "C" {
#include "os/os.h"
#include "os/mem.h"
}

// Global variables
namespace {
    const tflite::Model* model = nullptr;
    tflite::MicroInterpreter* interpreter = nullptr;
    TfLiteTensor* input = nullptr;
    TfLiteTensor* output = nullptr;

    // Adjust arena size based on model requirements
    constexpr int kTensorArenaSize = 200 * 1024;
    uint8_t* tensor_arena = nullptr;
}

// Initialize model
bool InitModel() {
    // 1. Load model
    model = tflite::GetModel(g_my_model_data);
    if (model->version() != TFLITE_SCHEMA_VERSION) {
        os_printf("Model version mismatch!\n");
        return false;
    }

    // 2. Register required operators
    static tflite::MicroMutableOpResolver<6> micro_op_resolver;
    micro_op_resolver.AddConv2D();
    micro_op_resolver.AddDepthwiseConv2D();
    micro_op_resolver.AddFullyConnected();
    micro_op_resolver.AddSoftmax();
    micro_op_resolver.AddReshape();
    micro_op_resolver.AddMaxPool2D();
    // Add other operators based on actual model

    // 3. Allocate tensor arena from PSRAM
    tensor_arena = (uint8_t*)psram_malloc(kTensorArenaSize);
    if (tensor_arena == NULL) {
        os_printf("PSRAM allocation failed!\n");
        return false;
    }

    // 4. Create interpreter
    static tflite::MicroInterpreter static_interpreter(
        model, micro_op_resolver, tensor_arena, kTensorArenaSize);
    interpreter = &static_interpreter;

    // 5. Allocate tensors
    TfLiteStatus allocate_status = interpreter->AllocateTensors();
    if (allocate_status != kTfLiteOk) {
        os_printf("AllocateTensors failed!\n");
        return false;
    }

    // 6. Get input/output tensor pointers
    input = interpreter->input(0);
    output = interpreter->output(0);

    os_printf("Model initialized successfully! Arena used: %d bytes\n",
              interpreter->arena_used_bytes());

    return true;
}

5.3.2 Execute Inference

// Execute inference
bool RunInference(const void* input_data, int input_size) {
    // 1. Check input size
    if (input_size != input->bytes) {
        os_printf("Input size mismatch! Expected: %d, Actual: %d\n",
                  input->bytes, input_size);
        return false;
    }

    // 2. Fill input data
    memcpy(input->data.int8, input_data, input_size);

    // 3. Execute inference
    TfLiteStatus invoke_status = interpreter->Invoke();
    if (invoke_status != kTfLiteOk) {
        os_printf("Inference execution failed!\n");
        return false;
    }

    return true;
}

5.4 Get Inference Results

5.4.1 Classification Task Result Processing

For classification tasks (e.g., image classification, speech recognition):

// Get classification result
int GetClassificationResult(float* confidence) {
    // Get output tensor
    TfLiteTensor* output = interpreter->output(0);

    // Get quantization parameters
    float scale = output->params.scale;
    int32_t zero_point = output->params.zero_point;

    // Find class with maximum probability
    int max_index = 0;
    float max_score = -1000.0f;

    for (int i = 0; i < output->bytes; i++) {
        // Dequantize
        float score = (output->data.int8[i] - zero_point) * scale;

        if (score > max_score) {
            max_score = score;
            max_index = i;
        }
    }

    if (confidence != nullptr) {
        *confidence = max_score;
    }

    os_printf("Recognition result: Class %d, Confidence: %.2f\n", max_index, max_score);

    return max_index;
}

// Usage example
void ClassificationExample() {
    // Prepare input data
    int8_t input_data[INPUT_SIZE];
    PrepareInputData(input_data);

    // Execute inference
    if (RunInference(input_data, INPUT_SIZE)) {
        // Get result
        float confidence = 0.0f;
        int class_id = GetClassificationResult(&confidence);

        // Process result based on class ID
        if (confidence > 0.8f) {  // Confidence threshold
            os_printf("High confidence recognition: Class %d\n", class_id);
            // Execute corresponding action
        }
    }
}

5.4.2 Object Detection Result Processing

For object detection tasks (e.g., gesture detection, object detection):

// Detection result structure
typedef struct {
    float x, y, w, h;      // Bounding box coordinates and size
    int class_id;          // Class ID
    float confidence;      // Confidence
} DetectionResult;

// Parse detection results
int GetDetectionResults(DetectionResult* results, int max_results) {
    TfLiteTensor* output = interpreter->output(0);

    float scale = output->params.scale;
    int32_t zero_point = output->params.zero_point;

    int result_count = 0;

    // Assume output format: [num_detections, 8] (x,y,w,h,score,class1,class2,class3)
    int num_boxes = output->dims->data[0];  // e.g., 2268
    int box_size = output->dims->data[1];   // e.g., 8

    for (int i = 0; i < num_boxes && result_count < max_results; i++) {
        int8_t* box_data = &output->data.int8[i * box_size];

        // Dequantize confidence
        float score = (box_data[4] - zero_point) * scale;

        // Filter by confidence threshold
        if (score > 60.0f) {
            // Dequantize coordinates
            results[result_count].x = (box_data[0] - zero_point) * scale;
            results[result_count].y = (box_data[1] - zero_point) * scale;
            results[result_count].w = (box_data[2] - zero_point) * scale;
            results[result_count].h = (box_data[3] - zero_point) * scale;
            results[result_count].confidence = score;

            // Find class with maximum probability
            float max_class_score = -1000.0f;
            int max_class_id = 0;

            for (int c = 0; c < 3; c++) {  // Assume 3 classes
                float class_score = (box_data[5 + c] - zero_point) * scale;
                if (class_score > max_class_score) {
                    max_class_score = class_score;
                    max_class_id = c;
                }
            }

            results[result_count].class_id = max_class_id;
            result_count++;
        }
    }

    return result_count;
}

// Usage example
void DetectionExample() {
    // Prepare input image
    int8_t image_data[IMAGE_SIZE];
    CaptureImage(image_data);

    // Execute inference
    if (RunInference(image_data, IMAGE_SIZE)) {
        // Get detection results
        DetectionResult results[10];
        int num_detections = GetDetectionResults(results, 10);

        os_printf("Detected %d objects\n", num_detections);

        for (int i = 0; i < num_detections; i++) {
            os_printf("Object %d: Class=%d, Confidence=%.2f, "
                     "Position=(%.1f,%.1f), Size=(%.1f,%.1f)\n",
                     i, results[i].class_id, results[i].confidence,
                     results[i].x, results[i].y,
                     results[i].w, results[i].h);
        }
    }
}

5.4.3 Regression Task Result Processing

For regression tasks (e.g., keypoint detection, pose estimation):

// Get regression results
bool GetRegressionOutput(float* output_values, int output_size) {
    TfLiteTensor* output = interpreter->output(0);

    if (output->bytes != output_size) {
        os_printf("Output size mismatch!\n");
        return false;
    }

    float scale = output->params.scale;
    int32_t zero_point = output->params.zero_point;

    // Dequantize all output values
    for (int i = 0; i < output_size; i++) {
        output_values[i] = (output->data.int8[i] - zero_point) * scale;
    }

    return true;
}

5.5 Complete Application Example

5.5.1 Image Classification Application

// Image classification task entry
void ImageClassificationTask(void *arg) {
    // 1. Initialize model
    if (!InitModel()) {
        os_printf("Model initialization failed!\n");
        vTaskDelete(NULL);
        return;
    }

    // 2. Inference loop
    while (1) {
        // Get camera image
        int8_t image_buffer[192*192*3];
        if (CaptureImage(image_buffer) == 0) {

            // Execute inference
            uint32_t start_time = rtos_get_time();

            if (RunInference(image_buffer, sizeof(image_buffer))) {
                // Get classification result
                float confidence = 0.0f;
                int class_id = GetClassificationResult(&confidence);

                uint32_t end_time = rtos_get_time();

                // Output result
                os_printf("Classification result: %d, Confidence: %.2f%%, Time: %ums\n",
                         class_id, confidence * 100.0f,
                         end_time - start_time);
            }
        }

        // Delay
        vTaskDelay(pdMS_TO_TICKS(1000));
    }
}

5.5.2 CPU1 Main Function Integration

// app_main_cpu1.cc
extern "C" {
#include <os/os.h>
#include "modules/pm.h"
#include "FreeRTOS.h"
#include "task.h"
}

// Declare inference task
extern void ImageClassificationTask(void *arg);

extern "C" void app_main_cpu1(void *arg) {
    os_printf("CPU1 started, AI inference task initializing...\n");

    // Set CPU frequency to 480MHz for optimal performance
    bk_pm_module_vote_cpu_freq(PM_DEV_ID_LIN, PM_CPU_FRQ_480M);

    // Create AI inference task
    // Stack size: 16KB, Priority: 3
    xTaskCreate(ImageClassificationTask,
               "ai_inference",
               16*1024,
               NULL,
               3,
               NULL);

    // CPU1 main loop
    while (1) {
        vTaskDelay(pdMS_TO_TICKS(1000));
    }
}

6. API Reference

6.1 Core API

6.1.1 Model Loading

// Load model
const tflite::Model* model = tflite::GetModel(model_data);

// Verify model version
if (model->version() != TFLITE_SCHEMA_VERSION) {
    // Handle version mismatch
}

6.1.2 Operator Resolver

// Create mutable operator resolver (specify operator count)
tflite::MicroMutableOpResolver<N> op_resolver;

// Add operators
op_resolver.AddConv2D();                          // Convolution
op_resolver.AddDepthwiseConv2D();                 // Depthwise separable convolution
op_resolver.AddFullyConnected();                  // Fully connected
op_resolver.AddSoftmax();                         // Softmax
op_resolver.AddReshape();                         // Reshape
op_resolver.AddPad();                             // Padding
op_resolver.AddMaxPool2D();                       // Max pooling
op_resolver.AddAdd();                             // Addition
op_resolver.AddMul();                             // Multiplication

// Use optimized INT8 operator
op_resolver.AddConv2D(tflite::Register_CONV_2D_INT8());

6.1.3 Interpreter

// Create interpreter
tflite::MicroInterpreter interpreter(
    model,              // Model pointer
    op_resolver,        // Operator resolver
    tensor_arena,       // Tensor arena buffer
    tensor_arena_size   // Arena size
);

// Allocate tensor memory
TfLiteStatus status = interpreter.AllocateTensors();
if (status != kTfLiteOk) {
    // Handle allocation failure
}

// Get arena usage
size_t used_bytes = interpreter.arena_used_bytes();

6.1.4 Input/Output Processing

// Get input tensor
TfLiteTensor* input = interpreter.input(0);

// Fill input data
for (int i = 0; i < input->bytes; ++i) {
    input->data.int8[i] = input_data[i];
}

// Execute inference
TfLiteStatus invoke_status = interpreter.Invoke();
if (invoke_status != kTfLiteOk) {
    // Handle inference failure
}

// Get output tensor
TfLiteTensor* output = interpreter.output(0);

// Process quantized output (dequantization)
float scale = output->params.scale;
int32_t zero_point = output->params.zero_point;
for (int i = 0; i < output->bytes; ++i) {
    float value = (output->data.int8[i] - zero_point) * scale;
    // Use dequantized value
}

7. References

7.1 Official Documentation

TensorFlow Lite Micro Official Documentation: https://www.tensorflow.org/lite/microcontrollers
TensorFlow Lite Model Optimization: https://www.tensorflow.org/lite/performance/model_optimization
CMSIS-NN Documentation: https://arm-software.github.io/CMSIS_5/NN/html/index.html

7.2 Example Code

Micro Speech Example: projects/tflite_micro/micro_speech/
Gesture Detection Example: projects/tflite_micro/gesture_detection/
Rock Paper Scissors Complete Application: See Rock Paper Scissors