[2024-02] Behavior Control Based on LLM Coding

Note: To protect commercial confidentiality, public information is limited.


Project Introduction

This project was initiated to develop a robotic control toolkit for future robotics applications. To improve the accuracy of human-language-guided robot behavior control, we explored multiple approaches, including programmatic behavior control and direct joint-output VLA (Vision-Language-Action). Inspired by Fei-Fei Li’s team and the robot’s native SDK capabilities, we adopted the former method.

To enhance control precision for complex behaviors (e.g., API combinations >3, long-horizon instructions >10) and reduce latency, we fine-tuned LLMs (≤7B parameters) to generate structured responses for encapsulated action APIs. This involved model selection and custom dataset curation.


During my spare time, I investigated small models’ mathematical reasoning capabilities, aiming to identify the core factors behind LLMs’ reasoning abilities. My guess candidates are model structure, number of parameters, data quality and content. But unfortunately, due to limited time and resources, I was unable to complete the exploration.

The document below shows my initial experiments on LLaMA-3.1-8B-Instruct. The performance remained almost unchanged. I was eager to build a model and data with COT token, but I only scratched the surface.