IN THIS ARTICLE

AI Speech to text feature

Incorrect email!

The article was successfully sent to the email

Summary

Starting TRBOnet 6.5 you can convert voice transmissions to text using AI language models, either via OpenAI commercial service or a self-hosted Local deployment. Transcription works in real time or retrospectively in reports, and quality depends on the selected language model.

Do not install the local model on the same computer running TRBOnet. Local models impose heavy system load. Use a dedicated computer built for the model.

Speech to Text processing

1. The server receives or sends a voice transmission.
2. The server removes silence segments from the transmission.
3. The server sends the prepared audio file to the language model using the endpoint /v1/audio/transcriptions.
4. The language model receives and processes the file.
5. The language model returns the transcribed text to TRBOnet.

Model Types

Method  Model Endpoint
OpenAI API key Whisper-1 OpenAI servers
Local v1 compatible STT or LLM model
Self-hosted

OpenAI Model

Model: Whisper-1
Configuration: Enter the purchased API key in Administration System Settings AI Speech to Text. Requests are sent to OpenAI servers.
TRBOnet Demo license includes a pre-configured OpenAI Whisper test connection. This function is unavailable with Commercial or Trial licenses.

Local model

Use any model compatible with v1 API. Deploy using publicly available solutions (such as LocalAI or any other suitable solution).

Tested models: Whisper-1, Gemma-4, Qwen-3.5

Setup

1. Deploy Local model (Example: https://github.com/mudler/LocalAI)
2. Add the language model.
3. Go to Administration System Settings AI Speech to Text.
4. In Model, enter the model name (example: Gemma-4).
5. In API endpoint, enter the Local Model server address: https://XXX.XXX.XXX.XXX/v1/audio/transcriptions
6. Input API key configured in Local Model to the API key field, if API key is not set up in Local model, input any text in the field.

Recommendations

Use language model suitable for speaker language, some models provide better transcription for a specific language.
Load models into GPU video memory (VRAM) for faster performance or use a dedicated hardware solutions tailored for AI operation.
Test the local model under high load to determine acceptable performance.
Neocom Software does not provide hardware specifications. Requirements depend on the model and load level.


Helpful?
We're glad this article helped.

Thanks for letting us know. What went wrong?