Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[whisper] Add OpenAI API compatibility #17921

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 33 additions & 8 deletions bundles/org.openhab.voice.whisperstt/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ It also uses [libfvad](https://github.com/dpirch/libfvad) for voice activity det

[Whisper.cpp](https://github.com/ggerganov/whisper.cpp) is a high-optimized lightweight c++ implementation of [whisper](https://github.com/openai/whisper) that allows to easily integrate it in different platforms and applications.

Alternatively, if you do not want to perform speech-to-text on the computer hosting openHAB, this add-on can consume an OpenAI/Whisper compatible transcription API.

Whisper enables speech recognition for multiple languages and dialects:

english, chinese, german, spanish, russian, korean, french, japanese, portuguese, turkish, polish, catalan, dutch, arabic, swedish,
Expand All @@ -15,9 +17,11 @@ marathi, punjabi, sinhala, khmer, shona, yoruba, somali, afrikaans, occitan, geo
uzbek, faroese, haitian, pashto, turkmen, nynorsk, maltese, sanskrit, luxembourgish, myanmar, tibetan, tagalog, malagasy, assamese, tatar, lingala,
hausa, bashkir, javanese and sundanese.

## Supported platforms
## Local mode (offline)

### Supported platforms

This add-on uses some native binaries to work.
This add-on uses some native binaries to work when performing offline recognition.
You can find here the used [whisper.cpp Java wrapper](https://github.com/GiviMAD/whisper-jni) and [libfvad Java wrapper](https://github.com/GiviMAD/libfvad-jni).

The following platforms are supported:
Expand All @@ -28,7 +32,7 @@ The following platforms are supported:

The native binaries for those platforms are included in this add-on provided with the openHAB distribution.

## CPU compatibility
### CPU compatibility

To use this binding it's recommended to use a device at least as powerful as the RaspberryPI 5 with a modern CPU.
The execution times on Raspberry PI 4 are x2, so just the tiny model can be run on under 5 seconds.
Expand All @@ -40,18 +44,18 @@ You can check those flags on Windows using a program like `CPU-Z`.
If you are going to use the binding in a `arm64` host the CPU should support the flags: `fphp`.
You can check those flags on linux using the terminal with `lscpu`.

## Transcription time
### Transcription time

On a Raspberry PI 5, the approximate transcription times are:

| model | exec time |
| ---------- | --------: |
|------------|----------:|
| tiny.bin | 1.5s |
| base.bin | 3s |
| small.bin | 8.5s |
| medium.bin | 17s |

## Configuring the model
### Configuring the model

Before you can use this service you should configure your model.

Expand All @@ -64,7 +68,7 @@ You should place the downloaded .bin model in '\<openHAB userdata\>/whisper/' so

Remember to check that you have enough RAM to load the model, estimated RAM consumption can be checked on the huggingface link.

## Using alternative whisper.cpp library
### Using alternative whisper.cpp library

It's possible to use your own build of the whisper.cpp shared library with this add-on.

Expand All @@ -76,7 +80,7 @@ In the [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) README you can fi

Note: You need to restart openHAB to reload the library.

## Grammar
### Grammar

The whisper.cpp library allows to define a grammar to alter the transcription results without fine-tuning the model.

Expand All @@ -99,6 +103,14 @@ tv_channel ::= ("set ")? "tv channel to " [0-9]+

You can provide the grammar and enable its usage using the binding configuration.

## API mode

You can also use this add-on with a remote API that is compatible with the 'transcription' API from OpenAI. Online services exposing such an API may require an API key (paid services, such as OpenAI).

You can host you own compatible service elsewhere on your network, with third-party software such as faster-whisper-server.

Please note that API mode also uses libvfad for voice activity detection, and that grammar parameters are not available.

## Configuration

Use your favorite configuration UI to edit the Whisper settings:
Expand All @@ -107,6 +119,7 @@ Use your favorite configuration UI to edit the Whisper settings:

General options.

- **Mode : LOCAL or API** - Choose either local computation or remote API use.
- **Model Name** - Model name. The 'ggml-' prefix and '.bin' extension are optional here but required on the filename. (ex: tiny.en -> ggml-tiny.en.bin)
- **Preload Model** - Keep whisper model loaded.
- **Single Utterance Mode** - When enabled recognition stops listening after a single utterance.
Expand Down Expand Up @@ -139,6 +152,13 @@ Configure whisper options.
- **Initial Prompt** - Initial prompt for whisper.
- **OpenVINO Device** - Initialize OpenVINO encoder. (built-in binaries do not support OpenVINO, this has no effect)
- **Use GPU** - Enables GPU usage. (built-in binaries do not support GPU usage, this has no effect)
- **Language** - If specified, speed up recognition by avoiding auto-detection. Default to system locale.

### API Configuration

- **API key** - Optional use of an API key for online services requiring it.
- **API url** - You may use your own service and define its URL here. Default set to OpenAI transcription API.
- **API model name** - Your hosted service may have other models. Default to OpenAI only model 'whisper-1'.

### Grammar Configuration

Expand Down Expand Up @@ -199,7 +219,9 @@ In case you would like to set up the service via a text file, create a new file
Its contents should look similar to:

```ini
org.openhab.voice.whisperstt:mode=LOCAL
org.openhab.voice.whisperstt:modelName=tiny
org.openhab.voice.whisperstt:language=en
org.openhab.voice.whisperstt:initSilenceSeconds=0.3
org.openhab.voice.whisperstt:removeSilence=true
org.openhab.voice.whisperstt:stepSeconds=0.3
Expand Down Expand Up @@ -229,6 +251,9 @@ org.openhab.voice.whisperstt:useGPU=false
org.openhab.voice.whisperstt:useGrammar=false
org.openhab.voice.whisperstt:grammarPenalty=80.0
org.openhab.voice.whisperstt:grammarLines=
org.openhab.voice.whisperstt:apiKey=mykeyaaaa
org.openhab.voice.whisperstt:apiUrl=https://api.openai.com/v1/audio/transcriptions
org.openhab.voice.whisperstt:apiModelName=whisper-1
```

### Default Speech-to-Text Configuration
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -146,4 +146,29 @@ public class WhisperSTTConfiguration {
* Print whisper.cpp library logs as binding debug logs.
*/
public boolean enableWhisperLog;
/**
* local to use embedded whisper or openaiapi to use an external API
*/
public Mode mode = Mode.LOCAL;
/**
* If mode set to openaiapi, then use this URL
*/
public String apiUrl = "https://api.openai.com/v1/audio/transcriptions";
/**
* if mode set to openaiapi, use this api key to access apiUrl
*/
public String apiKey = "";
/**
* If specified, speed up recognition by avoiding auto-detection
*/
public String language = "";
/**
* Model name (API only)
*/
public String apiModelName = "whisper-1";

public static enum Mode {
LOCAL,
API;
}
}
Loading