diff --git a/bundles/org.openhab.voice.whisperstt/README.md b/bundles/org.openhab.voice.whisperstt/README.md
index b5a88a390bcdc..e75f0dc592b33 100644
--- a/bundles/org.openhab.voice.whisperstt/README.md
+++ b/bundles/org.openhab.voice.whisperstt/README.md
@@ -5,6 +5,8 @@ It also uses [libfvad](https://github.com/dpirch/libfvad) for voice activity det
 
 [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) is a high-optimized lightweight c++ implementation of [whisper](https://github.com/openai/whisper) that allows to easily integrate it in different platforms and applications.
 
+Alternatively, if you do not want to perform speech-to-text on the computer hosting openHAB, this add-on can consume an OpenAI/Whisper compatible transcription API.
+
 Whisper enables speech recognition for multiple languages and dialects:
 
 english, chinese, german, spanish, russian, korean, french, japanese, portuguese, turkish, polish, catalan, dutch, arabic, swedish,
@@ -15,9 +17,11 @@ marathi, punjabi, sinhala, khmer, shona, yoruba, somali, afrikaans, occitan, geo
 uzbek, faroese, haitian, pashto, turkmen, nynorsk, maltese, sanskrit, luxembourgish, myanmar, tibetan, tagalog, malagasy, assamese, tatar, lingala,
 hausa, bashkir, javanese and sundanese.
 
-## Supported platforms
+## Local mode (offline)
+
+### Supported platforms
 
-This add-on uses some native binaries to work.
+This add-on uses some native binaries to work when performing offline recognition.
 You can find here the used [whisper.cpp Java wrapper](https://github.com/GiviMAD/whisper-jni) and [libfvad Java wrapper](https://github.com/GiviMAD/libfvad-jni).
 
 The following platforms are supported:
@@ -28,7 +32,7 @@ The following platforms are supported:
 
 The native binaries for those platforms are included in this add-on provided with the openHAB distribution.
 
-## CPU compatibility
+### CPU compatibility
 
 To use this binding it's recommended to use a device at least as powerful as the RaspberryPI 5 with a modern CPU.
 The execution times on Raspberry PI 4 are x2, so just the tiny model can be run on under 5 seconds.
@@ -40,18 +44,18 @@ You can check those flags on Windows using a program like `CPU-Z`.
 If you are going to use the binding in a `arm64` host the CPU should support the flags: `fphp`.
 You can check those flags on linux using the terminal with `lscpu`.
 
-## Transcription time
+### Transcription time
 
 On a Raspberry PI 5, the approximate transcription times are:
 
 | model      | exec time |
-| ---------- | --------: |
+|------------|----------:|
 | tiny.bin   |      1.5s |
 | base.bin   |        3s |
 | small.bin  |      8.5s |
 | medium.bin |       17s |
 
-## Configuring the model
+### Configuring the model
 
 Before you can use this service you should configure your model.
 
@@ -64,7 +68,7 @@ You should place the downloaded .bin model in '\<openHAB userdata\>/whisper/' so
 
 Remember to check that you have enough RAM to load the model, estimated RAM consumption can be checked on the huggingface link.
 
-## Using alternative whisper.cpp library
+### Using alternative whisper.cpp library
 
 It's possible to use your own build of the whisper.cpp shared library with this add-on.
 
@@ -76,7 +80,7 @@ In the [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) README you can fi
 
 Note: You need to restart openHAB to reload the library.
 
-## Grammar
+### Grammar
 
 The whisper.cpp library allows to define a grammar to alter the transcription results without fine-tuning the model.
 
@@ -99,6 +103,14 @@ tv_channel ::= ("set ")? "tv channel to " [0-9]+
 
 You can provide the grammar and enable its usage using the binding configuration.
 
+## API mode
+
+You can also use this add-on with a remote API that is compatible with the 'transcription' API from OpenAI. Online services exposing such an API may require an API key (paid services, such as OpenAI).
+
+You can host you own compatible service elsewhere on your network, with third-party software such as faster-whisper-server.
+
+Please note that API mode also uses libvfad for voice activity detection, and that grammar parameters are not available.   
+
 ## Configuration
 
 Use your favorite configuration UI to edit the Whisper settings:
@@ -107,6 +119,7 @@ Use your favorite configuration UI to edit the Whisper settings:
 
 General options.
 
+- **Mode : LOCAL or API** - Choose either local computation or remote API use.
 - **Model Name** - Model name. The 'ggml-' prefix and '.bin' extension are optional here but required on the filename. (ex: tiny.en -> ggml-tiny.en.bin)
 - **Preload Model** - Keep whisper model loaded.
 - **Single Utterance Mode** - When enabled recognition stops listening after a single utterance.
@@ -139,6 +152,13 @@ Configure whisper options.
 - **Initial Prompt** - Initial prompt for whisper.
 - **OpenVINO Device** - Initialize OpenVINO encoder. (built-in binaries do not support OpenVINO, this has no effect)
 - **Use GPU** - Enables GPU usage. (built-in binaries do not support GPU usage, this has no effect)
+- **Language** - If specified, speed up recognition by avoiding auto-detection. Default to system locale.
+
+### API Configuration
+
+- **API key** - Optional use of an API key for online services requiring it.
+- **API url** - You may use your own service and define its URL here. Default set to OpenAI transcription API.
+- **API model name** - Your hosted service may have other models. Default to OpenAI only model 'whisper-1'.
 
 ### Grammar Configuration
 
@@ -199,7 +219,9 @@ In case you would like to set up the service via a text file, create a new file
 Its contents should look similar to:
 
 ```ini
+org.openhab.voice.whisperstt:mode=LOCAL
 org.openhab.voice.whisperstt:modelName=tiny
+org.openhab.voice.whisperstt:language=en
 org.openhab.voice.whisperstt:initSilenceSeconds=0.3
 org.openhab.voice.whisperstt:removeSilence=true
 org.openhab.voice.whisperstt:stepSeconds=0.3
@@ -229,6 +251,9 @@ org.openhab.voice.whisperstt:useGPU=false
 org.openhab.voice.whisperstt:useGrammar=false
 org.openhab.voice.whisperstt:grammarPenalty=80.0
 org.openhab.voice.whisperstt:grammarLines=
+org.openhab.voice.whisperstt:apiKey=mykeyaaaa
+org.openhab.voice.whisperstt:apiUrl=https://api.openai.com/v1/audio/transcriptions
+org.openhab.voice.whisperstt:apiModelName=whisper-1
 ```
 
 ### Default Speech-to-Text Configuration
diff --git a/bundles/org.openhab.voice.whisperstt/src/main/java/org/openhab/voice/whisperstt/internal/WhisperSTTConfiguration.java b/bundles/org.openhab.voice.whisperstt/src/main/java/org/openhab/voice/whisperstt/internal/WhisperSTTConfiguration.java
index 0eed735113b4b..57d75afa63e7f 100644
--- a/bundles/org.openhab.voice.whisperstt/src/main/java/org/openhab/voice/whisperstt/internal/WhisperSTTConfiguration.java
+++ b/bundles/org.openhab.voice.whisperstt/src/main/java/org/openhab/voice/whisperstt/internal/WhisperSTTConfiguration.java
@@ -146,4 +146,29 @@ public class WhisperSTTConfiguration {
      * Print whisper.cpp library logs as binding debug logs.
      */
     public boolean enableWhisperLog;
+    /**
+     * local to use embedded whisper or openaiapi to use an external API
+     */
+    public Mode mode = Mode.LOCAL;
+    /**
+     * If mode set to openaiapi, then use this URL
+     */
+    public String apiUrl = "https://api.openai.com/v1/audio/transcriptions";
+    /**
+     * if mode set to openaiapi, use this api key to access apiUrl
+     */
+    public String apiKey = "";
+    /**
+     * If specified, speed up recognition by avoiding auto-detection
+     */
+    public String language = "";
+    /**
+     * Model name (API only)
+     */
+    public String apiModelName = "whisper-1";
+
+    public static enum Mode {
+        LOCAL,
+        API;
+    }
 }
diff --git a/bundles/org.openhab.voice.whisperstt/src/main/java/org/openhab/voice/whisperstt/internal/WhisperSTTService.java b/bundles/org.openhab.voice.whisperstt/src/main/java/org/openhab/voice/whisperstt/internal/WhisperSTTService.java
index 00d55590d9f50..38d3ea06a03ce 100644
--- a/bundles/org.openhab.voice.whisperstt/src/main/java/org/openhab/voice/whisperstt/internal/WhisperSTTService.java
+++ b/bundles/org.openhab.voice.whisperstt/src/main/java/org/openhab/voice/whisperstt/internal/WhisperSTTService.java
@@ -12,12 +12,10 @@
  */
 package org.openhab.voice.whisperstt.internal;
 
-import static org.openhab.voice.whisperstt.internal.WhisperSTTConstants.SERVICE_CATEGORY;
-import static org.openhab.voice.whisperstt.internal.WhisperSTTConstants.SERVICE_ID;
-import static org.openhab.voice.whisperstt.internal.WhisperSTTConstants.SERVICE_NAME;
-import static org.openhab.voice.whisperstt.internal.WhisperSTTConstants.SERVICE_PID;
+import static org.openhab.voice.whisperstt.internal.WhisperSTTConstants.*;
 
 import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
 import java.io.FileOutputStream;
 import java.io.IOException;
 import java.nio.ByteBuffer;
@@ -32,7 +30,9 @@
 import java.util.Locale;
 import java.util.Map;
 import java.util.Set;
+import java.util.concurrent.ExecutionException;
 import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeoutException;
 import java.util.concurrent.atomic.AtomicBoolean;
 
 import javax.sound.sampled.AudioFileFormat;
@@ -41,6 +41,13 @@
 
 import org.eclipse.jdt.annotation.NonNullByDefault;
 import org.eclipse.jdt.annotation.Nullable;
+import org.eclipse.jetty.client.HttpClient;
+import org.eclipse.jetty.client.api.ContentResponse;
+import org.eclipse.jetty.client.api.Request;
+import org.eclipse.jetty.client.util.InputStreamContentProvider;
+import org.eclipse.jetty.client.util.MultiPartContentProvider;
+import org.eclipse.jetty.client.util.StringContentProvider;
+import org.eclipse.jetty.http.HttpMethod;
 import org.openhab.core.OpenHAB;
 import org.openhab.core.audio.AudioFormat;
 import org.openhab.core.audio.AudioStream;
@@ -48,6 +55,7 @@
 import org.openhab.core.common.ThreadPoolManager;
 import org.openhab.core.config.core.ConfigurableService;
 import org.openhab.core.config.core.Configuration;
+import org.openhab.core.io.net.http.HttpClientFactory;
 import org.openhab.core.io.rest.LocaleService;
 import org.openhab.core.voice.RecognitionStartEvent;
 import org.openhab.core.voice.RecognitionStopEvent;
@@ -57,6 +65,7 @@
 import org.openhab.core.voice.STTServiceHandle;
 import org.openhab.core.voice.SpeechRecognitionErrorEvent;
 import org.openhab.core.voice.SpeechRecognitionEvent;
+import org.openhab.voice.whisperstt.internal.WhisperSTTConfiguration.Mode;
 import org.openhab.voice.whisperstt.internal.utils.VAD;
 import org.osgi.framework.Constants;
 import org.osgi.service.component.annotations.Activate;
@@ -96,10 +105,13 @@ public class WhisperSTTService implements STTService {
     private @Nullable WhisperContext context;
     private @Nullable WhisperGrammar grammar;
     private @Nullable WhisperJNI whisper;
+    private boolean isWhisperLibAlreadyLoaded = false;
+    private final HttpClientFactory httpClientFactory;
 
     @Activate
-    public WhisperSTTService(@Reference LocaleService localeService) {
+    public WhisperSTTService(@Reference LocaleService localeService, @Reference HttpClientFactory httpClientFactory) {
         this.localeService = localeService;
+        this.httpClientFactory = httpClientFactory;
     }
 
     @Activate
@@ -108,7 +120,8 @@ protected void activate(Map<String, Object> config) {
             if (!Files.exists(WHISPER_FOLDER)) {
                 Files.createDirectory(WHISPER_FOLDER);
             }
-            WhisperJNI.loadLibrary(getLoadOptions());
+            this.config = new Configuration(config).as(WhisperSTTConfiguration.class);
+            loadWhisperLibraryIfNeeded();
             VoiceActivityDetector.loadLibrary();
             whisper = new WhisperJNI();
         } catch (IOException | RuntimeException e) {
@@ -117,6 +130,13 @@ protected void activate(Map<String, Object> config) {
         configChange(config);
     }
 
+    private void loadWhisperLibraryIfNeeded() throws IOException {
+        if (config.mode == Mode.LOCAL && !isWhisperLibAlreadyLoaded) {
+            WhisperJNI.loadLibrary(getLoadOptions());
+            isWhisperLibAlreadyLoaded = true;
+        }
+    }
+
     private WhisperJNI.LoadOptions getLoadOptions() {
         Path libFolder = Paths.get("/usr/local/lib");
         Path libFolderWin = Paths.get("/Windows/System32");
@@ -167,14 +187,27 @@ protected void deactivate(Map<String, Object> config) {
 
     private void configChange(Map<String, Object> config) {
         this.config = new Configuration(config).as(WhisperSTTConfiguration.class);
-        WhisperJNI.setLibraryLogger(this.config.enableWhisperLog ? this::onWhisperLog : null);
         WhisperGrammar grammar = this.grammar;
         if (grammar != null) {
             grammar.close();
             this.grammar = null;
         }
+
+        // API mode
+        if (this.config.mode == Mode.API) {
+            try {
+                unloadContext();
+            } catch (IOException e) {
+                logger.warn("IOException unloading model: {}", e.getMessage());
+            }
+            return;
+        }
+
+        // Local mode
         WhisperJNI whisper;
         try {
+            loadWhisperLibraryIfNeeded();
+            WhisperJNI.setLibraryLogger(this.config.enableWhisperLog ? this::onWhisperLog : null);
             whisper = getWhisper();
         } catch (IOException ignored) {
             logger.warn("library not loaded, the add-on will not work");
@@ -228,9 +261,17 @@ public String getLabel(@Nullable Locale locale) {
 
     @Override
     public Set<Locale> getSupportedLocales() {
-        // as it is not possible to determine the language of the model that was downloaded and setup by the user, it is
-        // assumed the language of the model is matching the locale of the openHAB server
-        return Set.of(localeService.getLocale(null));
+        // Attempt to create a locale from the configured language
+        String language = config.language;
+        Locale modelLocale = localeService.getLocale(null);
+        if (!language.isBlank()) {
+            try {
+                modelLocale = Locale.forLanguageTag(language);
+            } catch (IllegalArgumentException e) {
+                logger.warn("Invalid language '{}', defaulting to server locale", language);
+            }
+        }
+        return Set.of(modelLocale);
     }
 
     @Override
@@ -246,33 +287,18 @@ public Set<AudioFormat> getSupportedFormats() {
     public STTServiceHandle recognize(STTListener sttListener, AudioStream audioStream, Locale locale, Set<String> set)
             throws STTException {
         AtomicBoolean aborted = new AtomicBoolean(false);
-        WhisperContext ctx = null;
-        WhisperState state = null;
         try {
-            var whisper = getWhisper();
-            ctx = getContext();
-            logger.debug("Creating whisper state...");
-            state = whisper.initState(ctx);
-            logger.debug("Whisper state created");
             logger.debug("Creating VAD instance...");
-            final int nSamplesStep = (int) (config.stepSeconds * (float) WHISPER_SAMPLE_RATE);
+            final int nSamplesStep = (int) (config.stepSeconds * WHISPER_SAMPLE_RATE);
             VAD vad = new VAD(VoiceActivityDetector.Mode.valueOf(config.vadMode), WHISPER_SAMPLE_RATE, nSamplesStep,
                     config.vadStep, config.vadSensitivity);
             logger.debug("VAD instance created");
             sttListener.sttEventReceived(new RecognitionStartEvent());
-            backgroundRecognize(whisper, ctx, state, nSamplesStep, locale, sttListener, audioStream, vad, aborted);
+            backgroundRecognize(nSamplesStep, locale, sttListener, audioStream, vad, aborted);
         } catch (IOException e) {
-            if (ctx != null && !config.preloadModel) {
-                ctx.close();
-            }
-            if (state != null) {
-                state.close();
-            }
             throw new STTException("Exception during initialization", e);
         }
-        return () -> {
-            aborted.set(true);
-        };
+        return () -> aborted.set(true);
     }
 
     private WhisperJNI getWhisper() throws IOException {
@@ -339,9 +365,8 @@ private void unloadContext() throws IOException {
         }
     }
 
-    private void backgroundRecognize(WhisperJNI whisper, WhisperContext ctx, WhisperState state, final int nSamplesStep,
-            Locale locale, STTListener sttListener, AudioStream audioStream, VAD vad, AtomicBoolean aborted) {
-        var releaseContext = !config.preloadModel;
+    private void backgroundRecognize(final int nSamplesStep, Locale locale, STTListener sttListener,
+            AudioStream audioStream, VAD vad, AtomicBoolean aborted) {
         final int nSamplesMax = config.maxSeconds * WHISPER_SAMPLE_RATE;
         final int nSamplesMin = (int) (config.minSeconds * (float) WHISPER_SAMPLE_RATE);
         final int nInitSilenceSamples = (int) (config.initSilenceSeconds * (float) WHISPER_SAMPLE_RATE);
@@ -353,21 +378,17 @@ private void backgroundRecognize(WhisperJNI whisper, WhisperContext ctx, Whisper
         logger.debug("Max silence samples {}", nMaxSilenceSamples);
         // used to store the step samples in libfvad wanted format 16-bit int
         final short[] stepAudioSamples = new short[nSamplesStep];
-        // used to store the full samples in whisper wanted format 32-bit float
-        final float[] audioSamples = new float[nSamplesMax];
+        // used to store the full retained samples for whisper
+        final short[] audioSamples = new short[nSamplesMax];
         executor.submit(() -> {
             int audioSamplesOffset = 0;
             int silenceSamplesCounter = 0;
             int nProcessedSamples = 0;
-            int numBytesRead;
             boolean voiceDetected = false;
             String transcription = "";
-            String tempTranscription = "";
-            VAD.@Nullable VADResult lastVADResult;
             VAD.@Nullable VADResult firstConsecutiveSilenceVADResult = null;
             try {
-                try (state; //
-                        audioStream; //
+                try (audioStream; //
                         vad) {
                     if (AudioFormat.CONTAINER_WAVE.equals(audioStream.getFormat().getContainer())) {
                         AudioWaveUtils.removeFMT(audioStream);
@@ -376,10 +397,9 @@ private void backgroundRecognize(WhisperJNI whisper, WhisperContext ctx, Whisper
                             .order(ByteOrder.LITTLE_ENDIAN);
                     // init remaining to full capacity
                     int remaining = captureBuffer.capacity();
-                    WhisperFullParams params = getWhisperFullParams(ctx, locale);
                     while (!aborted.get()) {
                         // read until no remaining so we get the complete step samples
-                        numBytesRead = audioStream.read(captureBuffer.array(), captureBuffer.capacity() - remaining,
+                        int numBytesRead = audioStream.read(captureBuffer.array(), captureBuffer.capacity() - remaining,
                                 remaining);
                         if (aborted.get() || numBytesRead == -1) {
                             break;
@@ -395,17 +415,15 @@ private void backgroundRecognize(WhisperJNI whisper, WhisperContext ctx, Whisper
                         while (shortBuffer.hasRemaining()) {
                             var position = shortBuffer.position();
                             short i16BitSample = shortBuffer.get();
-                            float f32BitSample = Float.min(1f,
-                                    Float.max((float) i16BitSample / ((float) Short.MAX_VALUE), -1f));
                             stepAudioSamples[position] = i16BitSample;
-                            audioSamples[audioSamplesOffset++] = f32BitSample;
+                            audioSamples[audioSamplesOffset++] = i16BitSample;
                             nProcessedSamples++;
                         }
                         // run vad
                         if (nProcessedSamples + nSamplesStep > nSamplesMax - nSamplesStep) {
                             logger.debug("VAD: Skipping, max length reached");
                         } else {
-                            lastVADResult = vad.analyze(stepAudioSamples);
+                            VAD.@Nullable VADResult lastVADResult = vad.analyze(stepAudioSamples);
                             if (lastVADResult.isVoice()) {
                                 voiceDetected = true;
                                 logger.debug("VAD: voice detected");
@@ -484,43 +502,26 @@ private void backgroundRecognize(WhisperJNI whisper, WhisperContext ctx, Whisper
                                 }
                             }
                         }
-                        // run whisper
-                        logger.debug("running whisper with {} seconds of audio...",
-                                Math.round((((float) audioSamplesOffset) / (float) WHISPER_SAMPLE_RATE) * 100f) / 100f);
-                        long execStartTime = System.currentTimeMillis();
-                        var result = whisper.fullWithState(ctx, state, params, audioSamples, audioSamplesOffset);
-                        logger.debug("whisper ended in {}ms with result code {}",
-                                System.currentTimeMillis() - execStartTime, result);
-                        // process result
-                        if (result != 0) {
-                            emitSpeechRecognitionError(sttListener);
-                            break;
-                        }
-                        int nSegments = whisper.fullNSegmentsFromState(state);
-                        logger.debug("Available transcription segments {}", nSegments);
-                        if (nSegments == 1) {
-                            tempTranscription = whisper.fullGetSegmentTextFromState(state, 0);
+                        // run whisper, either locally or by remote API
+                        String tempTranscription = (switch (config.mode) {
+                            case LOCAL -> recognizeLocal(audioSamplesOffset, audioSamples, locale.getLanguage());
+                            case API -> recognizeAPI(audioSamplesOffset, audioSamples, locale.getLanguage());
+                        });
+
+                        if (tempTranscription != null && !tempTranscription.isBlank()) {
                             if (config.createWAVRecord) {
                                 createAudioFile(audioSamples, audioSamplesOffset, tempTranscription,
                                         locale.getLanguage());
                             }
+                            transcription += tempTranscription;
                             if (config.singleUtteranceMode) {
                                 logger.debug("single utterance mode, ending transcription");
-                                transcription = tempTranscription;
                                 break;
-                            } else {
-                                // start a new transcription segment
-                                transcription += tempTranscription;
-                                tempTranscription = "";
                             }
-                        } else if (nSegments == 0 && config.singleUtteranceMode) {
-                            logger.debug("Single utterance mode and no results, ending transcription");
-                            break;
-                        } else if (nSegments > 1) {
-                            // non reachable
-                            logger.warn("Whisper should be configured in single segment mode {}", nSegments);
+                        } else {
                             break;
                         }
+
                         // reset state to start with next segment
                         voiceDetected = false;
                         silenceSamplesCounter = 0;
@@ -528,10 +529,6 @@ private void backgroundRecognize(WhisperJNI whisper, WhisperContext ctx, Whisper
                         logger.debug("Partial transcription: {}", tempTranscription);
                         logger.debug("Transcription: {}", transcription);
                     }
-                } finally {
-                    if (releaseContext) {
-                        ctx.close();
-                    }
                 }
                 // emit result
                 if (!aborted.get()) {
@@ -543,7 +540,7 @@ private void backgroundRecognize(WhisperJNI whisper, WhisperContext ctx, Whisper
                         emitSpeechRecognitionNoResultsError(sttListener);
                     }
                 }
-            } catch (IOException e) {
+            } catch (STTException | IOException e) {
                 logger.warn("Error running speech to text: {}", e.getMessage());
                 emitSpeechRecognitionError(sttListener);
             } catch (UnsatisfiedLinkError e) {
@@ -553,7 +550,119 @@ private void backgroundRecognize(WhisperJNI whisper, WhisperContext ctx, Whisper
         });
     }
 
-    private WhisperFullParams getWhisperFullParams(WhisperContext context, Locale locale) throws IOException {
+    @Nullable
+    private String recognizeLocal(int audioSamplesOffset, short[] audioSamples, String language) throws STTException {
+        logger.debug("running whisper with {} seconds of audio...",
+                Math.round((((float) audioSamplesOffset) / (float) WHISPER_SAMPLE_RATE) * 100f) / 100f);
+        var releaseContext = !config.preloadModel;
+
+        WhisperJNI whisper = null;
+        WhisperContext ctx = null;
+        WhisperState state = null;
+        try {
+            whisper = getWhisper();
+            ctx = getContext();
+            logger.debug("Creating whisper state...");
+            state = whisper.initState(ctx);
+            logger.debug("Whisper state created");
+            WhisperFullParams params = getWhisperFullParams(ctx, language);
+
+            // convert to local whisper format (float)
+            float[] floatArray = new float[audioSamples.length];
+            for (int i = 0; i < audioSamples.length; i++) {
+                floatArray[i] = Float.min(1f, Float.max((float) audioSamples[i] / ((float) Short.MAX_VALUE), -1f));
+            }
+
+            long execStartTime = System.currentTimeMillis();
+            var result = whisper.fullWithState(ctx, state, params, floatArray, audioSamplesOffset);
+            logger.debug("whisper ended in {}ms with result code {}", System.currentTimeMillis() - execStartTime,
+                    result);
+            // process result
+            if (result != 0) {
+                throw new STTException("Cannot use whisper locally, result code: " + result);
+            }
+            int nSegments = whisper.fullNSegmentsFromState(state);
+            logger.debug("Available transcription segments {}", nSegments);
+            if (nSegments == 1) {
+                return whisper.fullGetSegmentTextFromState(state, 0);
+            } else if (nSegments == 0 && config.singleUtteranceMode) {
+                logger.debug("Single utterance mode and no results, ending transcription");
+                return null;
+            } else {
+                // non reachable
+                logger.warn("Whisper should be configured in single segment mode {}", nSegments);
+                return null;
+            }
+        } catch (IOException e) {
+            if (state != null) {
+                state.close();
+            }
+            throw new STTException("Cannot use whisper locally", e);
+        } finally {
+            if (releaseContext && ctx != null) {
+                ctx.close();
+            }
+        }
+    }
+
+    private String recognizeAPI(int audioSamplesOffset, short[] audioStream, String language) throws STTException {
+        // convert to byte array, Each short has 2 bytes
+        int size = audioSamplesOffset * 2;
+        ByteBuffer byteArrayBuffer = ByteBuffer.allocate(size).order(ByteOrder.LITTLE_ENDIAN);
+        for (int i = 0; i < audioSamplesOffset; i++) {
+            byteArrayBuffer.putShort(audioStream[i]);
+        }
+        javax.sound.sampled.AudioFormat jAudioFormat = new javax.sound.sampled.AudioFormat(
+                javax.sound.sampled.AudioFormat.Encoding.PCM_SIGNED, WHISPER_SAMPLE_RATE, 16, 1, 2, WHISPER_SAMPLE_RATE,
+                false);
+        byte[] byteArray = byteArrayBuffer.array();
+
+        try {
+            AudioInputStream audioInputStream = new AudioInputStream(new ByteArrayInputStream(byteArray), jAudioFormat,
+                    audioSamplesOffset);
+
+            // write stream as a WAV file, in a byte array stream :
+            ByteArrayInputStream byteArrayInputStream = null;
+            try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
+                AudioSystem.write(audioInputStream, AudioFileFormat.Type.WAVE, baos);
+                byteArrayInputStream = new ByteArrayInputStream(baos.toByteArray());
+            }
+
+            // prepare HTTP request
+            HttpClient commonHttpClient = httpClientFactory.getCommonHttpClient();
+            MultiPartContentProvider multiPartContentProvider = new MultiPartContentProvider();
+            multiPartContentProvider.addFilePart("file", "audio.wav",
+                    new InputStreamContentProvider(byteArrayInputStream), null);
+            multiPartContentProvider.addFieldPart("model", new StringContentProvider(this.config.apiModelName), null);
+            multiPartContentProvider.addFieldPart("response_format", new StringContentProvider("text"), null);
+            multiPartContentProvider.addFieldPart("temperature",
+                    new StringContentProvider(Float.toString(this.config.temperature)), null);
+            if (!language.isBlank()) {
+                multiPartContentProvider.addFieldPart("language", new StringContentProvider(language), null);
+            }
+            Request request = commonHttpClient.newRequest(config.apiUrl).method(HttpMethod.POST)
+                    .content(multiPartContentProvider);
+            if (!config.apiKey.isBlank()) {
+                request = request.header("Authorization", "Bearer " + config.apiKey);
+            }
+            // execute the request
+            ContentResponse response = request.send();
+
+            // check the HTTP status code from the response
+            int statusCode = response.getStatus();
+            if (statusCode < 200 || statusCode >= 300) {
+                logger.debug("HTTP error: Received status code {}, full error is {}", statusCode,
+                        response.getContentAsString());
+                throw new STTException("Failed to retrieve transcription: HTTP status code " + statusCode);
+            }
+            return response.getContentAsString();
+
+        } catch (InterruptedException | TimeoutException | ExecutionException | IOException e) {
+            throw new STTException("Exception during attempt to get speech recognition result from api", e);
+        }
+    }
+
+    private WhisperFullParams getWhisperFullParams(WhisperContext context, String language) throws IOException {
         WhisperSamplingStrategy strategy = WhisperSamplingStrategy.valueOf(config.samplingStrategy);
         var params = new WhisperFullParams(strategy);
         params.temperature = config.temperature;
@@ -570,7 +679,7 @@ private WhisperFullParams getWhisperFullParams(WhisperContext context, Locale lo
             params.grammarPenalty = config.grammarPenalty;
         }
         // there is no single language models other than the english ones
-        params.language = getWhisper().isMultilingual(context) ? locale.getLanguage() : "en";
+        params.language = getWhisper().isMultilingual(context) ? language : "en";
         // implementation assumes this options
         params.translate = false;
         params.detectLanguage = false;
@@ -605,7 +714,7 @@ private void createSamplesDir() {
         }
     }
 
-    private void createAudioFile(float[] samples, int size, String transcription, String language) {
+    private void createAudioFile(short[] samples, int size, String transcription, String language) {
         createSamplesDir();
         javax.sound.sampled.AudioFormat jAudioFormat;
         ByteBuffer byteBuffer;
@@ -615,7 +724,7 @@ private void createAudioFile(float[] samples, int size, String transcription, St
                     WHISPER_SAMPLE_RATE, 16, 1, 2, WHISPER_SAMPLE_RATE, false);
             byteBuffer = ByteBuffer.allocate(size * 2).order(ByteOrder.LITTLE_ENDIAN);
             for (int i = 0; i < size; i++) {
-                byteBuffer.putShort((short) (samples[i] * (float) Short.MAX_VALUE));
+                byteBuffer.putShort(samples[i]);
             }
         } else {
             logger.debug("Saving audio file with sample format f32");
@@ -623,7 +732,7 @@ private void createAudioFile(float[] samples, int size, String transcription, St
                     WHISPER_SAMPLE_RATE, 32, 1, 4, WHISPER_SAMPLE_RATE, false);
             byteBuffer = ByteBuffer.allocate(size * 4).order(ByteOrder.LITTLE_ENDIAN);
             for (int i = 0; i < size; i++) {
-                byteBuffer.putFloat(samples[i]);
+                byteBuffer.putFloat(Float.min(1f, Float.max((float) samples[i] / ((float) Short.MAX_VALUE), -1f)));
             }
         }
         AudioInputStream audioInputStreamTemp = new AudioInputStream(new ByteArrayInputStream(byteBuffer.array()),
diff --git a/bundles/org.openhab.voice.whisperstt/src/main/resources/OH-INF/config/config.xml b/bundles/org.openhab.voice.whisperstt/src/main/resources/OH-INF/config/config.xml
index c1f08cfb15c22..e4deb556fd032 100644
--- a/bundles/org.openhab.voice.whisperstt/src/main/resources/OH-INF/config/config.xml
+++ b/bundles/org.openhab.voice.whisperstt/src/main/resources/OH-INF/config/config.xml
@@ -11,7 +11,7 @@
 		</parameter-group>
 		<parameter-group name="vad">
 			<label>Voice Activity Detection</label>
-			<description>Configure the VAD mechanisim used to isolate single phrases to feed whisper with.</description>
+			<description>Configure the VAD mechanism used to isolate single phrases to feed whisper with.</description>
 		</parameter-group>
 		<parameter-group name="whisper">
 			<label>Whisper Options</label>
@@ -19,7 +19,7 @@
 		</parameter-group>
 		<parameter-group name="grammar">
 			<label>Grammar</label>
-			<description>Define a grammar to improve transcrptions.</description>
+			<description>Define a grammar to improve transcriptions.</description>
 		</parameter-group>
 		<parameter-group name="messages">
 			<label>Info Messages</label>
@@ -30,9 +30,27 @@
 			<description>Options added for developers.</description>
 			<advanced>true</advanced>
 		</parameter-group>
+		<parameter-group name="openaiapi">
+			<label>API Configuration Options</label>
+			<description>Configure OpenAI compatible API, if you don't want to use the local model.</description>
+		</parameter-group>
+		<parameter name="mode" type="text" groupName="stt">
+			<label>Local Mode Or API</label>
+			<description>Use the local model or the OpenAI compatible API.</description>
+			<default>LOCAL</default>
+			<options>
+				<option value="LOCAL">Local</option>
+				<option value="API">OpenAI API</option>
+			</options>
+		</parameter>
 		<parameter name="modelName" type="text" groupName="stt" required="true">
-			<label>Model Name</label>
-			<description>Model name without extension.</description>
+			<label>Local Model Name</label>
+			<description>Model name without extension. Local mode only.</description>
+		</parameter>
+		<parameter name="language" type="text" groupName="whisper">
+			<label>Language</label>
+			<description>If specified, speed up recognition by avoiding auto-detection. Default to system locale.</description>
+			<default></default>
 		</parameter>
 		<parameter name="preloadModel" type="boolean" groupName="stt">
 			<label>Preload Model</label>
@@ -225,5 +243,20 @@
 			<default>false</default>
 			<advanced>true</advanced>
 		</parameter>
+		<parameter name="apiKey" type="text" groupName="openaiapi">
+			<label>API Key</label>
+			<description>Key to access the API</description>
+			<default></default>
+		</parameter>
+		<parameter name="apiUrl" type="text" groupName="openaiapi">
+			<label>API Url</label>
+			<description>OpenAI compatible API URL. Default to OpenAI transcription service.</description>
+			<default>https://api.openai.com/v1/audio/transcriptions</default>
+		</parameter>
+		<parameter name="apiModelName" type="text" groupName="openaiapi">
+			<label>API Model</label>
+			<description>Model name to use (API only). Default to OpenAI only available model (whisper-1).</description>
+			<default>whisper-1</default>
+		</parameter>
 	</config-description>
 </config-description:config-descriptions>
diff --git a/bundles/org.openhab.voice.whisperstt/src/main/resources/OH-INF/i18n/whisperstt.properties b/bundles/org.openhab.voice.whisperstt/src/main/resources/OH-INF/i18n/whisperstt.properties
index 0780316715b5c..9051bda8e4b99 100644
--- a/bundles/org.openhab.voice.whisperstt/src/main/resources/OH-INF/i18n/whisperstt.properties
+++ b/bundles/org.openhab.voice.whisperstt/src/main/resources/OH-INF/i18n/whisperstt.properties
@@ -3,6 +3,12 @@
 addon.whisperstt.name = Whisper Speech-to-Text
 addon.whisperstt.description = Whisper STT Service uses the whisper.cpp library to transcript audio data to text.
 
+voice.config.whisperstt.apiKey.label = API Key
+voice.config.whisperstt.apiKey.description = Key to access the API
+voice.config.whisperstt.apiModelName.label = API Model
+voice.config.whisperstt.apiModelName.description = Model name to use (API only). Default to OpenAI only available model (whisper-1).
+voice.config.whisperstt.apiUrl.label = API Url
+voice.config.whisperstt.apiUrl.description = OpenAI compatible API URL. Default to OpenAI transcription service.
 voice.config.whisperstt.audioContext.label = Audio Context
 voice.config.whisperstt.audioContext.description = Overwrite the audio context size. (0 to use whisper default context size)
 voice.config.whisperstt.beamSize.label = Beam Size
@@ -24,27 +30,35 @@ voice.config.whisperstt.greedyBestOf.description = Best Of configuration for sam
 voice.config.whisperstt.group.developer.label = Developer
 voice.config.whisperstt.group.developer.description = Options added for developers.
 voice.config.whisperstt.group.grammar.label = Grammar
-voice.config.whisperstt.group.grammar.description = Define a grammar to improve transcrptions.
+voice.config.whisperstt.group.grammar.description = Define a grammar to improve transcriptions.
 voice.config.whisperstt.group.messages.label = Info Messages
 voice.config.whisperstt.group.messages.description = Configure service information messages.
+voice.config.whisperstt.group.openaiapi.label = API Configuration Options
+voice.config.whisperstt.group.openaiapi.description = Configure OpenAI compatible API, if you don't want to use the local model.
 voice.config.whisperstt.group.stt.label = STT Configuration
 voice.config.whisperstt.group.stt.description = Configure Speech to Text.
 voice.config.whisperstt.group.vad.label = Voice Activity Detection
-voice.config.whisperstt.group.vad.description = Configure the VAD mechanisim used to isolate single phrases to feed whisper with.
+voice.config.whisperstt.group.vad.description = Configure the VAD mechanism used to isolate single phrases to feed whisper with.
 voice.config.whisperstt.group.whisper.label = Whisper Options
 voice.config.whisperstt.group.whisper.description = Configure the whisper.cpp transcription options.
 voice.config.whisperstt.initSilenceSeconds.label = Initial Silence Seconds
 voice.config.whisperstt.initSilenceSeconds.description = Max initial seconds of silence to discard transcription.
 voice.config.whisperstt.initialPrompt.label = Initial Prompt
 voice.config.whisperstt.initialPrompt.description = Initial prompt to feed whisper with.
+voice.config.whisperstt.language.label = Language
+voice.config.whisperstt.language.description = If specified, speed up recognition by avoiding auto-detection. Default to system locale.
 voice.config.whisperstt.maxSeconds.label = Max Transcription Seconds
 voice.config.whisperstt.maxSeconds.description = Seconds to force transcription before silence detection.
 voice.config.whisperstt.maxSilenceSeconds.label = Max Silence Seconds
 voice.config.whisperstt.maxSilenceSeconds.description = Seconds of silence to trigger transcription.
 voice.config.whisperstt.minSeconds.label = Min Transcription Seconds
 voice.config.whisperstt.minSeconds.description = Min transcription seconds passed to whisper.
-voice.config.whisperstt.modelName.label = Model Name
-voice.config.whisperstt.modelName.description = Model name without extension.
+voice.config.whisperstt.mode.label = Local Mode Or API
+voice.config.whisperstt.mode.description = Use the local model or the OpenAI compatible API.
+voice.config.whisperstt.mode.option.LOCAL = Local
+voice.config.whisperstt.mode.option.API = OpenAI API
+voice.config.whisperstt.modelName.label = Local Model Name
+voice.config.whisperstt.modelName.description = Model name without extension. Local mode only.
 voice.config.whisperstt.openvinoDevice.label = OpenVINO Device
 voice.config.whisperstt.openvinoDevice.description = Initialize OpenVINO encoder. (built-in binaries do not support OpenVINO, this has no effect)
 voice.config.whisperstt.preloadModel.label = Preload Model