Skip to content

Latest commit

 

History

History
224 lines (165 loc) · 10.7 KB

README.md

File metadata and controls

224 lines (165 loc) · 10.7 KB

Demo FastAPI WebSocket Audio

Web audio --WebSocket--> FastAPI Server.

run

Use https

Use https to use getUserMedia cross host.

uvicorn src.main:app  --host=0.0.0.0 --reload --ssl-keyfile=./key.pem --ssl-certfile=./cert.pem

Use http

deprecated.

uvicorn src.main:app --reload

Web Audio Concepts and usage

The API is based on the manipulation of a MediaStream object representing a flux of audio- or video-related data. See an example in Get the video.

A MediaStream consists of zero or more MediaStreamTrack objects, representing various audio or video tracks. Each MediaStreamTrack may have one or more channels. The channel represents the smallest unit of a media stream, such as an audio signal associated with a given speaker, like left or right in a stereo audio track.

MediaStream objects have a single input and a single output. A MediaStream object generated by getUserMedia() is called local, and has as its source input one of the user's cameras or microphones. A non-local MediaStream may be representing to a media element, like ](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/video) or [, a stream originating over the network, and obtained via the WebRTC RTCPeerConnection API, or a stream created using the Web Audio API MediaStreamAudioSourceNode.

The output of the MediaStream object is linked to a consumer. It can be a media elements, like ](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/audio) or [, the WebRTC RTCPeerConnection API or a Web Audio API MediaStreamAudioSourceNode.

https://developer.mozilla.org/en-US/docs/Web/API/Media_Streams_API

Access the raw data from the microphone

  • navigator.mediaDevices.getUserMedia:for read microphone stream.
  • context.createScriptProcessor: for process audio buffer, though it is deprecated.
const handleSuccess = function (stream) {
    const context = new AudioContext();
    const source = context.createMediaStreamSource(stream);
    const processor = context.createScriptProcessor(1024, 1, 1);

    source.connect(processor);
    processor.connect(context.destination);

    processor.onaudioprocess = function (e) {
        // Do something with the data, e.g. convert it to WAV
        console.log(e.inputBuffer);
    };
};

navigator.mediaDevices.getUserMedia({ audio: true, video: false })
    .then(handleSuccess);

https://developers.google.com/web/fundamentals/media/recording-audio#access_the_raw_data_from_the_microphone

Two audio resample method

Cause getUserMedia with Constraint Not work, so resample by the following methods:

Use OfflineAudioContext(native code)

// `sourceAudioBuffer` is an AudioBuffer instance of the source audio
// at the original sample rate.
const DESIRED_SAMPLE_RATE = 16000;
const offlineCtx = new OfflineAudioContext(sourceAudioBuffer.numberOfChannels, sourceAudioBuffer.duration * DESIRED_SAMPLE_RATE, DESIRED_SAMPLE_RATE);
const cloneBuffer = offlineCtx.createBuffer(sourceAudioBuffer.numberOfChannels, sourceAudioBuffer.length, sourceAudioBuffer.sampleRate);
// Copy the source data into the offline AudioBuffer
for (let channel = 0; channel < sourceAudioBuffer.numberOfChannels; channel++) {
    cloneBuffer.copyToChannel(sourceAudioBuffer.getChannelData(channel), channel);
}
// Play it from the beginning.
const source = offlineCtx.createBufferSource();
source.buffer = cloneBuffer;
source.connect(offlineCtx.destination);
offlineCtx.oncomplete = function (e) {
    // `resampledAudioBuffer` contains an AudioBuffer resampled at 16000Hz.
    // use resampled.getChannelData(x) to get an Float32Array for channel x.
    const resampledAudioBuffer = e.renderedBuffer;
    console.log(resampledAudioBuffer);
}
offlineCtx.startRendering();
source.start(0);

https://stackoverflow.com/a/55427982/974526

Use javascript code Resampler

navigator.mediaDevices.getUserMedia({audio: true})
    .then((stream) => {
        let context = new AudioContext(),
            bufSize = 4096,
            microphone = context.createMediaStreamSource(stream),
            processor = context.createScriptProcessor(bufSize, 1, 1),
            res = new Resampler(context.sampleRate, 16000, 1, bufSize),
            bufferArray = [];

        processor.onaudioprocess = (event) => {
            console.log('onaudioprocess');
            // const right = event.inputBuffer.getChannelData(1);
            const outBuf = res.resample(event.inputBuffer.getChannelData(0));
            bufferArray.push.apply(bufferArray, outBuf);
        }
    }
}

https://github.com/felix307253927/resampler

Why Constraints Not work

Although navigator.mediaDevices.getUserMedia is set by following MediaTrackConstraints: mediaStreamConstraints, the stream is still at SampleRate 48000. Because the Chrome browser I use only support sampleRate 48000.

const mediaStreamConstraints = {
   audio: {
     channelCount: 1,
     sampleRate: 16000,
     sampleSize: 16
   }
}
// set constraints at begining
navigator.mediaDevices.getUserMedia(mediaStreamConstraints)
 .catch( err => serverlog(`ERROR mediaDevices.getUserMedia: ${err}`) )
 .then( stream => {
     const track = mediaStream.getAudioTracks()[0];
     // can update audio track Constraints here
     // track.applyConstraints(mediaStreamConstraints['audio'])
     .then(() => {
       console.log(track.getCapabilities());
     });
     
    // audio recorded as Blob 
    // and the binary data are sent via socketio to a nodejs server
    // that store blob as a file (e.g. audio/inp/audiofile.webm)
  } )

So how to check the capabilities?

let stream = await navigator.mediaDevices.getUserMedia({audio: true});
let track = stream.getAudioTracks()[0];
console.log(track.getCapabilities());

output:

{autoGainControl: Array(2), channelCount: {…}, deviceId: "default", echoCancellation: Array(2), groupId: "1e76386ad54f9ad3548f6f6c14c08e7eff6753f9362d93d8620cc48f546604f5", …}
autoGainControl: (2) [true, false]
channelCount: {max: 2, min: 1}
deviceId: "default"
echoCancellation: (2) [true, false]
groupId: "1e76386ad54f9ad3548f6f6c14c08e7eff6753f9362d93d8620cc48f546604f5"
latency: {max: 0.01, min: 0.01}
noiseSuppression: (2) [true, false]
sampleRate: {max: 48000, min: 48000}
sampleSize: {max: 16, min: 16}
__proto__: Object

https://developer.mozilla.org/en-US/docs/Web/API/Media_Streams_API/Constraints

AudioWorklet

The legacy ScriptProcessorNode was asynchronous and required thread hops (between UI thread and user thread), which could produce an unstable audio output. The AudioWorklet object provides a new synchronous JavaScript execution context which allows developers to programmatically control audio without additional latency and higher stability in the output audio. You can see example code in action along with other examples at Google Chrome Labs.

https://blog.chromium.org/2018/03/chrome-66-beta-css-typed-object-model.html

Safari does not support AudioWorklet now.

https://developer.mozilla.org/en-US/docs/Web/API/AudioWorklet

Web Audio API (Draft)

The Web Audio API provides a powerful and versatile system for controlling audio on the Web, allowing developers to choose audio sources, add effects to audio, create audio visualizations, apply spatial effects (such as panning) and much more.

Browser/Web audio Brief history:

flash play audio -> <audio> element -> Web Audio API (do something outside main thread)

https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API

Record Audio from Browser methods

  • Use Web Audio API native
  • Use recorder.js, but is not being actively maintained. (Can not get streaming buffer, only after stop.)
  • Use RecordRTC.js, it is active and support almost browser. ((Can not get streaming buffer, only after stop.)

Audio Glitching

Audio glitches are caused by an interruption of the normal continuous audio stream, resulting in loud clicks and pops. It is considered to be a catastrophic failure of a multi-media system and MUST be avoided. It can be caused by problems with the threads responsible for delivering the audio stream to the hardware, such as scheduling latencies caused by threads not having the proper priority and time-constraints. It can also be caused by the audio DSP trying to do more work than is possible in real-time given the CPU’s speed.

The ScriptProcessorNode Interface - DEPRECATED

The ScriptProcessorNode is constructed with a bufferSize which MUST be one of the following values: 256, 512, 1024, 2048, 4096, 8192, 16384. This value controls how frequently the onaudioprocess event is dispatched and how many sample-frames need to be processed each call. onaudioprocess events are only dispatched if the ScriptProcessorNode has at least one input or one output connected. Lower numbers for bufferSize will result in a lower (better) latency. Higher numbers will be necessary to avoid audio breakup and glitches.

Use https to develop cross host

Use mkcert to make certificates.

mkcert: A simple zero-config tool to make locally trusted development certificates with any names you'd like.

mkcert -key-file key.pem -cert-file cert.pem localhost <host ip>

Audio Downsampling

There are several ways to downsample audio in web:

  • OfflineAudioContext (native code, built in downsampling feature), currently used.
  • Web Worker, and use self implementation downsampling method, such JavaScript or WebAssembly code.

TODO