Multi-Speaker-Diarization

Automated Multi Speaker diarization API for meetings, calls, interviews, press-conference etc.

DeepAffects Speaker diarization API tries to figure out "Who Speaks When". It essentially splits audio clip into segments corresponding to a unique speaker.

POST Request

POST https://proxy.api.deepaffects.com/audio/generic/api/v2/async/diarize

Sample Code

Shell

curl -X POST "https://proxy.api.deepaffects.com/audio/generic/api/v2/async/diarize?apikey=<API_KEY>&webhook=<WEBHOOK_URL>&request_id=<REQUEST_ID>" -H 'content-type: application/json' -d @data.json

# contents of data.json
{"content": "bytesEncodedAudioString", "sampleRate": 8000, "encoding": "FLAC", "languageCode": "en-US", "speakers": 2, "audioType": "callcenter"}

Javascript

var DeepAffects = require("deep-affects");
var defaultClient = DeepAffects.ApiClient.instance;

// Configure API key authorization: UserSecurity
var UserSecurity = defaultClient.authentications["UserSecurity"];
UserSecurity.apiKey = "<API_KEY>";

var apiInstance = new DeepAffects.DiarizeApiV2();

var body = DeepAffects.DiarizeAudio.fromFile("/path/to/file"); // DiarizeAudio | Audio object that needs to be diarized.

var callback = function(error, data, response) {
  if (error) {
    console.error(error);
  } else {
    console.log("API called successfully. Returned data: " + data);
  }
};

webhook = "https://your/webhook/";
// async request
apiInstance.asyncDiarizeAudio(body, webhook, callback);

Python

import requests
import base64

url = "https://proxy.api.deepaffects.com/audio/generic/api/v2/async/diarize"

querystring = {"apikey":"<API_KEY>", "webhook":"<WEBHOOK_URL>", "request_id":"<OPTIONAL_REQUEST_ID>"}

payload = {
    "encoding": "Wave",
    "languageCode": "en-US",
    "speakers": -1,
    "doVad": true
}

# The api accepts data either as a url or as base64 encoded content
# passing payload as url:
payload["url"] = "https://publicly-facing-url.wav"
# alternatively, passing payload as content:
with open(audio_file_name, 'rb') as fin:
    audio_content = fin.read()
payload["content"] = base64.b64encode(audio_content).decode('utf-8')

headers = {
    'Content-Type': "application/json",
}

response = requests.post(url, data=payload, headers=headers, params=querystring)

print(response.text)

Output

# Sync:

{
  "num_speakers": 2,
  "segments":[
        {
            "speaker_id": "speaker1",
            "start": 0,
            "end": 1
        }
    ]
}

# Async:

{
"request_id": "8bdd983a-c6bd-4159-982d-6a2471406d62",
"api": "requested_api_name"
}

# Webhook:

{
"request_id": "8bdd983a-c6bd-4159-982d-6a2471406d62",
"response": {
  "num_speakers": 2,
  "segments":[
        {
            "speaker_id": "speaker1",
            "start": 0,
            "end": 1
        }
    ]
  }
}

Body Parameters

Parameter	Type	Description	Notes
encoding	String	Encoding of audio file like MP3, WAV etc.
sampleRate	Number	Sample rate of the audio file.
languageCode	String	Language spoken in the audio file.	[default to 'en-US']
content	String	base64 encoding of the audio file.
speakers	Number	Number of speakers in the file (-1 for unknown speakers)	[default to -1]
audioType	String	Type of the audio based on number of speakers	[default to callcenter]
speakerIds	List[String]	Optional set of speakers to be identified from the call	[default to []]
doVad	Bool	Apply voice activity detection	[default to False]

audioType: can have two values 1) callcenter 2) meeting. We recommend using callcenter when there are two speakers expected to be identified and meeting when multiple speakers are expected. doVad: Default=False. This parameters is required if you want silence & noise segments removed from the diarization output.

Query Parameters

Parameter	Type	Description	Notes
api_key	String	The apikey	Required for authentication inside all requests
webhook	String	The webhook url at which the responses will be sent	Required for async requests
request_id	Number	An optional unique id to link async response with the original request	Optional

Output Parameters (Async)

Parameter	Type	Description	Notes
request_id	String	The request id	This defaults to the originally sent id or is generated by the api
api	String	The api method which was called

Output Parameters (Webhook)

Parameter	Type	Description	Notes
request_id	String	The request id	This defaults to the originally sent id or is generated by the api
response	Object	The actual output of the diarization	The Diarized object is defined below

Diarized Object

Parameter	Type	Description	Notes
num_speakers	Number	The number of speakers detected	The number of speaker will be detected only when the request set speakers to -1
segments	List	List of diarized segments	The Diarized Segment is defined below

Diarized Segment

Parameter	Type	Description
speaker_id	Number	The speaker id for the corresponding audio segment
start	Number	Start time of the audio segment in seconds
end	Number	End time of the audio segment in seconds

About

DeepAffects is a speech analysis platform for Developers. We offer a number of speech analysis apis like, Speech Enhancement, Multi-Speaker Diarization, Emotion Recognition, Voice-prints, Conversation Metrics etc. For more information, checkout our developer portal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Multi-Speaker-Diarization

POST Request

Sample Code

Shell

Javascript

Python

Output

Body Parameters

Query Parameters

Output Parameters (Async)

Output Parameters (Webhook)

Diarized Object

Diarized Segment

About

Files

README.md

Latest commit

History

README.md

File metadata and controls

Multi-Speaker-Diarization

POST Request

Sample Code

Shell

Javascript

Python

Output

Body Parameters

Query Parameters

Output Parameters (Async)

Output Parameters (Webhook)

Diarized Object

Diarized Segment

About