Skip to content

Commit

Permalink
Turn improvements and profiling (#1896)
Browse files Browse the repository at this point in the history
* Use timeout value instead of counter for allocation

* Add timeout for get credentials too and change all timeouts to a standard 5 seconds

* Added internal profiling and SM labels

* Add IPv4 check while adding peer

* Nit

* debug string cleanup

* Profile create perm and bind channel only first time and profile per peer

* Add README

* Fix compiler ubused variable warnings
  • Loading branch information
disa6302 authored Jan 16, 2024
1 parent 95e8124 commit 4eaa8d1
Show file tree
Hide file tree
Showing 10 changed files with 130 additions and 54 deletions.
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -429,6 +429,24 @@ The threadpool is enabled by default, and starts with 5 threads that it can incr

To disable threadpool, run `cmake .. -DENABLE_KVS_THREADPOOL=OFF`

### Setting ICE related timeouts

There are some default timeout values set for different steps in ICE in the [KvsRtcConfiguration](https://awslabs.github.io/amazon-kinesis-video-streams-webrtc-sdk-c/structKvsRtcConfiguration.html). These are configurable in the application. While the defaults are generous, there could be applications that might need more flexibility to improve chances of connection establishment because of poor network.

You can find the default setting in the logs:
```
2024-01-08 19:43:44.433 INFO iceAgentValidateKvsRtcConfig():
iceLocalCandidateGatheringTimeout: 10000 ms
iceConnectionCheckTimeout: 12000 ms
iceCandidateNominationTimeout: 12000 ms
iceConnectionCheckPollingInterval: 50 ms
```
Let us look into when each of these could be changed:
1. `iceCandidateNominationTimeout`: Say the connection with host/srflx could not be established and TURN seems to be the only resort. Let us assume it takes about 15 seconds to gather the first local relay candidate, the application could set the timeout to a value more than 15 seconds to ensure candidate pairs with the local relay candidate are tried for success. If the value is set to less than 15 seconds in this case, the SDK would lose out on trying a potential candidate pair leading to connection establishment failure
2. `iceLocalCandidateGatheringTimeout`: Say the host candidates would not work and srflx/relay candidates need to be tried. Due to poor network, it is anticipated the candidates are gathered slowly and the application does not want to spend more than 20 seconds on this step. The goal is to try all possible candidate pairs. Increasing the timeout helps in giving some more time to gather more potential candidates to try for connection. Also note, this parameter increase would not make a difference in the situation unless `iceCandidateNominationTimeout` > `iceLocalCandidateGatheringTimeout` since nomination step should also be given time to work with the new candidates
3. `iceConnectionCheckTimeout`: It is useful to increase this timeout in unstable/slow network where the packet exchange takes time and hence the binding request/response. Essentially, increasing it will allow atleast one candidate pair to be tried for nomination by the other peer.
4. `iceConnectionCheckPollingInterval`: This value is set to a default of 50 ms per [spec](https://datatracker.ietf.org/doc/html/rfc8445#section-14.2). Changing this would change the frequency of connectivity checks and essentially, the ICE state machine transitions. Decreasing the value could help in faster connection establishment in a reliable high performant network setting with good system resources. Increasing the value could help in reducing the network load, however, the connection establishment could slow down. Unless there is a strong reasoning, it is **NOT** recommended to deviate from spec/default.

## Documentation
All Public APIs are documented in our [Include.h](https://github.com/awslabs/amazon-kinesis-video-streams-webrtc-sdk-c/blob/master/src/include/com/amazonaws/kinesis/video/webrtcclient/Include.h), we also generate a [Doxygen](https://awslabs.github.io/amazon-kinesis-video-streams-webrtc-sdk-c/) each commit for easier navigation.

Expand Down
2 changes: 1 addition & 1 deletion samples/Samples.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ extern "C" {

#define CA_CERT_PEM_FILE_EXTENSION ".pem"

#define FILE_LOGGING_BUFFER_SIZE (10 * 1024)
#define FILE_LOGGING_BUFFER_SIZE (10000 * 1024)
#define MAX_NUMBER_OF_LOG_FILES 5

#define SAMPLE_HASH_TABLE_BUCKET_COUNT 50
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,8 @@ extern "C" {
#define STATUS_ICE_CANDIDATE_STRING_MISSING_TYPE STATUS_ICE_BASE + 0x00000029
#define STATUS_TURN_CONNECTION_ALLOCATION_FAILED STATUS_ICE_BASE + 0x0000002a
#define STATUS_TURN_INVALID_STATE STATUS_ICE_BASE + 0x0000002b
#define STATUS_TURN_CONNECTION_GET_CREDENTIALS_FAILED STATUS_ICE_BASE + 0x0000002c

/*!@} */

/////////////////////////////////////////////////////
Expand Down Expand Up @@ -1179,7 +1181,7 @@ typedef struct {
//!< USE_CANDIDATE attribute. If client is ice controlled, this is the timeout for receiving binding request
//!< that has USE_CANDIDATE attribute after connection check is done. Use default value if 0.

UINT32 iceConnectionCheckPollingInterval; //!< Ta in https://tools.ietf.org/html/rfc8445
UINT32 iceConnectionCheckPollingInterval; //!< Ta in https://datatracker.ietf.org/doc/html/rfc8445#section-14.2
//!< rate at which binding request packets are sent during connection check. Use default interval if 0.

INT32 generatedCertificateBits; //!< GeneratedCertificateBits controls the amount of bits the locally generated self-signed certificate uses
Expand Down
6 changes: 3 additions & 3 deletions src/source/Ice/IceAgent.c
Original file line number Diff line number Diff line change
Expand Up @@ -360,7 +360,7 @@ STATUS iceAgentValidateKvsRtcConfig(PKvsRtcConfiguration pKvsRtcConfiguration)
pKvsRtcConfiguration->iceConnectionCheckPollingInterval = KVS_ICE_CONNECTION_CHECK_POLLING_INTERVAL;
}

DLOGD("\n\ticeLocalCandidateGatheringTimeout: %u ms"
DLOGI("\n\ticeLocalCandidateGatheringTimeout: %u ms"
"\n\ticeConnectionCheckTimeout: %u ms"
"\n\ticeCandidateNominationTimeout: %u ms"
"\n\ticeConnectionCheckPollingInterval: %u ms",
Expand Down Expand Up @@ -517,7 +517,8 @@ STATUS iceAgentAddRemoteCandidate(PIceAgent pIceAgent, PCHAR pIceCandidateString
pLocalIceCandidate = (PIceCandidate) pCurNode->data;
pCurNode = pCurNode->pNext;

if (pLocalIceCandidate->iceCandidateType == ICE_CANDIDATE_TYPE_RELAYED) {
// TODO: Remove IPv4 check once IPv6 TURN relay candidates are chosen. Disabling this to reduce the number of TURN permissions we create
if (pLocalIceCandidate->iceCandidateType == ICE_CANDIDATE_TYPE_RELAYED && IS_IPV4_ADDR(&pLocalIceCandidate->ipAddress)) {
CHK_STATUS(turnConnectionAddPeer(pLocalIceCandidate->pTurnConnection, &pIceCandidate->ipAddress));
}
}
Expand Down Expand Up @@ -2400,7 +2401,6 @@ STATUS incomingRelayedDataHandler(UINT64 customData, PSocketConnection pSocketCo

CHK(pRelayedCandidate != NULL && pSocketConnection != NULL, STATUS_NULL_ARG);

DLOGV("Candidate id: %s", pRelayedCandidate->id);
CHK_STATUS(turnConnectionIncomingDataHandler(pRelayedCandidate->pTurnConnection, pBuffer, bufferLen, pSrc, pDest, turnChannelData,
&turnChannelDataCount));
for (i = 0; i < turnChannelDataCount; ++i) {
Expand Down
16 changes: 10 additions & 6 deletions src/source/Ice/IceAgent.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,16 @@ extern "C" {
#define KVS_ICE_MAX_REMOTE_CANDIDATE_COUNT 100
#define KVS_ICE_MAX_LOCAL_CANDIDATE_COUNT 100
#define KVS_ICE_GATHER_REFLEXIVE_AND_RELAYED_CANDIDATE_TIMEOUT (10 * HUNDREDS_OF_NANOS_IN_A_SECOND)
#define KVS_ICE_CONNECTIVITY_CHECK_TIMEOUT (10 * HUNDREDS_OF_NANOS_IN_A_SECOND)
#define KVS_ICE_CANDIDATE_NOMINATION_TIMEOUT (10 * HUNDREDS_OF_NANOS_IN_A_SECOND)
#define KVS_ICE_SEND_KEEP_ALIVE_INTERVAL (15 * HUNDREDS_OF_NANOS_IN_A_SECOND)
#define KVS_ICE_TURN_CONNECTION_SHUTDOWN_TIMEOUT (1 * HUNDREDS_OF_NANOS_IN_A_SECOND)
#define KVS_ICE_DEFAULT_TIMER_START_DELAY (3 * HUNDREDS_OF_NANOS_IN_A_MILLISECOND)
#define KVS_ICE_SHORT_CHECK_DELAY (50 * HUNDREDS_OF_NANOS_IN_A_MILLISECOND)
#define KVS_ICE_CONNECTIVITY_CHECK_TIMEOUT \
(12 * HUNDREDS_OF_NANOS_IN_A_SECOND) // This should be greater than KVS_ICE_GATHER_REFLEXIVE_AND_RELAYED_CANDIDATE_TIMEOUT to ensure there is
// buffer wait time for connectivity checks with the pairs formed with near timeout created pairs
#define KVS_ICE_CANDIDATE_NOMINATION_TIMEOUT \
(12 * HUNDREDS_OF_NANOS_IN_A_SECOND) // This should be greater than KVS_ICE_GATHER_REFLEXIVE_AND_RELAYED_CANDIDATE_TIMEOUT to ensure there is some
// buffer for nomination with near timeout generated candidates
#define KVS_ICE_SEND_KEEP_ALIVE_INTERVAL (15 * HUNDREDS_OF_NANOS_IN_A_SECOND)
#define KVS_ICE_TURN_CONNECTION_SHUTDOWN_TIMEOUT (1 * HUNDREDS_OF_NANOS_IN_A_SECOND)
#define KVS_ICE_DEFAULT_TIMER_START_DELAY (3 * HUNDREDS_OF_NANOS_IN_A_MILLISECOND)
#define KVS_ICE_SHORT_CHECK_DELAY (50 * HUNDREDS_OF_NANOS_IN_A_MILLISECOND)

// Ta in https://tools.ietf.org/html/rfc8445
#define KVS_ICE_CONNECTION_CHECK_POLLING_INTERVAL (50 * HUNDREDS_OF_NANOS_IN_A_MILLISECOND)
Expand Down
2 changes: 1 addition & 1 deletion src/source/Ice/IceAgentStateMachine.c
Original file line number Diff line number Diff line change
Expand Up @@ -605,7 +605,7 @@ STATUS executeReadyIceAgentState(UINT64 customData, UINT64 time)

if (pIceAgent->iceAgentStartTime != 0) {
PROFILE_WITH_START_TIME_OBJ(pIceAgent->iceAgentStartTime, pIceAgent->iceAgentProfileDiagnostics.iceAgentSetUpTime,
"Time taken to get ICE Agent ready for media exchange");
"ICE Agent ready for media exchange from check connection start");
pIceAgent->iceAgentStartTime = 0;
}

Expand Down
53 changes: 44 additions & 9 deletions src/source/Ice/TurnConnection.c
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ STATUS createTurnConnection(PIceServer pTurnServer, TIMER_QUEUE_HANDLE timerQueu
pTurnConnection = (PTurnConnection) MEMCALLOC(
1, SIZEOF(TurnConnection) + DEFAULT_TURN_MESSAGE_RECV_CHANNEL_DATA_BUFFER_LEN * 2 + DEFAULT_TURN_MESSAGE_SEND_CHANNEL_DATA_BUFFER_LEN);
CHK(pTurnConnection != NULL, STATUS_NOT_ENOUGH_MEMORY);

pTurnConnection->lock = MUTEX_CREATE(TRUE);
pTurnConnection->sendLock = MUTEX_CREATE(FALSE);
pTurnConnection->freeAllocationCvar = CVAR_CREATE();
Expand Down Expand Up @@ -215,6 +214,7 @@ STATUS turnConnectionHandleStun(PTurnConnection pTurnConnection, PBYTE pBuffer,
PStunAttributeAddress pStunAttributeAddress = NULL;
PStunAttributeLifetime pStunAttributeLifetime = NULL;
PStunPacket pStunPacket = NULL;
CHAR profileDebugStr[MAX_TURN_PROFILE_LOG_DESC_LEN];
CHAR ipAddrStr[KVS_IP_ADDRESS_STRING_BUFFER_LEN];
BOOL locked = FALSE;
ATOMIC_BOOL hasAllocation = FALSE;
Expand Down Expand Up @@ -245,12 +245,18 @@ STATUS turnConnectionHandleStun(PTurnConnection pTurnConnection, PBYTE pBuffer,

// convert lifetime to 100ns and store it
pTurnConnection->allocationExpirationTime = (pStunAttributeLifetime->lifetime * HUNDREDS_OF_NANOS_IN_A_SECOND) + currentTime;
DLOGD("TURN Allocation succeeded. Life time: %u seconds. Allocation expiration epoch %" PRIu64, pStunAttributeLifetime->lifetime,
pTurnConnection->allocationExpirationTime / DEFAULT_TIME_UNIT_IN_NANOS);

pStunAttributeAddress = (PStunAttributeAddress) pStunAttr;
pTurnConnection->relayAddress = pStunAttributeAddress->address;
ATOMIC_STORE_BOOL(&pTurnConnection->hasAllocation, TRUE);
getIpAddrStr(&pTurnConnection->relayAddress, ipAddrStr, ARRAY_SIZE(ipAddrStr));
SNPRINTF(profileDebugStr, MAX_TURN_PROFILE_LOG_DESC_LEN, "%p - %s:%d - %s", (PVOID) pTurnConnection, ipAddrStr,
pTurnConnection->relayAddress.port, "TURN allocation");
DLOGD("[%p - %s:%d] TURN Allocation succeeded. Life time: %u seconds. Allocation expiration epoch %" PRIu64, pTurnConnection, ipAddrStr,
pTurnConnection->relayAddress.port, pStunAttributeLifetime->lifetime,
pTurnConnection->allocationExpirationTime / DEFAULT_TIME_UNIT_IN_NANOS);
PROFILE_WITH_START_TIME_OBJ(pTurnConnection->turnProfileDiagnostics.createAllocationStartTime,
pTurnConnection->turnProfileDiagnostics.createAllocationTime, profileDebugStr);

if (!pTurnConnection->relayAddressReported && pTurnConnection->turnConnectionCallbacks.relayAddressAvailableFn != NULL) {
pTurnConnection->relayAddressReported = TRUE;
Expand Down Expand Up @@ -291,7 +297,13 @@ STATUS turnConnectionHandleStun(PTurnConnection pTurnConnection, PBYTE pBuffer,
if (pTurnPeer->connectionState == TURN_PEER_CONN_STATE_CREATE_PERMISSION) {
pTurnPeer->connectionState = TURN_PEER_CONN_STATE_BIND_CHANNEL;
CHK_STATUS(getIpAddrStr(&pTurnPeer->address, ipAddrStr, ARRAY_SIZE(ipAddrStr)));
DLOGD("create permission succeeded for peer %s", ipAddrStr);
DLOGD("[%p] Create permission succeeded for peer %s:%d", pTurnConnection, ipAddrStr, pTurnPeer->address.port);
if (pTurnPeer->firstTimeCreatePermResponse) {
pTurnPeer->firstTimeCreatePermResponse = FALSE;
SNPRINTF(profileDebugStr, MAX_TURN_PROFILE_LOG_DESC_LEN, "%p - %s:%d - %s", (PVOID) pTurnConnection, ipAddrStr,
pTurnPeer->address.port, "TURN create permission");
PROFILE_WITH_START_TIME_OBJ(pTurnPeer->createPermissionStartTime, pTurnPeer->createPermissionTime, profileDebugStr);
}
}

pTurnPeer->permissionExpirationTime = TURN_PERMISSION_LIFETIME + currentTime;
Expand All @@ -313,8 +325,14 @@ STATUS turnConnectionHandleStun(PTurnConnection pTurnConnection, PBYTE pBuffer,
pTurnPeer->connectionState = TURN_PEER_CONN_STATE_READY;

CHK_STATUS(getIpAddrStr(&pTurnPeer->address, ipAddrStr, ARRAY_SIZE(ipAddrStr)));
DLOGD("Channel bind succeeded with peer %s, port: %u, channel number %u", ipAddrStr, (UINT16) getInt16(pTurnPeer->address.port),
pTurnPeer->channelNumber);
DLOGD("[%p] Channel bind succeeded with peer %s, port: %d, channel number %u", pTurnConnection, ipAddrStr,
pTurnPeer->address.port, pTurnPeer->channelNumber);
if (pTurnPeer->firstTimeBindChannelResponse) {
pTurnPeer->firstTimeBindChannelResponse = FALSE;
SNPRINTF(profileDebugStr, MAX_TURN_PROFILE_LOG_DESC_LEN, "%p - %s:%d:%u - %s", (PVOID) pTurnConnection, ipAddrStr,
pTurnPeer->address.port, pTurnPeer->channelNumber, "TURN bind channel");
PROFILE_WITH_START_TIME_OBJ(pTurnPeer->bindChannelStartTime, pTurnPeer->bindChannelTime, profileDebugStr);
}

break;
}
Expand Down Expand Up @@ -360,6 +378,7 @@ STATUS turnConnectionHandleStunError(PTurnConnection pTurnConnection, PBYTE pBuf
PStunPacket pStunPacket = NULL;
BOOL locked = FALSE, iterate = TRUE;
PTurnPeer pTurnPeer = NULL;
CHAR profileDebugStr[MAX_TURN_PROFILE_LOG_DESC_LEN];
UINT32 i;

CHK(pTurnConnection != NULL, STATUS_NULL_ARG);
Expand Down Expand Up @@ -412,7 +431,9 @@ STATUS turnConnectionHandleStunError(PTurnConnection pTurnConnection, PBYTE pBuf
pTurnConnection->turnRealm[pStunAttributeRealm->attribute.length] = '\0';

pTurnConnection->credentialObtained = TRUE;

SNPRINTF(profileDebugStr, MAX_TURN_PROFILE_LOG_DESC_LEN, "%p - %s", (PVOID) pTurnConnection, "TURN Get Credentials");
PROFILE_WITH_START_TIME_OBJ(pTurnConnection->turnProfileDiagnostics.getCredentialsStartTime,
pTurnConnection->turnProfileDiagnostics.getCredentialsTime, profileDebugStr);
CHK_STATUS(turnConnectionUpdateNonce(pTurnConnection));
break;

Expand Down Expand Up @@ -561,8 +582,8 @@ STATUS turnConnectionHandleChannelDataTcpMode(PTurnConnection pTurnConnection, P
/* process only one channel data and return. Because channel data can be intermixed with STUN packet.
* need to check remainingBufLen too because channel data could be incomplete. */
while (remainingBufLen != 0 && channelDataCount == 0) {
DLOGV("currRecvDataLen: %d", pTurnConnection->currRecvDataLen);
if (pTurnConnection->currRecvDataLen != 0) {
DLOGV("currRecvDataLen: %d", pTurnConnection->currRecvDataLen);
if (pTurnConnection->currRecvDataLen >= TURN_DATA_CHANNEL_SEND_OVERHEAD) {
/* pTurnConnection->recvDataBuffer always has channel data start */
paddedChannelDataLen = ROUND_UP((UINT32) getInt16(*(PINT16) (pTurnConnection->recvDataBuffer + SIZEOF(channelNumber))), 4);
Expand Down Expand Up @@ -673,6 +694,10 @@ STATUS turnConnectionAddPeer(PTurnConnection pTurnConnection, PKvsIpAddress pPee
pTurnPeer->channelNumber = (UINT16) pTurnConnection->turnPeerCount + TURN_CHANNEL_BIND_CHANNEL_NUMBER_BASE;
pTurnPeer->permissionExpirationTime = INVALID_TIMESTAMP_VALUE;
pTurnPeer->ready = FALSE;
pTurnPeer->firstTimeCreatePermReq = TRUE;
pTurnPeer->firstTimeBindChannelReq = TRUE;
pTurnPeer->firstTimeCreatePermResponse = TRUE;
pTurnPeer->firstTimeBindChannelResponse = TRUE;

CHK_STATUS(xorIpAddress(&pTurnPeer->xorAddress, NULL)); /* only work for IPv4 for now */
CHK_STATUS(createTransactionIdStore(DEFAULT_MAX_STORED_TRANSACTION_ID_COUNT, &pTurnPeer->pTransactionIdStore));
Expand Down Expand Up @@ -852,6 +877,7 @@ STATUS turnConnectionRefreshPermission(PTurnConnection pTurnConnection, PBOOL pN
UINT64 currTime = 0;
PTurnPeer pTurnPeer = NULL;
BOOL needRefresh = FALSE;
CHAR ipAddr[KVS_IP_ADDRESS_STRING_BUFFER_LEN];
UINT32 i;

CHK(pTurnConnection != NULL && pNeedRefresh != NULL, STATUS_NULL_ARG);
Expand All @@ -863,7 +889,8 @@ STATUS turnConnectionRefreshPermission(PTurnConnection pTurnConnection, PBOOL pN
pTurnPeer = &pTurnConnection->turnPeerList[i];
if (IS_VALID_TIMESTAMP(pTurnPeer->permissionExpirationTime) &&
currTime + DEFAULT_TURN_PERMISSION_REFRESH_GRACE_PERIOD >= pTurnPeer->permissionExpirationTime) {
DLOGD("Refreshing turn permission");
getIpAddrStr(&pTurnPeer->address, ipAddr, ARRAY_SIZE(ipAddr));
DLOGD("[%p] Refreshing turn permission for %s:%d", pTurnConnection, ipAddr, pTurnPeer->address.port);
needRefresh = TRUE;
}
}
Expand Down Expand Up @@ -1012,6 +1039,10 @@ STATUS checkTurnPeerConnections(PTurnConnection pTurnConnection)
pTurnPeer = &pTurnConnection->turnPeerList[i];

if (pTurnPeer->connectionState == TURN_PEER_CONN_STATE_CREATE_PERMISSION) {
if (pTurnPeer->firstTimeCreatePermReq) {
pTurnPeer->createPermissionStartTime = GETTIME();
pTurnPeer->firstTimeCreatePermReq = FALSE;
}
// update peer address;
CHK_STATUS(getStunAttribute(pTurnConnection->pTurnCreatePermissionPacket, STUN_ATTRIBUTE_TYPE_XOR_PEER_ADDRESS,
(PStunAttributeHeader*) &pStunAttributeAddress));
Expand All @@ -1028,6 +1059,10 @@ STATUS checkTurnPeerConnections(PTurnConnection pTurnConnection)
pTurnConnection->pControlChannel, NULL, FALSE);

} else if (pTurnPeer->connectionState == TURN_PEER_CONN_STATE_BIND_CHANNEL) {
if (pTurnPeer->firstTimeBindChannelReq) {
pTurnPeer->bindChannelStartTime = GETTIME();
pTurnPeer->firstTimeBindChannelReq = FALSE;
}
// update peer address;
CHK_STATUS(getStunAttribute(pTurnConnection->pTurnChannelBindPacket, STUN_ATTRIBUTE_TYPE_XOR_PEER_ADDRESS,
(PStunAttributeHeader*) &pStunAttributeAddress));
Expand Down
Loading

0 comments on commit 4eaa8d1

Please sign in to comment.