Skip to content

Commit policy

Andrea Fiore edited this page Jan 26, 2023 · 3 revisions

The connector accumulates data into files before it uploads it to EMS. Please check How it works section for details.

The commit policy is a set of rules to be applied by the connector to determine when data is uploaded. The goal is to avoid small files (their file size is in kilobytes), and avoid delaying the records for too long.

There are 3 configuration parameters to set to control the behaviour:

  • parquet file size
  • number of records in the file
  • time since the last write

Once a record has been written to a file associated with a source topic-partition, the sink checks if the file should be committed. If any of the first two criteria is met, then the file is being uploaded.

The time since last write is key to reduce the time for data to be uploaded. There are scenarios where data is not stored in a Kafka topic every few milliseconds or seconds. Depending of the context, there can be a gap of minutes or even hours before new data arrives to a topic. The extreme is for no record to ever arrive in the topic. Since these delays can be common the first two criteria will take hours to be reached; or it might never be the case. Therefore any existing accumulated data should not be delayed to be uploaded to EMS. Thus, the time since last write offers a stop gap makes and ensures the data will always be uploaded.

Examples

  • Every 10MB, or every 10k records or 30 seconds since last write
connect.ems.commit.size.bytes=10000000
connect.ems.commit.records=10000
connect.ems.commit.interval.ms=30000
  • Every 25MB, or every 100k records or every 120 seconds
connect.ems.commit.size.bytes=25000000
connect.ems.commit.records=1000000
connect.ems.commit.interval.ms=120000
Clone this wiki locally