From 1ed5b0af694e39f83b056dbfeda492a89a62404c Mon Sep 17 00:00:00 2001 From: Pravin Bhat Date: Mon, 21 Oct 2024 23:08:00 -0400 Subject: [PATCH] Update README.md Fix typos Co-authored-by: Madhavan --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index b0949f2d..b6b971c0 100644 --- a/README.md +++ b/README.md @@ -157,7 +157,7 @@ spark-submit --properties-file cdm.properties \ - Performance bottleneck are usually the result of - Low resource availability on `Origin` OR `Target` cluster - Low resource availability on CDM VMs, [see recommendations here](https://docs.datastax.com/en/data-migration/deployment-infrastructure.html#_machines) - - Bad schema design which could be cause by Out of balance `Origin` cluster, large partitions (> 100 MB), large rows (> 10MB) and/or high column count + - Bad schema design which could be caused by out of balance `Origin` cluster, large partitions (> 100 MB), large rows (> 10MB) and/or high column count. - Incorrect configuration of below properties - `numParts`: Default is 5K, but ideal value is usually around table-size/10MB. - `batchSize`: Default is 5, but this should be set to 1 for tables where primary-key=partition-key OR where average row-size is > 20 KB. Similarly, this should be set to a value > 5, if row-size is small (< 1KB) and most partitions have several rows (100+).