Skip to content

Commit

Permalink
Merge pull request #170 from rzvoncek/radovan/fix-readme
Browse files Browse the repository at this point in the history
Fix class names in README examples + Make target ks.t not required
  • Loading branch information
msmygit authored Jun 12, 2023
2 parents 5d20f4a + 53297a7 commit 06cc507
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 10 deletions.
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ tar -xvzf spark-3.3.1-bin-hadoop3.tgz
./spark-submit --properties-file cdm.properties /
--conf spark.cdm.schema.origin.keyspaceTable="<keyspace-name>.<table-name>" /
--master "local[*]" /
--class datastax.cdm.job.Migrate cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
--class com.datastax.cdm.job.Migrate cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
```

Note:
Expand All @@ -46,18 +46,18 @@ Note:
./spark-submit --properties-file cdm.properties /
--conf spark.cdm.schema.origin.keyspaceTable="<keyspace-name>.<table-name>" /
--master "local[*]" --driver-memory 25G --executor-memory 25G /
--class datastax.cdm.job.Migrate cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
--class com.datastax.cdm.job.Migrate cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
```

# Steps for Data-Validation:

- To run the job in Data validation mode, use class option `--class datastax.cdm.job.DiffData` as shown below
- To run the job in Data validation mode, use class option `--class com.datastax.cdm.job.DiffData` as shown below

```
./spark-submit --properties-file cdm.properties /
--conf spark.cdm.schema.origin.keyspaceTable="<keyspace-name>.<table-name>" /
--master "local[*]" /
--class datastax.cdm.job.DiffData cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
--class com.datastax.cdm.job.DiffData cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
```

- Validation job will report differences as “ERRORS” in the log file as shown below
Expand All @@ -83,12 +83,12 @@ Note:
- The validation job will never delete records from target i.e. it only adds or updates data on target

# Migrating specific partition ranges
- You can also use the tool to migrate specific partition ranges using class option `--class datastax.cdm.job.MigratePartitionsFromFile` as shown below
- You can also use the tool to migrate specific partition ranges using class option `--class com.datastax.cdm.job.MigratePartitionsFromFile` as shown below
```
./spark-submit --properties-file cdm.properties /
--conf spark.cdm.schema.origin.keyspaceTable="<keyspace-name>.<table-name>" /
--master "local[*]" /
--class datastax.cdm.job.MigratePartitionsFromFile cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
--class com.datastax.cdm.job.MigratePartitionsFromFile cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
```

When running in above mode the tool assumes a `partitions.csv` file to be present in the current folder in the below format, where each line (`min,max`) represents a partition-range
Expand All @@ -107,12 +107,12 @@ This mode is specifically useful to processes a subset of partition-ranges that
grep "ERROR CopyJobSession: Error with PartitionRange" /path/to/logfile_name.txt | awk '{print $13","$15}' > partitions.csv
```
# Data validation for specific partition ranges
- You can also use the tool to validate data for a specific partition ranges using class option `--class datastax.cdm.job.DiffPartitionsFromFile` as shown below,
- You can also use the tool to validate data for a specific partition ranges using class option `--class com.datastax.cdm.job.DiffPartitionsFromFile` as shown below,
```
./spark-submit --properties-file cdm.properties /
--conf spark.origin.keyspaceTable="<keyspace-name>.<table-name>" /
--master "local[*]" /
--class datastax.cdm.job.DiffPartitionsFromFile cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
--class com.datastax.cdm.job.DiffPartitionsFromFile cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
```

When running in above mode the tool assumes a `partitions.csv` file to be present in the current folder.
Expand All @@ -124,7 +124,7 @@ When running in above mode the tool assumes a `partitions.csv` file to be presen
--conf spark.origin.keyspaceTable="<keyspace-name>.<table-name>" /
--conf spark.cdm.feature.guardrail.colSizeInKB=10000 /
--master "local[*]" /
--class datastax.cdm.job.GuardrailCheck cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
--class com.datastax.cdm.job.GuardrailCheck cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
```

# Features
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,6 @@ public enum PropertyType {

static {
types.put(TARGET_KEYSPACE_TABLE, PropertyType.STRING);
required.add(TARGET_KEYSPACE_TABLE);
}

//==========================================================================
Expand Down

0 comments on commit 06cc507

Please sign in to comment.