Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add CSV-to-DB importer script #738

Merged
merged 2 commits into from
Apr 30, 2024

Conversation

marcospb19-cw
Copy link
Contributor

No description provided.

@marcospb19-cw marcospb19-cw requested a review from a team as a code owner April 30, 2024 16:10
Copy link

PR Review

⏱️ Estimated effort to review [1-5]

3, because the script involves multiple operations including file handling, database operations, and conditional logic which requires careful review to ensure correctness and efficiency.

🧪 Relevant tests

No

🔍 Possible issues

Possible Bug: The script does not handle potential failures in database operations. If a database operation fails, the script will continue executing without any retries or error handling.

Performance Issue: The script lists all CSV files in the directory and then filters them in a loop, which might be inefficient for directories with a large number of files.

🔒 Security concerns

Sensitive information exposure: The script includes a hardcoded database URL in the example usage comments, which might lead to accidental exposure of credentials if used in real environments.

Code feedback:
relevant fileutils/import-csv-to-db.sh
suggestion      

Consider adding error handling for database import operations to manage failures gracefully. This could include retry mechanisms or logging errors to a specific file. [important]

relevant linetime psql "$DATABASE_URL" -c "\copy $table FROM '$file' DELIMITER E'\t' CSV HEADER"

relevant fileutils/import-csv-to-db.sh
suggestion      

To improve performance, consider using find with -exec instead of ls and a loop. This method can directly handle files within the specified range, reducing unnecessary iterations. [important]

relevant linefiles=$(ls -w 1 -v $DATA_FOLDER/*.csv)

relevant fileutils/import-csv-to-db.sh
suggestion      

Remove or mask the database URL in the example usage comments to prevent potential sensitive information exposure. Use placeholder values like <database_url>. [important]

relevant line# bash csv-to-db-importer.sh ../data 0 36000000 postgres://postgres:postgres@localhost/stratus

relevant fileutils/import-csv-to-db.sh
suggestion      

Add a final check after the import loop to verify if any files were processed and handle the case where no files meet the criteria. This could prevent confusion when no files are imported due to range mismatches. [medium]

relevant lineecho "Imported $import_count files, check the file 'imported.logs'"


✨ Review tool usage guide:

Overview:
The review tool scans the PR code changes, and generates a PR review which includes several types of feedbacks, such as possible PR issues, security threats and relevant test in the PR. More feedbacks can be added by configuring the tool.

The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on any PR.

  • When commenting, to edit configurations related to the review tool (pr_reviewer section), use the following template:
/review --pr_reviewer.some_config1=... --pr_reviewer.some_config2=...
[pr_reviewer]
some_config1=...
some_config2=...

See the review usage page for a comprehensive guide on using this tool.

Copy link

PR Code Suggestions

CategorySuggestions                                                                                                                                                       
Enhancement
Improve the reliability and predictability of file listing.

Replace the use of ls for file listing with a more robust method such as find. The ls
command can behave unpredictably with special characters or large numbers of files, and
its output is not guaranteed to be in a consistent order, even with -v flag.

utils/import-csv-to-db.sh [37]

-files=$(ls -w 1 -v $DATA_FOLDER/*.csv)
+files=$(find $DATA_FOLDER -name '*.csv' -print | sort -V)
 
Add error handling for database import operations.

Add error handling for the psql command to ensure that the script stops or handles errors
when a database import fails.

utils/import-csv-to-db.sh [47]

-time psql "$DATABASE_URL" -c "\copy $table FROM '$file' DELIMITER E'\t' CSV HEADER"
+if ! time psql "$DATABASE_URL" -c "\copy $table FROM '$file' DELIMITER E'\t' CSV HEADER"; then
+    echo "Failed to import $file" >&2
+    exit 1
+fi
 
Improve file writing robustness and error handling.

Replace the use of echo for appending to files with tee -a for better handling of
permissions and errors.

utils/import-csv-to-db.sh [49]

-echo "$file" >> imported.logs
+echo "$file" | tee -a imported.logs >/dev/null
 
Enhance the robustness of table name extraction from filenames.

Consider using a more robust method for extracting the table name from the filename to
handle potential variations or complex patterns.

utils/import-csv-to-db.sh [44]

-table=$(echo "$file" | grep -oP "(?<=/)\w+(?=-)")
+table=$(basename "$file" | grep -oP "^\w+")
 
Best practice
Prevent word splitting and globbing issues by quoting variable expansions.

Use double quotes around variable expansions to prevent word splitting and globbing
issues, which can lead to unexpected behavior or security vulnerabilities.

utils/import-csv-to-db.sh [37]

-files=$(ls -w 1 -v $DATA_FOLDER/*.csv)
+files=$(ls -w 1 -v "$DATA_FOLDER"/*.csv)
 

✨ Improve tool usage guide:

Overview:
The improve tool scans the PR code changes, and automatically generates suggestions for improving the PR code. The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on a PR.

  • When commenting, to edit configurations related to the improve tool (pr_code_suggestions section), use the following template:
/improve --pr_code_suggestions.some_config1=... --pr_code_suggestions.some_config2=...
[pr_code_suggestions]
some_config1=...
some_config2=...

See the improve usage page for a comprehensive guide on using this tool.

@marcospb19-cw marcospb19-cw force-pushed the chore-add-csv-importer-script branch from 03dc41a to 1405fba Compare April 30, 2024 16:34
@marcospb19-cw marcospb19-cw enabled auto-merge (squash) April 30, 2024 16:38
@marcospb19-cw marcospb19-cw merged commit 5c9eeee into main Apr 30, 2024
24 checks passed
@marcospb19-cw marcospb19-cw deleted the chore-add-csv-importer-script branch April 30, 2024 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant