Support file wildcards in GSProcessing inputs #1107
Labels
good first issue
Good for newcomers
gsprocessing
For issues and PRs related the the GSProcessing library
Currently we read files in GSProcessing by directly using the path provided by the user in the config in a
spark.read.parquet/csv(filepath)
call. Spark doesn't support wildcards when used like this, but GConstuct has support for filepath wildcards.To ensure better compatibility between the two we should support wildcards for S3 paths on GSProcessing as well. One option is to use boto to list all files under the parent path and then apply the wildcard rule, then pass the resulting list of files to the input.
This can happen in config parsing time.
The text was updated successfully, but these errors were encountered: