Initial code review #1

wfondrie · 2021-07-13T22:52:17Z

Overall I think it looks great and very useful. Here are my comments for my initial code review of the repository:

The README needs some more details 😄
What is this src/py.typed? It seems to be empty.
Typically, I think folks use from pathlib import Path. It might be something you want to consider, but it doesn't really matter.

talus-aws-utils/src/talus_aws_utils/s3.py

Line 3 in b17407c

import pathlib

Why would you use this private function instead of just boto3.Session() directly?

talus-aws-utils/src/talus_aws_utils/s3.py

Lines 17 to 24 in b17407c

    
           def _get_boto_session() -> boto3.Session: 
        
               """Creates and returns an active boto3 session. 
        
               Returns: 
        
                   boto3.Session: An active boto3 Session. 
        
               """ 
        
               session = boto3.Session() 
        
               return session

Any reason why you decided to have the bucket and key separate in you parameters, instead of say using the S3 URI? I guess a more general AWS question: Is the S3 URI always in the format s3://<bucket>/key? If so, you could always figure out the bucket and key from a single parameter.
I think you'll run into problems if the user has specified sep in **kwargs.

talus-aws-utils/src/talus_aws_utils/s3.py

Line 98 in b17407c

return pd.read_csv(data, sep="\t", **kwargs)

This should mention it is inferred from the suffix by default:

talus-aws-utils/src/talus_aws_utils/s3.py

Lines 120 to 122 in b17407c

    
                   outputformat (Optional[str], optional): The target output format. 
        
                                                           Can be one of {parquet, txt, csv, tsv}. 
        
                                                           Defaults to None.

You've replicated a subset of the pandas saving a loading utilities here. However, I think that functionality is already in pandas using the s3 URI. From https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html:

Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.csv.

Have you tried this?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial code review #1

Initial code review #1

wfondrie commented Jul 13, 2021

Initial code review #1

Initial code review #1

Comments

wfondrie commented Jul 13, 2021