Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial code review #1

Open
8 tasks
wfondrie opened this issue Jul 13, 2021 · 0 comments
Open
8 tasks

Initial code review #1

wfondrie opened this issue Jul 13, 2021 · 0 comments

Comments

@wfondrie
Copy link
Member

Overall I think it looks great and very useful. Here are my comments for my initial code review of the repository:

  • The README needs some more details 😄

  • What is this src/py.typed? It seems to be empty.

  • Typically, I think folks use from pathlib import Path. It might be something you want to consider, but it doesn't really matter.

  • Why would you use this private function instead of just boto3.Session() directly?

    def _get_boto_session() -> boto3.Session:
    """Creates and returns an active boto3 session.
    Returns:
    boto3.Session: An active boto3 Session.
    """
    session = boto3.Session()
    return session

  • Any reason why you decided to have the bucket and key separate in you parameters, instead of say using the S3 URI? I guess a more general AWS question: Is the S3 URI always in the format s3://<bucket>/key? If so, you could always figure out the bucket and key from a single parameter.

  • I think you'll run into problems if the user has specified sep in **kwargs.

    return pd.read_csv(data, sep="\t", **kwargs)

  • This should mention it is inferred from the suffix by default:

    outputformat (Optional[str], optional): The target output format.
    Can be one of {parquet, txt, csv, tsv}.
    Defaults to None.

  • You've replicated a subset of the pandas saving a loading utilities here. However, I think that functionality is already in pandas using the s3 URI. From https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html:

    Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.csv.

    Have you tried this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant