DOC: Provide examples of using read_parquet #49739

wjones127 · 2022-11-17T02:45:23Z

Pandas version checks

I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_parquet.html

Documentation problem

For the pyarrow engine, there are some important features behind the kwargs that aren't aren't described here, and it might not be obvious to users where to look in PyArrow. For example:

Using filters, users can prune which files and/or row groups are read.
Using filesystem, users can configure a filesystem such as S3

Suggested fix for documentation

At the very least, we should document for each engine where those kwargs are passed. But it might even be worthwhile to provide examples of filters, reading partitioned datasets, and configuring remote filesystems. Does that seem reasonable?

The text was updated successfully, but these errors were encountered:

rhshadrach · 2022-11-17T11:37:39Z

Thanks for the report! +1 on saying the function that's called from pandas and linking to it's documentation. However, if we were to document which kwargs there's a bit more maintenance burden keeping it in sync (e.g. what can be passed for filters just changed in 10.0.0) and I don't think it provides significant benefit to the user.

phofl · 2022-11-17T17:39:03Z

Yep agreed, would rather link to the functions itself too

…nor fixes)

Fixed typos that were causing tests to fail. Oops.

…rmatting failed checks

@mroeschke

…d read_parquet from code_checks.sh as requested by @mroeschke

@mroeschke

* DOC: Provide examples of using read_parquet #49739 * DOC: Provide examples of using read_parquet #49739 * DOC: Provide examples of using read_parquet #49739 (with minor fixes) * DOC: Provide examples of using read_parquet #49739 Fixed typos that were causing tests to fail. Oops. * DOC: Provide examples of using read_parquet #49739 - fix formatting failed checks * DOC: Provide examples of using read_parquet #49739 - removed read_parquet from code_checks.sh as requested by @mroeschke --------- Co-authored-by: Vijay Vaidyanathan <[email protected]>

LuchiLucs · 2023-11-27T15:03:09Z

I'm interested in examples of:

leveraging the filter argument to filter rows based on their index, for instance if the index is a datetime, to return rows in a given datetime interval
leveraging the columns argument considering the case of unknown existence. For instance, if given columns do exist, filter based on them, if a sub-set do exist and a sub-set do not, filter based only on those existing without raising.

wjones127 added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 17, 2022

rhshadrach added IO Parquet parquet, feather and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 17, 2022

phofl mentioned this issue Nov 17, 2022

DOC: Documentation improvements noatamir/pyladies-berlin-sprints#4

Open

12 tasks

phofl mentioned this issue Apr 9, 2023

Small documentation improvements noatamir/pyladies-workshop#6

Open

12 tasks

vvaidy pushed a commit to vvaidy/pandas that referenced this issue Jul 15, 2023

DOC: Provide examples of using read_parquet pandas-dev#49739

b3cadf3

vvaidy mentioned this issue Jul 15, 2023

DOC: Provide examples of using read_parquet #49739 #54150

Merged

5 tasks

vvaidy pushed a commit to vvaidy/pandas that referenced this issue Jul 15, 2023

DOC: Provide examples of using read_parquet pandas-dev#49739

694700e

vvaidy pushed a commit to vvaidy/pandas that referenced this issue Jul 15, 2023

DOC: Provide examples of using read_parquet pandas-dev#49739 (with mi…

0ec5d45

…nor fixes)

vvaidy pushed a commit to vvaidy/pandas that referenced this issue Jul 16, 2023

DOC: Provide examples of using read_parquet pandas-dev#49739

bc3c5d4

Fixed typos that were causing tests to fail. Oops.

vvaidy pushed a commit to vvaidy/pandas that referenced this issue Jul 18, 2023

DOC: Provide examples of using read_parquet pandas-dev#49739 - fix fo…

4bb7434

…rmatting failed checks

vvaidy pushed a commit to vvaidy/pandas that referenced this issue Jul 18, 2023

DOC: Provide examples of using read_parquet pandas-dev#49739 - remove…

b2d3a5d

…d read_parquet from code_checks.sh as requested by @mroeschke

phofl mentioned this issue Jun 23, 2024

Small documentation fixes phofl/pydata-yerevan-sprint#2

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Provide examples of using read_parquet #49739

DOC: Provide examples of using read_parquet #49739

wjones127 commented Nov 17, 2022

rhshadrach commented Nov 17, 2022

phofl commented Nov 17, 2022

LuchiLucs commented Nov 27, 2023

DOC: Provide examples of using read_parquet #49739

DOC: Provide examples of using read_parquet #49739

Comments

wjones127 commented Nov 17, 2022

Pandas version checks

Location of the documentation

Documentation problem

Suggested fix for documentation

rhshadrach commented Nov 17, 2022

phofl commented Nov 17, 2022

LuchiLucs commented Nov 27, 2023