-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataframe is all Nat and None after loading #127
Comments
Hi @Sondos-Omar, could you please provide some sample input and output of the current behavior vs the behavior you would expect? |
Hi @ShaneHarvey , thank you for your reply, here is a sample of the output. Loading this data using collections.find followed by appending the dataset to pandas in batches works and loads the expected data (the dates, names and ids). But this takes so much space and time. Double checked the schema on mongodb compass, the prop is object and the prop.Start is datetime and the rest are strings. |
Hi! Thank you for raising this issue. This is unfortunately because Schemas do not support the "dot" notation used in MongoDB projections. Unfortunately, at this time the best workaround seems to be to flatten the data before ingesting into PyMongoArrow by using an aggregate pipeline. I opened a ticket for the implementation of a real solution to this issue, and a ticket here for us to update our documentation with the correct workaround. |
An example aggregation pipeline you can use would be:
Which would yield something that looks like this:
|
@Sondos-Omar here is a more detailed example:
|
Hi! @Sondos-Omar We have updated our documentation in this PR to show more examples for using nested data: #130 |
I was trying mongo arrow to load a dataset from mongodb, it is loading the selected columns only that's saving space, but the dataframe is all Nat and Nones only. Is this a common issue and how to fix that?
Thanks in advance
df=collection.find_pandas_all(
{ "prop.Start": {'$gte':start_date,'$lte':end_date}} ,
schema=Schema({
'prop.Start': datetime,
'prop.Name':str,
'_id.objectId':str
}))
The text was updated successfully, but these errors were encountered: