-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide convenient way for loading query results into a data frame #83
Comments
@avisrivastava254084 can you describe the schema of the tables you are reading and provide an example with rows from this table? In other words, could you describe the data that you're querying from Presto and the expected values in the pandas |
@ggreg sorry for responding late, can we close this issue and instead discuss the main topic "Creating pandas data frame in presto client" instead? This issue |
This is the repository for the Presto Python client. The other issue is in the Presto engine codebase. |
@avisrivastava254084 Aviral, prestodb/presto-python-client (this repo) is a proper place for Python client enhancements. The other repo, prestodb/presto, contains code for the Presto itself. Are you thinking of making changes in the presto client or presto code? |
I am so sorry, I am thinking in the presto client only. Let's continue here.
So, do you guys think we should have pandas data frame creation in here as
well? I could copy the code from PrestoHook of Airflow because I use both.
But I think it'd be easier for presto users who code in Python if they are
given this feature, creating pandas data frame using Presto.
Does this make sense?
…On Thu, Jun 27, 2019 at 2:56 AM Maria Basmanova ***@***.***> wrote:
@avisrivastava254084 <https://github.com/avisrivastava254084> Aviral,
prestodb/presto-python-client (this repo) is a proper place for Python
client enhancements. The other repo, prestodb/presto, contains code for the
Presto itself. Are you thinking of making changes in the presto client or
presto code?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#83?email_source=notifications&email_token=ADG64L3VHIWV3LHXBGXL4JDP4PNIZA5CNFSM4HYVXJVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYU3XKI#issuecomment-506051497>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADG64LYQ6X7PPAFP27MMUQ3P4PNIZANCNFSM4HYVXJVA>
.
|
@avisrivastava254084 Makes sense to me. |
I'd like to avoid introducing a dependency to pandas. I was thinking of providing an interface to return the data in the right format. |
@ggreg Not only Pandas, we should also think of implementing the same for pyspark i.e. creating Spark data frame from Presto. But I would like your input on this and your thought process, maybe I am having a bias. |
@avisrivastava254084 i appreciate the value of support this. My main concern is to not conflate too many things into the Presto Python client library. What do you think about providing these features in another library, called |
Name sounds good. Shall I begin?
…On Fri, Jul 19, 2019, 04:01 Greg ***@***.***> wrote:
@avisrivastava254084 <https://github.com/avisrivastava254084> i
appreciate the value of support this. My main concern is to not conflate
too many things into the Presto Python client library.
What do you think about providing these features in another library,
called presto-python-common (i welcome better names too :) )?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#83?email_source=notifications&email_token=ADG64LZMS6SRFKZ5TUPJTXTQADVL7A5CNFSM4HYVXJVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2J7OOY#issuecomment-513013563>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADG64L7E4W6Y2PGTOODLLYDQADVL7ANCNFSM4HYVXJVA>
.
|
I'm preparing the repo so it follows the requirements for an open source repository and allow to easily contribute new modules. I'll ping when it is ready. Meanwhile how would we call the module to manipulate dataframes? Should we just call it |
Looks good to me.
…On Tue, Jul 23, 2019, 05:24 Greg ***@***.***> wrote:
I'm preparing the repo so it follows the requirements for an open source
repository and allow to easily contribute new modules. I'll ping when it is
ready.
Meanwhile how would we call the module to manipulate dataframes? Should we
just call it prestodb.dataframe and provide a get(sql, **kwargs) ->
pandas.DataFrame function?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#83?email_source=notifications&email_token=ADG64L3JCZN2JQNXVUWVI3DQAZCERA5CNFSM4HYVXJVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2RPZAI#issuecomment-513997953>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADG64L2VEQLMZ5QTJD37CBDQAZCERANCNFSM4HYVXJVA>
.
|
Is there any progress on this? It would be incredibly helpful for my use cases. |
@ggreg any update? I am still up for this. |
@ggreg no longer works on the project. @mayankgarg1990 Do you know what might help here? |
Thanks @mbasmanova ! @mayankgarg1990 hi at aviralsrivastava dot com |
I want to create a data frame from hive using Presto. I am done with it but with one exception: there are empty strings in my data which would be
NaN
if the corresponding CSV file is read using Pandas (pd.read_csv()
).I read several documentations but none address this issue.
My code for creating a data frame from Presto is pretty straightforward:
As you can see, there exists a line
df = pandas.DataFrame(data)
. Now, I read the documentation ofpandas.DataFrame()
and could not find any method of keepingkeep_default_na
toTrue
so that the empty string''
are treated asNaN
as Pandas would treat them.The text was updated successfully, but these errors were encountered: