Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yandex.Disk Query runner #6598

Merged
merged 22 commits into from
Nov 22, 2023
Merged

Conversation

denisov-vlad
Copy link
Member

What type of PR is this?

  • Refactor
  • Feature
  • Bug Fix
  • New Query Runner (Data Source)
  • New Alert Destination
  • Other

Description

Added query runner for Yandex.Disk, supports showing CSV, TSV, XLS(X) files

get_schema is used to show all available files

Example query:

path: /file_path.xlsx
sheet_name: 0  -- optional, used standard pandas syntax and allows both indexes and names

How is this tested?

  • Unit tests (pytest, jest)
  • E2E Tests (Cypress)
  • Manually
  • N/A

Related Tickets & Documents

Mobile & Desktop Screenshots/Recordings (if there are UI changes)

Copy link

codecov bot commented Nov 14, 2023

Codecov Report

Merging #6598 (f377b9d) into master (8bfc574) will increase coverage by 0.38%.
The diff coverage is 90.27%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6598      +/-   ##
==========================================
+ Coverage   61.89%   62.28%   +0.38%     
==========================================
  Files         158      160       +2     
  Lines       12992    13114     +122     
  Branches     1774     1790      +16     
==========================================
+ Hits         8042     8168     +126     
+ Misses       4673     4663      -10     
- Partials      277      283       +6     
Files Coverage Δ
redash/settings/__init__.py 98.65% <ø> (ø)
redash/utils/pandas.py 90.00% <90.00%> (ø)
redash/query_runner/yandex_disk.py 95.19% <95.19%> (ø)
redash/query_runner/python.py 62.06% <40.00%> (+3.46%) ⬆️

@justinclift
Copy link
Member

Re-running the frontend-e2e test, as it looks like it failed on the first attempt due to a GitHub infrastructure problem. 🤦

@justinclift
Copy link
Member

Oh hang on. It's failing on this:

ModuleNotFoundError: No module named 'numpy'

Sounds like numpy might need to be added to the dependency list or similar?

@denisov-vlad
Copy link
Member Author

@justinclift looks strange, because numpy is the core dependency of pandas, and pandas is included in all_ds. What do you think about moving pandas+numpy to main deps?

@justinclift
Copy link
Member

@guidopetri Do you have time to look at this one?

@konnectr
Copy link
Collaborator

@denisov-vlad
func get_column_types_from_dataframe and pandas_to_result
I probably would have done it simpler and put the code in the utils folder, and not in init.py

I believe that there is no need to combine the dependencies of the main Redash and QR into one list. This can later lead to unnecessary dependencies

Copy link
Contributor

@guidopetri guidopetri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. Thanks for implementing this :)

Comment on lines 116 to 130
if column_type == np.bool_:
redash_type = TYPE_BOOLEAN
elif column_type == np.inexact:
redash_type = TYPE_FLOAT
elif column_type == np.integer:
redash_type = TYPE_INTEGER
elif column_type in (np.datetime64, np.dtype("<M8[ns]")):
if df.empty:
redash_type = TYPE_DATETIME
elif len(df[column_name].head(1).astype(str).loc[0]) > 10:
redash_type = TYPE_DATETIME
else:
redash_type = TYPE_DATE
else:
redash_type = TYPE_STRING
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a mapping?

column_type_mappings = {
  np.bool_: TYPE_BOOLEAN,
  np.inexact: TYPE_FLOAT,
  ...
}

redash_type = column_type_mappings[column_type]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, this was moved from elsewhere. I'd be fine keeping it as is then, but obviously better if it's made more readable :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original code was also written by me :)

Thought about mapping here, but using extra if section for date/datetime prevented me from doing this.

redash/query_runner/yandex_disk.py Outdated Show resolved Hide resolved
redash/query_runner/yandex_disk.py Outdated Show resolved Hide resolved
redash/query_runner/yandex_disk.py Show resolved Hide resolved
tests/query_runner/test_yandex_disk.py Outdated Show resolved Hide resolved


class TestYandexDisk(TestCase):
def test_xlsx(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are great, but we only have ~30% coverage of the new query runner. Could we add some more tests to have >=90%?

@denisov-vlad denisov-vlad changed the title Yandex.Disk Query runner WIP: Yandex.Disk Query runner Nov 20, 2023
@denisov-vlad
Copy link
Member Author

Having the same issue as before with missed numpy package.

Also could you please update Codecov report?

@denisov-vlad denisov-vlad changed the title WIP: Yandex.Disk Query runner Yandex.Disk Query runner Nov 20, 2023
@guidopetri
Copy link
Contributor

The frontend e2e tests build the docker image without all_ds dependencies. So that's why it's not finding numpy/pandas. (this wasn't clear to me, so pointing it out more explicitly) Maybe we can add an import guard like in the query runner files?

@denisov-vlad
Copy link
Member Author

@guidopetri it was hard, but I've done it :) Could you review it again please?

Copy link
Contributor

@guidopetri guidopetri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome! Thanks so much @denisov-vlad , merging now :)

@guidopetri guidopetri merged commit a07b8a6 into getredash:master Nov 22, 2023
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants