Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollout pandas version 2 #12

Open
2 of 4 tasks
victorlin opened this issue May 22, 2024 · 15 comments
Open
2 of 4 tasks

Rollout pandas version 2 #12

victorlin opened this issue May 22, 2024 · 15 comments
Labels
enhancement New feature or request

Comments

@victorlin
Copy link
Member

victorlin commented May 22, 2024

Pandas version 2 is supported by Augur and it's likely that our runtimes will eventually resolve to this new version with potentially breaking changes.

Tasks

Related issues

@corneliusroemer
Copy link
Member

The latest pyarrow no longer supports pandas <2 it seems so this is another good reason to allow pandas 2.

@corneliusroemer
Copy link
Member

An even bigger motivation: we can't support python 3.12 until we support pandas v2

Would be really cool if you could figure this out @victorlin 😄

@corneliusroemer
Copy link
Member

Any progress on this? We'd like to use augur in some parts of Loculus and we've been using some Python 3.12 features (f-strings, type hints) that make downgrading a pain. Would be really cool if we could get Python 3.12 support done, especially now that it's not even the latest anymore (since Python 3.13 release).

@victorlin
Copy link
Member Author

@corneliusroemer thanks for the bump – I've been busy with some Auspice stuff but will try to take a look next week. If anyone else wants to look in the meantime, feel free to do so!

@victorlin
Copy link
Member Author

victorlin commented Oct 17, 2024

I just tested locally and Augur+pandas are installable on Python 3.12 as-is. The lack of 3.12 support on pandas 1.x just means that it has to be built from source instead of installing pre-built wheels. I haven't done any actual testing, but maybe this is good enough to get things going on your end.

@corneliusroemer
Copy link
Member

corneliusroemer commented Oct 17, 2024

Thanks @victorlin - we'll have a look, this means we have to pip-install augur in the conda environment.yaml. Seems to work as a workaround for now, good idea! Though we'll also have to downgrade pandas from 2->1 then.

@corneliusroemer
Copy link
Member

corneliusroemer commented Nov 10, 2024

I've just had another look at supporting pandas v2, tests pass up to latest pandas v2.2.3

I think we should just go and officially sort it, all the tests pass: unit and functional. If something does come up, we can fix it. Would be good to run pathogen-ci with it put that's about it.

Update: pathogen-CI works with v2.2.3 of pandas 🎉
nextstrain/augur#1672 (comment)

@victorlin
Copy link
Member Author

victorlin commented Nov 11, 2024

@corneliusroemer thanks for looking! Did you mean to create 2 PRs for this (nextstrain/augur#1671, nextstrain/augur#1672)? The only difference between the two is that one supports pandas v1 while the other doesn't. I think it would be good to keep support for v1 if trivial. The tests are passing so it seems fine?

I think pandas changes should be good as long as tests are passing. But I'll want to review the Cram stuff in more detail. Could you rebase the pandas PR so that it's not based on the changes from nextstrain/augur#1667?

@corneliusroemer
Copy link
Member

Yes the two PRs are absolutely on purpose. I agree we should keep v1 compatibility for now. The reason I made the second PR excluding V1 is that pathogen CI will still use pandas v1 in tests unless you exclude v1. Does that make sense?

The reason is that we use the conda base and then pip install augur into it in CI. Since conda base already has pandas v1 and v1 is allowed pip keeps it around and doesn't upgrade to V2. So the actual V2 PR doesn't actually check V2 in pathogen CI. Hence the need for the test PR.

I expect that we can upgrade pandas to V2 in conda base once augur supports it so excluding V1 is just a temporary test PR

@corneliusroemer
Copy link
Member

@victorlin I've cleaned up the v2 pandas PR nextstrain/augur#1671 - I think we're pretty much good to go, let's get this done it unblocks Python 3.12 and other stuff :)

@corneliusroemer
Copy link
Member

Progress, I merged the PR! Should we wait a few days for any potential 26.1.0 bugs to surface first so we know bugs are not from pandas 2? Then release in say a week.

@corneliusroemer corneliusroemer changed the title Support pandas version 2 Rollout pandas version 2 Nov 13, 2024
@victorlin
Copy link
Member Author

@joverlee521 re: your edits to issue description (thanks for adding!)

fauna: Can we remove the ==1.* ceiling? From the commit that added it, I don't see a reason why pandas v2 can't be supported.

ncov-ingest: there is no ceiling pin. From a quick skim, none of the pandas usage sticks out as potentially breaking with v2, but the easiest way to know for sure is to just let it run.

@joverlee521
Copy link

Wanted to flag these two that specifically mentioned pandas v1 so we don't have any surprises.

fauna: I think we can just remove the ceiling.

For context, the download scripts that are used seasonal-flu within the Nextstrain docker runtime don't even use pandas. Our manual upload scripts that do use pandas also use xlrd v1.2.0 which errors with Python >=3.8 so I actually have a separate pyenv with Python 3.7 for that process. The use of Python 3.7 automatically limits pandas to v1. (It's definitely a bit of a mess that I need to sort out eventually...)

ncov-ingest: If we want to be cautious, we could create a branch-* image then use it fora a trial run of the workflow. But as you said, I'm not super concerned and we could just let it run to tackle any breakages.

@victorlin
Copy link
Member Author

victorlin commented Nov 20, 2024

  • Ensure the pins are propagated to conda-base and docker-base

The runtimes still resolve to pandas v1 due to the version constraints from evofr and fauna (this is only somewhat obvious in logs from the docker-base rebuild). I'll make separate issues in those repos.

@victorlin
Copy link
Member Author

The work in Augur is done. I think it's still beneficial to keep this open to be aware of if/when Nextstrain runtimes resolve to Pandas v2, so I'll transfer this to the nextstrain/public repo.

@victorlin victorlin transferred this issue from nextstrain/augur Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants