Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes numpy >2.0 compatibility for asr while replicating existing behavior #11446

Closed
wants to merge 18 commits into from

Conversation

andylamp
Copy link

@andylamp andylamp commented Dec 2, 2024

What does this PR do ?

The PR attempts to restore functionality with recent numpy versions; numpy 2.0 removed sctypes and this is breaking asr functionality.

edit: closed the PR due to commits not being signed-off properly and rebasing caused issues :)

Collection: asr

Changelog

Usage

The usage is that ASR tasks that call _convert_samples_to_float32 functions now succeed. As noted above, while this has been attempted to be addressed in the PRs mentioned above, I believe that the functionality is not replicated accurately.

More concretely, if you check the output of sctype for the types of int and float in supported versions of numpy you'd get the following:

print(np.sctypes["int"])
[<class 'numpy.int8'>, <class 'numpy.int16'>, <class 'numpy.int32'>, <class 'numpy.int64'>]
print(np.sctypes["float"])
[<class 'numpy.float16'>, <class 'numpy.float32'>, <class 'numpy.float64'>]

However, if we use the issubdtype to perform this the set will be wider, case on point for floating point:

import numpy as np
float128 = np.floating[128]
np.issubdtype(float128, np.floating)
Out[6]: True

And, also for integers the set mostly covers the output of sctype for signed ones. However any subclass from signedinteger or unsignedinteger will return true. A more concrete example would be,

import numpy as np
int96 = np.signedinteger[96]
np.issubdtype(int96, np.integer)
Out[4]: True
uint96 = np.unsignedinteger[96]
np.issubdtype(uint96, np.integer)
Out[5]: True

Therefore, in this case not only we consider int{8,16,32,64} but also uint{8,16,32,64} which is not the expected result when querying sctype['int'] leading to potentially unexpected behavior. Case on point,

import numpy as np
np.issubdtype(np.uint8, np.integer)
Out[3]: True

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests? No new tests were required.
  • Did you add or update any necessary documentation? No documentation update required.
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc) No optional components are affected.
  • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

The fix is backwards compatible with all supported numpy 1.x versions as well. Therefore, it should pose minimal risk wrt to integration.

@github-actions github-actions bot added the ASR label Dec 2, 2024
andylamp and others added 16 commits December 2, 2024 01:40
Signed-off-by: andylamp <[email protected]>
Signed-off-by: andylamp <[email protected]>
* ci: Allow dry-run of release

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* finalize

Signed-off-by: Oliver Koenig <[email protected]>

---------

Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: andylamp <[email protected]>
* fix dtype when init HF model from config

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: andylamp <[email protected]>
…VIDIA#11435)

* Remove try / catch block to propagate import errors

Signed-off-by: Jan Lasek <[email protected]>

* Small rewrite to handle import errors in export/deploy scripts

Signed-off-by: Jan Lasek <[email protected]>

* Apply isort and black reformatting

Signed-off-by: janekl <[email protected]>

---------

Signed-off-by: Jan Lasek <[email protected]>
Signed-off-by: janekl <[email protected]>
Co-authored-by: janekl <[email protected]>
Signed-off-by: andylamp <[email protected]>
Signed-off-by: andylamp <[email protected]>
Signed-off-by: andylamp <[email protected]>
Copy link
Contributor

github-actions bot commented Dec 2, 2024

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.


Your code was analyzed with PyLint. The following annotations have been identified:


------------------------------------
Your code has been rated at 10.00/10

Thank you for improving NeMo's documentation!

@andylamp andylamp closed this Dec 2, 2024
@andylamp andylamp deleted the fix-numpy-compatibility branch December 2, 2024 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants