Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We should have type annotations #1709

Open
abhisrkckl opened this issue Jan 4, 2024 · 13 comments
Open

We should have type annotations #1709

abhisrkckl opened this issue Jan 4, 2024 · 13 comments

Comments

@abhisrkckl
Copy link
Contributor

abhisrkckl commented Jan 4, 2024

Type hinting was introduced in Python 3.5. Now that we no longer support older python versions (<3.8), there is no reason why we cannot have type annotations everywhere. It makes life so much easier while using an IDE with code highlighting.

30-05-2024 Update: This is gradually being implemented.

@scottransom
Copy link
Member

I just found another really amazing usecase with vscode that is enabled by type hinting.

In the example fit_NGC6440E.py we have the following:

...
from pint.models import get_model_and_toas
...
# Read the timing model and the TOAs
m, t = get_model_and_toas(parfile, timfile)

If we are editing code below that, the editor does not know what m and t are.

However, if I simply put a wrapper around get_model_and toas() like so:

import pint.models.timing_model
def foo(par: str, tim: str) -> tuple[pint.models.timing_model.TimingModel, pint.toa.TOAs]:
    return get_model_and_toas(par, tim)

m, t = foo(parfile, timfile)

Then anywhere later in the code, vscode knows that m and t are TimingModel and TOAs class instances, respectively, and you get tab completion (and autocompletion) on all of the methods and attributes of those classes.

That's really cool. I definitely think we should add type hints to many of our most common methods and functions for this reason alone.

@scottransom
Copy link
Member

scottransom commented Jan 5, 2024

Interestingly, you can get the same behavior without adding type hints to the definition of get_model_and_toas() but by pre-hinting the variables that you are going to use in fit_NGC6440E.py itself:

import pint.toa
import pint.models.timing_model

m: pint.models.timing_model.TimingModel
t: pint.toa.TOAs
m, t = get_model_and_toas("1748-2446as.par", "Ter5as_forDMs.tim")

That works just fine, and the editor knows what m and t are and allows autocompletion etc.

@scottransom
Copy link
Member

It looks like there are a couple projects that do code analysis to guess type hinting information and then automatically modify your code. So this could be a first step if we wanted to do this with much of or most of PINT:
https://github.com/Instagram/MonkeyType

@aarchiba
Copy link
Contributor

Type hinting is also something that can reasonably be done incrementally. In my experience it often turns up awkward corner cases (oops, I forgot that could return None) so it can be useful to go through the code fixing these at the same time. There will be a certain amount of fighting with the type checker - many interfaces don't really work especially well for static typing (notably dynamically generated class attributes like model.F0). Liberally sprinkling assert isinstance(thing, TypeIKnowItReallyIs) and typing.cast() can help.

A further benefit is that the generated documentation can (does?) automatically include type information for function arguments and return values; for fully documented functions this doesn't necessarily add much, but it can save some effort when documenting new functions, and it makes the docs for poorly or un-documented functions more usable.

@aarchiba
Copy link
Contributor

There's some discussion in #1717 on the best way to do this. Some points that I think came out of that discussion (others may disagree!):

  • Typing can mostly be done incrementally using the Scout rule ("leave the place better than you found it"), adding types to functions when working on them.
  • It would be a good idea to define some commonly used types in a global way (pint.types?); in addition to simplifying the typing process, this would encourage us to use these standard (combinations of) types rather than cooking up our own inconsistent versions.
  • For as long as we want to retain 3.8 compatibility we are stuck with using the more verbose Union[type1, type2] rather than type1 | type2. While there is from __future__ import annotations, this is to support an upcoming feature that has been deferred for several python releases already, and it is not clear how this interacts with type checkers.

There are some questions:

  • How aggressively should we attempt to type Quantities? In principle it is possible to specify that an argument should be a Quantity with units of time, but it's not clear to me how powerful/flexible/effective/useful this is.
  • How aggressively should we attempt to type numpy arrays? Their types can include at least dtype information, but it's not clear that one can capture the runtime flexibility of numpy-based functions in their types. The documentation on typing for numpy is also not great.

@aarchiba
Copy link
Contributor

It looks like there are a couple projects that do code analysis to guess type hinting information and then automatically modify your code. So this could be a first step if we wanted to do this with much of or most of PINT: https://github.com/Instagram/MonkeyType

Often the correct types for function arguments require some thought. But many IDEs - including VSCode - can be persuaded to include "inlay hints" showing inferred types for function arguments, return values, and variables; generally double-clicking on these will fill in the guess as source code. So a semi-automated procedure can be quite quick and convenient.

@dlakaplan
Copy link
Contributor

There's some discussion in #1717 on the best way to do this. Some points that I think came out of that discussion (others may disagree!):

  • Typing can mostly be done incrementally using the Scout rule ("leave the place better than you found it"), adding types to functions when working on them.
  • It would be a good idea to define some commonly used types in a global way (pint.types?); in addition to simplifying the typing process, this would encourage us to use these standard (combinations of) types rather than cooking up our own inconsistent versions.
  • For as long as we want to retain 3.8 compatibility we are stuck with using the more verbose Union[type1, type2] rather than type1 | type2. While there is from __future__ import annotations, this is to support an upcoming feature that has been deferred for several python releases already, and it is not clear how this interacts with type checkers.

There are some questions:

  • How aggressively should we attempt to type Quantities? In principle it is possible to specify that an argument should be a Quantity with units of time, but it's not clear to me how powerful/flexible/effective/useful this is.
  • How aggressively should we attempt to type numpy arrays? Their types can include at least dtype information, but it's not clear that one can capture the runtime flexibility of numpy-based functions in their types. The documentation on typing for numpy is also not great.

I generally agree with this plan. In terms of typing quantities, how would that interact with quantity_input annotations? We have some of those (although not uniformly, but e.g., in src/pint/derived_quantities.py). Can you have separate quantity_input and type annotations, or do we need to only use one?

@aarchiba
Copy link
Contributor

If we decide to take typing seriously, we could add mypy (or another type checker) to our CI. It is (they are) reasonably well-designed for a codebase that is only partially annotated, and they support incrementally annotating more and more of the code. They also support linter-like comments to suppress known non-problems.

@aarchiba
Copy link
Contributor

I generally agree with this plan. In terms of typing quantities, how would that interact with quantity_input annotations? We have some of those (although not uniformly, but e.g., in src/pint/derived_quantities.py). Can you have separate quantity_input and type annotations, or do we need to only use one?

I really haven't worked with typing and Quantities, so I'm not sure how this would look. I think a conservative approach might be to simply mark Quantities as Quantities?

@aarchiba
Copy link
Contributor

The PR #1725 is an initial setup getting static type checking into our CI. There are also some notes on what a gradual typing workflow would look like.

@aarchiba
Copy link
Contributor

aarchiba commented Mar 4, 2024

I generally agree with this plan. In terms of typing quantities, how would that interact with quantity_input annotations? We have some of those (although not uniformly, but e.g., in src/pint/derived_quantities.py). Can you have separate quantity_input and type annotations, or do we need to only use one?

For Astropy version 6.0, we can have the best of both worlds: we can write

@u.quantity_input
def spin_period(spin_frequency: u.Quantity[u.Hz]) -> u.Quantity[u.s]: ...

and it will do the right thing in terms of both static type checking and informing quantity_input what to do.

Unfortunately, it looks like pre-6.0 - oldestdeps uses Astropy 4.0 - this is not so convenient. I think it should work to separate the two effects:

@u.quantity_input(spin_frequency=u.Hz)
def spin_period(spin_frequency: u.Quantity) -> u.Quantity: ...

but I'll have to try that out.

[Edited to add:] Looks like this works; not as nice as the astropy 6.0 version but seems to work okay.

@dlakaplan
Copy link
Contributor

As we do more of this, we might want to standardize some type definitions of our own. For instance, we often have things like:
Union[float, np.ndarray, u.Quantity]
or
Union[float, np.ndarray, u.Quantity, Time]
and we could define types like those (in the top-level package?) to use them uniformly. Same with some file-like type (Union[str, Path, IO] or whatever).

@dlakaplan
Copy link
Contributor

There were some issues combining type hints and unit decorators with old astropy.

@u.quantity_input(x=u.m)
def v(x: u.Quantity) -> u.Quantity
    return x/(10*u.s)

seems to fail with the older astropy since it interprets the return type for a return unit. Without the @u.quantity_input this works fine, and in newer astropy it works. But not with the old ones. Instead we need to do:

@u.quantity_input(x=u.m)
def v(x: u.Quantity) -> u.m/u.s
    return x/(10*u.s)

and specify the units for the return rather than the type/class.

This may exclusively be in derived_quantities. But this is just a note to check & update that (and the other annotations) when we update the astropy oldestdeps version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants