Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static typing with mypy, flake8, and lots of subjective code changes for Python3... #53

Open
wants to merge 102 commits into
base: master
Choose a base branch
from

Conversation

cfcohen
Copy link
Contributor

@cfcohen cfcohen commented Mar 20, 2021

This was never intended to be a pull request that I actually expected you to merge, but rather something I felt that I needed to do for my own benefit in order to understand this code. ;-) That said, I see no reason why I shouldn't let you know about my changes. You are welcome to accept as many, or as few, or even zero commits as you see fit.

The subject matter doesn't make for the simplest code in the world to begin with. I found several of the code paradigms such as the passing of dictionaries, which creates ambiguity about defined keys, and the heavy use of aliasing of constants in the module, class, and function scopes very confusing. The 1000+ line calculo() in iapws95.py was particularly perplexing to me, with even minor changes to the code demonstrating that I had no real comprehension of which variables were defined where.

Since the static typing features I've used necessitate Python3, and I've made no real efforts to understand the consequences for early Python3 support either, I suspect that your primary complaint about these changes will be that they substantially reduce the supported Python versions. That wasn't really a concern for my purposes.

While I made some effort to not modify the external API, I did not slavishly adhere to that as a requirement. There's a reasonable chance that the external API shifted minorly around some corner cases, and certainly did in a number of internal functions as identified by leading underscores. In at least one place, I think I now return 'nan' instead of None in the external API (which wasn't test for or documented anyway).

The benefits of the branch I hope will be obvious. It passes static type checking with mypy, and is clean of errors for a reasonable flake8 configuration. At least for me (which I realize can be very subjective) this branch is much more easily understood, and provides greater assurances about code correctness. It passes your unit tests with fairly minimal changes to the API and no changes to results, which I leaned on heavily to check correctness while changing the code.

I'm probably done working on this for a little while, but I intend to return to it again in the future with more emphasis on my understanding the IAPWS standard now that I understand the code a little better. If you're interested in doing something with this pull request let me know and I'll see if I can help. If not, that's fine too.

cfcohen added 30 commits March 13, 2021 16:24
DeprecationWarning: scipy.exp is deprecated and will be removed in
SciPy 2.0.0, use numpy.exp instead

DeprecationWarning: scipy.log is deprecated and will be removed in
SciPy 2.0.0, use numpy.lib.scimath.log instead
./iapws/iapws08.py:665:13: E117 over-indented
./docs/conf.py:243:3: E121 continuation line under-indented for hanging indent
./docs/conf.py:287:3: E121 continuation line under-indented for hanging indent
E122 continuation line missing indentation or outdented
E123 closing bracket does not match indentation of opening bracket's line
E126 continuation line over-indented for hanging indent
W391 blank line at end of file
C812 missing trailing comma
C101 Coding magic comment not found
C407 Unnecessary list comprehension - 'sum' can take a generator.
See: https://github.com/adamchainz/flake8-comprehensions
N813 camelcase 'Boltzmann' imported as lowercase 'kb'
N811 constant 'M' imported as non constant 'Mw'

Also avoid some ambiguity with local 'Mw' at line 926, but it would
still be nice for there to be more clarity.
W504 line break after binary operator
D100 Missing docstring in public module
F841 local variable 'st' is assigned to but never used

Since these tests are not currently working, I added a test that was
almost certainly correct to suppress the flake8 complaint.  Obviously,
addressing the tests would be a better fix. ;-)
D202 No blank lines allowed after function docstring
D209 Multi-line docstring closing quotes should be on a separate line
D204 1 blank line required after class docstring
D101 Missing docstring in public class

Perhaps a better class documentation string is required?
D104 Missing docstring in public package

This is another case where perhaps a more descriptive doc string would
be appropriate, but I don't know exactly what to say here.
D102 Missing docstring in public method
E241 multiple spaces after ','
E302 expected 2 blank lines, found 1
All conditional function variants must have identical signatures

Mypy doesn't like to have the same symbol declared with different
types, even if they're in differen conditional scopes.
iapws/iapws97.py:4520: error: Unsupported target for indexed assignment ("Tuple[float, float]")
iapws/iapws97.py:4522: error: Unsupported target for indexed assignment ("Tuple[float, float]")

Because tuples are not modifable, the previously code generated mypy
errors despite being what I thought was the intended API.  I thought
that I could modify the code to remove the stealthy update of the
CALLERS par list, but it trurns out that this change is required, and
causes tests to fail.  Was this update intentional?  If so, perhaps a
comment is appropriate?  If not, I think this is a bug.
The line breaks that make sense to me don't actually work.  This
solution is rather ugly but it makes both mypy and flake8 happy.
Or were declared with types (None) that didn't match the types on the
Derived classes.  The mypy errors were roughly:

error: "MEoS" has no attribute "rhoc"
error: "MEoS" has no attribute "_constants"
error: Incompatible types in assignment (expression has type
 "Dict[str, List[float]]", base class "MEoS" defined the type as "None")
The types on np.linspace probably shouldn't be needed if we had proper
stubs.  The remaining mypy errors are about missing stubs form numpy,
scipy, and matplotlib.
This may be a controversial change, but for mypy to make much sense of
the static types, we need _constants to be more stongly typed (not
dependent on the key string).  Perhaps there's something I can do to
restore this later, but not right now wre' still getting static types.
cfcohen added 21 commits March 16, 2021 23:25
It's now clearer to me how _fase is initialized.  Document the origin
of M and name in MEoS more clearly.
It's much more difficult to have a static assurances of the keys and
values of a dictionary than a custom class.  The resulting variable
names are slightly different, but not obviously worse to me.  More
importantly, the type is differentiated from alll other dictionaries
that are keyed by str and have float values, of which there are
several, including some with overlapping keys (e.g. _fav).
Better static typing and generally more readble code.  For region 4, I
returned 'nan' for the undefined properties, which migth cause a
subtle API change.  But now that there's a class, I can change back to
Optional[float] easily enough if needed.
More static typing and readability improvements.
Maybe I'm still missing something here, but there seems to be a lots
of fancy for what appears to be a pair of constants.  There's still
the self.rhoc and self.Tc variants, but they seem to be constants as
well.  Hopefully this hasn't broken anything...
The object properties are apparently never assigned to except during
initialization, where they're set to match the global constant, which
is typically imported from _iapws but is sometimes defined in the
module with a different value?
This approach make it clear how the remaining constants are scoped to
a specific set of functions.  The constants do not change and how the
"missing" contants are handled.  I also think the data flow from the
constants to where they're used is a lot more straightforward.
IAPWS95 is rather larger than I think is comfortable, and so I took
this opportunity to break a clearly delinated class out into a new
file.
Maybe there's a "traditional form" of these equations or something
like that where seeing the gamma2 terms makes them more familar.  To
me is was just additional unnecessary complexity.
The size of this function was one of the primary reasons that I
started making changes to IAPWS.  While I understood the overall
structure of this function, I was unable to understand what the inputs
and output of each calculation "mode" was because everything was in a
single scope with processing before and processing after the "switch".
This version is far from perfect, and is about 150 lines larger than
the previous version, but I think it's a lot more readable, and it's
likely that the size will return to something more like the orginal
after some additional cleanup, but this seemed like a good place to
commit.
While there was a lot of farily arbitrary restructuring in the
previous commit, this change was the one that seemed to be the most
substantively different.  These two variables really do seem to have
just been unused, whereas most of the other changes were more
obviously stylistic or caused by changes in data flow.
This is beneficial for mypy (and also for humans to some extent)
because it struggles with code paradigms like _mode where variables
are tested for types in one place, then summarized in _mode, and then
used logically in distant code that presumes the logic is sound.  By
placing the actual tests closer to the usage, mypy is able to figure
out, and humans can more easily verify that the constraints are
enforced.
Remove last remaining type ignores in iapws95.py, by more consistently
checking for "is not None", which mypy understands.
Also avoids confusion with the method of the same name on several
other classes.
In comparison to the IAPWS95 changes, these changes were hardly needed
because the code was already so much clearer.  There were a couple of
places where mypy had to be "accommodated" to remove the "type:
noginore" comments.  The first was moving the "is not None" tests
closer the calls that used the variables, and the second was splitting
out the _solve_X_X() functions because mypy doesn't cope well with
sub-function declarations and conditional logic.  Both changes were
beneficial anyway in my opinion, drawing attention to an unusual
computation of sigma in two cases, and generally providing strong
assurances that the entire _solve_X_X() call chain doesn't otherwise
update the object.  I wish we had that feature in IAPWS95.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 98.955% when pulling b7d2e09 on cfcohen:python3 into de7a093 on jjgomera:master.

@jjgomera
Copy link
Owner

Really great work!!
Althought python2 is discontinued, I would not like to break python2 compatibility because many system still use as default. I'll try to filter the pull-request to integrate the code cleaning section.

@cfcohen
Copy link
Contributor Author

cfcohen commented Mar 25, 2021

Sounds great. To be honest I didn't fully realize that the static typing annotations was a Python 3 only thing because I've been exclusively using Python3 for so long now. Take whatever is beneficial for you and leave the rest. This was really just so that I could better understand your code. I suggest looking at commits 38ae29f and fbfd2fe a little more closely, because I think they might reflect things that were truly wrong/suspicious in the original code. The other 99% of the code all checked out as correct once I understood what was happening.

@jjgomera
Copy link
Owner

jjgomera commented Mar 26, 2021

I just cherry-pick the style commit. I need more time to give a try to your changes, thanks, really great work

@jjgomera
Copy link
Owner

wow, really valuable speed increase with your code, so i work with that to remove static typing and add to module, thanks

@cfcohen
Copy link
Contributor Author

cfcohen commented Mar 27, 2021

Huh. I hadn't actually noticed. I'm really glad to see that you're finding value in the changes. They were fairly arbitrary, and non-trivial in scope. It seems to me that the performance improvement would have to be because of one of these two primary changes:

  1. The consistency of enforcement that floats were Python native floats and not a strange mixture of numpy.ndarrays and so forth.
  2. The move away from dictionaries to custom objects for passing multiple values between functions.

Numpy is really fast when the arrays are large. I've always suspected that it was mildly harmful when there was a single float inside an ndarray. In particular, I find the subtly different semantics between native float division and numpy float division very confusing. That's why I was on a little bit of a war path to exterminate the ndarray types from the places that were documented as being native floats.

I think the dictionary thing is less likely but there is a fair bit of string comparison that's gone now. I suppose that could be it. Are you sure it's not just a Python2 versus Python3 thing?

@cfcohen
Copy link
Contributor Author

cfcohen commented Mar 27, 2021

Since I probably know my changes better, here are the commits I think of as being primarily related to the numpy elimination:

38d0a95 Convert to math.exp() and math.log() where possible.
fa1eb5d Isolate numpy.log() and numpy.exp() calls as much as possible.
aaeca01 A working version with no numpy log()/exp() dependency.
f9abf87 More comprehensively cast from numpy.Float64 to Python float.
b05ee95 This commit demonstrates that numpy.Float64 is mostly gone.

I'm far from certain that's the cause of the performance improvement but it's my best theory. I never would have been able to figure this out without static typing by the way... Which is a big part of why I wanted static typing in the first place. I suppose you could try to check results with assert(type(x)==float) used in many places.

Another possibility might be to accept the changes more completely, and then remove the static typing with a search-and-replace regexp... :.*[=,] I could do this if you'd like.

@k-a-mendoza
Copy link

One comment I'll add is while this does appear to be faster for single P-T calculations, the necessity of creating a new object at every P-T point creates enormous overhead when trying to use this in a vectorized format. For example, my use case has a 400x5000x400 grid of P-T values which need to be solved. Creating 800,000,000 objects that need to be garbage collected is not ideal from a performance perspective.

@3x380V
Copy link

3x380V commented Oct 23, 2024

@cfcohen, could you rebase this PR?
@jjgomera, does python2 support argument still stand? (seems so based on recent commit in master branch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants