Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of models beyond the limits for which they were validated or intended #433

Open
adriesse opened this issue Feb 20, 2018 · 18 comments
Open

Comments

@adriesse
Copy link
Member

It occurs to me that it could be an interesting feature of PVLIB, to produce a certain class of warnings whenever input parameters fall outside the range for which models were developed or validated. Could be a latitude range in the case of tilted irradiance models, as a possible example. Thoughts?

@cwhanse
Copy link
Member

cwhanse commented Feb 20, 2018

A good idea, but determining the conditions for validation is probably not practical. It's not always clear what filtering was done on the data. Also, some models were developed using hourly data, for example, would we warn if they were applied to minute data?

@wholmgren
Copy link
Member

I agree that could be interesting but challenging. The first step would probably be to add more and better documentation of model limitations to pvlib. Perhaps put these in a special box in the rendered documentation.

@markcampanelli
Copy link
Contributor

Wouldn’t it be nice to better know what parts of pvlib are used most? That’s a can of worms too, I suppose.

@adriesse
Copy link
Member Author

Good ideas! I like the warning for mismatch on the time scale too.

Yes, it would be challenging and could probably never be done everywhere and documentation could come first. But what would speak against doing it for a few easy-ish cases as examples for future coders to follow? The time scale one could be appropriate for detect_clearsky() for example, where the limits are already in the documentation.

I must admit I don't know quite how to do this. Can you create warning classes similar to exception classes or something?

@cwhanse
Copy link
Member

cwhanse commented Feb 22, 2018 via email

@wholmgren
Copy link
Member

Can you create warning classes similar to exception classes or something?

See PVLibDeprecationWarning in the examples in #427 (comment)

The gist of the warnings in #427 is

import warnings

class PVLibDeprecationWarning(UserWarning):
    pass

message = complicated_stuff_copied_from_matplotlib()
warnings.warn(message, PVLibDeprecationWarning)

You could imagine a pvlibModelValidationWarning defined and used in this simple way. The complicated stuff in #427 is to autofill information from the decorated function, but we'd skip that for this feature (at least initially).

I think this strategy would be much better than print or logging because users that understand the risk can easily silence the warnings using the python warning machinery.

We'd also want to add tests to ensure that the warnings are issued and not issued as appropriate.

@mikofski
Copy link
Member

mikofski commented Oct 31, 2019

TL;DR: maybe this could work in a data-model checker/validator (skip to the very bottom) and there are already several packages that do this (skip to list about halfway down), but I would prefer keep pvlib as a fast, bare metal library with stern warnings in the documentation.

random rambles

In principle I like the idea, but I have my concerns as well:

  1. My first concern would be how would all these warnings and checks affect performance? This may be one of the arguments for duck-typing and EAFP vs. LBYL. I think it's possible to implement a few warnings if they use the try:except format which should in theory have the minimum impact

    in practice trying to write this makes me think it needs some type of validation decorator that raises the exception. Something like:

def validator(a, b):
    def innerfunc(f):
        def wrapper(x):
            if a <= x <= b:
                return f(x)
            else:
                raise Exception('Validation error')
        return wrapper
    return innerfunc


@validator(2, 4)
def g(x):
    return x/2


>>> g(3)
1.5

>>> g(1)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-21-3580407364ae> in <module>
----> 1 g(1)

<ipython-input-18-9bf5d6a1b9b8> in wrapper(x)
      5                 return f(x)
      6             else:
----> 7                 raise Exception('Validation error')
      8         return wrapper
      9     return innerfunc

Exception: Validation error

>>> g(5)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-22-8ec331d73f51> in <module>
----> 1 g(5)

<ipython-input-18-9bf5d6a1b9b8> in wrapper(x)
      5                 return f(x)
      6             else:
----> 7                 raise Exception('Validation error')
      8         return wrapper
      9     return innerfunc

Exception: Validation error

So my concern about performance was reinforced, because I don't see how to check x limits without using if and condition tests, which will take time, and in the case of a time-series 100k steps long, this will add overhead, especially if every function has this level of checking.

This also made me think that there's probably already something out there like this, a quick Google search for "python"+"validator" turned up these 3, but there may be more:

  1. My second concern is readability. I find it sometimes tricky to wrap code in try:except although it is easier to read than if:else it just needs to be thought out well and explained with lots of comments - which then look like documentation which then makes me think that the warning should just go there, in the documentation.

So I don't know what's best, I appreciate the concern, but I'm not sure how it can be implemented and address my concerns. I guess I prefer the idea of stressing limits on inputs in the documents more.

I also talked to some colleagues, and they suggested that perhaps there could be either an optional or separate application for folks who wanted this type of rigorous checking. It occurred to me that validators are typically used for UIs, whereas pvlib is more of a bare metal library.

Data Model Checker

One other idea, perhaps spurred by my colleagues' comments, is that if we implement a data model as suggested in the pvlib roadmap user needs and v1.0 milestone then a user could optionally run the data model through the rigorous checker/validator.

Basically,

  1. a user would create a data model, perhaps in JSON or YAML,
  2. then they could optionally pass it through the checker
  3. or just pass it straight into the model chain

@adriesse
Copy link
Member Author

Thanks for pointing out the new roadmap--I hadn't noticed it before.

An small example from numpy for what it's worth:

In[1]: import numpy as np
In[2]: np.arccos([0,1,2])
__main__:1: RuntimeWarning: invalid value encountered in arccos
Out[2]: array([1.57079633, 0.        ,        nan])

@wholmgren
Copy link
Member

I agree with Mark that decorators could be a good way to approach this. I agree with Anton that something like the simple numpy RuntimeWarning could be a good way to alert the user.

In any case, adding validation code will also require adding clear documentation about what's expected. So I still think improving documentation is the fastest way to make progress on this topic.

Historical context... The circa 2014 PVSC version of pvlib python had some validation code that was sort of ported from matlab. Checkout the prearizona tag to browse the repo at that point in time. I removed it because I didn't like the pattern (not "pythonic"), I just didn't understand some of the code, some of the tests were overly restrictive, and I wanted NaNs instead of exceptions. It's clear that Rob put a lot of effort into that code. My big concern is that we're going to repeat the situation... someone puts a lot of effort into this hard problem and the solution is not accepted.

@cwhanse
Copy link
Member

cwhanse commented Nov 1, 2019

@wholmgren you are referring code like this?

	Expect={'SurfTilt':('num','x>=0'),
	      'SunZen':('x>=-180'),
	      'DHI':('x>=0'),
	      'GHI':('x>=0')
	      }

The Matlab PVLIB has input checking in each function, I couldn't say we had a specific reason for doing that, likely we just thought it was be helpful. It looks like that checking was carried forward in the initial port.

+1 for @wholmgren suggestion to improve documentation first.
+1 for @mikofski concerns about maintaining readability.

The data model checker is a good idea, and could be added when we implement the data model. Just to clarify in case my choice of "data model" isn't clear, the development plan envisions something like a UML diagram that guides definitions of classes and methods. An instance of the system data model could be recorded using JSON or YAML.

And to @adriesse 's point, we haven't called attention to the development plan which was posted just a few weeks ago. I meant to do that once v0.7 is released.

@wholmgren
Copy link
Member

@cwhanse yes that's the pattern I'm referring to.

@mikofski
Copy link
Member

mikofski commented Nov 1, 2019

Maybe, this is a question that could be asked in a user survey?

@adriesse
Copy link
Member Author

adriesse commented Nov 2, 2019

Is the Development Roadmap complete?

@cwhanse
Copy link
Member

cwhanse commented Nov 4, 2019

Is the Development Roadmap complete?

Not sure what you are asking. It includes all objectives that were discussed, I believe. We can consider more, or different objectives for each milestone. I think it's important to have a plan of this nature.

@adriesse
Copy link
Member Author

adriesse commented Nov 4, 2019

I am curious about the roadmap development process. I missed the discussion that you refer to. (Or at least I can't recall it at the moment.)

@cwhanse
Copy link
Member

cwhanse commented Nov 4, 2019

I am curious about the roadmap development process. I missed the discussion that you refer to. (Or at least I can't recall it at the moment.)

That discussion is in an email chain, which resulted in the document posted to the wiki. The roadmap isn't complete, once v0.7 is released we can post an issue for v0.8 and call attention to it and invite comments.

@wholmgren
Copy link
Member

Maybe we should go ahead an start one issue for the roadmap and a second issue for finalizing 0.7 so that we don't pollute this thread.

@markcampanelli
Copy link
Contributor

markcampanelli commented Nov 6, 2019

I’ll throw another design pattern into the ring from my Spectral Mismatch Correction Factor, M, calculator. The pattern is “functional” for the “type-hinted” computations, but uses classes to wrap the function inputs to ensure that they are valid (via __init__() checks during the data object construction). In particular, the M computation requires the integration of products of certain non-negative functions with strictly positive domains represented by data, and I ensure that the provided data validly represents such functions using a subclassing data structure starting here:

https://github.com/markcampanelli/pvfit/blob/master/pvfit/measurement/spectral_correction/api.py#L13

In particular, this makes the function calls clean/lightweight, because I can very reasonably assume that the inputs are correct so long as the suggested input classes have been used as per the type hints. For example, if my algorithm requires ordered domain data for efficiency, then I accept unordered domain and range data in the data class constructor (for user convenience), but also order it during object construction to ensure that it works in the function calls for which it is intended. An exception is thrown during object construction when invalidity is detected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants