Use of models beyond the limits for which they were validated or intended #433

adriesse · 2018-02-20T15:11:07Z

It occurs to me that it could be an interesting feature of PVLIB, to produce a certain class of warnings whenever input parameters fall outside the range for which models were developed or validated. Could be a latitude range in the case of tilted irradiance models, as a possible example. Thoughts?

cwhanse · 2018-02-20T16:00:37Z

A good idea, but determining the conditions for validation is probably not practical. It's not always clear what filtering was done on the data. Also, some models were developed using hourly data, for example, would we warn if they were applied to minute data?

wholmgren · 2018-02-21T16:50:57Z

I agree that could be interesting but challenging. The first step would probably be to add more and better documentation of model limitations to pvlib. Perhaps put these in a special box in the rendered documentation.

markcampanelli · 2018-02-22T07:04:01Z

Wouldn’t it be nice to better know what parts of pvlib are used most? That’s a can of worms too, I suppose.

adriesse · 2018-02-22T22:09:18Z

Good ideas! I like the warning for mismatch on the time scale too.

Yes, it would be challenging and could probably never be done everywhere and documentation could come first. But what would speak against doing it for a few easy-ish cases as examples for future coders to follow? The time scale one could be appropriate for detect_clearsky() for example, where the limits are already in the documentation.

I must admit I don't know quite how to do this. Can you create warning classes similar to exception classes or something?

cwhanse · 2018-02-22T22:15:30Z

I think it needs to be more mild than a warning, more like a FYI

…

________________________________ From: Anton Driesse <[email protected]> Sent: Thursday, February 22, 2018 3:09 PM To: pvlib/pvlib-python Cc: Hansen, Clifford W; Comment Subject: [EXTERNAL] Re: [pvlib/pvlib-python] Use of models beyond the limits for which they were validated or intended (#433) Good ideas! I like the warning for mismatch on the time scale too. Yes, it would be challenging and could probably never be done everywhere and documentation could come first. But what would speak against doing it for a few easy-ish cases as examples for future coders to follow? The time scale one could be appropriate for detect_clearsky() for example, where the limits are already in the documentation. I must admit I don't know quite how to do this. Can you create warning classes similar to exception classes or something? - You are receiving this because you commented. Reply to this email directly, view it on GitHub<#433 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AFJNLyL27CVuipq8iFoH2VCylx1oh3XRks5tXeWPgaJpZM4SMG_L>.

wholmgren · 2018-02-23T00:26:21Z

Can you create warning classes similar to exception classes or something?

See PVLibDeprecationWarning in the examples in #427 (comment)

The gist of the warnings in #427 is

import warnings

class PVLibDeprecationWarning(UserWarning):
    pass

message = complicated_stuff_copied_from_matplotlib()
warnings.warn(message, PVLibDeprecationWarning)

You could imagine a pvlibModelValidationWarning defined and used in this simple way. The complicated stuff in #427 is to autofill information from the decorated function, but we'd skip that for this feature (at least initially).

I think this strategy would be much better than print or logging because users that understand the risk can easily silence the warnings using the python warning machinery.

We'd also want to add tests to ensure that the warnings are issued and not issued as appropriate.

mikofski · 2019-10-31T20:49:52Z

TL;DR: maybe this could work in a data-model checker/validator (skip to the very bottom) and there are already several packages that do this (skip to list about halfway down), but I would prefer keep pvlib as a fast, bare metal library with stern warnings in the documentation.

random rambles

In principle I like the idea, but I have my concerns as well:

My first concern would be how would all these warnings and checks affect performance? This may be one of the arguments for duck-typing and EAFP vs. LBYL. I think it's possible to implement a few warnings if they use the try:except format which should in theory have the minimum impact

in practice trying to write this makes me think it needs some type of validation decorator that raises the exception. Something like:

def validator(a, b):
    def innerfunc(f):
        def wrapper(x):
            if a <= x <= b:
                return f(x)
            else:
                raise Exception('Validation error')
        return wrapper
    return innerfunc


@validator(2, 4)
def g(x):
    return x/2


>>> g(3)
1.5

>>> g(1)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-21-3580407364ae> in <module>
----> 1 g(1)

<ipython-input-18-9bf5d6a1b9b8> in wrapper(x)
      5                 return f(x)
      6             else:
----> 7                 raise Exception('Validation error')
      8         return wrapper
      9     return innerfunc

Exception: Validation error

>>> g(5)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-22-8ec331d73f51> in <module>
----> 1 g(5)

<ipython-input-18-9bf5d6a1b9b8> in wrapper(x)
      5                 return f(x)
      6             else:
----> 7                 raise Exception('Validation error')
      8         return wrapper
      9     return innerfunc

Exception: Validation error

So my concern about performance was reinforced, because I don't see how to check x limits without using if and condition tests, which will take time, and in the case of a time-series 100k steps long, this will add overhead, especially if every function has this level of checking.

This also made me think that there's probably already something out there like this, a quick Google search for "python"+"validator" turned up these 3, but there may be more:

validators-0.14.0 308 GH stars
validator.py-1.3.0 124 GH stars
validator-collection-1.3.7 47 GH stars

My second concern is readability. I find it sometimes tricky to wrap code in try:except although it is easier to read than if:else it just needs to be thought out well and explained with lots of comments - which then look like documentation which then makes me think that the warning should just go there, in the documentation.

So I don't know what's best, I appreciate the concern, but I'm not sure how it can be implemented and address my concerns. I guess I prefer the idea of stressing limits on inputs in the documents more.

I also talked to some colleagues, and they suggested that perhaps there could be either an optional or separate application for folks who wanted this type of rigorous checking. It occurred to me that validators are typically used for UIs, whereas pvlib is more of a bare metal library.

Data Model Checker

One other idea, perhaps spurred by my colleagues' comments, is that if we implement a data model as suggested in the pvlib roadmap user needs and v1.0 milestone then a user could optionally run the data model through the rigorous checker/validator.

Basically,

a user would create a data model, perhaps in JSON or YAML,
then they could optionally pass it through the checker
or just pass it straight into the model chain

adriesse · 2019-10-31T22:40:45Z

Thanks for pointing out the new roadmap--I hadn't noticed it before.

An small example from numpy for what it's worth:

In[1]: import numpy as np
In[2]: np.arccos([0,1,2])
__main__:1: RuntimeWarning: invalid value encountered in arccos
Out[2]: array([1.57079633, 0.        ,        nan])

wholmgren · 2019-11-01T03:43:17Z

I agree with Mark that decorators could be a good way to approach this. I agree with Anton that something like the simple numpy RuntimeWarning could be a good way to alert the user.

In any case, adding validation code will also require adding clear documentation about what's expected. So I still think improving documentation is the fastest way to make progress on this topic.

Historical context... The circa 2014 PVSC version of pvlib python had some validation code that was sort of ported from matlab. Checkout the prearizona tag to browse the repo at that point in time. I removed it because I didn't like the pattern (not "pythonic"), I just didn't understand some of the code, some of the tests were overly restrictive, and I wanted NaNs instead of exceptions. It's clear that Rob put a lot of effort into that code. My big concern is that we're going to repeat the situation... someone puts a lot of effort into this hard problem and the solution is not accepted.

cwhanse · 2019-11-01T15:23:10Z

@wholmgren you are referring code like this?

	Expect={'SurfTilt':('num','x>=0'),
	      'SunZen':('x>=-180'),
	      'DHI':('x>=0'),
	      'GHI':('x>=0')
	      }

The Matlab PVLIB has input checking in each function, I couldn't say we had a specific reason for doing that, likely we just thought it was be helpful. It looks like that checking was carried forward in the initial port.

+1 for @wholmgren suggestion to improve documentation first.
+1 for @mikofski concerns about maintaining readability.

The data model checker is a good idea, and could be added when we implement the data model. Just to clarify in case my choice of "data model" isn't clear, the development plan envisions something like a UML diagram that guides definitions of classes and methods. An instance of the system data model could be recorded using JSON or YAML.

And to @adriesse 's point, we haven't called attention to the development plan which was posted just a few weeks ago. I meant to do that once v0.7 is released.

wholmgren · 2019-11-01T15:55:18Z

@cwhanse yes that's the pattern I'm referring to.

mikofski · 2019-11-01T16:24:24Z

Maybe, this is a question that could be asked in a user survey?

adriesse · 2019-11-02T18:24:04Z

Is the Development Roadmap complete?

cwhanse · 2019-11-04T15:47:42Z

Is the Development Roadmap complete?

Not sure what you are asking. It includes all objectives that were discussed, I believe. We can consider more, or different objectives for each milestone. I think it's important to have a plan of this nature.

adriesse · 2019-11-04T19:32:03Z

I am curious about the roadmap development process. I missed the discussion that you refer to. (Or at least I can't recall it at the moment.)

cwhanse · 2019-11-04T19:43:34Z

I am curious about the roadmap development process. I missed the discussion that you refer to. (Or at least I can't recall it at the moment.)

That discussion is in an email chain, which resulted in the document posted to the wiki. The roadmap isn't complete, once v0.7 is released we can post an issue for v0.8 and call attention to it and invite comments.

wholmgren · 2019-11-04T20:11:20Z

Maybe we should go ahead an start one issue for the roadmap and a second issue for finalizing 0.7 so that we don't pollute this thread.

markcampanelli · 2019-11-06T13:24:36Z

I’ll throw another design pattern into the ring from my Spectral Mismatch Correction Factor, M, calculator. The pattern is “functional” for the “type-hinted” computations, but uses classes to wrap the function inputs to ensure that they are valid (via __init__() checks during the data object construction). In particular, the M computation requires the integration of products of certain non-negative functions with strictly positive domains represented by data, and I ensure that the provided data validly represents such functions using a subclassing data structure starting here:

https://github.com/markcampanelli/pvfit/blob/master/pvfit/measurement/spectral_correction/api.py#L13

In particular, this makes the function calls clean/lightweight, because I can very reasonably assume that the inputs are correct so long as the suggested input classes have been used as per the type hints. For example, if my algorithm requires ordered domain data for efficiency, then I accept unordered domain and range data in the data class constructor (for user convenience), but also order it during object construction to ensure that it works in the function calls for which it is intended. An exception is thrown during object construction when invalidity is detected.

wholmgren added this to the Someday milestone Feb 21, 2018

wholmgren added enhancement documentation labels Feb 21, 2018

adriesse mentioned this issue Jul 18, 2018

Handle different time intervals in clearsky.detect_clearsky #507

Open

wholmgren mentioned this issue Aug 23, 2018

Add GTI DIRINT model #400

Merged

13 tasks

cwhanse mentioned this issue Jan 30, 2019

Add irradiance.clearsky_index #571

Merged

8 tasks

wholmgren mentioned this issue Oct 31, 2019

Implement IEC 61853 IAM calculations for diffuse irradiance #793

Merged

8 tasks

wholmgren mentioned this issue Sep 23, 2020

add Ross temperature model #1045

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of models beyond the limits for which they were validated or intended #433

Use of models beyond the limits for which they were validated or intended #433

adriesse commented Feb 20, 2018

cwhanse commented Feb 20, 2018

wholmgren commented Feb 21, 2018

markcampanelli commented Feb 22, 2018

adriesse commented Feb 22, 2018

cwhanse commented Feb 22, 2018 via email

wholmgren commented Feb 23, 2018

mikofski commented Oct 31, 2019 •

edited

Loading

adriesse commented Oct 31, 2019

wholmgren commented Nov 1, 2019

cwhanse commented Nov 1, 2019

wholmgren commented Nov 1, 2019

mikofski commented Nov 1, 2019

adriesse commented Nov 2, 2019

cwhanse commented Nov 4, 2019

adriesse commented Nov 4, 2019 •

edited

Loading

cwhanse commented Nov 4, 2019

wholmgren commented Nov 4, 2019

markcampanelli commented Nov 6, 2019 •

edited

Loading

Use of models beyond the limits for which they were validated or intended #433

Use of models beyond the limits for which they were validated or intended #433

Comments

adriesse commented Feb 20, 2018

cwhanse commented Feb 20, 2018

wholmgren commented Feb 21, 2018

markcampanelli commented Feb 22, 2018

adriesse commented Feb 22, 2018

cwhanse commented Feb 22, 2018 via email

wholmgren commented Feb 23, 2018

mikofski commented Oct 31, 2019 • edited Loading

random rambles

Data Model Checker

adriesse commented Oct 31, 2019

wholmgren commented Nov 1, 2019

cwhanse commented Nov 1, 2019

wholmgren commented Nov 1, 2019

mikofski commented Nov 1, 2019

adriesse commented Nov 2, 2019

cwhanse commented Nov 4, 2019

adriesse commented Nov 4, 2019 • edited Loading

cwhanse commented Nov 4, 2019

wholmgren commented Nov 4, 2019

markcampanelli commented Nov 6, 2019 • edited Loading

mikofski commented Oct 31, 2019 •

edited

Loading

adriesse commented Nov 4, 2019 •

edited

Loading

markcampanelli commented Nov 6, 2019 •

edited

Loading