-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compute_rpss is not working on generated lightning data #5
Comments
Is this an issue only for the At this moment the fix can be just changing this hardcoded value? Now a standardizing the IO by enforcing attributes and checking routines will be a good upgrade to the entire code base, I do not think it will be much of a hassle to input this. That being said, I think this can be a fix in the next version perhaps? Do you agree? |
Did a major fix to def compute_rpss(self, threshold, dim=None):
"""
Compute the Ranked Probability Skill Score (RPSS) for a given threshold.
Args:
threshold (float): The threshold value for binary classification.
dim (str, list, or None): The dimension(s) along which to compute the RPSS.
If None, compute the RPSS over the entire data.
Returns:
xarray.DataArray: The computed RPSS values.
"""
# Convert data to binary based on the threshold
obs_binary = (self.obs_data >= threshold).astype(int)
model_binary = (self.model_data >= threshold).astype(int)
# Calculate the RPS for the model data
rps_model = ((model_binary.cumsum(dim) - obs_binary.cumsum(dim)) ** 2).mean(dim=dim)
# Calculate the RPS for the climatology (base rate)
base_rate = obs_binary.mean(dim=dim)
rps_climo = ((xr.full_like(model_binary, 0).cumsum(dim) - obs_binary.cumsum(dim)) ** 2).mean(dim=dim)
rps_climo = rps_climo + base_rate * (1 - base_rate)
# Calculate the RPSS
rpss = 1 - rps_model / rps_climo
return rpss Indepth Explanation for what this means (for new users)The updated In the context of xarray and dimensions/coordinates in a dataset, a scalar value refers to a single value that does not depend on any dimensions. It is a 0-dimensional value. On the other hand, a non-scalar value is an array or a DataArray that depends on one or more dimensions and has corresponding coordinates. Let's consider an example to illustrate the difference: Suppose we have a dataset with dimensions "time", "lat", and "lon". The dataset contains a variable "temperature" with corresponding coordinates for each dimension.
In the updated The subsequent lines of code in the rps_climo = ((xr.full_like(model_binary, 0).cumsum(dim) - obs_binary.cumsum(dim)) ** 2).mean(dim=dim)
rps_climo = rps_climo + base_rate * (1 - base_rate) If Now, whether this will work with data of different coordinates??? The updated However, it's important to note that if the coordinates of In summary, the updated |
@harry9713 I have created a new branch. Can you test this and let me know if the bug is gone and then we can close this with the "IO" changes as a major enhancement in the major update or so? |
@Debasish-Mahapatra it does fixes the runtime error for now. So we can merge this. I will check the rest of the methods as well. However, I did not get a realistic result as the sample data give nan values everytime. We will have to release the new version since this is a major fix. |
@harry9713 I will merge this with the main. Were you testing the code with the added Lightning data? It might be giving NaN values because of the threshold; did you check that? It might also be a good Idea to check with some "real case data", before releasing the next version, and see if we are getting some realistic results. I am still keeping this issue open; we can close it depending how the tests with ERA data go. |
I have checked with different thresholds but it resulted in the same. Would be nice if we have real cases to test with. |
In the new branch, have added some plots in nwpeval/examples... check the following -> metrics_time_avg.png and metrics_time_avg.png.. These are from the generated "test" data of lightning. can you take a look at this? import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import nwpeval as nw
# Load observation and model data
obs_data = xr.open_dataset("Era_rain.nc")
model_data = xr.open_dataset("Era_rain.nc")
# Extract precipitation variables
obs_tp = obs_data["tp"] * 10000 # Convert from m to mm
model_tp = model_data["tp"] * 1000 # Convert from m to mm
# Create an instance of the NWP_Stats class
metrics_obj = nw.NWP_Stats(obs_tp, model_tp)
#####################
# Define the thresholds for metric calculations
thresholds = {
'RPSS': 1,
}
# Calculate time-averaged metrics
metrics_time_avg = {}
for metric, threshold in thresholds.items():
metrics_time_avg[metric] = metrics_obj.compute_metrics([metric], thresholds={metric: threshold}, dim="time")[metric]
# Calculate area-average diurnal cycle metrics
metrics_diurnal = {}
for metric, threshold in thresholds.items():
metrics_diurnal[metric] = metrics_obj.compute_metrics([metric], thresholds={metric: threshold}, dim=["latitude", "longitude"])[metric].groupby("time.hour").mean()
# Plot time-averaged metrics
for metric in thresholds.keys():
plt.figure(figsize=(9, 6))
metrics_time_avg[metric].plot(cmap="coolwarm", vmin=-1, vmax=1)
plt.title(f"Time-Averaged {metric}")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.tight_layout()
plt.savefig(f"metrics_time_avg_{metric}.png")
plt.show()
# Plot area-average diurnal cycle metrics
for metric in thresholds.keys():
plt.figure(figsize=(9, 6))
metrics_diurnal[metric].plot()
plt.title(f"Area-Average Diurnal Cycle {metric}")
plt.xlabel("Hour")
plt.ylabel(metric)
plt.grid(True)
plt.tight_layout()
plt.savefig(f"metrics_diurnal_{metric}.png")
plt.show() I have done this with the attached |
Defenitely. I will check with the provided data later today. I think the conversion from m to mm is wrong for observations since it should be 1000 instead of 10000. Please check if it is the same from your side as well. Other than that, it seems working. |
Opps..! My bad, I should have said about this earlier, As I was too lazy to download two data sets and then match the grids and go through all the pre-processing steps, I downloaded just one data from Era5 and multiplied these "factors" (10000 and 1000) so that they have a variability when you calculate something. 🤣🤣🤣 And then I forgot to change the comments. Not A Bug! It was Intentional...😂 # Extract precipitation variables
obs_tp = obs_data["tp"] * 10000 # offset to create data variability
model_tp = model_data["tp"] * 1000 # offset to create data variability |
I tried with examples given, the function to calculate rpss is giving errors about the absence of dim_0.
version: 1.5.0
python: 3.10.2
code:
Error:
When I tracked the error, it came from the
compute_rpss
function which expectsdim_0
as a native dimension in input. I am not sure if this will be the case in every dataset that we provide.Possible solution
Standardise the output and input structure of data by enforcing attributes and checking routines.
Please let me know what you think.
The text was updated successfully, but these errors were encountered: