Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented dbscan for RGDR #57

Merged
merged 10 commits into from
Aug 3, 2022
Merged

Implemented dbscan for RGDR #57

merged 10 commits into from
Aug 3, 2022

Conversation

BSchilperoort
Copy link
Contributor

@BSchilperoort BSchilperoort commented Jul 27, 2022

Implements dbscan clustering functionality. The code included in here is sufficient to reproduce the prototype_RGDR notebook.
See the wip notebook


From @geek-yang:

  • Implement DBSCAN clustering function
  • Check if the longitude is between -180 : 180 and convert to this range if not
    not required, haversine uses radians.
  • Handle input with lags dimension to vectorize DBSCAN clustering and keep labels
    --> currently supports n_targets == 1, see Supporting multiple target intervals in map_correlation. #56
  • Handle clustering for +/- correlation separately
  • Mask data ( [lat, lon] pairs) using significance (p_value)

When these methods are implemented, we also need to validate the results:

  • Check if it is necessary to have weight by area for DBSCAN calculation (the impact of area is already reflected by the haversine distance)
    weighing by area is required for reducing clusters to the mean of each cluster. This is implemented
  • Add relevant checks for the results and raise warnings (e.g. check whether the results are noise or not).
  • Verify DBSCAN clustering results

Note: For RGDR we will want to support grouping the DBSCAN operations over splits, due to the computational intensity when working with full resolution data, see #58

This PR fixes #55


Still to be implemented, but out of scope for this PR:

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@BSchilperoort BSchilperoort requested a review from Peter9192 August 2, 2022 15:17
@BSchilperoort
Copy link
Contributor Author

Note that the Python 3.7 Windows build fails. This is due to an issue with the netcdf4 package, which has become a dependency to be able to use the example data from the proto.

To silence this issue we can specify the version for netcdf4, which might resolve the error. However, for how long do we want to keep supporting Python 3.7? It will reach its end of life in a couple of months.

@BSchilperoort BSchilperoort marked this pull request as ready for review August 3, 2022 09:04
@Peter9192 Peter9192 mentioned this pull request Aug 3, 2022
Copy link
Contributor

@Peter9192 Peter9192 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Perhaps combine all in one module (rgdr.py) and consider adding a transform method.

s2spy/rgdr/_map_regions.py Outdated Show resolved Hide resolved
s2spy/rgdr/_map_regions.py Outdated Show resolved Hide resolved
s2spy/rgdr/_map_regions.py Outdated Show resolved Hide resolved
s2spy/rgdr/_map_regions.py Outdated Show resolved Hide resolved
s2spy/rgdr/_map_analysis.py Outdated Show resolved Hide resolved
s2spy/rgdr/_map_regions.py Outdated Show resolved Hide resolved
s2spy/rgdr/_map_utils.py Outdated Show resolved Hide resolved
s2spy/rgdr/_map_utils.py Outdated Show resolved Hide resolved
@sonarqubecloud
Copy link

sonarqubecloud bot commented Aug 3, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

96.6% 96.6% Coverage
0.0% 0.0% Duplication

Copy link
Contributor

@Peter9192 Peter9192 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, let's follow it up in the next PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

missing docstring info in _map_analysis.py/correlation
2 participants