Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MeanCenter trait #1166

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Conversation

JosiahParry
Copy link
Contributor

@JosiahParry JosiahParry commented Mar 30, 2024

  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

This PR adds a new trait MeanCenter which calculates the euclidean mean center of a set of coordinates. It is implemented for all geometry types.

@lnicola
Copy link
Member

lnicola commented Mar 30, 2024

I think this is normally called a centroid.

@JosiahParry
Copy link
Contributor Author

There are subtle differences between centroids and the mean center of point sets. I've added some tests that compare the centroid for each geometry with the mean center. There are differences, notably for a polygons.

Here's a blog post that (briefly) describes its utility. I was planning on adding a fn weighted_mean_center() as well as a follow up. I can add that to this PR if you think it would make the trait more motivating .

@urschrei
Copy link
Member

There are subtle differences between centroids and the mean center of point sets. I've added some tests that compare the centroid for each geometry with the mean center. There are differences, notably for a polygons.

Here's a blog post that (briefly) describes its utility. I was planning on adding a fn weighted_mean_center() as well as a follow up. I can add that to this PR if you think it would make the trait more motivating .

It's obviously an automatic approval from me because the blog post uses Irish CSO data, but I'd love to see the weighted mean centre method too.

@JosiahParry
Copy link
Contributor Author

🤣 heard. I've already pushed the weighted_mean_center() method and doc changes. I just need to add tests which I'll find time for tonight or tomorrow. 🫡

Copy link
Member

@frewsxcv frewsxcv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an entry for this in the algorithm index? https://docs.rs/geo/latest/geo/#algorithms

@JosiahParry
Copy link
Contributor Author

JosiahParry commented Apr 2, 2024

I think this is normally called a centroid.

Yup, you're right! OOPS! 🙈

The mean center for MultiLineString, MultiPolygon, and GeomtryCollection is where this differs. Since this is a point pattern technique, there must be one point per feature. Each geometry's centroid is used to represent it. Then the mean center (centroid) of those is used.

This technique is most useful for feature collections. For example, finding the weighted mean center of 311 reports grouped by category (MultiPoint per category) and the weight would be days since report or estimate cost to fix etc.

Related doc from arcgis implementation

I've updated the trait doc and implementation to reflect this.

I will need to rewrite the tests, though! Thank you for your help :)

Copy link
Member

@michaelkirk michaelkirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an application for this algorithm? I haven't really grokked its utility and am a little skeptical of its specification.

I'm not allergic to merging it if you have a use for it and can fix it up.

let centroids = self
.iter()
.map(|g| g.centroid())
.filter(|c| c.is_some())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can replace the map/filter/map(unwrap) with flat_map(|g| g.centroid())

(this occurs in several places in this PR)

///
/// The `weights` argument takes a slice which is used as an iterator which is cycled.
/// If there are fewer weights than coordinates, the weights will be recycled until the
/// calculation has completed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are fewer weights than coordinates, the weights will be recycled until the calculation has completed.

Is there a useful reason for this cycle behavior or some precedent?

Or is this more like "do something, anything, so long as we don't explode"?

return None;
}

for ((i, coord), weight) in self.coords_iter().enumerate().zip(weights.iter().cycle()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. The blog post uses centroids here as well right?:

        xy_arr[i] = (ft_geom.Centroid().GetX() * weight, ft_geom.Centroid().GetY() * weight)

};
}

impl_mean_center_option!(LineString);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're computing centroid here.

Two things:

  1. The blog uses one weight per geometry, not per coord in the geometry.

  2. For linear features like a LineString, the blog post uses midpoint instead of centroid. Though I can't think of an application of this algorithm where using a midpoint would be more useful than a centroid. I know there are applications where you want the "middle" of the geometry to be within a geometry, which might not happen with centroid (which is why people love polylabel), but in this case, we're eventually averaging a bunch of points together, so there's no guarantee that the output point will be inside of any geometry anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants