Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distance Metrics in the data sets #20

Open
aginensky opened this issue Nov 18, 2019 · 4 comments
Open

Distance Metrics in the data sets #20

aginensky opened this issue Nov 18, 2019 · 4 comments

Comments

@aginensky
Copy link

I want to understand what distance metrics you have used in the scripts. I have found a couple of packages that will turn latitude longitude data into distances. If you have a preferred package, I can just use that. Secondly, once a distance package has been installed. I wanted to do some sort of 'clustering' to determine how many unique accident sites there are. I've done nothing yet, but I'm wondering if there are instances in which distinct locations are actually 10 ft apart- which likely means they are the same site. It would also be interesting to see accidents by some sort of grid search. For example write code to see how many accidents are within 10/50/100/250 feet of a given accident. Thoughts ? Suggestions ?

@sas1336
Copy link
Collaborator

sas1336 commented Nov 19, 2019

I think this is a great idea. I would also try to understand how we get the coordinates in the first place. Until recently, i,e. till 2017 - IDOT would geocode the crash locations. I myself am not totally clear on this but as I understand, the geocoding is done according to addresses that reporting officers put in. Thus, if there is a long block with only one address, are all the crashes that happen along that block plotted to only one location? Will do a deeper dive on this question myself too.

@hneaz
Copy link

hneaz commented Nov 19, 2019

I have worked with spatial data before at my current role. I used geosphere package to use the Haversine distances as well as the RANN package to do K Nearest Neighbor in order to cluster locations to the nearest points. This is an interesting analysis to look into. I might be able to help since I have some experience.

@aginensky
Copy link
Author

aginensky commented Nov 20, 2019 via email

@hneaz
Copy link

hneaz commented Nov 20, 2019

@aginensky I got the longitude and latitude data for the hospital list provided by @sas1336 using SmartyStreets API. Another question to look into is the distance between the hospital and the accident sites, measure the time to response, etc.

Here is the file.
illinois_hospital_list.csv.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants