Skip to content
This repository has been archived by the owner on Jun 14, 2019. It is now read-only.

Historical information is included #3

Open
seansummers opened this issue Feb 28, 2018 · 5 comments
Open

Historical information is included #3

seansummers opened this issue Feb 28, 2018 · 5 comments
Labels
help wanted Extra attention is needed

Comments

@seansummers
Copy link

It looks like historical information is included in the dataset, which means all renewals also show up for a given property:

License_Status: | RN
License_Status_Description: | Renewed

It appears the licenses can overlap, and have gaps in coverage between renewals (from License_Expir_Date to Issue_Date) and overlapping Classification_Codes for each business.

It also looks like Business_Name (and possibly address information) may be the only key linking the renewals to each other, as the FID changes each record.

@seansummers seansummers added the help wanted Extra attention is needed label Feb 28, 2018
@trwiley
Copy link
Contributor

trwiley commented Mar 2, 2018

I deduped the data set based on the restaurant name with a single line of R code - I believe this may have addressed the issue.

@seansummers
Copy link
Author

I'm not sure this is adequate; I've commented on the commit c2f7818

@danrneumann
Copy link
Member

Something that might be worth noting. If you use address, it is the address of the business office, not necessarily the address of the dining establishment, if I recall correctly from looking at the dataset some time ago.

@trwiley
Copy link
Contributor

trwiley commented Mar 2, 2018

@danrneumann Noted. Also, if you happen to recall this, do you know what attribute of the original data set that you and Josh used to dedupe it? I used name, but the issue is that there are cases where two different addresses are associated with the same business name. However, I'm not sure how much of an issue this will be: I'm envisioning a good amount of "human filtering" for this dataset in the future anyways.

Update: I asked Josh today about it and he said you guys deduped the set based on address.

@trwiley
Copy link
Contributor

trwiley commented Mar 9, 2018

In commit 042fcd7 (the latest one), I created a data set that's a union of the name deduped set and the address deduped set. Is this adequate for the issue?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants