-
Notifications
You must be signed in to change notification settings - Fork 2
Historical information is included #3
Comments
I deduped the data set based on the restaurant name with a single line of R code - I believe this may have addressed the issue. |
I'm not sure this is adequate; I've commented on the commit c2f7818 |
Something that might be worth noting. If you use address, it is the address of the business office, not necessarily the address of the dining establishment, if I recall correctly from looking at the dataset some time ago. |
@danrneumann Noted. Also, if you happen to recall this, do you know what attribute of the original data set that you and Josh used to dedupe it? I used name, but the issue is that there are cases where two different addresses are associated with the same business name. However, I'm not sure how much of an issue this will be: I'm envisioning a good amount of "human filtering" for this dataset in the future anyways. Update: I asked Josh today about it and he said you guys deduped the set based on address. |
In commit 042fcd7 (the latest one), I created a data set that's a union of the name deduped set and the address deduped set. Is this adequate for the issue? |
It looks like historical information is included in the dataset, which means all renewals also show up for a given property:
It appears the licenses can overlap, and have gaps in coverage between renewals (from License_Expir_Date to Issue_Date) and overlapping Classification_Codes for each business.
It also looks like Business_Name (and possibly address information) may be the only key linking the renewals to each other, as the FID changes each record.
The text was updated successfully, but these errors were encountered: