Proof of concept: parsing PDF tree felling permits #590

electricmonk · 2022-11-28T08:59:06Z

Hi
Following #587, here's a proof of concept that reads from the Tel Aviv city hall website, downloads PDFs and then attempts to parse them.

Currently the results are so-so; only about 90% of the PDFs are text - the rest are scanned images, which would require integrating OCR. Even those that are text, are not deterministically parsed and would require more work to improve the ability to extract reliable data. Currently about 70% of data is parsed, but I'm scoring all fields equally.

The important question before I move forward would be - is this better than nothing? should I invest more time?

…-trees

CLAassistant · 2022-11-28T08:59:11Z

All committers have signed the CLA.

gruppin · 2022-11-28T12:18:21Z

hi, unfortunately this is not better than nothing.
our integrity for our users is to have all the tree licenses, not only part, but they relay on us to notify about every tree license.
so in this sense, notify about partial set of licenses is even worse than not notifying at all, since we might mislead our users with a false image of reality.
I don't think you should invest more time in it.

electricmonk · 2022-11-28T15:25:47Z

Although, we can extract 100% of street addresses from the PDF file name, and we can conceivably create a unique id from street address city and publication date. So we won't miss any petition - just have missing data for 10%-30% of them. Wdyt?

electricmonk added 2 commits November 28, 2022 10:50

proof of concept: parsing Tel Aviv PDFs

c474240

Merge branch 'master' of github.com:electricmonk/meirim into tel-aviv…

5ac6ec8

…-trees

electricmonk changed the title ~~Proof of concept: parsing PDF tree licenses~~ Proof of concept: parsing PDF tree permits Nov 28, 2022

electricmonk changed the title ~~Proof of concept: parsing PDF tree permits~~ Proof of concept: parsing PDF tree felling permits Nov 28, 2022

added scoring by existing fields

6341661

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proof of concept: parsing PDF tree felling permits #590

Proof of concept: parsing PDF tree felling permits #590

electricmonk commented Nov 28, 2022 •

edited

Loading

CLAassistant commented Nov 28, 2022 •

edited

Loading

gruppin commented Nov 28, 2022

electricmonk commented Nov 28, 2022

Proof of concept: parsing PDF tree felling permits #590

Are you sure you want to change the base?

Proof of concept: parsing PDF tree felling permits #590

Conversation

electricmonk commented Nov 28, 2022 • edited Loading

CLAassistant commented Nov 28, 2022 • edited Loading

gruppin commented Nov 28, 2022

electricmonk commented Nov 28, 2022

electricmonk commented Nov 28, 2022 •

edited

Loading

CLAassistant commented Nov 28, 2022 •

edited

Loading