You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have an Action to populate label studio. It uses the Common Crawler script, which tries to find needles in the Common Crawl haystack that are related to selected keywords. Then, it uploads stuff to LabelStudio.
We have #74 which may be referenced or continued (related: #76)
Requirements
Rename the "Populate LabelStudio" action to "Common Crawler to LabelStudio", for specificity
Add a new action called "Google Searcher to LabelStudio"
Instead of common crawl, it should accept arguments for making targeted google searches
accepts county in "Allegheny, PA" format (including the state!) or County FIPS code in "12345" format (these are in our DB!)
accepts custom, comma-separated keywords to apply to every agency in the county
pre-populate data portal, public records, documents
generates search terms by iterating through the agencies in the PDAP database and concatenating with keywords
agencies.submitted_name + "data portal"
agencies.submitted_name + "public records"
agencies.submitted_name + "documents"
It should generate a batch with the first 10 results from each search and send them to LabelStudio in the same way
i.e. combine, deduplicate, check for duplicates in LS; log batches and cache
The text was updated successfully, but these errors were encountered:
Context
We have an Action to populate label studio. It uses the Common Crawler script, which tries to find needles in the Common Crawl haystack that are related to selected keywords. Then, it uploads stuff to LabelStudio.
We have #74 which may be referenced or continued (related: #76)
Requirements
common crawl
, it should accept arguments for making targetedgoogle searches
county
in "Allegheny, PA" format (including the state!) or County FIPS code in "12345" format (these are in our DB!)custom, comma-separated keywords
to apply to every agency in the countydata portal, public records, documents
agencies.submitted_name + "data portal"
agencies.submitted_name + "public records"
agencies.submitted_name + "documents"
The text was updated successfully, but these errors were encountered: