diff --git a/annotation_pipeline/README.md b/annotation_pipeline/README.md index 2aaf2af..a494bac 100644 --- a/annotation_pipeline/README.md +++ b/annotation_pipeline/README.md @@ -18,15 +18,15 @@ This Python script automates the process of crawling for relevant URLs, scraping `pip install pandas argparse huggingface-hub` 2. Setup Environment variables in annotation_pipeline/dev.env - LABEL_STUDIO_ACCESS_TOKEN=... - LABEL_STUDIO_PROJECT_ID=... - LABEL_STUDIO_ORGANIZATION=... + - LABEL_STUDIO_ACCESS_TOKEN=... + - LABEL_STUDIO_PROJECT_ID=... + - LABEL_STUDIO_ORGANIZATION=... As well as in data_source_identification/.env - HUGGINGFACE_ACCESS_TOKEN=... - LABEL_STUDIO_ACCESS_TOKEN=... - LABEL_STUDIO_PROJECT_ID=... - LABEL_STUDIO_ORGANIZATION=... + - HUGGINGFACE_ACCESS_TOKEN=... + - LABEL_STUDIO_ACCESS_TOKEN=... + - LABEL_STUDIO_PROJECT_ID=... + - LABEL_STUDIO_ORGANIZATION=... ## Usage @@ -38,4 +38,4 @@ This Python script automates the process of crawling for relevant URLs, scraping - `--pages num_pages`: Number of pages to search - `--record-type record_type` (optional): Assumed rescord type for pre-annotation. -e.g. `python annotation_pipeline.py CC-MAIN-2024-10 '*.gov' arrest --pages 2 --record-type Arrest Records` \ No newline at end of file +e.g. `python annotation_pipeline.py CC-MAIN-2024-10 '*.gov' arrest --pages 2 --record-type 'Arrest Records'` \ No newline at end of file