Skip to content

Latest commit

 

History

History
9 lines (8 loc) · 582 Bytes

README.md

File metadata and controls

9 lines (8 loc) · 582 Bytes

Post-processor

Handling post processing for results from domain or twitter crawlers

  • the post-processor scripts are under processor/
  • the old back-end scripts are in archived/
  • scripts/ are handy automation scripts to prepare data for postprocessing:
    • cleaner removes unnecessary headers and duplicate records from twitter crawler output csv files
    • metascraper can be used to populate the title_metascraper, author_metascraper, date, html_content, and article_text JSON fields produced by the domain crawler
    • url_expander lengthens short urls in tweets