Handling post processing for results from domain or twitter crawlers
- the post-processor scripts are under
processor/
- the old back-end scripts are in
archived/
scripts/
are handy automation scripts to prepare data for postprocessing:- cleaner removes unnecessary headers and duplicate records from twitter crawler output csv files
- metascraper can be used to populate the title_metascraper, author_metascraper, date, html_content, and article_text JSON fields produced by the domain crawler
- url_expander lengthens short urls in tweets