The GeoNewsMiner (GNM): An interactive spatial humanities tool to visualize geographical references in historical newspapers
-
Updated
Feb 21, 2022 - Jupyter Notebook
The GeoNewsMiner (GNM): An interactive spatial humanities tool to visualize geographical references in historical newspapers
Convert ALTO XML to plain text + minimal metadata
A Toponym Resolution Pipeline for Digitised Historical Newspapers
🛠️ Python library to import OCR data in various formats into the canonical JSON format defined by the Impresso project.
Awesome historical newspaper analysis tools and literature
Tools for the use of Tesseract OCR in R
Repository of JSON schemas used in the Impresso project.
This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and applies NLP methods to them. NLP tasks: Tokenization, Lemmatization, TF-IDF, Part-of-speech tagging, semantic search with transformers, article extraction and OCR post-correction with LLMs, NER and text classification
Dataset from the paper "Information Extraction from Public Meeting Articles"
This repository contains code and sample data related to running the impresso corpus through the text reuse detection software passim.
The Hongkong News headline analysis project was conducted by the Chinese University of Hong Kong Library.
Everything to reproduce the CLEF HIPE 2020 campaign results.
Source code for cleaning pipeline and web app pairing the Press Directories dataset with general elections results.
This repository contains Jupyter Notebooks that are being used with MogonOndemand, a web-based platform based on OpenOnDemand that simplifies access to MOGON NHR HPC resources.
Add a description, image, and links to the historical-newspapers topic page so that developers can more easily learn about it.
To associate your repository with the historical-newspapers topic, visit your repo's landing page and select "manage topics."