GitHub - sooshie/scrape_pdf: Python script to pull various IOCs from PDFs

A basic script based that uses PDFMiner to decompress streams, and then looks inside the streams

Currently it attempts to pull out IPs, hashes, URLs, and hostnames.

Requires:

Then after you've done that, you'll likely want to get the newest TLD list.
Open a Python interpreter then:

import uniaccept
uniaccept.refreshtlddb("/tmp/tld-list.txt")

Feel free to change the location of the tld-list.txt file, the scrape-pdf.py script expects it in the CWD.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
scrape_pdf.py		scrape_pdf.py

Provide feedback