Skip to content

sooshie/scrape_pdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

A basic script based that uses PDFMiner to decompress streams, and then looks inside the streams

Currently it attempts to pull out IPs, hashes, URLs, and hostnames.

Requires:

  • pip install dnspython
  • grab uniaccept from here
  • pip install pdfminer

Then after you've done that, you'll likely want to get the newest TLD list.
Open a Python interpreter then:

import uniaccept
uniaccept.refreshtlddb("/tmp/tld-list.txt")

Feel free to change the location of the tld-list.txt file, the scrape-pdf.py script expects it in the CWD.

About

Python script to pull various IOCs from PDFs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages