OCR Text Extraction for Instagram images

UPDATE 07/01/2021 --> Local Screenshot image OCR

Please review screenshot-transcribe-ocr.ipynb, which is a similar but different project to the Instagram image analysis described below

Etienne P Jacquot - 06/30/2021

Getting Started

The other night I read the following article https://www.thedp.com/article/2021/03/black-ivy-stories-penn-chemistry-stem

My thought was to scrape each image from this instagram account, run Azure OCR (more info here), and sort posts per ivy school to identify posts specifically about penn. From there we can try the instagram commentGetter containerized web scraper to retrieve comments / engagement on those posts

UPDATE -->> ivy totals based on Azure OCR text extraction of stories:

princeton    93
penn         57
brown        48
columbia     43
cornell      42
harvard      41
dartmouth    21
yale         14

Get Instagram timeline posts w/ PhantomBuster

Instagram account: https://www.instagram.com/blackivystories/ @BlackIvyStories

Posts Extracted w/ Phantom Buster: https://phantombuster.com/automations/instagram/12766/instagram-posts-extractor
- I do not think you can use the Instagram API v2 for tbis timeline extraction as it requires users to accept your app or something? Alternatively you could try and web scrape a profile but that's involved... so we use PhantomBuster because it's a service that is free & easy to use.

Microsoft Azure Computer Vision

Code to run here: instagram_ocr_azure
- This requires an Azure account. The computer vision endpoint has a free tier at 20 images per minute

Create your Computer Vision endpoint

More information here for getting started: https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/

Create the free tier offering!

Set your `configs/config.ini` for Azure Computer Vision endpoint

Navigate to your Computer Vision resource here: https://portal.azure.com/

For example (you can use key1 or key2):

[ASC-COMPUTERVISION]
key1 = 7cffe7....
endpoint = https://asc-computervision.cognitiveservices.azure.com/
region = eastus

UPDATE FROM NXCOMMJHUB!

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
comment_getter		comment_getter
configs		configs
data		data
img		img
.gitignore		.gitignore
README.md		README.md
instagram_ocr_azure.ipynb		instagram_ocr_azure.ipynb
screenshot-transcribe-ocr.ipynb		screenshot-transcribe-ocr.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR Text Extraction for Instagram images

UPDATE 07/01/2021 --> Local Screenshot image OCR

Please review screenshot-transcribe-ocr.ipynb, which is a similar but different project to the Instagram image analysis described below

Getting Started

Get Instagram timeline posts w/ PhantomBuster

Microsoft Azure Computer Vision

Create your Computer Vision endpoint

Set your `configs/config.ini` for Azure Computer Vision endpoint

About

Releases

Packages

Languages

atnjqt/ig_pennstories_ocr

Folders and files

Latest commit

History

Repository files navigation

OCR Text Extraction for Instagram images

UPDATE 07/01/2021 --> Local Screenshot image OCR

Please review screenshot-transcribe-ocr.ipynb, which is a similar but different project to the Instagram image analysis described below

Getting Started

Get Instagram timeline posts w/ PhantomBuster

Microsoft Azure Computer Vision

Create your Computer Vision endpoint

Set your configs/config.ini for Azure Computer Vision endpoint

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Set your `configs/config.ini` for Azure Computer Vision endpoint

Packages