This project scrape images from rawpixel website that are on public domain.
- Install Chrome Web Browser.
- Install Chrome WebDriver, just ensure version compatibility with you Chrome Web Browser version.
- Install
Python 3.7
.
- The
requirements.txt
file containSelenium
andtqdm
libraries. - Install requirements with
pip install -r requirements.txt
, it's good to install dependencies in isolated Python virtual environment.
To start using this project you need to have account or create one in rawpixel website.
-
Open Chrome Web Browser in debugger mode.
- Navigate to where your Chrome Web Browser application (
chrome.exe
) is installed in your filesystem, and copy the path where it's installed. - Add the path to system environment variable
i.e PATH
"make sure to not includechrome.exe
in the path". - Create new directory where to launch the browser. It's added to avoid conflict with your already installed Chrome Web Browser.
- Open command prompt
cmd
, and enter this command:The command will launch Chrome Web Browser window in debugging mode.chrome.exe -remote-debugging-port=9014 --user-data-dir="<absolute path to the created directory>"
- In the opened window (tab) navigate to rawpixel website and login with your account informations, then go to a public domain images album of your choice.
NOTE: This feature run on >= 63 version of chrome web browser only.
- Navigate to where your Chrome Web Browser application (
-
Next, you need to run
get_session_cookies.py
Python script. to save your session cookies. Here is the command to run:python code/get_session_cookies.py \ --webdriver="<absolute path to chrome webdriver>"
It will save session cookies in
cookies.pkl
file.
-
Finally, you need to run
img_scraper_rawpixel.py
Python script, to downlowd images in in your specified directory. Here is the command to run:python code/img_scraper_rawpixel.py \ --webdriver="<absolute path to chrome webdriver>" \ --output_dir="<absolute path to output directory>" \ --url="<url of your choice>"
--webdriver
absolute path to where you saved Chrome WebDriverchromedriver.exe
.--output_dir
where to put downloded images.--url
URL of public domain images collection in rawpixel website (e.g: https://www.rawpixel.com/board/574376/les-roses-pierre-joseph-redoute-free-cc0-roses-illustrations?sort=curated&mode=shop&page=1)
K Tonpa.