GitHub - jules-sommer/go-drugsdata-scraper: Golang web scraper laser focused on scraping DrugBank chemical information, extendible, hackable - totally concurrent, support for proxies for rapid and undeterred scraping of mass datasets, handles many datatypes and structures the scraped data in an intelligent and dynamic way.

Go DrugBank Scraper

Description

The Go DrugBank Scraper is a sophisticated ( ehm... messy but functional ) web scraping tool written in Go, targeting the DrugBank database to extract comprehensive drug information. It utilizes Go's concurrency features for efficient data retrieval and is equipped with functionalities to handle different modes of data scraping. it's also super hackable, this makes it a great tool for learning how to scrape data from the web using Go - as I did with it!! It was certainly a fun project and I hope you get something out of it too! Feel free to extend, PR, feature request, repurpose it entirely, or whatever you want to do with it. I'm open to any suggestions and contributions!

Features

Concurrent Page Processing: Employs Go's concurrency for efficient data scraping across multiple pages.
Customizable Querying: Allows specification of page ranges or individual drug IDs for scraping.
Resilient and Intelligent: Implements delay strategies to manage rate limiting and ensure uninterrupted scraping.
Detailed Data Extraction: Gathers extensive information about drugs, including molecular details, pharmacodynamics, interactions, and more.
Output Serialization: Provides scraped data in structured JSON format, ready for various applications.
Easy to Use: Requires only a few command-line arguments to execute.
Extensible & Modular: Easily extendable and modular for future updates. Example, handler functions for new data fields can be added to the `main.go` file, they are called automatically by the scraper depending on the data field being scraped.

Installation

Clone the Repository:
```
gh repo clone jules-sommer/go-drugsdata-scraper
```
or
```
git clone https://github.com/jules-sommer/go-drugsdata-scraper
```
- I think the first is generally preferable on Unix machines, as it is on my copy of Ubuntu, and the later on Windows; but I don't rightly know why there are two commands and CLIs for git.
Install Dependencies: Navigate to the cloned directory and install necessary Go modules:
```
cd go-drugbank-scraper
go mod tidy
```

Usage

Execute the scraper with command-line arguments to define the scraping mode:

Scraping by Page Range:

go run main.go <MODE={ID,numPages}> <number_of_pages>

Modes

Page Range: [0-508]: As of writing, the maximum number of pages available for scraping is 508. There's a to-do in place to automatically detect the number of available pages in the future.
Single ID: use ID as the value for MODE cli arg, then enter the prompted value at runtime. Input the DrugBank ID as a 4-digit integer (e.g., 0123). The prefix "DB0" will be automatically added to the ID. As of now, specifying the prefix manually is not supported. Future updates will include features to fuzz the IDs and thus the pages to scrape.

[NOTE]: Currently the program prompts at runtime for the actual value of the ID or numPages args, this will be supported both ways in future.

Contributing

I welcome contributions! For guidelines on how to contribute, please read our CONTRIBUTING.md.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Acknowledgments

DrugBank for providing extensive drug information.
The Go community for their invaluable resources and continuous support.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
logs		logs
results		results
README.md		README.md
drugbank.json		drugbank.json
go.mod		go.mod
go.sum		go.sum
main.go		main.go
random.py		random.py
runone.json		runone.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Go DrugBank Scraper

Description

Features

Installation

Usage

Modes

Contributing

License

Acknowledgments

About

Releases

Packages

Languages

jules-sommer/go-drugsdata-scraper

Folders and files

Latest commit

History

Repository files navigation

Go DrugBank Scraper

Description

Features

Installation

Usage

Modes

Contributing

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages