The DigiWebScraper is a tool designed for research in the field of Natural Language Processing (NLP). It is specifically engineered to scrape the Digikala website, a popular online marketplace.
The scraper operates smoothly, utilizing rotating proxies and engines to ensure that the targeted website does not detect it as a bot. This is further facilitated by the implementation of random time lags between requests.
The DigiWebScraper is designed to navigate from the homepage of the Digikala website into each individual product page. It meticulously collects a wealth of information about each product. This includes the product’s name, which identifies the item, the star rating, which provides a measure of customer satisfaction, and the price, offering a clear picture of the product’s cost.
In addition, the scraper also retrieves descriptions that provide detailed information about the product’s features and specifications. What sets this tool apart is its ability to gather comments. It does this by mimicking user behavior on the browser and making AJAX calls, thereby accessing real-time user feedback on the products.
This comprehensive approach ensures that the DigiWebScraper provides a holistic view of each product, making it an invaluable tool for research in the field of Natural Language Processing (NLP).
Please note that due to potential platform and API changes on the Digikala website, the DigiWebScraper may not currently be operational as it was written some time ago.
This tool provides a valuable resource for those interested in NLP research, offering a wealth of data from a major online marketplace. However, please be aware of potential limitations due to changes in the target website’s structure or policies.