Datasource: website grabber

Jump to bottom

Rello edited this page Sep 8, 2022 · 4 revisions

With the Regex html datasource, you can extract data from external web pages

Usecase

grab weather data
grab data from Amazon (see example)

Conditions

the website can not use any kind of login or browser detection logic
the regex is using preg_match_all in php
the pattern must be prototyped externally. DA can only execute the given content
use sides like https://regex101.com to develop the regex
the regex must include 2 groups with the titles (?<value>xxx) and (?<dimension>xxx)
the datasource is only available in advanced config with scheduling due to the required parameters

Usage

Report -> advanced config
add "HTML Regex" Dataload
maintain parameters
testrun until the required parameters arrive

Example

Grab the sales rank of a book from amazon
This will extract the number from the section "Nr. 1 in Vietnamesisch lernen (Bücher)"

url: https://www.amazon.de/dp/3964433578

regex: /("> Nr. )(?<value>.*)( in)(.*)(hrsr_books'>)(?<dimension>Vietnamesisch lernen)/