-
-
Notifications
You must be signed in to change notification settings - Fork 32
Datasource: website grabber
Rello edited this page Sep 8, 2022
·
4 revisions
With the Regex html datasource, you can extract data from external web pages
- grab weather data
- grab data from Amazon (see example)
- the website can not use any kind of login or browser detection logic
- the regex is using
preg_match_all
in php - the pattern must be prototyped externally. DA can only execute the given content
- use sides like https://regex101.com to develop the regex
- the regex must include 2 groups with the titles (?<value>xxx) and (?<dimension>xxx)
- the datasource is only available in advanced config with scheduling due to the required parameters
- Report -> advanced config
- add "HTML Regex" Dataload
- maintain parameters
- testrun until the required parameters arrive
Grab the sales rank of a book from amazon
This will extract the number from the section "Nr. 1 in Vietnamesisch lernen (Bücher)"
url: https://www.amazon.de/dp/3964433578
regex: /("> Nr. )(?<value>.*)( in)(.*)(hrsr_books'>)(?<dimension>Vietnamesisch lernen)/