Skip to content

Datasource: website grabber

Rello edited this page Sep 8, 2022 · 4 revisions

With the Regex html datasource, you can extract data from external web pages

Usecase

  • grab weather data
  • grab data from Amazon (see example)

Conditions

  • the website can not use any kind of login or browser detection logic
  • the regex is using preg_match_all in php
  • the pattern must be prototyped externally. DA can only execute the given content
  • use sides like https://regex101.com to develop the regex
  • the regex must include 2 groups with the titles (?<value>xxx) and (?<dimension>xxx)
  • the datasource is only available in advanced config with scheduling due to the required parameters

Usage

  1. Report -> advanced config
  2. add "HTML Regex" Dataload
  3. maintain parameters
  4. testrun until the required parameters arrive

Example

Grab the sales rank of a book from amazon
This will extract the number from the section "Nr. 1 in Vietnamesisch lernen (Bücher)"

url: https://www.amazon.de/dp/3964433578

regex: /("> Nr. )(?<value>.*)( in)(.*)(hrsr_books'>)(?<dimension>Vietnamesisch lernen)/