Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_wfs(): enable to work with all 272 WFS datasets on FIS-Broker #7

Open
mrustl opened this issue Sep 26, 2022 · 4 comments
Open
Assignees
Labels
bug Something isn't working question Further information is requested
Milestone

Comments

@mrustl
Copy link
Member

mrustl commented Sep 26, 2022

# Enable repository from kwb-r
options(repos = c(
  kwbr = 'https://kwb-r.r-universe.dev',
  CRAN = 'https://cloud.r-project.org'))
# Download and install kwb.fisbroker in R
install.packages('kwb.fisbroker')

get_id <- function(data) {
  data %>% 
    dplyr::filter(.data$parameter %in% c("Rechneradresse",  "ATOM-Feed-Url")) %>% 
    dplyr::mutate(id = basename(.data$value))
}

get_metadata_from_ghpages <- function() {
  
paths_list <- list(
  base_url = "https://kwb-r.github.io/kwb.fisbroker/metadata_", 
  file_format = ".json",
  atom = "<base_url>atom<file_format>",
  wfs =  "<base_url>wfs<file_format>",
  wms =  "<base_url>wms<file_format>"
)

paths <- kwb.utils::resolve(paths_list)


meta_paths <- paths[names(paths) %in% c("atom", "wfs", "wms")]


stats::setNames(lapply(meta_paths, function(path) {
  jsonlite::read_json(path, simplifyVector = TRUE) %>%  
    get_id()
}), names(meta_paths))

}


meta <- get_metadata_from_ghpages()

data_wfs <- stats::setNames(lapply(meta$wfs$id, function(id) kwb.fisbroker::read_wfs(dataset_id = id)),
                meta$wfs$identifier)
@mrustl mrustl added bug Something isn't working question Further information is requested labels Sep 26, 2022
@mrustl mrustl self-assigned this Sep 26, 2022
@mrustl
Copy link
Member Author

mrustl commented Sep 27, 2022

For example read_xml.raw(x, encoding = encoding, ...): internal error: Huge input lookup [1]. Maybe setting read_xml(path, options="HUGE") as described here (davidgohel/ggiraph#87 (comment)) helps

#install.packages("kwb.fisbroker")
library(kwb.fisbroker)
get_id <- function(data) {
  data %>%
    dplyr::filter(.data$parameter %in% c("Rechneradresse",
                                         "ATOM-Feed-Url")) %>%
    dplyr::mutate(id = basename(.data$value))
}


get_metadata_from_ghpages <- function() {
  paths_list <- list(
    base_url = "https://kwb-r.github.io/kwb.fisbroker/metadata_",
    file_format = ".json",
    atom = "<base_url>atom<file_format>",
    wfs =  "<base_url>wfs<file_format>",
    wms =  "<base_url>wms<file_format>"
  )
  
  paths <- kwb.utils::resolve(paths_list)
  
  
  meta_paths <- paths[names(paths) %in% c("atom", "wfs", "wms")]
  
  
  stats::setNames(lapply(meta_paths, function(path) {
    jsonlite::read_json(path, simplifyVector = TRUE) %>%
      get_id()
  }), names(meta_paths))
  
}


meta <- get_metadata_from_ghpages()

overview <- kwb.fisbroker::get_dataset_overview()
#> Login to FIS-Broker ... ok. (1.98s) 
#> Getting HTML text from 'https://fbinter.stadt-be...d=navigationFrameResult' ... ok. (0.69s)

### There are more "overview" identifiers than "ids" (i.e. WFS files). Thus multiple
### identifiers use the same WFS
meta_wfs <- meta$wfs %>%
  dplyr::left_join(overview) %>%
  dplyr::filter(category_name %in% c("Umweltbeobachtung", "Umweltschutz"))
#> Joining, by = "identifier"
nrow(meta_wfs)
#> [1] 214

ids <- unique(meta_wfs$id)[order(unique(meta_wfs$id))]
length(ids)
#> [1] 76

data_wfs <- stats::setNames(lapply(ids, function(id) {
  kwb.fisbroker::read_wfs(dataset_id = id)
}),
meta$wfs$identifier)
#> Importing WFS dataset_id 'co2veg_block' from FIS-Broker ... ok. (13.44s) 
#> Importing WFS dataset_id 'co2veg_strasse' from FIS-Broker ... ok. (4.18s) 
#> Importing WFS dataset_id 's_02_20_zeMHGW_li_2016' from FIS-Broker ... ok. (2.46s) 
#> Importing WFS dataset_id 's_04_02_1ltempmittel_bl_8110' from FIS-Broker ... ok. (13.28s) 
#> Importing WFS dataset_id 's_04_02_1ltempmittel_fl_8110' from FIS-Broker ... ok. (1.33s) 
#> Importing WFS dataset_id 's_04_02_1ltempmittel_str_8110' from FIS-Broker ...
#> Error in read_xml.raw(x, encoding = encoding, ...): internal error: Huge input lookup [1]

Created on 2022-09-27 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.1 (2022-06-23 ucrt)
#>  os       Windows 10 x64 (build 19044)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  German_Germany.utf8
#>  ctype    German_Germany.utf8
#>  tz       Europe/Berlin
#>  date     2022-09-27
#>  pandoc   2.18 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package       * version    date (UTC) lib source
#>  assertthat      0.2.1      2019-03-21 [1] CRAN (R 4.2.0)
#>  class           7.3-20     2022-01-16 [2] CRAN (R 4.2.1)
#>  classInt        0.4-7      2022-06-10 [1] CRAN (R 4.2.0)
#>  cli             3.3.0      2022-04-25 [1] CRAN (R 4.2.0)
#>  curl            4.3.2      2021-06-23 [1] CRAN (R 4.2.0)
#>  DBI             1.1.3      2022-06-18 [1] CRAN (R 4.2.0)
#>  digest          0.6.29     2021-12-01 [1] CRAN (R 4.2.0)
#>  dplyr           1.0.10     2022-09-01 [1] CRAN (R 4.2.1)
#>  e1071           1.7-11     2022-06-07 [1] CRAN (R 4.2.0)
#>  ellipsis        0.3.2      2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate        0.15       2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi           1.0.3      2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap         1.1.0      2021-01-25 [1] CRAN (R 4.2.0)
#>  fs              1.5.2      2021-12-08 [1] CRAN (R 4.2.0)
#>  generics        0.1.3      2022-07-05 [1] CRAN (R 4.2.0)
#>  glue            1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
#>  highr           0.9        2021-04-16 [1] CRAN (R 4.2.0)
#>  hms             1.1.2      2022-08-19 [1] CRAN (R 4.2.1)
#>  htmltools       0.5.2      2021-08-25 [1] CRAN (R 4.2.0)
#>  httr            1.4.4      2022-08-17 [1] CRAN (R 4.2.1)
#>  jsonlite        1.8.0      2022-02-22 [1] CRAN (R 4.2.0)
#>  KernSmooth      2.23-20    2021-05-03 [2] CRAN (R 4.2.1)
#>  knitr           1.39       2022-04-26 [1] CRAN (R 4.2.0)
#>  kwb.fisbroker * 0.0.0.9000 2022-09-26 [1] https://kwb-r.r-universe.dev (R 4.2.1)
#>  kwb.utils       0.13.0     2022-08-23 [1] Github (KWB-R/kwb.utils@2a6faaa)
#>  lifecycle       1.0.2      2022-09-09 [1] CRAN (R 4.2.1)
#>  magrittr        2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
#>  pillar          1.8.1      2022-08-19 [1] CRAN (R 4.2.1)
#>  pkgconfig       2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
#>  proxy           0.4-27     2022-06-09 [1] CRAN (R 4.2.0)
#>  purrr           0.3.4      2020-04-17 [1] CRAN (R 4.2.0)
#>  R6              2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
#>  Rcpp            1.0.9      2022-07-08 [1] CRAN (R 4.2.1)
#>  readr           2.1.2      2022-01-30 [1] CRAN (R 4.2.0)
#>  reprex          2.0.1      2021-08-05 [1] CRAN (R 4.2.1)
#>  rlang           1.0.5      2022-08-31 [1] CRAN (R 4.2.1)
#>  rmarkdown       2.14       2022-04-25 [1] CRAN (R 4.2.0)
#>  rstudioapi      0.13       2020-11-12 [1] CRAN (R 4.2.0)
#>  rvest           1.0.3      2022-08-19 [1] CRAN (R 4.2.1)
#>  selectr         0.4-2      2019-11-20 [1] CRAN (R 4.2.0)
#>  sessioninfo     1.2.2      2021-12-06 [1] CRAN (R 4.2.1)
#>  sf              1.0-8      2022-07-14 [1] CRAN (R 4.2.1)
#>  stringi         1.7.6      2021-11-29 [1] CRAN (R 4.2.0)
#>  stringr         1.4.1      2022-08-20 [1] CRAN (R 4.2.1)
#>  tibble          3.1.7      2022-05-03 [1] CRAN (R 4.2.0)
#>  tidyr           1.2.1      2022-09-08 [1] CRAN (R 4.2.1)
#>  tidyselect      1.1.2      2022-02-21 [1] CRAN (R 4.2.0)
#>  tzdb            0.3.0      2022-03-28 [1] CRAN (R 4.2.0)
#>  units           0.8-0      2022-02-05 [1] CRAN (R 4.2.0)
#>  utf8            1.2.2      2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs           0.4.1      2022-04-13 [1] CRAN (R 4.2.0)
#>  withr           2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun            0.31       2022-05-10 [1] CRAN (R 4.2.0)
#>  xml2            1.3.3      2021-11-30 [1] CRAN (R 4.2.0)
#>  yaml            2.3.5      2022-02-21 [1] CRAN (R 4.2.0)
#> 
#>  [1] C:/Users/mrustl/AppData/Local/R/win-library/4.2
#>  [2] C:/Program Files/R/R-4.2.1/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

mrustl added a commit that referenced this issue Sep 27, 2022
mrustl added a commit that referenced this issue Sep 27, 2022
added (as shapefile export leads to errors due to limited allowed characters as column names!)

but multi WFS import only tested for 76 ids (i.e. for categories "Umweltbeobachtung" and "Umweltschutz"), others (196/272) still on the to-do list for testing
@mrustl
Copy link
Member Author

mrustl commented Sep 27, 2022

For the remaining 197 ids the following 8 errors occured:

meta <- get_metadata_from_ghpages()
str(meta)

overview <- kwb.fisbroker::get_dataset_overview()
str(overview)

### There are more "overview" identifiers than "ids" (i.e. WFS files). Thus multiple
### identifiers use the same WFS
### (limit to 2 categories: "Umweltbeobachtung" and "Umweltschutz")
meta_wfs <- meta$wfs %>%
  dplyr::left_join(overview) %>%
  dplyr::filter(!category_name %in% c("Umweltbeobachtung", "Umweltschutz"))
str(meta_wfs)
nrow(meta_wfs)

ids <- unique(meta_wfs$id)[order(unique(meta_wfs$id))]
length(ids)

ids <- unique(meta_wfs$id)[order(unique(meta_wfs$id))]
system.time(data_wfs <- stats::setNames(lapply(ids, function(id) {
     try(kwb.fisbroker::read_wfs(dataset_id = id))
}), ids))
Importing WFS dataset_id 're_aktive_stadt' from FIS-Broker ... Error in httr_get_or_fail(full_url) : 
  Request 'https://fbinter.stadt-berlin.de/fb/wfs/data/senstadt/re_aktive_stadt?service=WFS&version=2.0.0&request=GetFeature&typenames=fis:re_aktive_stadt&srsName=EPSG:25833' failed
Importing WFS dataset_id 're_fgeb_denkmal' from FIS-Broker ... Error in httr_get_or_fail(full_url) : 
  Request 'https://fbinter.stadt-berlin.de/fb/wfs/data/senstadt/re_fgeb_denkmal?service=WFS&version=2.0.0&request=GetFeature&typenames=fis:re_fgeb_denkmal&srsName=EPSG:25833' failed
Importing WFS dataset_id 're_friedh' from FIS-Broker ... Error in httr_get_or_fail(full_url) : 
  Request 'https://fbinter.stadt-berlin.de/fb/wfs/data/senstadt/re_friedh?service=WFS&version=2.0.0&request=GetFeature&typenames=fis:re_friedh&srsName=EPSG:25833' failed
Importing WFS dataset_id 're_gebaeudeatlas_25833' from FIS-Broker ... Error in httr_get_or_fail(full_url) : 
  Request 'https://fbinter.stadt-berlin.de/fb/wfs/data/senstadt/re_gebaeudeatlas_25833?service=WFS&version=2.0.0&request=GetFeature&typenames=fis:re_gebaeudeatlas_25833&srsName=EPSG:25833' failed
Importing WFS dataset_id 're_umweltzone2007' from FIS-Broker ... Error in httr_get_or_fail(full_url) : 
  Request 'https://fbinter.stadt-berlin.de/fb/wfs/data/senstadt/re_umweltzone2007?service=WFS&version=2.0.0&request=GetFeature&typenames=fis:re_umweltzone2007&srsName=EPSG:25833' failed
Importing WFS dataset_id 's_fussgaengernetz' from FIS-Broker ... Error in UseMethod("write_xml") : 
  no applicable method for 'write_xml' applied to an object of class "raw"
Importing WFS dataset_id 's_vms_tempolimits_spatial' from FIS-Broker ... Error in UseMethod("write_xml") : 
  no applicable method for 'write_xml' applied to an object of class "raw"
In addition: Warning message:
In CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
  GDAL Message 1: Value '5135
' of field s_luftbild1964.bild parsed incompletely to integer 5135.
Importing WFS dataset_id 's_wfs_alkis_gebaeudeflaechen' from FIS-Broker ... Integer64 values larger than 9.0072e+15 lost significance after conversion to double;
use argument int64_as_string = TRUE to import them lossless, as character
ok. (9.24s) 
which(sapply(data_wfs, kwb.utils::isTryError))
          re_aktive_stadt           re_fgeb_denkmal                 re_friedh 
                        1                         2                         3 
   re_gebaeudeatlas_25833         re_umweltzone2007         s_fussgaengernetz 
                        4                         5                        54 
s_vms_tempolimits_spatial               s_wfs_alkis 
                      140                       147 

@mrustl mrustl changed the title read_wfs() hangs for other than test dataset read_wfs(): enable to work with all 272 WFS datasets on FIS-Broker Sep 28, 2022
mrustl added a commit that referenced this issue Sep 28, 2022
to do: remove "dataset_id" as argument as it is unsecure due to different WFS endpoints, i.e. "wfs/data" and "wfs/geometry"
@mrustl mrustl added this to the v0.1.0 milestone Sep 28, 2022
@mrustl
Copy link
Member Author

mrustl commented Sep 28, 2022

Now works for all 272 different WFS ids available on FIS-Broker with latest modifications

To do: discuss with @hsonne how to add this clean into R package and maybe also remove function argument dataset_id as url is more secure (due to different WFS endpoints, mostly wfs/data but sometimes also wfs/geometry, see failed imports above). In addition I am unsure whether a default change of the EPSG:25833 with sf::st_transform() to 4326 (as it is used by Leaflet as default, see here: https://rstudio.github.io/leaflet/projections.html) or not....

What remains is just a warning for:

Importing WFS dataset_id 's_wfs_alkis_gebaeudeflaechen' from FIS-Broker ... Integer64 values larger than 9.0072e+15 lost significance after conversion to double;
use argument int64_as_string = TRUE to import them lossless, as character
ok. (6.37s)

In addition not all exports to netCDF work but this is another topic/issue:

exp_ds <- data_wfs[which(!sapply(data_wfs, kwb.utils::isTryError))]

export_nc <- stats::setNames(lapply(seq_len(length(exp_ds)), function(i) { try(write_wfs_as(format = "nc", data_wfs = exp_ds[i], target_dir = "./netCDF"))}), names(exp_ds))


Exporting (1/1): 's_fussgaengernetz' to 'C:/Users/mrustl/Documents/RProjects/kwb.fisbroker/netCDF/s_fussgaengernetz.nc' ... Writing layer `s_fussgaengernetz' to data source 
  `C:/Users/mrustl/Documents/RProjects/kwb.fisbroker/netCDF/s_fussgaengernetz.nc' using driver `netCDF'
Creating or updating layer s_fussgaengernetz failed.
Error in CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options),  : 
  Write error.

In addition: Warning message:
In CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options),  :
  GDAL Error 1: An error occurred while writing metadata to the netCDF file.
Unsupported or unrecognized feature type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant