Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General jsonlite compatibility issues #3

Open
2 of 6 tasks
coolbutuseless opened this issue Aug 27, 2023 · 12 comments
Open
2 of 6 tasks

General jsonlite compatibility issues #3

coolbutuseless opened this issue Aug 27, 2023 · 12 comments

Comments

@coolbutuseless
Copy link
Owner

coolbutuseless commented Aug 27, 2023

{yyjsonr} has a "soft" goal to try and be compatible with {jsonlite} in the most common use cases.

Full compatibility with {jsonlite} is a non-goal.

This is a place to voice any compatibility issue you have, or express support/disagreement for things on the list below. This will help prioritise programming effort.

  • Borrow (with permission!) the jsonlite test suite
  • Add a digits argument
  • promote_num_to_string - Option to promote numerics to strings if an array has a mixture of numbers and strings. set to TRUE for jsonlite compatibility.
  • Add 'json' class to match jsonlite's behaviour. Optional, and set to FALSE. Set to TRUE for jsonlite compatibility
  • Read/write arrays like jsonlite. Maybe.
  • a POSIXt = c('string', 'epoch', 'ISO8601') option like jsonlite::toJSON()
@shikokuchuo
Copy link
Contributor

I think the above option should be the default actually. Stability of return types is important for programmatic use. Granted the type of vector cannot be guaranteed, but it seems it would be easier for code to handle this rather than having to deal with different ‘shapes’ (vector vs list).

@coolbutuseless
Copy link
Owner Author

@shikokuchuo Just to clarify: you think that (by default) the return type should always be an atomic vector?

I'm a bit conflicted on how this might go, as I like the idea of defaulting to easy-to-handle atomic vectors, but I'm "offended" by the idea of just discarding type information as soon as types are mixed.

@shikokuchuo
Copy link
Contributor

Yes, making it always return an atomic vector seems to be better behaviour.

If you think about it from the perspective of wanting to do something with the returned data, having a list that retains the data types does not really give you any advantage. If you unlist (or coerce) then that also loses the type information. That means handling code would need to have 2 branches - to deal with the case it is an atomic vector or using lapply or similar when dealing with a list.

In reality I think a lot of json is retrieved from web APIs, in which case you probably know the expected return type, and can easily test in case the returned vector is not the expected type (perhaps the API returns a 'missing value' occasionally or something). This handling may be included within vectorised code. Otherwise we would have to write code that handles the fact that we may get a list from time to time (or flatten by default, which gets us back to the same place).

However, I don't have a real use case where this (may) happen (thankfully!), so this is purely theoretical and I defer to those who are actually dealing with such data.

@coolbutuseless
Copy link
Owner Author

Thanks, I appreciate your thoughts. Mapping JSON-to-R is tricky and there are so many tradeoffs!

@shikokuchuo
Copy link
Contributor

Totally! That's why I'm so glad someone like you has actually attempted this :)

@jonocarroll
Copy link

(can be moved to a new issue if that helps)

Would it be useful to add a "json" class to the string output, as jsonlite does?

jsonlite <- jsonlite::toJSON(as.list(euro), auto_unbox = TRUE)
jsonlite
#> {"ATS":13.7603,"BEF":40.3399,"DEM":1.9558,"ESP":166.386,"FIM":5.9457,"FRF":6.5596,"IEP":0.7876,"ITL":1936.27,"LUF":40.3399,"NLG":2.2037,"PTE":200.482}
str(jsonlite)
#>  'json' chr "{\"ATS\":13.7603,\"BEF\":40.3399,\"DEM\":1.9558,\"ESP\":166.386,\"FIM\":5.9457,\"FRF\":6.5596,\"IEP\":0.7876,\""| __truncated__

yyjsonr <- yyjsonr::to_json_str(as.list(euro), opts = yyjsonr::to_opts(auto_unbox = TRUE))
yyjsonr
#> [1] "{\"ATS\":13.7603,\"BEF\":40.3399,\"DEM\":1.95583,\"ESP\":166.386,\"FIM\":5.94573,\"FRF\":6.55957,\"IEP\":0.787564,\"ITL\":1936.27,\"LUF\":40.3399,\"NLG\":2.20371,\"PTE\":200.482}"
str(yyjsonr)
#>  chr "{\"ATS\":13.7603,\"BEF\":40.3399,\"DEM\":1.95583,\"ESP\":166.386,\"FIM\":5.94573,\"FRF\":6.55957,\"IEP\":0.7875"| __truncated__

jsonlite:::print.json(yyjsonr)
#> {"ATS":13.7603,"BEF":40.3399,"DEM":1.95583,"ESP":166.386,"FIM":5.94573,"FRF":6.55957,"IEP":0.787564,"ITL":1936.27,"LUF":40.3399,"NLG":2.20371,"PTE":200.482}

class(yyjsonr) <- "json"
yyjsonr # since jsonlite is loaded, the print method is available, but this isn't necessarily the case
#> {"ATS":13.7603,"BEF":40.3399,"DEM":1.95583,"ESP":166.386,"FIM":5.94573,"FRF":6.55957,"IEP":0.787564,"ITL":1936.27,"LUF":40.3399,"NLG":2.20371,"PTE":200.482}
str(yyjsonr)
#>  'json' chr "{\"ATS\":13.7603,\"BEF\":40.3399,\"DEM\":1.95583,\"ESP\":166.386,\"FIM\":5.94573,\"FRF\":6.55957,\"IEP\":0.7875"| __truncated__

Created on 2023-09-06 with reprex v2.0.2

This would allow for yyjsonr to have its own print.json as well, but more importantly, I found this to be a requirement when using the JSON as input to a htmlwidget (not tested thoroughly, but that was the solution in at least one case I encountered).

@shikokuchuo
Copy link
Contributor

This would allow for yyjsonr to have its own print.json as well, but more importantly, I found this to be a requirement when using the JSON as input to a htmlwidget (not tested thoroughly, but that was the solution in at least one case I encountered).

If this is necessary for some use cases, it could perhaps live in an option (non-default). If a key raison d'être of yysonr is performance, adding a class is probably the single worst thing you can do to kill it. This is as every generic function, including the standard subsetting operators, then needs to perform S3 dispatch. It is of course possible to use .subset and .subset2 but a lot of those in code is not fun and kills readability. Just my perspective of looking to push the boundaries of what's possible, not just oh well it's already 2x faster than 'jsonlite'.

@jonocarroll
Copy link

I'm not sure I follow - I'm suggesting adding a class to the resulting JSON string. If you're calling a generic on that then it will go through dispatch regardless, but will hit the default method if it was without a formal class.

If the suggestion was to make parsing a generic then yes, that would impact performance, but I'm not suggesting that at all.

@shikokuchuo
Copy link
Contributor

shikokuchuo commented Sep 6, 2023

I'm not sure I follow - I'm suggesting adding a class to the resulting JSON string.

Right - when I typed my response I was thinking the other way round (my main use cases). Assuming it's less likely you'd want to manipulate the resulting json string then I guess it's less of a concern.

If you're calling a generic on that then it will go through dispatch regardless, but will hit the default method if it was without a formal class.

Just to clarify what I meant, if the object is not classed it's OBJECT bit is not set on the SEXP and S3 dispatch is not performed. Example with the generic function [[ (yes it does nothing meaningful in this case):

cstr <- str <- yyjsonr::to_json_str(list(a = 1, b = 2))
str
#> [1] "{\"a\":[1.0],\"b\":[2.0]}"
class(cstr) <- "json"
cstr
#> [1] "{\"a\":[1.0],\"b\":[2.0]}"
#> attr(,"class")
#> [1] "json"

# generic `[[`:
microbenchmark::microbenchmark(str[[1L]], cstr[[1L]])
#> Unit: nanoseconds
#>        expr min    lq   mean median    uq  max neval
#>   str[[1L]]  77  89.5 137.26   97.0 104.0 3557   100
#>  cstr[[1L]] 302 314.0 410.14  327.5 347.5 5027   100

# non-generic .subset2:
microbenchmark::microbenchmark(.subset2(str, 1L), .subset2(cstr, 1L))
#> Unit: nanoseconds
#>                expr min lq   mean median uq  max neval
#>   .subset2(str, 1L)  76 80 127.40   84.5 89 3386   100
#>  .subset2(cstr, 1L)  85 90 105.08   97.0 99  812   100

Created on 2023-09-06 with reprex v2.0.2

@msummersgill
Copy link

Really cool work, looks like there is lots of potential for wide benefits if this could be leveraged in some commonly used packages!

It seems like if compatibility were designed such that yyjsonr functions could be dropped in place of prominent jsonlite implementations, then users could benefit by improved shiny app performance, faster document render times, etc.

Some examples that come to mind: shiny::toJSON(), shiny::safeFromJSON(), htmlwidgets::toJSON(), plotly::to_JSON(), plotly::from_JSON() , rmarkdown::base64_encode_object(), etc...

Is this something you have in mind, or are you thinking of this as more of a bulk data ingest focused library that might not share the same design objectives of jsonlite?

I did some poking around with (plotly::to_JSON() and was able to successfully render a plot, but with Warning messages: 1: In yyjsonr::to_json_str(x, auto_unbox = TRUE) : serialize_core(): Unhandled SEXP: closure.

@coolbutuseless
Copy link
Owner Author

coolbutuseless commented Sep 7, 2023

@msummersgill I did think about aiming for strict jsonlite drop-in compatibility but this was not a priority for the initial release.

@jangorecki
Copy link

jangorecki commented Jan 5, 2024

+1 against drop in replacement. If there is something to improve this is the place to do it. If jsonlite made some imperfect design decisions they are already stuck with them for good. New package like this is exactly where corrections can and should be made.

I would like to point out potential of I(), the AsIs class, that probably could be well utilized when deparsing to json needs to provide an alternative method for an object, and prevent from simplifying/wrapping up/boxing/unboxing/etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants