









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The jsonlite R package provides functions for converting JSON data to and from R objects, as well as for streaming, validating, and formatting JSON data. the usage of various functions such as flatten, rbind_pages, prettify, minify, read_json, serializeJSON, unserializeJSON, stream_in, and stream_out.
Typology: Lecture notes
1 / 15
This page cannot be seen from the preview
Don't miss anything!










Version 1.8.
Title A Simple and Robust JSON Parser and Generator for R
License MIT + file LICENSE
Depends methods
URL https://arxiv.org/abs/1403.2805 (paper)
BugReports https://github.com/jeroen/jsonlite/issues
Maintainer Jeroen Ooms
VignetteBuilder knitr, R.rsp
Description A reasonably fast JSON parser and generator, optimized for statistical data and the web. Offers simple, flexible tools for working with JSON in R, and is particularly powerful for building pipelines and interacting with a web API. The implementation is based on the mapping described in the vignette (Ooms, 2014). In addition to converting JSON data from/to R objects, 'jsonlite' contains functions to stream, validate, and prettify JSON data. The unit tests included with the package verify that all edge cases are encoded and decoded consistently for use with dynamic data in systems and applications.
Suggests httr, curl, vctrs, testthat, knitr, rmarkdown, R.rsp, sf
RoxygenNote 7.1.
Encoding UTF-
NeedsCompilation yes
Author Jeroen Ooms [aut, cre] (), Duncan Temple Lang [ctb], Lloyd Hilaiel [cph] (author of bundled libyajl)
Repository CRAN
Date/Publication 2022-02-22 11:20:02 UTC
R topics documented:
base64............................................ 2 flatten............................................ 2
2 flatten
prettify, minify....................................... 3 rbind_pages......................................... 4 read_json.......................................... 5 serializeJSON........................................ 6 stream_in, stream_out.................................... 7 toJSON, fromJSON..................................... 9 unbox............................................ 12 validate........................................... 13
Index 15
base64 Encode/decode base
Description
Simple in-memory base64 encoder and decoder. Used internally for converting raw vectors to text. Interchangeable with encoder from base64enc or openssl package.
Usage
base64_dec(input)
base64_enc(input)
Arguments
input string or raw vector to be encoded/decoded
Examples
str <- base64_enc(serialize(iris, NULL)) out <- unserialize(base64_dec(str)) stopifnot(identical(out, iris))
flatten Flatten nested data frames
Description
In a nested data frame, one or more of the columns consist of another data frame. These structures frequently appear when parsing JSON data from the web. We can flatten such data frames into a regular 2 dimensional tabular structure.
Usage
flatten(x, recursive = TRUE)
4 rbind_pages
Examples
myjson <- toJSON(cars) cat(myjson) prettify(myjson) minify(myjson)
rbind_pages Combine pages into a single data frame
Description
The rbind_pages function is used to combine a list of data frames into a single data frame. This is often needed when working with a JSON API that limits the amount of data per request. If we need more data than what fits in a single request, we need to perform multiple requests that each retrieve a fragment of data, not unlike pages in a book. In practice this is often implemented using a page parameter in the API. The rbind_pages function can be used to combine these pages back into a single dataset.
Usage
rbind_pages(pages)
Arguments
pages a list of data frames, each representing a page of data
Details
The rbind_pages function uses vctrs::vec_rbind() to bind the pages together. This generalizes base::rbind() in two ways:
Examples
x <- data.frame(foo = rnorm(3), bar = c(TRUE, FALSE, TRUE)) y <- data.frame(foo = rnorm(2), col = c("blue", "red")) rbind_pages(list(x, y))
baseurl <- "https://projects.propublica.org/nonprofits/api/v2/search.json" pages <- list() for(i in 0:20){ mydata <- fromJSON(paste0(baseurl, "?order=revenue&sort_order=desc&page=", i)) message("Retrieving page ", i)
read_json 5
pages[[i+1]] <- mydata$organizations } organizations <- rbind_pages(pages) nrow(organizations) colnames(organizations)
read_json Read/write JSON
Description
These functions are similar to toJSON() and fromJSON() except they explicitly distinguish between path and literal input, and do not simplify by default.
Usage
read_json(path, simplifyVector = FALSE, ...)
parse_json(json, simplifyVector = FALSE, ...)
write_json(x, path, ...)
Arguments
path file on disk simplifyVector simplifies nested lists into vectors and data frames. See fromJSON(). ... additional conversion arguments, see also toJSON() or fromJSON() json string with literal json or connection object to read from x an object to be serialized to JSON
See Also
fromJSON(), stream_in()
Examples
tmp <- tempfile() write_json(iris, tmp)
read_json(tmp)
read_json(tmp, simplifyVector = TRUE)
stream_in, stream_out 7
stream_in, stream_out Streaming JSON input/output
Description
The stream_in and stream_out functions implement line-by-line processing of JSON data over a connection, such as a socket, url, file or pipe. JSON streaming requires the ndjson format, which slightly differs from fromJSON() and toJSON(), see details.
Usage
stream_in(con, handler = NULL, pagesize = 500, verbose = TRUE, ...)
stream_out(x, con = stdout(), pagesize = 500, verbose = TRUE, prefix = "", ...)
Arguments
con a connection object. If the connection is not open, stream_in and stream_out will automatically open and later close (and destroy) the connection. See details. handler a custom function that is called on each page of JSON data. If not specified, the default handler stores all pages and binds them into a single data frame that will be returned by stream_in. See details. pagesize number of lines to read/write from/to the connection per iteration. verbose print some information on what is going on. ... arguments for fromJSON() and toJSON() that control JSON formatting/parsing where applicable. Use with caution. x object to be streamed out. Currently only data frames are supported. prefix string to write before each line (use "\u001e" to write rfc7464 text sequences)
Details
Because parsing huge JSON strings is difficult and inefficient, JSON streaming is done using lines of minified JSON records, a.k.a. ndjson. This is pretty standard: JSON databases such as dat or MongoDB use the same format to import/export datasets. Note that this means that the total stream combined is not valid JSON itself; only the individual lines are. Also note that because line-breaks are used as separators, prettified JSON is not permitted: the JSON lines must be minified. In this respect, the format is a bit different from fromJSON() and toJSON() where all lines are part of a single JSON structure with optional line breaks. The handler is a callback function which is called for each page (batch) of JSON data with exactly one argument (usually a data frame with pagesize rows). If handler is missing or NULL, a default handler is used which stores all intermediate pages of data, and at the very end binds all pages together into one single data frame that is returned by stream_in. When a custom handler function is specified, stream_in does not store any intermediate results and always returns NULL. It is then up to the handler to process or store data pages. A handler function that does not store intermediate
8 stream_in, stream_out
results in memory (for example by writing output to another connection) results in a pipeline that can process an unlimited amount of data. See example. Note that a vector of JSON strings already in R can parsed with stream_in by creating a connection to it with textConnection(). If a connection is not opened yet, stream_in and stream_out will automatically open and later close the connection. Because R destroys connections when they are closed, they cannot be reused. To use a single connection for multiple calls to stream_in or stream_out, it needs to be opened beforehand. See example.
Value
The stream_out function always returns NULL. When no custom handler is specified, stream_in returns a data frame of all pages binded together. When a custom handler function is specified, stream_in always returns NULL.
References
MongoDB export format: https://docs.mongodb.com/database-tools/mongoexport/ Documentation for the JSON Lines text file format: https://jsonlines.org/
See Also
fromJSON(), read_json()
Examples
x <- iris[1:3,] toJSON(x) stream_out(x)
mydata <- stream_in(url("http://httpbin.org/stream/100"))
#stream large dataset to file and back library(nycflights13) stream_out(flights, file(tmp <- tempfile())) flights2 <- stream_in(file(tmp)) unlink(tmp) all.equal(flights2, as.data.frame(flights))
diamonds2 <- stream_in(url("http://jeroen.github.io/data/diamonds.json"))
flights3 <- stream_in(gzcon(url("http://jeroen.github.io/data/nycflights13.json.gz"))) all.equal(flights3, as.data.frame(flights))
10 toJSON, fromJSON
Usage
fromJSON( txt, simplifyVector = TRUE, simplifyDataFrame = simplifyVector, simplifyMatrix = simplifyVector, flatten = FALSE, ... )
toJSON( x, dataframe = c("rows", "columns", "values"), matrix = c("rowmajor", "columnmajor"), Date = c("ISO8601", "epoch"), POSIXt = c("string", "ISO8601", "epoch", "mongo"), factor = c("string", "integer"), complex = c("string", "list"), raw = c("base64", "hex", "mongo", "int", "js"), null = c("list", "null"), na = c("null", "string"), auto_unbox = FALSE, digits = 4, pretty = FALSE, force = FALSE, ... )
Arguments
txt a JSON string, URL or file simplifyVector coerce JSON arrays containing only primitives into an atomic vector simplifyDataFrame coerce JSON arrays containing only records (JSON objects) into a data frame simplifyMatrix coerce JSON arrays containing vectors of equal mode and dimension into matrix or array flatten automatically flatten() nested data frames into a single non-nested data frame ... arguments passed on to class specific print methods x the object to be encoded dataframe how to encode data.frame objects: must be one of ’rows’, ’columns’ or ’values’ matrix how to encode matrices and higher dimensional arrays: must be one of ’rowma- jor’ or ’columnmajor’. Date how to encode Date objects: must be one of ’ISO8601’ or ’epoch’ POSIXt how to encode POSIXt (datetime) objects: must be one of ’string’, ’ISO8601’, ’epoch’ or ’mongo’
toJSON, fromJSON 11
factor how to encode factor objects: must be one of ’string’ or ’integer’ complex how to encode complex numbers: must be one of ’string’ or ’list’ raw how to encode raw objects: must be one of ’base64’, ’hex’ or ’mongo’ null how to encode NULL values within a list: must be one of ’null’ or ’list’ na how to print NA values: must be one of ’null’ or ’string’. Defaults are class specific auto_unbox automatically unbox() all atomic vectors of length 1. It is usually safer to avoid this and instead use the unbox() function to unbox individual elements. An exception is that objects of class AsIs (i.e. wrapped in I()) are not automatically unboxed. This is a way to mark single values as length-1 arrays. digits max number of decimal digits to print for numeric values. Use I() to specify significant digits. Use NA for max precision. pretty adds indentation whitespace to JSON output. Can be TRUE/FALSE or a number specifying the number of spaces to indent. See prettify() force unclass/skip objects of classes with no defined JSON mapping
Details
The toJSON() and fromJSON() functions are drop-in replacements for the identically named func- tions in packages rjson and RJSONIO. Our implementation uses an alternative, somewhat more consistent mapping between R objects and JSON strings. The serializeJSON() and unserializeJSON() functions in this package use an alternative sys- tem to convert between R objects and JSON, which supports more classes but is much more verbose. A JSON string is always unicode, using UTF-8 by default, hence there is usually no need to escape any characters. However, the JSON format does support escaping of unicode characters, which are encoded using a backslash followed by a lower case "u" and 4 hex characters, for example: "Z\u00FCrich". The fromJSON function will parse such escape sequences but it is usually prefer- able to encode unicode characters in JSON using native UTF-8 rather than escape sequences.
References
Jeroen Ooms (2014). The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects. arXiv:1403.2805. https://arxiv.org/abs/1403.
See Also
read_json(), stream_in()
Examples
jsoncars <- toJSON(mtcars, pretty=TRUE) cat(jsoncars)
fromJSON(jsoncars)
validate 13
Details
It is usually recommended to avoid this function and stick with the default encoding schema for the various R classes. The only use case for this function is if you are bound to some specific predefined JSON structure (e.g. to submit to an API), which has no natural R representation. Note that the default encoding for data frames naturally results in a collection of key-value pairs, without using unbox.
Value
Returns a singleton version of x.
References
https://en.wikipedia.org/wiki/Singleton_(mathematics)
Examples
toJSON(list(foo=123)) toJSON(list(foo=unbox(123)))
x = list(x=1:3, y = 4, z = "foo", k = NULL) toJSON(x) toJSON(x, auto_unbox = TRUE)
x <- iris[1,] toJSON(list(rec=x)) toJSON(list(rec=unbox(x)))
validate Validate JSON
Description
Test if a string contains valid JSON. Characters vectors will be collapsed into a single string.
Usage
validate(txt)
Arguments
txt JSON string
14 validate
Examples
#Output from toJSON and serializeJSON should pass validation myjson <- toJSON(mtcars) validate(myjson) #TRUE
#Something bad happened truncated <- substring(myjson, 1, 100) validate(truncated) #FALSE