Code
library(tidyverse)
library(httr2)
library(DT)
First let’s describe an example querying a single ISSN. Here, the object called issn_list is list of three ISSN as a minimal example.
Probably you will need to deal with a longer list of ISSN; you can copy your list of ISSN as a .csv file to the main directory of this workbook and use the following command to read the data into your work environment.
1issn_list <- read.csv("the name of your file.csv")
Then we can define the base URL and an specific endpoint that we will query
For this example, we will query only the first element of our list.
And then, we can extract the body from the response in JSON format.
Now, we can extract specific information from the response, for example, as a table. The following chunk will extract infromation from the ‘message’ level of the response, and then, it will apply a custom function to build a table with some elements of interest:
my_table |>
pluck('message') |>
{\(y) {
#browser()
tibble(
"issn" = y |> pluck("ISSN"),
"publisher" = y |> pluck("publisher"),
"current_dois" = y |> pluck("counts", "current-dois"),
"backfile_dois" = y |> pluck("counts", "backfile-dois"),
"total_dois" = y |> pluck("counts", "total-dois")
) |>
cbind(
y |>
pluck('coverage-type') |>
unlist() |>
data.frame() |>
rownames_to_column('variable') |>
pivot_wider(names_from = variable,
values_from = `unlist.pluck.y...coverage.type...`)
)
}}() |>
datatable()
We can use these steps as a reference to use more powerful functions that will let us make all our requests sequentially. The first thing that we’ll do is build (not execute) a list of requests, including adding a rate variable that will keep our requests within the safe limits of the Public and Polite API.
We limit each IP address to sending 50 reqs/sec. Please avoid getting blocked by identifying yourself and keeping your rates below this value.
list_queries <- issn_list |>
map(\(x){
request("https://api.crossref.org/journals/") |>
req_url_path_append(x) |>
req_url_query() |>
1 req_throttle(rate = 30/60)
})
Now that we have our list of requests, and that each includes a rate specification, we can use an specific function to sequentially perform each of those queries.
Waiting 2s for throttling delay ■■■■■■■■■■■■■■■
Waiting 2s for throttling delay ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
This new object contains a list of retrieved queries. We can turn this into JSON files and explore their content. For example, let’s do this for the first element of the list.
We can write a custom function to loop over our new list and:
my_df <- my_item |>
map(\(x){
#browser()
resp_body_json(x, simplifyVector = TRUE) |>
pluck('message') |>
{\(y) {
#browser()
tibble(
"issn" = y |> pluck("ISSN"),
"publisher" = y |> pluck("publisher"),
"current_dois" = y |> pluck("counts", "current-dois"),
"backfile_dois" = y |> pluck("counts", "backfile-dois"),
"total_dois" = y |> pluck("counts", "total-dois")
) |>
cbind(
y |>
pluck('coverage-type') |>
unlist() |>
data.frame() |>
rownames_to_column('variable') |>
pivot_wider(names_from = variable,
values_from = `unlist.pluck.y...coverage.type...`)
)
}}()
}) |>
bind_rows()
We can use the package DT to make interactive tables