vignettes/intro.Rmd
intro.Rmd
The crrri
package provides a Chrome Remote Interface for R. It is inspired by the node.js module chrome-remote-interface
.
This vignette aims to show several examples of usage for crrri
.
All the examples come from the chrome-remote-interface
or puppeteer
documentations. This vignette shows how to reproduce those using crrri
.
It is better to set up beforehand the HEADLESS_CHROME
environment variable to a Chromium/Chrome binary on our system that crrri
will use. If you do not, you can provide the path to a Chromium/Chrome binary in Chrome$new()
or let the package guess using its find_chrome_binary()
.
The default behavior of crrri
is equivalent to setting the environment variable like this
Sys.setenv(HEADLESS_CHROME = crrri::find_chrome_binary())
We need to load crrri
and also promises
to have the tools to deals with promises that crrri
is based on.
This first example is inspired from this post that uses the chrome-remote-interface
node.js package.
The first step is to launch Chromium/Chrome in headless mode:
chrome <- Chrome$new()
Then connect R to headless Chromium/Chrome with the connect()
method. Since the connection process is not immediate, the connect()
method returns a promise that is fulfilled when R is connected to Chrome. The value of this promise is the connection object.
client <- chrome$connect()
You need to write a function whose first parameter will receive the client
connection object.
screenshot_file <- tempfile(fileext = ".png") screenshot <- function(client) { # some constants targetUrl <- "https://cran.rstudio.com" viewport <- c(1440, 900) screenshotDelay <- 2 # seconds # extract the domain you need Page <- client$Page Emulation <- client$Emulation # enable events for the Page, DOM and Network domains Page$enable() %...>% { # modify the viewport settings Emulation$setDeviceMetricsOverride( width = viewport[1], height = viewport[2], deviceScaleFactor = 0, mobile = FALSE, dontSetVisibleSize = FALSE ) } %...>% { # go to url Page$navigate(targetUrl) # wait the page is loaded Page$loadEventFired() } %>% # add a delay wait(delay = screenshotDelay) %...>% { # capture screenshot Page$captureScreenshot(format = "png", fromSurface = TRUE) } %...>% { .$data %>% jsonlite::base64_dec() %>% writeBin(screenshot_file) } %>% # close headless chrome (client connections are safely closed) finally( ~ client$disconnect() ) %...!% { cat("Error:", .$message, "\n") } }
Therefore, you can take a screenshot by executing this screenshot()
function:
client %...>% screenshot()
The screenshot is written to disk and looks like this:
This example is inspired from this JavaScript script from the chrome-remote-interface
wiki that dumps the DOM.
html_file <- tempfile(fileext = ".html") client <- chrome$connect() dump_DOM <- function(client) { Network <- client$Network Page <- client$Page Runtime <- client$Runtime Network$enable() %...>% { Page$enable() } %...>% { Network$setCacheDisabled(cacheDisabled = TRUE) } %...>% { Page$navigate(url = "https://github.com") } %...>% { Page$loadEventFired() } %...>% { Runtime$evaluate( expression = 'document.documentElement.outerHTML' ) } %...>% { writeLines(c(.$result$value, "\n"), con = html_file) } %>% finally( ~ client$disconnect() ) %...!% { cat("Error:", .$message, "\n") } }
Execute the task:
client %...>% dump_DOM()
Here is the first 20 lines of what we get in html_file
:
<html lang="en"><head>
<meta charset="utf-8">
<link rel="dns-prefetch" href="https://assets-cdn.github.com">
<link rel="dns-prefetch" href="https://avatars0.githubusercontent.com">
<link rel="dns-prefetch" href="https://avatars1.githubusercontent.com">
<link rel="dns-prefetch" href="https://avatars2.githubusercontent.com">
<link rel="dns-prefetch" href="https://avatars3.githubusercontent.com">
<link rel="dns-prefetch" href="https://github-cloud.s3.amazonaws.com">
<link rel="dns-prefetch" href="https://user-images.githubusercontent.com/">
<link crossorigin="anonymous" media="all" integrity="sha512-lLo2nlsdl+bHLu6PGvC2j3wfP45RnK4wKQLiPnCDcuXfU38AiD+JCdMywnF3WbJC1jaxe3lAI6AM4uJuMFBLEw==" rel="stylesheet" href="https://assets-cdn.github.com/assets/frameworks-08fc49d3bd2694c870ea23d0906f3610.css">
<link crossorigin="anonymous" media="all" integrity="sha512-4kfWSrzu4OShEnC5m0lqUCfKkZfG7JH0ff4wnEtubTUTZqV5pS5oUMTOvWE2DDL7ttjZ9FpnZInl/0TLO3EIiA==" rel="stylesheet" href="https://assets-cdn.github.com/assets/github-6c1d4c04bb55a87b9cb81ffdbd683662.css">
<link crossorigin="anonymous" media="all" integrity="sha512-PcJMPDRp7jbbEAmTk9kaL2kRQqg69QZ26WsZf07xsPyaipKsi3wVG0805PZNYXxotPDAliKKFvNSQPhD8fp1FQ==" rel="stylesheet" href="https://assets-cdn.github.com/assets/site-50c740d9290419d070dd6213a7cd03b5.css">
This could be useful to parse HTML with rvest
after a page is loaded.