headless browser

{{Short description|Web browser without a graphical user interface}}

{{Distinguish|Text-based web browser}}

A headless browser is a web browser without a graphical user interface.

Headless browsers provide automated control of a web page in an environment similar to popular web browsers, but they are executed via a command-line interface or using network communication. They are particularly useful for testing web pages as they are able to render and understand HTML the same way a browser would, including styling elements such as page layout, color, font selection and execution of JavaScript and Ajax which are usually not available when using other testing methods.{{cite web|url=http://blog.arhg.net/2009/10/what-is-headless-browser.html|title=What is a headless browser?|work=arhg.net|date=7 October 2009 }}

Since version 59 of Google Chrome{{cite web|url=https://developers.google.com/web/updates/2017/04/headless-chrome|title=Getting Started with Headless Chrome|work=developers.google.com|date=27 April 2017 }} and version 56{{cite web|url=https://developer.mozilla.org/en-US/Firefox/Releases/56#Other|title=Firefox 56 release notes|work=developer.mozilla.org|date=26 February 2023 }} of Firefox,{{cite web|url=https://developer.mozilla.org/en-US/Firefox/Headless_mode#Browser_support|title=Headless mode - browser support|work=developer.mozilla.org|access-date=2017-08-31|archive-date=2018-06-03|archive-url=https://web.archive.org/web/20180603191617/https://developer.mozilla.org/en-US/Firefox/Headless_mode#Browser_support|url-status=dead}} there is native support for remote control of the browser. This made earlier efforts obsolete, notably PhantomJS.{{cite web|url=http://phantomjs.org/quick-start.html|title=Quick Start|work=phantomjs.org}}

Use cases

The main use cases for headless browsers are:

=Other uses=

Headless browsers are also useful for web scraping. Google stated in 2009 that using a headless browser could help their search engine index content from websites that use Ajax.{{cite web|url=http://googlewebmastercentral.blogspot.com.au/2009/10/proposal-for-making-ajax-crawlable.html|title=Official Google Webmaster Central Blog: A proposal for making AJAX crawlable|work=Official Google Webmaster Central Blog|date=2009-10-07|author=Mueller, John}}

Headless browsers have also been misused in various ways:

  • Perform DDoS attacks on web sites.{{cite web|url=http://www.business2community.com/tech-gadgets/headless-browser-botnet-used-150-hour-ddos-attack-0688767|title=Headless Browser Botnet Used in 150 hour DDoS attack|work=Business 2 Community|date=2013-11-20|author=Rawlings, Matt}}
  • Increase advertisement impressions.{{cite web|url=http://www.ecommercetimes.com/story/80194.html|title=Headless Web Traffic Threatens Internet Economy|work=ecommercetimes.com|date=2014-03-25|author=Mello Jr., John P.}}
  • Automate web sites in unintended ways{{cite web|url=http://www.itproportal.com/2014/04/01/headless-browsers-legitimate-software-enables-attack/|title=Headless browsers: legitimate software that enables attack|work=ITProPortal|author=Raywood, Dan|date=2014-04-01}} e.g. for credential stuffing.{{cite web|url=https://www.owasp.org/index.php/Credential_stuffing|title=Credential stuffing|work=owasp.org|author=Mueller, Neal}}

However, a study of browser traffic in 2018 found no preference by malicious actors for headless browsers.{{Cite web|last=Bekerman|first=Dima|date=2018-11-28|title=Headless Chrome: DevOps Love It, So Do Hackers, Here's Why {{!}} Imperva|url=https://www.imperva.com/blog/headless-chrome-devops-love-it-so-do-hackers-heres-why/|access-date=2021-02-22|website=Blog|language=en-US}} There is no indication that headless browsers are used more frequently than non-headless browsers for malicious purposes, like DDoS attacks, SQL injections or cross-site scripting attacks.

Usage

As several major browsers natively support headless mode through APIs, some software exists to perform browser automation through a unified interface. These include:

  • Selenium WebDriver - a W3C compliant implementation of WebDriver{{cite web|url=https://www.lambdatest.com/blog/selenium4-w3c-webdriver-protocol/|title=Selenium 4 Is Now W3C Compliant: All You Need To Know|author=Sheth, Himanshu|date=2020-11-17}}
  • Playwright - a Node.js library to automate Chromium, Firefox and WebKit{{cite web|url=https://github.com/microsoft/playwright|title=GitHub - Playwright|website=GitHub|access-date=2021-04-11}}
  • Puppeteer - a Node.js library to automate Chrome{{cite web|url=https://github.com/puppeteer/puppeteer|title=Github - Puppeteer|website=GitHub|access-date=2021-04-11}}

=Test automation =

Some test automation software and frameworks include headless browsers as part of their testing apparati.

  • Capybara uses headless browsing, either via WebKit or Headless Chrome to mimic user behavior in its testing protocols.{{Cite web|last=Silva|first=Francisco|date=2019-05-29|title=From capybara-webkit to Headless Chrome and ChromeDriver|url=https://www.imaginarycloud.com/blog/from-capybara-webkit-to-headless-chrome-and-chromedriver/|access-date=2021-02-22|website=Blog {{!}} Imaginary Cloud|language=en}}
  • Jasmine uses Selenium by default, but can use WebKit or Headless Chrome, to run browser tests.{{Cite web|last=Bintz|first=John|title=jasmine-headless-webkit -- The fastest way to run your Jasmine specs!|url=https://johnbintz.github.io/jasmine-headless-webkit/|access-date=2021-02-22|website=johnbintz.github.io}}
  • Cypress, a frontend testing framework
  • QF-Test, a software tool for automated testing of programs via the graphical user interface where a headless browser can also be used for testing.

=Alternatives=

Another approach is to use software that provides browser APIs. For example, Deno provides browser APIs as part of its design. For Node.js, jsdom{{cite web|url=https://github.com/jsdom/jsdom#pretending-to-be-a-visual-browser|title=JSDOM at GitHub - Pretending to be a visual browser|website=GitHub|access-date=2021-04-18}} is the most complete provider. While most are able to support common browser features (HTML parsing, cookies, XHR, some JavaScript, etc.), they do not render the DOM and have limited support for DOM events. They usually perform faster than full browsers, but are unable to correctly interpret many popular websites.{{cite web|url=https://github.com/assaf/zombie/tree/v3.0.15#faq|title=assaf/zombie|work=GitHub}}{{cite web|url=http://www.envjs.com/doc/guides|title=ヘルペスが口や目からうつる?感染した時の症状と病院の治療方法とは|website=www.envjs.com|access-date=2015-03-13|archive-url=https://web.archive.org/web/20150223094542/http://www.envjs.com/doc/guides|archive-date=2015-02-23|url-status=dead}}{{cite web|url=http://www.javascriptmvc.com/docs/funcunit.envjs.html|title=JavaScriptMVC - EnvJS|work=javascriptmvc.com|access-date=2015-03-13|archive-date=2015-05-23|archive-url=https://web.archive.org/web/20150523003635/http://www.javascriptmvc.com/docs/funcunit.envjs.html|url-status=dead}}

Another is HtmlUnit, a headless browser written in Java. HtmlUnit uses the Rhino engine to provide JavaScript and Ajax support as well as partial rendering capability.{{cite web|url=http://htmlunit.sourceforge.net/|title=HtmlUnit – Welcome to HtmlUnit|author=Mike Bowler|work=sourceforge.net}}{{cite web|url=http://vaadin.com/download/release/7.3/7.3.4/docs/api/com/google/gwt/junit/Platform.html|title=Platform (Vaadin 7.3.4 API)|date=6 November 2014|work=vaadin.com}}

List of headless browsers

These are various software that provide headless browser APIs.

  • Splash is a headless web browser written in Python using the WebKit layout engine via Qt. It has an HTTP API, Lua scripting support and a built-in IPython (Jupyter)-based IDE. Development started at ScrapingHub in 2013; it is partially funded by DARPA.{{cite web|url=https://github.com/scrapinghub/splash|title=scrapinghub/splash|work=GitHub|date=20 December 2021}}{{Cite web |url=http://www.darpa.mil/opencatalog/MEMEX.html |title=DARPA - Open Catalog |access-date=2015-05-28 |archive-url=https://web.archive.org/web/20150528223527/http://www.darpa.mil/opencatalog/MEMEX.html |archive-date=2015-05-28 |url-status=dead }}
  • Zombie.js is a simulated browser environment for Node.js.{{cite web|url=http://zombie.labnotes.org/|title=Zombie|work=labnotes.org}}
  • SimpleBrowser is a headless web browser written in C# supporting .NET Standard 2.0{{Citation|title=SimpleBrowserDotNet/SimpleBrowser|date=2021-02-10|url=https://github.com/SimpleBrowserDotNet/SimpleBrowser|publisher=SimpleBrowserDotNet|access-date=2021-02-22}}
  • DotNetBrowser is a proprietary .NET Chromium-based library that provides the off-screen rendering mode and can be used without embedding or displaying windows.{{Citation|title=DotNetBrowser Examples|date=2021-03-12|url=https://github.com/TeamDev-IP/DotNetBrowser-Examples|publisher=TeamDev|access-date=2021-03-12}}{{cite web|url=https://www.teamdev.com/dotnetbrowser|title=DotNetBrowser|date=2021-05-05|publisher=TeamDev}}

Another noted earlier effort was envjs in 2008 from John Resig, which was a simulated browser environment written in JavaScript for the Rhino engine.{{cite web|url=https://github.com/jeresig/env-js|title=env-js: A pure-JavaScript browser environment|first=John|last=Resig|date=2008-10-12|publisher=|via=GitHub}}

See also

References

{{reflist|30em}}

Category:Web browsers