Packages

class HtmlUnitBrowser extends Browser

A Browser implementation based on HtmlUnit, a GUI-less browser for Java programs. HtmlUnitBrowser simulates thoroughly a web browser, executing JavaScript code in the pages besides parsing and modelling its HTML content. It supports several compatibility modes, allowing it to emulate browsers such as Internet Explorer.

Both the net.ruippeixotog.scalascraper.model.Document and the net.ruippeixotog.scalascraper.model.Element instances obtained from HtmlUnitBrowser can be mutated in the background. JavaScript code can at any time change attributes and the content of elements, reflected both in queries to Document and on previously stored references to Elements. The Document instance will always represent the current page in the browser's "window". This means the Document's location value can change, together with its root element, in the event of client-side page refreshes or redirections. However, Element instances belong to a fixed DOM tree and they stop being meaningful as soon as they are removed from the DOM or a client-side page reload occurs.

Linear Supertypes
Browser, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. HtmlUnitBrowser
  2. Browser
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new HtmlUnitBrowser(browserType: BrowserVersion = BrowserVersion.CHROME, proxy: Option[ProxyConfig] = None)

    browserType

    the browser type and version to simulate

    proxy

    an optional proxy configuration to use

Type Members

  1. type DocumentType = HtmlUnitDocument

    The concrete type of documents created by this browser.

    The concrete type of documents created by this browser.

    Definition Classes
    HtmlUnitBrowserBrowser

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clearCookies(): Unit

    Clears the cookie store of this browser.

    Clears the cookie store of this browser.

    Definition Classes
    HtmlUnitBrowserBrowser
  6. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @HotSpotIntrinsicCandidate() @native()
  7. def closeAll(): Unit

    Closes all windows opened in this browser.

  8. def cookies(url: String): Map[String, String]

    Returns the current set of cookies stored in this browser for a given URL.

    Returns the current set of cookies stored in this browser for a given URL.

    url

    the URL whose stored cookies are to be returned

    returns

    a mapping of cookie names to their respective values.

    Definition Classes
    HtmlUnitBrowserBrowser
  9. def defaultClientSettings(client: WebClient): Unit
    Attributes
    protected[this]
  10. def defaultRequestSettings(req: WebRequest): Unit
    Attributes
    protected[this]
  11. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  12. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  13. def exec(req: WebRequest): HtmlUnitDocument
  14. def get(url: String): HtmlUnitDocument

    Retrieves and parses a web page using a GET request.

    Retrieves and parses a web page using a GET request.

    url

    the URL of the page to retrieve

    returns

    a Document containing the retrieved web page.

    Definition Classes
    HtmlUnitBrowserBrowser
  15. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @HotSpotIntrinsicCandidate() @native()
  16. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @HotSpotIntrinsicCandidate() @native()
  17. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  18. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  19. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @HotSpotIntrinsicCandidate() @native()
  20. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @HotSpotIntrinsicCandidate() @native()
  21. def parseFile(file: File, charset: String): HtmlUnitDocument

    Parses a local HTML file with a specified charset.

    Parses a local HTML file with a specified charset.

    file

    the HTML file to parse

    charset

    the charset of the file

    returns

    a Document containing the parsed web page.

    Definition Classes
    HtmlUnitBrowserBrowser
  22. def parseFile(path: String): DocumentType

    Parses a local HTML file encoded in UTF-8.

    Parses a local HTML file encoded in UTF-8.

    path

    the path in the local filesystem where the HTML file is located

    returns

    a Document containing the parsed web page.

    Definition Classes
    Browser
  23. def parseFile(path: String, charset: String): DocumentType

    Parses a local HTML file with a specified charset.

    Parses a local HTML file with a specified charset.

    path

    the path in the local filesystem where the HTML file is located

    charset

    the charset of the file

    returns

    a Document containing the parsed web page.

    Definition Classes
    Browser
  24. def parseFile(file: File): DocumentType

    Parses a local HTML file encoded in UTF-8.

    Parses a local HTML file encoded in UTF-8.

    file

    the HTML file to parse

    returns

    a Document containing the parsed web page.

    Definition Classes
    Browser
  25. def parseInputStream(inputStream: InputStream, charset: String): HtmlUnitDocument

    Parses an input stream with its content in a specified charset.

    Parses an input stream with its content in a specified charset. The provided input stream is always closed before this method returns or throws an exception.

    inputStream

    the input stream to parse

    charset

    the charset of the input stream content

    returns

    a Document containing the parsed web page.

    Definition Classes
    HtmlUnitBrowserBrowser
  26. def parseResource(name: String, charset: String = "UTF-8"): DocumentType

    Parses a resource with a specified charset.

    Parses a resource with a specified charset.

    name

    the name of the resource to parse

    charset

    the charset of the resource

    returns

    a Document containing the parsed web page.

    Definition Classes
    Browser
  27. def parseString(html: String): HtmlUnitDocument

    Parses an HTML string.

    Parses an HTML string.

    html

    the HTML string to parse

    returns

    a Document containing the parsed web page.

    Definition Classes
    HtmlUnitBrowserBrowser
  28. def post(url: String, form: Map[String, String]): HtmlUnitDocument

    Submits a form via a POST request and parses the resulting page.

    Submits a form via a POST request and parses the resulting page.

    url

    the URL of the page to retrieve

    form

    a map containing the form fields to submit with their respective values

    returns

    a Document containing the resulting web page.

    Definition Classes
    HtmlUnitBrowserBrowser
  29. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  30. def toString(): String
    Definition Classes
    AnyRef → Any
  31. lazy val underlying: WebClient
  32. def userAgent: String

    The user agent used by this browser to retrieve HTML pages from the web.

    The user agent used by this browser to retrieve HTML pages from the web.

    Definition Classes
    HtmlUnitBrowserBrowser
  33. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  34. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  35. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  36. def withProxy(proxy: Proxy): HtmlUnitBrowser

    Returns a new browser that uses the provided proxy for all connections.

    Returns a new browser that uses the provided proxy for all connections.

    Definition Classes
    HtmlUnitBrowserBrowser

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

Inherited from Browser

Inherited from AnyRef

Inherited from Any

Ungrouped