Packages

trait Browser extends AnyRef

A client able to retrieve and parse HTML pages from the web and from local resources.

An implementation of Browser can fetch pages via HTTP GET or POST requests, parse the downloaded page and return a net.ruippeixotog.scalascraper.model.Document instance, which can be queried via the scraper DSL or using its methods.

Different net.ruippeixotog.scalascraper.browser.Browser implementations can embed pages with different runtime behavior. For example, some browsers may limit themselves to parse the HTML content inside the page without executing any scripts inside, while others may run JavaScript and allow for Document instances with dynamic content. The documentation of each implementation should be read for more information on the semantics of its Document and net.ruippeixotog.scalascraper.model.Element implementations.

Linear Supertypes
AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Browser
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Type Members

  1. abstract type DocumentType <: Document

    The concrete type of documents created by this browser.

Abstract Value Members

  1. abstract def clearCookies(): Unit

    Clears the cookie store of this browser.

  2. abstract def cookies(url: String): Map[String, String]

    Returns the current set of cookies stored in this browser for a given URL.

    Returns the current set of cookies stored in this browser for a given URL.

    url

    the URL whose stored cookies are to be returned

    returns

    a mapping of cookie names to their respective values.

  3. abstract def get(url: String): DocumentType

    Retrieves and parses a web page using a GET request.

    Retrieves and parses a web page using a GET request.

    url

    the URL of the page to retrieve

    returns

    a Document containing the retrieved web page.

  4. abstract def parseFile(file: File, charset: String): DocumentType

    Parses a local HTML file with a specified charset.

    Parses a local HTML file with a specified charset.

    file

    the HTML file to parse

    charset

    the charset of the file

    returns

    a Document containing the parsed web page.

  5. abstract def parseInputStream(inputStream: InputStream, charset: String = "UTF-8"): DocumentType

    Parses an input stream with its content in a specified charset.

    Parses an input stream with its content in a specified charset. The provided input stream is always closed before this method returns or throws an exception.

    inputStream

    the input stream to parse

    charset

    the charset of the input stream content

    returns

    a Document containing the parsed web page.

  6. abstract def parseString(html: String): DocumentType

    Parses an HTML string.

    Parses an HTML string.

    html

    the HTML string to parse

    returns

    a Document containing the parsed web page.

  7. abstract def post(url: String, form: Map[String, String]): DocumentType

    Submits a form via a POST request and parses the resulting page.

    Submits a form via a POST request and parses the resulting page.

    url

    the URL of the page to retrieve

    form

    a map containing the form fields to submit with their respective values

    returns

    a Document containing the resulting web page.

  8. abstract def userAgent: String

    The user agent used by this browser to retrieve HTML pages from the web.

  9. abstract def withProxy(proxy: Proxy): Browser

    Returns a new browser that uses the provided proxy for all connections.

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @HotSpotIntrinsicCandidate() @native()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  8. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @HotSpotIntrinsicCandidate() @native()
  9. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @HotSpotIntrinsicCandidate() @native()
  10. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  11. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  12. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @HotSpotIntrinsicCandidate() @native()
  13. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @HotSpotIntrinsicCandidate() @native()
  14. def parseFile(path: String): DocumentType

    Parses a local HTML file encoded in UTF-8.

    Parses a local HTML file encoded in UTF-8.

    path

    the path in the local filesystem where the HTML file is located

    returns

    a Document containing the parsed web page.

  15. def parseFile(path: String, charset: String): DocumentType

    Parses a local HTML file with a specified charset.

    Parses a local HTML file with a specified charset.

    path

    the path in the local filesystem where the HTML file is located

    charset

    the charset of the file

    returns

    a Document containing the parsed web page.

  16. def parseFile(file: File): DocumentType

    Parses a local HTML file encoded in UTF-8.

    Parses a local HTML file encoded in UTF-8.

    file

    the HTML file to parse

    returns

    a Document containing the parsed web page.

  17. def parseResource(name: String, charset: String = "UTF-8"): DocumentType

    Parses a resource with a specified charset.

    Parses a resource with a specified charset.

    name

    the name of the resource to parse

    charset

    the charset of the resource

    returns

    a Document containing the parsed web page.

  18. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  19. def toString(): String
    Definition Classes
    AnyRef → Any
  20. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  21. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  22. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

Inherited from AnyRef

Inherited from Any

Ungrouped