trait Browser extends AnyRef
A client able to retrieve and parse HTML pages from the web and from local resources.
An implementation of Browser can fetch pages via HTTP GET or POST requests, parse the downloaded page and return a
net.ruippeixotog.scalascraper.model.Document instance, which can be queried via the scraper DSL or using its
methods.
Different net.ruippeixotog.scalascraper.browser.Browser implementations can embed pages with different runtime
behavior. For example, some browsers may limit themselves to parse the HTML content inside the page without
executing any scripts inside, while others may run JavaScript and allow for Document instances with dynamic
content. The documentation of each implementation should be read for more information on the semantics of its
Document and net.ruippeixotog.scalascraper.model.Element implementations.
- Alphabetic
- By Inheritance
- Browser
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Type Members
Abstract Value Members
- abstract def clearCookies(): Unit
Clears the cookie store of this browser.
- abstract def cookies(url: String): Map[String, String]
Returns the current set of cookies stored in this browser for a given URL.
Returns the current set of cookies stored in this browser for a given URL.
- url
the URL whose stored cookies are to be returned
- returns
a mapping of cookie names to their respective values.
- abstract def get(url: String): DocumentType
Retrieves and parses a web page using a GET request.
Retrieves and parses a web page using a GET request.
- url
the URL of the page to retrieve
- returns
a
Documentcontaining the retrieved web page.
- abstract def parseFile(file: File, charset: String): DocumentType
Parses a local HTML file with a specified charset.
Parses a local HTML file with a specified charset.
- file
the HTML file to parse
- charset
the charset of the file
- returns
a
Documentcontaining the parsed web page.
- abstract def parseInputStream(inputStream: InputStream, charset: String = "UTF-8"): DocumentType
Parses an input stream with its content in a specified charset.
Parses an input stream with its content in a specified charset. The provided input stream is always closed before this method returns or throws an exception.
- inputStream
the input stream to parse
- charset
the charset of the input stream content
- returns
a
Documentcontaining the parsed web page.
- abstract def parseString(html: String): DocumentType
Parses an HTML string.
Parses an HTML string.
- html
the HTML string to parse
- returns
a
Documentcontaining the parsed web page.
- abstract def post(url: String, form: Map[String, String]): DocumentType
Submits a form via a POST request and parses the resulting page.
Submits a form via a POST request and parses the resulting page.
- url
the URL of the page to retrieve
- form
a map containing the form fields to submit with their respective values
- returns
a
Documentcontaining the resulting web page.
- abstract def userAgent: String
The user agent used by this browser to retrieve HTML pages from the web.
- abstract def withProxy(proxy: Proxy): Browser
Returns a new browser that uses the provided proxy for all connections.
Concrete Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @HotSpotIntrinsicCandidate() @native()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @HotSpotIntrinsicCandidate() @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @HotSpotIntrinsicCandidate() @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @HotSpotIntrinsicCandidate() @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @HotSpotIntrinsicCandidate() @native()
- def parseFile(path: String): DocumentType
Parses a local HTML file encoded in UTF-8.
Parses a local HTML file encoded in UTF-8.
- path
the path in the local filesystem where the HTML file is located
- returns
a
Documentcontaining the parsed web page.
- def parseFile(path: String, charset: String): DocumentType
Parses a local HTML file with a specified charset.
Parses a local HTML file with a specified charset.
- path
the path in the local filesystem where the HTML file is located
- charset
the charset of the file
- returns
a
Documentcontaining the parsed web page.
- def parseFile(file: File): DocumentType
Parses a local HTML file encoded in UTF-8.
Parses a local HTML file encoded in UTF-8.
- file
the HTML file to parse
- returns
a
Documentcontaining the parsed web page.
- def parseResource(name: String, charset: String = "UTF-8"): DocumentType
Parses a resource with a specified charset.
Parses a resource with a specified charset.
- name
the name of the resource to parse
- charset
the charset of the resource
- returns
a
Documentcontaining the parsed web page.
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
Deprecated Value Members
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
(Since version 9)