package browser
- Alphabetic
- Public
- Protected
Type Members
- trait Browser extends AnyRef
A client able to retrieve and parse HTML pages from the web and from local resources.
A client able to retrieve and parse HTML pages from the web and from local resources.
An implementation of
Browsercan fetch pages via HTTP GET or POST requests, parse the downloaded page and return a net.ruippeixotog.scalascraper.model.Document instance, which can be queried via the scraper DSL or using its methods.Different net.ruippeixotog.scalascraper.browser.Browser implementations can embed pages with different runtime behavior. For example, some browsers may limit themselves to parse the HTML content inside the page without executing any scripts inside, while others may run JavaScript and allow for
Documentinstances with dynamic content. The documentation of each implementation should be read for more information on the semantics of itsDocumentand net.ruippeixotog.scalascraper.model.Element implementations. - class HtmlUnitBrowser extends Browser
A Browser implementation based on HtmlUnit, a GUI-less browser for Java programs.
A Browser implementation based on HtmlUnit, a GUI-less browser for Java programs.
HtmlUnitBrowsersimulates thoroughly a web browser, executing JavaScript code in the pages besides parsing and modelling its HTML content. It supports several compatibility modes, allowing it to emulate browsers such as Internet Explorer.Both the net.ruippeixotog.scalascraper.model.Document and the net.ruippeixotog.scalascraper.model.Element instances obtained from
HtmlUnitBrowsercan be mutated in the background. JavaScript code can at any time change attributes and the content of elements, reflected both in queries toDocumentand on previously stored references toElements. TheDocumentinstance will always represent the current page in the browser's "window". This means theDocument'slocationvalue can change, together with its root element, in the event of client-side page refreshes or redirections. However,Elementinstances belong to a fixed DOM tree and they stop being meaningful as soon as they are removed from the DOM or a client-side page reload occurs. - class JsoupBrowser extends Browser
A Browser implementation based on jsoup, a Java HTML parser library.
A Browser implementation based on jsoup, a Java HTML parser library.
JsoupBrowserprovides powerful and efficient document querying, but it doesn't run JavaScript in the pages. As such, it is limited to working strictly with the HTML send in the page source.Currently,
JsoupBrowserdoes not keep separate cookie stores for different domains and paths. In each request all cookies set previously will be sent, regardless of the domain they were set on. If you do requests to different domains and do not want this behavior, use differentJsoupBrowserinstances.As the documents parsed by
JsoupBrowserinstances are not changed after loading,DocumentandElementinstances obtained from them are guaranteed to be immutable. - case class Proxy(host: String, port: Int, proxyType: Type) extends Product with Serializable
A proxy configuration to be used by
Browsers.A proxy configuration to be used by
Browsers.- host
the proxy host
- port
the proxy port
- proxyType
the protocol used by a proxy (e.g. HTTP, SOCKS)
Value Members
- object HtmlUnitBrowser
- object JsoupBrowser
- object Proxy extends Serializable