public class Icu4jEncodingDetector extends Object implements org.apache.tika.detect.EncodingDetector
| Constructor and Description |
|---|
Icu4jEncodingDetector() |
| Modifier and Type | Method and Description |
|---|---|
Charset |
detect(InputStream input,
org.apache.tika.metadata.Metadata metadata) |
List<String> |
getIgnoreCharsets() |
int |
getMarkLimit() |
int |
getMarkLimt() |
boolean |
isStripMarkup() |
void |
setIgnoreCharsets(List<String> charsetsToIgnore) |
void |
setMarkLimit(int markLimit)
How far into the stream to read for charset detection.
|
void |
setStripMarkup(boolean stripMarkup)
Whether or not to attempt to strip html-ish markup
from the stream before sending it to the underlying
detector.
|
public Charset detect(InputStream input, org.apache.tika.metadata.Metadata metadata) throws IOException
detect in interface org.apache.tika.detect.EncodingDetectorIOExceptionpublic boolean isStripMarkup()
@Field public void setStripMarkup(boolean stripMarkup)
The underlying detector may still apply its own stripping
if this is set to false.
stripMarkup - whether or not to attempt to strip markup before
sending the stream to the underlying detectorpublic int getMarkLimit()
@Field public void setMarkLimit(int markLimit)
markLimit - public int getMarkLimt()
Copyright © 2007–2023 The Apache Software Foundation. All rights reserved.