| Constructor and Description |
|---|
ReaderTools() |
| Modifier and Type | Method and Description |
|---|---|
static InputStreamReader |
createReaderScanBom(InputStream is)
Try to detect the unicode transformation format (UTF encoding) from the
BOM.
|
static InputStreamReader |
createReaderScanMeta(InputStream is)
Try to detect the input stream encoding from the meta tags "$$$" embedded
in the stream.
|
static TaggedReader |
createTaggedReader(InputStream is,
String defaultCharsetName,
int size)
Create a
TaggedReader and automatically detect the encoding from
different heuristics. |
static Map.Entry<String,String> |
readEntry(Reader reader,
char delimiter)
Read a Map.Entry object from r.
|
static Map<String,String> |
readMetaData(Reader reader)
Try to detect meta data embedded in the input.
|
static String |
readMetaEncoding(Reader reader)
Try to detect encoding specific meta data embedded in the input.
|
static String |
readToken(Reader reader,
char delimiter)
Read a string token from r.
|
public static InputStreamReader createReaderScanBom(InputStream is) throws IOException
The InputStream is must support the mark operation!
For BOM marker bytes, see http://unicode.org/faq/utf_bom.html
Bytes Encoding Form 00 00 FE FF UTF-32, big-endian FF FE 00 00 UTF-32, little-endian FE FF UTF-16, big-endian FF FE UTF-16, little-endian EF BB BF UTF-8
is - InputStreamReader with the correct encodingIOExceptionpublic static InputStreamReader createReaderScanMeta(InputStream is) throws IOException
The InputStream is must support the mark operation!
is - InputStreamReader with the correct encodingIOExceptionpublic static TaggedReader createTaggedReader(InputStream is, String defaultCharsetName, int size) throws IOException
TaggedReader and automatically detect the encoding from
different heuristics. First, the BOM markers are checked, then embedded
meta information is scanned.
If no encoding can be guessed, either the defaultCharsetName or the platform encoding is used.
Meta information tags (lines starting with '$$$') are scanned.
is - defaultCharsetName - TaggedReader with the correct encodingIOExceptionpublic static Map.Entry<String,String> readEntry(Reader reader, char delimiter) throws IOException
The syntax for an entry is
ws* key ws* '=' value [delimiter | EOF] value = string | quoted_string quoted_string = '"' [ char | escape ]* '"' escape = '\' escape_char escape_char = '"' | '\' | 'n' | 'r' | 't' | '\n' | '\r' | '\t'
reader - delimiter - IOExceptionpublic static Map<String,String> readMetaData(Reader reader) throws IOException
Meta data lines start with a '$$$' immediately at the line beginning and end at the line end. Meta data lines are scanned until a line without meta data is found. Meta data is encoded as entries (as provided in readEntry method).
The maximum length for a meta data line is 1024.
After execution reader is either positioned after the last meta tag. The reader instance must support the "mark/reset" sequence.
reader - MapIOExceptionpublic static String readMetaEncoding(Reader reader) throws IOException
After execution reader is either positioned at the start or after the "encoding" meta tag. The reader instance must support the "mark/reset" sequence.
For more information on meta data see readMetaData.
reader - IOExceptionpublic static String readToken(Reader reader, char delimiter) throws IOException
value [delimiter | EOF] value = string | quoted_string quoted_string = '"' [ char | escape ]* '"' escape = '\' escape_char escape_char = '"' | '\' | 'n' | 'r' | 't' | '\n' | '\r' | '\t'
reader - delimiter - IOExceptionCopyright © 2013 intarsys consulting GmbH. All Rights Reserved.