public class MarcXmlWriter extends Object implements MarcWriter
OutputStream or
Result object. It can be used in a SAX
pipeline to post-process the result. By default this class uses a null
transform. It is strongly recommended to use a dedicated XML serializer.
This class requires a JAXP compliant XML parser and XSLT processor. The underlying SAX2 parser should be namespace aware.
The following example reads a file with MARC records and writes MARCXML records in UTF-8 encoding to the console:
InputStream input = new FileInputStream("input.mrc")
MarcReader reader = new MarcStreamReader(input);
MarcWriter writer = new MarcXmlWriter(System.out, true);
while (reader.hasNext()) {
Record record = reader.next();
writer.write(record);
}
writer.close();
To perform a character conversion like MARC-8 to UCS/Unicode register a
CharConverter:
writer.setConverter(new AnselToUnicode());
In addition you can perform Unicode normalization. This is for example not done by the MARC-8 to UCS/Unicode converter. With Unicode normalization text is transformed into the canonical composed form. For example "a�bc" is normalized to "�bc". To perform normalization set Unicode normalization to true:
writer.setUnicodeNormalization(true);
Please note that it's not garanteed to work if you try to convert normalized
Unicode back to MARC-8 encoding using
UnicodeToAnsel.
This class provides very basic formatting options. For more advanced options
create an instance of this class with a
SAXResult containing a
ContentHandler derived from a dedicated XML
serializer.
The following example uses
org.apache.xml.serialize.XMLSerializer to write MARC records to
XML using MARC-8 to UCS/Unicode conversion and Unicode normalization:
InputStream input = new FileInputStream("input.mrc")
MarcReader reader = new MarcStreamReader(input);
OutputFormat format = new OutputFormat("xml","UTF-8", true);
OutputStream out = new FileOutputStream("output.xml");
XMLSerializer serializer = new XMLSerializer(out, format);
Result result = new SAXResult(serializer.asContentHandler());
MarcXmlWriter writer = new MarcXmlWriter(result);
writer.setConverter(new AnselToUnicode());
while (reader.hasNext()) {
Record record = reader.next();
writer.write(record);
}
writer.close();
You can post-process the result using a Source object pointing
to a stylesheet resource and a Result object to hold the
transformation result tree. The example below converts MARC to MARCXML and
transforms the result tree to MODS using the stylesheet provided by The
Library of Congress:
String stylesheetUrl = "http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3.xsl";
Source stylesheet = new StreamSource(stylesheetUrl);
Result result = new StreamResult(System.out);
InputStream input = new FileInputStream("input.mrc")
MarcReader reader = new MarcStreamReader(input);
MarcXmlWriter writer = new MarcXmlWriter(result, stylesheet);
writer.setConverter(new AnselToUnicode());
while (reader.hasNext()) {
Record record = (Record) reader.next();
writer.write(record);
}
writer.close();
It is also possible to write the result into a DOM Node:
InputStream input = new FileInputStream("input.mrc")
MarcReader reader = new MarcStreamReader(input);
DOMResult result = new DOMResult();
MarcXmlWriter writer = new MarcXmlWriter(result);
writer.setConverter(new AnselToUnicode());
while (reader.hasNext()) {
Record record = (Record) reader.next();
writer.write(record);
}
writer.close();
Document doc = (Document) result.getNode();
| Modifier and Type | Field and Description |
|---|---|
protected static String |
COLLECTION |
protected static String |
CONTROL_FIELD |
protected static String |
DATA_FIELD |
protected static String |
LEADER |
protected static String |
RECORD |
protected static String |
SUBFIELD |
| Constructor and Description |
|---|
MarcXmlWriter(OutputStream out)
Constructs an instance with the specified output stream.
|
MarcXmlWriter(OutputStream out,
boolean indent)
Constructs an instance with the specified output stream and indentation.
|
MarcXmlWriter(OutputStream out,
String encoding)
Constructs an instance with the specified output stream and character
encoding.
|
MarcXmlWriter(OutputStream out,
String encoding,
boolean indent)
Constructs an instance with the specified output stream, character
encoding and indentation.
|
MarcXmlWriter(Result result)
Constructs an instance with the specified result.
|
MarcXmlWriter(Result result,
Source stylesheet)
Constructs an instance with the specified stylesheet source and result.
|
MarcXmlWriter(Result result,
String stylesheetUrl)
Constructs an instance with the specified stylesheet location and result.
|
| Modifier and Type | Method and Description |
|---|---|
void |
close()
Closes the writer.
|
CharConverter |
getConverter()
Returns the character converter.
|
protected char[] |
getDataElement(String data) |
boolean |
getUnicodeNormalization()
Returns true if this writer will perform Unicode normalization, false
otherwise.
|
boolean |
hasIndent()
Returns true if indentation is active, false otherwise.
|
void |
setConverter(CharConverter converter)
Sets the character converter.
|
protected void |
setHandler(Result result,
Source stylesheet) |
void |
setIndent(boolean indent)
Activates or deactivates indentation.
|
void |
setUnicodeNormalization(boolean normalize)
If set to true this writer will perform Unicode normalization on data
elements using normalization form C (NFC).
|
protected void |
toXml(Record record) |
void |
write(Record record)
Writes a Record object to the result.
|
protected void |
writeEndDocument()
Writes the root end tag to the result.
|
protected void |
writeStartDocument()
Writes the root start tag to the result.
|
protected static final String CONTROL_FIELD
protected static final String DATA_FIELD
protected static final String SUBFIELD
protected static final String COLLECTION
protected static final String RECORD
protected static final String LEADER
public MarcXmlWriter(OutputStream out)
MarcExceptionpublic MarcXmlWriter(OutputStream out, boolean indent)
MarcExceptionpublic MarcXmlWriter(OutputStream out, String encoding)
MarcExceptionpublic MarcXmlWriter(OutputStream out, String encoding, boolean indent)
MarcExceptionpublic MarcXmlWriter(Result result)
result - SAXExceptionpublic MarcXmlWriter(Result result, String stylesheetUrl)
result - SAXExceptionpublic MarcXmlWriter(Result result, Source stylesheet)
result - SAXExceptionpublic void close()
close in interface MarcWriterpublic CharConverter getConverter()
getConverter in interface MarcWriterpublic void setConverter(CharConverter converter)
setConverter in interface MarcWriterconverter - the character converterpublic void setUnicodeNormalization(boolean normalize)
normalize - true if this writer performs Unicode normalization,
false otherwisepublic boolean getUnicodeNormalization()
protected void setHandler(Result result, Source stylesheet) throws MarcException
MarcExceptionprotected void writeStartDocument()
SAXExceptionprotected void writeEndDocument()
SAXExceptionpublic void write(Record record)
write in interface MarcWriterrecord - - the Record objectSAXExceptionpublic boolean hasIndent()
public void setIndent(boolean indent)
indent - protected void toXml(Record record) throws SAXException
SAXExceptionprotected char[] getDataElement(String data)
Copyright © 2014 FreeLibrary. All Rights Reserved.