Package org.apache.tika.metadata
Interface TikaCoreProperties
public interface TikaCoreProperties
Contains a core set of basic Tika metadata properties, which all parsers
will attempt to supply (where the file format permits). These are all
defined in terms of other standard namespaces.
Users of Tika who wish to have consistent metadata across file formats
can make use of these Properties, knowing that where present they will
have consistent semantic meaning between different file formats. (No
matter if one file format calls it Title, another Long-Title and another
Long-Name, if they all mean the same thing as defined by
DublinCore.TITLE then they will all be present as such)
For now, most of these properties are composite ones including the deprecated
non-prefixed String properties from the Metadata class. In Tika 2.0, most
of these will revert back to simple assignments.- Since:
- Apache Tika 1.2
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic enumA file might contain different types of embedded documents. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Propertystatic final Propertystatic final PropertyThis is currently used to identify Content-Type that may be included within a document, such as in html documents (e.g.static final Propertystatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final PropertyEmbedded resource type propertystatic final Stringstatic final Propertystatic final Propertystatic final Propertystatic final PropertyDublinCore.SUBJECT; should include both subject and keywords if a document format has both.static final Propertystatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final PropertySome file formats can store information about their original file name/location or about their attachment's original file name/location.static final Propertystatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final Propertystatic final PropertyUse this to store exceptions caught while trying to read the stream of an embedded resource.static final StringUse this to store parse exception information in the Metadata object.static final PropertyUse this to store exceptions caught during a parse that are non-fatal, e.g.static final StringUse this to prefix metadata properties that store information about the parsing process.static final Propertystatic final PropertyDeprecated.use TikaCoreProperties#KEYWORDSstatic final PropertyDeprecated.use TikaCoreProperties#DESCRIPTIONstatic final PropertyDeprecated.use TikaCoreProperties#TITLEstatic final PropertyDeprecated.use OfficeOpenXMLCore#SUBJECTstatic final Property
-
Field Details
-
TIKA_META_PREFIX
Use this to prefix metadata properties that store information about the parsing process. Users should be able to distinguish between metadata that was contained within the document and metadata about the parsing process. In Tika 2.0 (or earlier?), let's change X-ParsedBy to X-TIKA-Parsed-By.- See Also:
-
TIKA_META_EXCEPTION_PREFIX
Use this to store parse exception information in the Metadata object.- See Also:
-
TIKA_META_EXCEPTION_WARNING
Use this to store exceptions caught during a parse that are non-fatal, e.g. if a parser is in lenient mode and more content can be extracted if we ignore an exception thrown by a dependency. -
TIKA_META_EXCEPTION_EMBEDDED_STREAM
Use this to store exceptions caught while trying to read the stream of an embedded resource. Do not use this if there is a parse exception on the embedded resource. -
EMBEDDED_RESOURCE_TYPE_KEY
- See Also:
-
ORIGINAL_RESOURCE_NAME
Some file formats can store information about their original file name/location or about their attachment's original file name/location. -
CONTENT_TYPE_HINT
This is currently used to identify Content-Type that may be included within a document, such as in html documents (e.g. ) , or the value might come from outside the document. This information may be faulty and should be treated only as a hint. -
CONTENT_TYPE_OVERRIDE
-
FORMAT
- See Also:
-
IDENTIFIER
- See Also:
-
CONTRIBUTOR
- See Also:
-
COVERAGE
- See Also:
-
CREATOR
- See Also:
-
MODIFIER
- See Also:
-
CREATOR_TOOL
- See Also:
-
LANGUAGE
- See Also:
-
PUBLISHER
- See Also:
-
RELATION
- See Also:
-
RIGHTS
- See Also:
-
SOURCE
- See Also:
-
TYPE
- See Also:
-
TITLE
- See Also:
-
DESCRIPTION
- See Also:
-
KEYWORDS
DublinCore.SUBJECT; should include both subject and keywords if a document format has both. See alsoOffice.KEYWORDSandOfficeOpenXMLCore.SUBJECT. -
CREATED
- See Also:
-
MODIFIED
- See Also:
-
PRINT_DATE
- See Also:
-
METADATA_DATE
- See Also:
-
LATITUDE
- See Also:
-
LONGITUDE
- See Also:
-
ALTITUDE
- See Also:
-
RATING
- See Also:
-
COMMENTS
- See Also:
-
TRANSITION_KEYWORDS_TO_DC_SUBJECT
Deprecated.use TikaCoreProperties#KEYWORDS- See Also:
-
TRANSITION_SUBJECT_TO_DC_DESCRIPTION
Deprecated.use TikaCoreProperties#DESCRIPTION- See Also:
-
TRANSITION_SUBJECT_TO_DC_TITLE
Deprecated.use TikaCoreProperties#TITLE- See Also:
-
TRANSITION_SUBJECT_TO_OO_SUBJECT
Deprecated.use OfficeOpenXMLCore#SUBJECT- See Also:
-
EMBEDDED_RESOURCE_TYPE
Embedded resource type property -
HAS_SIGNATURE
-