Jericho HTML Parser
Release Notes

1.5   (2004-10-??)
       - All programs written for previous versions of the library will have
         to be recompiled with the new version, regardless of whether any
         changes are required.  This is because several methods, including the
         Source constructor, now expect a CharSequence as an argument instead
         of a String.
       - Changes that could require modifications to existing programs:
         - The toString() method of Segment and all subclasses now returns the
           source text of the segment instead of a string useful for debugging
           puroposes.  This change was necessary because Segment now
           implements CharSequence.
         - For consistency, the toString() methods of all IOutputSegment
           implementations now return the output string instead of a string
           useful for debugging purposes.
         - The return type of the OutputDocument.getSourceText() method is now
           CharSequence instead of String.
         - Character references in Attribute.getValue() are now decoded
         - Element.getContent() now returns zero-length segment instead of null
           in case of an empty element.
         - FormField.getPredefinedValues() now returns an empty collection
           instead of null if the form field has no predefined values.
         - Attributes segment now ends immediately after the last attribute
           instead of immediatley before the end-of-tag delimiter.
         - Modified Segment.isWhiteSpace(char) to match HTML specification
       - Fixed the following bugs:
         - [1065861] Named StartTag search did not find a tag immediately
           following a comment
         - Unnamed StartTag search did not find a comment if the search starts
           at the first character of the comment
         - Character references in FormField.getPredefinedValues() items were
           not decoded
         - FormControlType.SELECT_SINGLE.allowsMultipleValues() returned false
           instead of the correct value of true, resulting in the same
           incorrect value from FormField.allowMultipleValues() when multiple
           SELECT_SINGLE controls with the same name were present in the form
       - removed public fields in Attribute class that were deprecated in 1.2
       - removed Source.getSourceTextLowerCase() method deprecated in 1.3
       - removed Source.findEnd(int pos, SpecialTag) method which was
         accidentally added as a public method in 1.4
       - Deprecated the following methods:
         - Attributes.getList()
         - Segment.findWords()
         - Segment.getSourceText()
         - Segment.getSourceTextNoWhitespace()
         - StartTag.getFormControlType()
         - StartTag.getFollowingTextSegment()
         - FormControlType.getAdditionalSubmitNames(String name)
         - FormControlType.isPotentialControl(String tagName)
         - FormControlType.allowsMultipleValues()
       - Segment class now implements CharSequence and Comparable
       - Added getDebugInfo() to Segment and all subclasses to replace the
         previous functionality of the toString() method
       - IOutputSegment interface now implements CharSequence
       - Added getDebugInfo() to the IOutputSegment interface to replace the
         previous functionality of the toString() method
       - Added FormControl class
       - FormFields class now implements Collection
       - Added FormFields.FieldNameCaseSensitive static property
       - Added FormFields(Collection formControls) constructor
       - Added FormFields.clearValues()
       - Added FormFields.getValuesMap()
       - Added FormFields.setValuesMap(Map)
       - Added FormFields.setValue(String name, CharSequence value)
       - Added FormFields.addValue(String name, CharSequence value)
       - Added FormFields.getFormControls()
       - Added FormField.getFormControls()
       - Added FormField.clearValues()
       - Added FormField.getValues()
       - Added FormField.setValues(Collection)
       - Added FormField.setValue(CharSequence value)
       - Added FormField.addValue(CharSequence value)
       - Added FormControlType.isSelect()
       - Added Source.getText()
       - Added Source.getElementById(String id)
       - Added Source.findNextStartTag(int pos, String attributeName,
                                      String value, boolean valueCaseSensitive)
       - Added Segment.findAllStartTags(String attributeName, String value,
                                      boolean valueCaseSensitive)
       - Added StartTag.regenerateHTML()
       - Added StartTag.generateHTML(String tagName,Map,boolean)
       - Added EndTag.generateHTML(String tagName)
       - Attributes class now implements List
       - Added Attributes.getValue(String name) convenience method
       - Added Attributes.populateMap(Map)
       - Added Attributes.generateHTML(Map)
       - Added CharacterReference.ApostropheEncoded static property
       - Added CharacterReference.encodeWithWhiteSpaceFormatting(CharSequence)
       - Added CharacterReference.reencode(CharSequence)
       - Added CharacterReference.decodeCollapseWhiteSpace(CharSequence)
       - Added OutputDocument.add(FormControl)
       - Added OutputDocument.add(FormFields)
       - Added AttributesOutputSegment class
       - Added Util class
       - Added OverlappingOutputSegmentsException class
       - Documentation improvements

1.4   (2004-09-02)
       - Added CharacterEntityReference and NumbericCharacterReference classes
       - Added CharOutputSegment class
       - Attributes allow whitespace around '=' sign
       - Added convenience method Element.getAttributes()
       - Some documentation improvements

1.3   (2004-07-25)
       - Deprecated Source.getSourceTextLowerCase()
       - Added ignoreWhenParsing methods to Source and Segment classes
         (See sample called JSPTest)
       - Added parseAttributes methods to Source, Segment and StartTag classes
       - Added ability to search for tags in a specified namespace
       - Added BlankOutputSegment class
       - Fixed bug relating to HTML comments with alphabetic characters
         immediately following the opening <!-- characters

1.2   (2004-06-16)
       - Deprecated public fields in Attribute class in favour of accessor
         methods
       - Following methods return empty list instead of null if no result:
         (WARNING - This could possibly break existing programs)
          Segment.findAllStartTags(String name)
          Segment.findAllComments()
          Segment.findAllElements(String name)
          Segment.findAllElements()
       - Added hashCode() method to Segment class
       - Server tags such as ASP, JSP, PSP, PHP and Mason are now recognised
       - Basic parser logging introduced (see Source.setLogWriter() method)
       - Start tags with too many badly formed attributes rejected
         (reduces number of false positives when searching for start tags)
       - Added public IOutputSegment.COMPARATOR field
       - Improved caching

1.1   (2004-03-07)
       - All elements defined in HTML 4.01 are recognised and their properties
         used to aid analysis
       - StartTag.getElement() method enhanced to return the correct span of
         elements which have a missing optional end tag
       - StartTag.isEndTagForbidden() method enhanced to also check the name of
         the tag against the list of elements in the HTML spec whose end tags
         are forbidden
       - Numerous new methods
       - Huge performance enhancement from the use of internal caching
       - Bug Fixes:
         [909944] Parser does not work with unclosed comments.

1.0   (2004-02-07) Initial Release

