Class RFC3986

java.lang.Object
org.apache.jena.rfc3986.RFC3986

public class RFC3986 extends Object
Implementation of RFC 3986 (URI), RFC 3987 (IRI).

See the Package Overview.

As is common, these are referred to as "3986" regardless just as java.net.URI covers IRIs. java.net.URI parses and allocates and follows RFC 2396 with modifications (several of which are in RFC 3986).

This provides a fast checking operation which does not copy the various parts of the IRI and which creates a single object. The cost of extracting and allocating strings happen when the getter for the component is called.

Implements the algorithms specified in RFC 3986 operations for:

  • Checking a string matches the IRI grammar.
  • Extracting components of an IRI
  • Resolving an IRI against a base IRI.
  • Normalizing an IRI
  • Relativize an IRI for a given base IRI.
  • Building an IRI from components.

Usage

Check

Check conformance with the RFC 3986 grammar:
     RFC3986.checkSyntax(string);
 
Check conformance with the RFC 3986 grammar and any applicable scheme specific rules:
     IRI3986 iri = RFC3986.create(string);
     iri.hasViolations();
 

Extract the components of IRI

     IRI3986 iri = RFC3986.create(string);
     iri.path();
     ...
 

Resolve

     IRI3986 base = RFC3986.create(baseIRIString);
     IRI3986 iri = RFC3986.create(string);
     IRI3986 iri2 = RFC3986.resolve(base);
 

Normalize

     IRI3986 iri  = RFC3986.create(string);
     IRI3986 iri2 = RFC3986.normalize(iri);
 

Relative IRI

     IRI3986 base = RFC3986.create(baseIRIString);
     IRI3986 target = RFC3986.create(string);
     IRI3986 relative = RFC3986relativize(base, target);
     // then base.resolve(relative) equals target
 

Build an IRI3986 from componets

     IRI3986 iri = RFC3986.newBuilder()
                       .scheme("http")
                       .host("example.org")
                       .path("/dir/page.html")
                       .build();
     System.out.println(iri.str());
 

RFC Regular Expression

An IRI can be created using the regular expression of RFC 3986. This regular expression identifies the components without checking for correct use of characters within components. It may be useful when an IRI does to conform to the details of the RFC 3986 syntax, for example spaces in the path component.
  • Field Details

    • rfc3986regex

      public static final Pattern rfc3986regex
      RFC 3986 regular expression. This assumes a well-formed URI reference; it will accept other mis-formed strings.
      • Group 2 : scheme without ':' (group 1, with ':')
      • Group 4 : authority (group 3 is authority '//')
      • Group 5 : path, with leading '/' if any.
      • Group 7 : query, without '?' (group 6, with '?')
      • Group 9 : fragment, without '#' (group 8, with '#')
      • Group 1 : scheme with ':'
      • Group 3 : authority with '//'
      • Group 5 : path, with leading '/' if any.
      • Group 6 : query, with '?'
      • Group 8 : fragment, with '#'
       "^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?"
           12           3  4          5       6   7        8 9
       
      • scheme = $2
      • authority = $4
      • path = $5
      • query = $7
      • fragment = $9
  • Constructor Details

    • RFC3986

      public RFC3986()
  • Method Details

    • checkSyntax

      public static void checkSyntax(String iristr)
      Determine if the string conforms to the IRI syntax. If not, it throws an exception. This operation checks the string against the RFC3986/7 grammar; it does not apply scheme specific rules.
    • create

      public static IRI3986 create(String iristr)
      Parse the string in accordance with the general IRI grammar. If not, it throws an exception.

      This reports schema-specific violations : see IRI3986.hasViolations() and IRI3986.forEachViolation(java.util.function.Consumer<org.apache.jena.rfc3986.Violation>).

    • createAny

      public static IRI3986 createAny(String iristr)
      Create an IRI3986 object; report errors and warnings. This operation always returns an object; it does not throw an exception, nor return null. The object may not be a valid IRI.

      Errors and warning may be accessed with IRI3986.hasViolations() and IRI3986.forEachViolation(java.util.function.Consumer<org.apache.jena.rfc3986.Violation>).

    • newBuilder

      public static Builder newBuilder()
      Create an IRI builder
    • create

      public static IRI3986 create(IRI iriOther)
      Ensure an IRI is a IRI3986
    • normalize

      public static IRI3986 normalize(IRI3986 iri)
      Normalize an IRI (RFC 3986 - Syntax-Based Normalization)
    • resolve

      public static IRI3986 resolve(IRI3986 base, IRI3986 iri)
      Resolve an IRI against a base.
    • relativize

      public static IRI3986 relativize(IRI3986 base, IRI3986 iri)
      For a given base, return (if possible) an IRI that is relative to base. If input iri is relative, this is returned unchanged.
    • createByRegex

      public static IRI3986 createByRegex(String iriStr)
      Create an IRI using the regular expression of RFC 3986. Throws an exception of the regular expression does not match. The regular expression assumes a valid RFC3986 IRI and splits out the components. This may be useful to extract components of an IRI with bad syntax. This does not check the character rules of the syntax, nor check scheme specific rules. Use the resulting IRI3986 with care.