Object UrlUtils

  • All Implemented Interfaces:

    
    public class UrlUtils
    
                        
    • Constructor Detail

    • Method Detail

      • isInternal

         final static Boolean isInternal(String url)

        Test if the url is an internal URL. Internal URLs are URLs that are used to identify internal resources and will never be fetched from the internet.

        Parameters:
        url - The url to test
        Returns:

        true if the given str is an internal URL, false otherwise

      • isNotInternal

         final static Boolean isNotInternal(String url)

        Test if the url is not an internal URL. Internal URLs are URLs that are used to identify internal resources and will never be fetched from the internet.

        Parameters:
        url - The url to test
        Returns:

        true if the given str is not an internal URL, false otherwise

      • isLocalFile

         final static Boolean isLocalFile(String url)

        Check if the given url is a local file url, which is a url that starts with {@link AppConstants#LOCAL_FILE_SERVE_PREFIX}

      • pathToLocalURL

         final static String pathToLocalURL(Path path)

        Convert a path to a URL, the path will be encoded to base64 and appended to the {@link AppConstants#LOCAL_FILE_FAKE_SERVER_HOME}

        For example:

        C:\Users\pereg\AppData\Local\Temp\pulsar\test.txt will be converted to: http://localfile.org?path=QzpcVXNlcnNccGVyZWdcQXBwRGF0YVxMb2NhbFxUZW1wXHB1bHNhclx0ZXN0LnR4dA==

        Parameters:
        path - The path to convertTODO: consider just use path.
      • localURLToPath

         final static Path localURLToPath(String url)

        Convert a URL to a path, the path is decoded from base64 and the prefix {@link AppConstants#LOCAL_FILE_SERVE_PREFIX} is removed

      • isBrowserURL

         final static Boolean isBrowserURL(String str)

        Checks if the given string is a browser-specific url.

        This function determines whether the string is a browser-specific url by checking if it exists in the internal URL list (INTERNAL_URLS), or if it starts with any of the internal URL prefixes (INTERNAL_URL_PREFIXES).

        Parameters:
        str - The string to be checked.
        Returns:

        Returns true if the string is a browser-specific url; otherwise, returns false.

      • isMappedBrowserURL

         final static Boolean isMappedBrowserURL(String url)

        Checks if the given URL is a browser-specific URL by verifying if it starts with a predefined prefix.

        Parameters:
        url - The URL to check.
        Returns:

        Returns true if the URL starts with the browser-specific url prefix, otherwise false.

      • browserURLToStandardURL

         final static String browserURLToStandardURL(String url)

        Converts a browser url string into a complete URL. The function URL-encodes the url string and appends it to a predefined prefix to form the final URL.

        Parameters:
        url - The browser url string to be converted.
        Returns:

        Returns the complete URL string containing the prefix and the encoded url parameter.

      • standardURLToBrowserURL

         final static String standardURLToBrowserURL(String url)

        Extracts the browser url from a given URL and re-encodes it. The function retrieves the url parameter from the URL, re-encodes it, and reconstructs the URL.

        Parameters:
        url - The URL containing the browser url.
        Returns:

        Returns the reconstructed URL with the re-encoded url parameter.

      • getURLOrNull

         final static URL getURLOrNull(String spec)

        Creates a {@code URL} object from the {@code String} representation.

        Parameters:
        spec - the {@code String} to parse as a URL.
        Returns:

        the URL parsed from spec, or null if no protocol is specified, or an unknown protocol is found, or {@code spec} is {@code null}, or the parsed URL fails to comply with the specific syntax of the associated protocol.

      • isStandard

         final static Boolean isStandard(String str)

        Test if the str is a standard URL.

        Parameters:
        str - The string to test
        Returns:

        true if the given str is a standard URL, false otherwise

      • isAllowed

         final static Boolean isAllowed(String str)

        Test if the str is an allowed URL.

        Parameters:
        str - The string to test
        Returns:

        true if the given str is a standard URL, false otherwise

      • normalize

         final static URL normalize(String url, Boolean ignoreQuery)

        Normalize a url spec.

        A URL may have appended to it a "fragment", also known as a "ref" or a "reference". The fragment is indicated by the sharp sign character "#" followed by more characters. For example: http://java.sun.com/index.html#chapter1

        The fragment will be removed after the normalization. If ignoreQuery is true, the query string will be removed.

        Parameters:
        url -
        The url to normalize, a tailing argument list is allowed and will be removed
        ignoreQuery -
        If true, the result url does not contain a query string
        Returns:

        The normalized URL

      • normalizeOrEmpty

         final static String normalizeOrEmpty(String url, Boolean ignoreQuery)

        Normalize a url spec.

        A URL may have appended to it a "fragment", also known as a "ref" or a "reference". The fragment is indicated by the sharp sign character "#" followed by more characters. For example: http://java.sun.com/index.html#chapter1

        The fragment will be removed after the normalization. If ignoreQuery is true, the query string will be removed.

        Parameters:
        url -
        The url to normalize, a tailing argument list is allowed and will be removed
        ignoreQuery -
        If true, the result url does not contain a query string
        Returns:

        The normalized url, or an empty string ("") if the given string violates RFC 2396

      • normalizeOrNull

         final static String normalizeOrNull(String url, Boolean ignoreQuery)

        Normalize a url spec.

        A URL may have appended to it a "fragment", also known as a "ref" or a "reference". The fragment is indicated by the sharp sign character "#" followed by more characters. For example: http://java.sun.com/index.html#chapter1

        The fragment will be removed after the normalization. If ignoreQuery is true, the query string will be removed.

        Parameters:
        url -
        The url to normalize, a tailing argument list is allowed and will be removed
        ignoreQuery -
        If true, the result url does not contain a query string
        Returns:

        The normalized url, or null if the given string violates RFC 2396

      • normalizeUrls

         final static List<String> normalizeUrls(Iterable<String> urls, Boolean ignoreQuery)

        Normalize a url spec.

        A URL may have appended to it a "fragment", also known as a "ref" or a "reference". The fragment is indicated by the sharp sign character "#" followed by more characters. For example: http://java.sun.com/index.html#chapter1

        The fragment will be removed after the normalization. If ignoreQuery is true, the query string will be removed.

        Parameters:
        urls -
        The urls to normalize, a tailing argument list is allowed and will be removed
        ignoreQuery -
        If true, the result url does not contain a query string
        Returns:

        The normalized URLs

      • getQueryParameters

         final String getQueryParameters(String url, String parameterName)

        Get the query parameter of a url.

        Parameters:
        url - The url to split
        parameterName - The name of the query parameter
        Returns:

        The query parameter of the url

      • removeQueryParameters

         final String removeQueryParameters(String url, String parameterNames)

        Remove the query parameters of a url.

        Parameters:
        url - The url to split
        parameterNames - The names of the query parameters
        Returns:

        The url without the query parameters

      • keepQueryParameters

         final String keepQueryParameters(String url, String parameterNames)

        Keep the query parameters of a url, and remove the others.

        Parameters:
        url - The url to split
        parameterNames - The names of the query parameters
        Returns:

        The url with only the query parameters

      • resolveURL

         final static URL resolveURL(URL base, String targetUrl)

        Resolve relative URL-s and fix a java.net.URL error in handling of URLs with pure query targets.

        Parameters:
        base - base url
        Returns:

        resolved absolute url.

      • splitUrlArgs

         final static Pair<String, String> splitUrlArgs(String configuredUrl)

        Split url and args

        Parameters:
        configuredUrl - url and args in $url $args format
        Returns:

        url and args pair

      • mergeUrlArgs

         final static String mergeUrlArgs(String url, String args)

        Merge url and args

        Parameters:
        url - url
        args - args
        Returns:

        url and args in $url $args format

      • reverseUrl

         final static String reverseUrl(String url)

        Reverses a url's domain. This form is better for storing in hbase. Because scans within the same domain are faster.

        E.g. "http://bar.foo.com:8983/to/index.html?a=b" becomes "com.foo.bar:8983:http/to/index.html?a=b".

        Parameters:
        url - url to be reversed
        Returns:

        Reversed url

      • reverseUrl

         final static String reverseUrl(URL url)

        Reverses a url's domain. This form is better for storing in hbase. Because scans within the same domain are faster.

        E.g. "http://bar.foo.com:8983/to/index.html?a=b" becomes "com.foo.bar:http:8983/to/index.html?a=b".

        Parameters:
        url - url to be reversed
        Returns:

        Reversed url

      • reverseUrl

         final static String reverseUrl(Integer tenantId, String unreversedUrl)

        Get the reversed and tenanted format of unreversedUrl, unreversedUrl can be both tenanted or not tenanted This method might change the tenant id of the original url

        Zero tenant id means no tenant

        Parameters:
        unreversedUrl - the unreversed url, can be both tenanted or not tenanted
        Returns:

        the tenanted and reversed url of unreversedUrl

      • reverseUrlOrEmpty

         final static String reverseUrlOrEmpty(String url)

        Reverses a url's domain. This form is better for storing in hbase. Because scans within the same domain are faster.

        E.g. "http://bar.foo.com:8983/to/index.html?a=b" becomes "com.foo.bar:8983:http/to/index.html?a=b".

        Parameters:
        url - url to be reversed
        Returns:

        Reversed url or empty string if the url is invalid

      • reverseUrlOrNull

         final static String reverseUrlOrNull(String url)

        Reverses a url's domain. This form is better for storing in hbase. Because scans within the same domain are faster.

        E.g. "http://bar.foo.com:8983/to/index.html?a=b" becomes "com.foo.bar:8983:http/to/index.html?a=b".

        Parameters:
        url - url to be reversed
        Returns:

        Reversed url or null if the url is invalid

      • unreverseUrl

         final static String unreverseUrl(String reversedUrl)

        Get the unreversed url of a reversed url.

        Returns:

        the unreversed url of reversedUrl

      • unreverseUrl

         final static String unreverseUrl(Integer tenantId, String reversedUrl)

        Get unreversed and tenanted url of reversedUrl, reversedUrl can be both tenanted or not tenanted, This method might change the tenant id of the original url

        Parameters:
        tenantId - the expected tenant id of the reversedUrl
        reversedUrl - the reversed url, can be both tenanted or not tenanted
        Returns:

        the unreversed url of reversedTenantedUrl

      • unreverseUrlOrNull

         final static String unreverseUrlOrNull(String reversedUrl)

        Get the unreversed url of a reversed url.

        Returns:

        the unreversed url of reversedUrl or null if the url is invalid

      • getStartKey

         final static String getStartKey(Integer tenantId, String unreversedUrl)

        Get start key for tenanted table

        Parameters:
        unreversedUrl - unreversed key, which is the original url
        Returns:

        reverse and tenanted key

      • getStartKey

         final static String getStartKey(String unreversedUrl)

        Get start key for non-tenanted table

        Parameters:
        unreversedUrl - unreversed key, which is the original url
        Returns:

        reverse key

      • getEndKey

         final static String getEndKey(String unreversedUrl)

        Get end key for non-tenanted tables

        Parameters:
        unreversedUrl - unreversed key, which is the original url
        Returns:

        reverse, key bound decoded key

      • getEndKey

         final static String getEndKey(Integer tenantId, String unreversedUrl)

        Get end key for tenanted tables

        Parameters:
        unreversedUrl - unreversed key, which is the original url
        Returns:

        reverse, tenanted and key bound decoded key

      • decodeKeyLowerBound

         final static String decodeKeyLowerBound(String startKey)

        We use unicode character \u0001 to be the lower key bound, but the client usally encode the character to be a string "\\u0001" or "\\\\u0001", so we should decode them to be the right one

        Note, the character is displayed as <U></U>+0001> in some output system

        Now, we consider all the three character/string \u0001, "\\u0001", "\\\\u0001" are the lower key bound

      • decodeKeyUpperBound

         final static String decodeKeyUpperBound(String endKey)

        We use unicode character \uFFFF to be the upper key bound, but the client usally encode the character to be a string "\\uFFFF" or "\\\\uFFFF", so we should decode them to be the right one

        Note, the character may display as <U></U>+FFFF> in some output system

        Now, we consider all the three character/string \uFFFF, "\\uFFFF", "\\\\uFFFF" are the upper key bound

      • getReversedHost

         final static String getReversedHost(String reversedUrl)

        Given a reversed url, returns the reversed host E.g "com.foo.bar:http:8983/to/index.html?a=b" -> "com.foo.bar"

        Parameters:
        reversedUrl - Reversed url
        Returns:

        Reversed host

      • reverseHost

         final static String reverseHost(String hostName)

        Reverse the host name.

        Parameters:
        hostName - host name
        Returns:

        reversed host name

      • unreverseHost

         final static String unreverseHost(String reversedHostName)

        Unreverse the host name.

        Parameters:
        reversedHostName - reversed host name
        Returns:

        host name

      • isPublicSuffix

         final Boolean isPublicSuffix(String domain)

        Indicates whether this domain name represents a public suffix, as defined by the Mozilla Foundation's Public Suffix List (PSL). A public suffix is one under which Internet users can directly register names, such as com, co.uk or pvt.k12.wy.us. Examples of domain names that are not public suffixes include google.com, foo.co.uk, and myblog.blogspot.com.

        Public suffixes are a proper superset of .isRegistrySuffix. The list of public suffixes additionally contains privately owned domain names under which Internet users can register subdomains. An example of a public suffix that is not a registry suffix is blogspot.com. Note that it is true that all public suffixes have registry suffixes, since domain name registries collectively control all internet domain names.

        For considerations on whether the public suffix or registry suffix designation is more suitable for your application, see this article.

        Returns:

        true if this domain name appears exactly on the public suffix list

      • isTopPrivateDomain

         final Boolean isTopPrivateDomain(URL url)

        Indicates whether this domain name is composed of exactly one subdomain component followed by a {@linkplain #isPublicSuffix() public suffix}. For example, returns {@code true} for {@code google.com} {@code foo.co.uk}, and {@code myblog.blogspot.com}, but not for {@code www.google.com}, {@code co.uk}, or {@code blogspot.com}.

        <p>This method can be used to determine whether a domain is probably the highest level for which cookies may be set, though even that depends on individual browsers' implementations of cookie controls. See <a href="http://www.ietf.org/rfc/rfc2109.txt">RFC 2109</a> for details.
      • getTopPrivateDomain

         final String getTopPrivateDomain(URL url)

        Returns the portion of this domain name that is one level beneath the isPublicSuffix public suffix. For example, for x.adwords.google.co.uk it returns google.co.uk, since co.uk is a public suffix. Similarly, for myblog.blogspot.com it returns the same domain, myblog.blogspot.com, since blogspot.com is a public suffix.

        If isTopPrivateDomain is true, the current domain name instance is returned.

        This method can be used to determine the probable highest level parent domain for which cookies may be set, though even that depends on individual browsers' implementations of cookie controls.

      • getTopPrivateDomain

         final String getTopPrivateDomain(String url)

        Returns the portion of this domain name that is one level beneath the isPublicSuffix public suffix. For example, for x.adwords.google.co.uk it returns google.co.uk, since co.uk is a public suffix. Similarly, for myblog.blogspot.com it returns the same domain, myblog.blogspot.com, since blogspot.com is a public suffix.

        If isTopPrivateDomain is true, the current domain name instance is returned.

        This method can be used to determine the probable highest level parent domain for which cookies may be set, though even that depends on individual browsers' implementations of cookie controls.

      • getTopPrivateDomainOrNull

         final String getTopPrivateDomainOrNull(String url)

        Returns the portion of this domain name that is one level beneath the isPublicSuffix public suffix. For example, for x.adwords.google.co.uk it returns google.co.uk, since co.uk is a public suffix. Similarly, for myblog.blogspot.com it returns the same domain, myblog.blogspot.com, since blogspot.com is a public suffix.

        If isTopPrivateDomain is true, the current domain name instance is returned.

        This method can be used to determine the probable highest level parent domain for which cookies may be set, though even that depends on individual browsers' implementations of cookie controls.

      • getOrigin

         final String getOrigin(String url)

        Returns the lowercase origin for the url.

        Parameters:
        url - The url to check.
        Returns:

        String The hostname for the url.

      • getOriginOrNull

         final String getOriginOrNull(String url)

        Returns the lowercase origin for the url or null if the url is not well-formed.

        Parameters:
        url - The url to check.
        Returns:

        String The hostname for the url.

      • getHostName

         final String getHostName(String url)

        Returns the lowercase hostname for the url.

        Parameters:
        url - The url to check.
        Returns:

        String The hostname for the url.

      • getHostNameOrNull

         final String getHostNameOrNull(String url)

        Returns the lowercase hostname for the url or null if the url is not well-formed.

        Parameters:
        url - The url to check.
        Returns:

        String The hostname for the url.