Package org.datavec.api.writable
Class Text
- java.lang.Object
-
- org.datavec.api.io.BinaryComparable
-
- org.datavec.api.writable.Text
-
- All Implemented Interfaces:
Serializable,Comparable<BinaryComparable>,WritableComparable<BinaryComparable>,Writable
public class Text extends BinaryComparable implements WritableComparable<BinaryComparable>
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classText.ComparatorA WritableComparator optimized for Text keys.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidappend(byte[] utf8, int start, int len)Append a range of bytes to the end of the given textstatic intbytesToCodePoint(ByteBuffer bytes)Returns the next code point at the current position in the buffer.intcharAt(int position)Returns the Unicode Scalar Value (32-bit integer value) for the character atposition.voidclear()Clear the string to empty.static Stringdecode(byte[] utf8)Converts the provided byte array to a String using the UTF-8 encoding.static Stringdecode(byte[] utf8, int start, int length)static Stringdecode(byte[] utf8, int start, int length, boolean replace)Converts the provided byte array to a String using the UTF-8 encoding.static ByteBufferencode(String string)Converts the provided String to bytes using the UTF-8 encoding.static ByteBufferencode(String string, boolean replace)Converts the provided String to bytes using the UTF-8 encoding.booleanequals(Object o)Returns true iffois a Text with the same contents.intfind(String what)intfind(String what, int start)Finds any occurence ofwhatin the backing buffer, starting as positionstart.byte[]getBytes()Returns the raw bytes; however, only data up togetLength()is valid.intgetLength()Returns the number of bytes in the byte arrayWritableTypegetType()Get the type of the writable.inthashCode()Return a hash of the bytes returned from {#getBytes()}.voidreadFields(DataInput in)deserializestatic StringreadString(DataInput in)Read a UTF8 encoded string from invoidset(byte[] utf8)Set to a utf8 byte arrayvoidset(byte[] utf8, int start, int len)Set the Text to range of bytesvoidset(String string)Set to contain the contents of a string.voidset(Text other)copy a text.static voidskip(DataInput in)Skips over one Text in the input.doubletoDouble()Convert Writable to double.floattoFloat()Convert Writable to float.inttoInt()Convert Writable to int.longtoLong()Convert Writable to long.StringtoString()Convert text back to stringstatic intutf8Length(String string)For the given string, returns the number of UTF-8 bytes required to encode the string.static voidvalidateUTF8(byte[] utf8)Check if a byte array contains valid utf-8static voidvalidateUTF8(byte[] utf8, int start, int len)Check to see if a byte array is valid utf-8voidwrite(DataOutput out)serialize write this object to out length uses zero-compressed encodingstatic intwriteString(DataOutput out, String s)Write a UTF8 encoded string to outvoidwriteType(DataOutput out)Write the type (a single short value) to the DataOutput.-
Methods inherited from class org.datavec.api.io.BinaryComparable
compareTo, compareTo
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface java.lang.Comparable
compareTo
-
-
-
-
Method Detail
-
getBytes
public byte[] getBytes()
Returns the raw bytes; however, only data up togetLength()is valid.- Specified by:
getBytesin classBinaryComparable
-
getLength
public int getLength()
Returns the number of bytes in the byte array- Specified by:
getLengthin classBinaryComparable
-
charAt
public int charAt(int position)
Returns the Unicode Scalar Value (32-bit integer value) for the character atposition. Note that this method avoids using the converter or doing String instatiation- Returns:
- the Unicode scalar value at position or -1 if the position is invalid or points to a trailing byte
-
find
public int find(String what)
-
find
public int find(String what, int start)
Finds any occurence ofwhatin the backing buffer, starting as positionstart. The starting position is measured in bytes and the return value is in terms of byte position in the buffer. The backing buffer is not converted to a string for this operation.- Returns:
- byte position of the first occurence of the search string in the UTF-8 buffer or -1 if not found
-
set
public void set(String string)
Set to contain the contents of a string.
-
set
public void set(byte[] utf8)
Set to a utf8 byte array
-
set
public void set(Text other)
copy a text.
-
set
public void set(byte[] utf8, int start, int len)Set the Text to range of bytes- Parameters:
utf8- the data to copy fromstart- the first position of the new stringlen- the number of bytes of the new string
-
append
public void append(byte[] utf8, int start, int len)Append a range of bytes to the end of the given text- Parameters:
utf8- the data to copy fromstart- the first position to append from utf8len- the number of bytes to append
-
clear
public void clear()
Clear the string to empty.
-
toString
public String toString()
Convert text back to string- Overrides:
toStringin classObject- See Also:
Object.toString()
-
readFields
public void readFields(DataInput in) throws IOException
deserialize- Specified by:
readFieldsin interfaceWritable- Parameters:
in-DataInputto deseriablize this object from.- Throws:
IOException
-
writeType
public void writeType(DataOutput out) throws IOException
Description copied from interface:WritableWrite the type (a single short value) to the DataOutput. SeeWritableFactoryfor details.- Specified by:
writeTypein interfaceWritable- Parameters:
out- DataOutput to write to- Throws:
IOException- For errors during writing
-
skip
public static void skip(DataInput in) throws IOException
Skips over one Text in the input.- Throws:
IOException
-
write
public void write(DataOutput out) throws IOException
serialize write this object to out length uses zero-compressed encoding- Specified by:
writein interfaceWritable- Parameters:
out-DataOuputto serialize this object into.- Throws:
IOException- See Also:
Writable.write(DataOutput)
-
equals
public boolean equals(Object o)
Returns true iffois a Text with the same contents.- Overrides:
equalsin classBinaryComparable
-
hashCode
public int hashCode()
Description copied from class:BinaryComparableReturn a hash of the bytes returned from {#getBytes()}.- Overrides:
hashCodein classBinaryComparable- See Also:
org.apache.hadoop.io.WritableComparator#hashBytes(byte[],int)
-
decode
public static String decode(byte[] utf8) throws CharacterCodingException
Converts the provided byte array to a String using the UTF-8 encoding. If the input is malformed, replace by a default value.- Throws:
CharacterCodingException
-
decode
public static String decode(byte[] utf8, int start, int length) throws CharacterCodingException
- Throws:
CharacterCodingException
-
decode
public static String decode(byte[] utf8, int start, int length, boolean replace) throws CharacterCodingException
Converts the provided byte array to a String using the UTF-8 encoding. Ifreplaceis true, then malformed input is replaced with the substitution character, which is U+FFFD. Otherwise the method throws a MalformedInputException.- Throws:
CharacterCodingException
-
encode
public static ByteBuffer encode(String string) throws CharacterCodingException
Converts the provided String to bytes using the UTF-8 encoding. If the input is malformed, invalid chars are replaced by a default value.- Returns:
- ByteBuffer: bytes stores at ByteBuffer.array() and length is ByteBuffer.limit()
- Throws:
CharacterCodingException
-
encode
public static ByteBuffer encode(String string, boolean replace) throws CharacterCodingException
Converts the provided String to bytes using the UTF-8 encoding. Ifreplaceis true, then malformed input is replaced with the substitution character, which is U+FFFD. Otherwise the method throws a MalformedInputException.- Returns:
- ByteBuffer: bytes stores at ByteBuffer.array() and length is ByteBuffer.limit()
- Throws:
CharacterCodingException
-
readString
public static String readString(DataInput in) throws IOException
Read a UTF8 encoded string from in- Throws:
IOException
-
writeString
public static int writeString(DataOutput out, String s) throws IOException
Write a UTF8 encoded string to out- Throws:
IOException
-
validateUTF8
public static void validateUTF8(byte[] utf8) throws MalformedInputExceptionCheck if a byte array contains valid utf-8- Parameters:
utf8- byte array- Throws:
MalformedInputException- if the byte array contains invalid utf-8
-
validateUTF8
public static void validateUTF8(byte[] utf8, int start, int len) throws MalformedInputExceptionCheck to see if a byte array is valid utf-8- Parameters:
utf8- the array of bytesstart- the offset of the first byte in the arraylen- the length of the byte sequence- Throws:
MalformedInputException- if the byte array contains invalid bytes
-
bytesToCodePoint
public static int bytesToCodePoint(ByteBuffer bytes)
Returns the next code point at the current position in the buffer. The buffer's position will be incremented. Any mark set on this buffer will be changed by this method!
-
utf8Length
public static int utf8Length(String string)
For the given string, returns the number of UTF-8 bytes required to encode the string.- Parameters:
string- text to encode- Returns:
- number of UTF-8 bytes required to encode
-
toDouble
public double toDouble()
Description copied from interface:WritableConvert Writable to double. Whether this is supported depends on the specific writable.
-
toFloat
public float toFloat()
Description copied from interface:WritableConvert Writable to float. Whether this is supported depends on the specific writable.
-
toInt
public int toInt()
Description copied from interface:WritableConvert Writable to int. Whether this is supported depends on the specific writable.
-
toLong
public long toLong()
Description copied from interface:WritableConvert Writable to long. Whether this is supported depends on the specific writable.
-
getType
public WritableType getType()
Description copied from interface:WritableGet the type of the writable.
-
-