Package org.apache.drill.exec.util
Class Text
java.lang.Object
org.apache.drill.exec.util.Text
A simplified byte wrapper similar to Hadoop's Text class without all the dependencies. Lifted from Hadoop 2.7.1
-
Nested Class Summary
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
append
(byte[] utf8, int start, int len) Append a range of bytes to the end of the given textstatic int
bytesToCodePoint
(ByteBuffer bytes) Returns the next code point at the current position in the buffer.int
charAt
(int position) Returns the Unicode Scalar Value (32-bit integer value) for the character atposition
.void
clear()
Clear the string to empty.byte[]
Get a copy of the bytes that is exactly the length of the data.static String
decode
(byte[] utf8) Converts the provided byte array to a String using the UTF-8 encoding.static String
decode
(byte[] utf8, int start, int length) static String
decode
(byte[] utf8, int start, int length, boolean replace) Converts the provided byte array to a String using the UTF-8 encoding.static ByteBuffer
Converts the provided String to bytes using the UTF-8 encoding.static ByteBuffer
Converts the provided String to bytes using the UTF-8 encoding.boolean
Returns true iffo
is a Text with the same contents.int
int
Finds any occurence ofwhat
in the backing buffer, starting as positionstart
.byte[]
getBytes()
Returns the raw bytes; however, only data up togetLength()
is valid.int
Returns the number of bytes in the byte arrayint
hashCode()
void
readWithKnownLength
(DataInput in, int len) Read a Text object whose length is already known.void
set
(byte[] utf8) Set to a utf8 byte arrayvoid
set
(byte[] utf8, int start, int len) Set the Text to range of bytesvoid
Set to contain the contents of a string.void
copy a text.toString()
Convert text back to stringstatic int
utf8Length
(String string) For the given string, returns the number of UTF-8 bytes required to encode the string.static void
validateUTF8
(byte[] utf8) Check if a byte array contains valid utf-8static void
validateUTF8
(byte[] utf8, int start, int len) Check to see if a byte array is valid utf-8
-
Field Details
-
DEFAULT_MAX_LEN
public static final int DEFAULT_MAX_LEN- See Also:
-
-
Constructor Details
-
Text
public Text() -
Text
Construct from a string. -
Text
Construct from another text. -
Text
public Text(byte[] utf8) Construct from a byte array.
-
-
Method Details
-
copyBytes
public byte[] copyBytes()Get a copy of the bytes that is exactly the length of the data. SeegetBytes()
for faster access to the underlying array. -
getBytes
public byte[] getBytes()Returns the raw bytes; however, only data up togetLength()
is valid. Please usecopyBytes()
if you need the returned array to be precisely the length of the data. -
getLength
public int getLength()Returns the number of bytes in the byte array -
charAt
public int charAt(int position) Returns the Unicode Scalar Value (32-bit integer value) for the character atposition
. Note that this method avoids using the converter or doing String instantiation- Returns:
- the Unicode scalar value at position or -1 if the position is invalid or points to a trailing byte
-
find
-
find
Finds any occurence ofwhat
in the backing buffer, starting as positionstart
. The starting position is measured in bytes and the return value is in terms of byte position in the buffer. The backing buffer is not converted to a string for this operation.- Returns:
- byte position of the first occurence of the search string in the UTF-8 buffer or -1 if not found
-
set
Set to contain the contents of a string. -
set
public void set(byte[] utf8) Set to a utf8 byte array -
set
copy a text. -
set
public void set(byte[] utf8, int start, int len) Set the Text to range of bytes- Parameters:
utf8
- the data to copy fromstart
- the first position of the new stringlen
- the number of bytes of the new string
-
append
public void append(byte[] utf8, int start, int len) Append a range of bytes to the end of the given text- Parameters:
utf8
- the data to copy fromstart
- the first position to append from utf8len
- the number of bytes to append
-
clear
public void clear()Clear the string to empty. Note: For performance reasons, this call does not clear the underlying byte array that is retrievable viagetBytes()
. In order to free the byte-array memory, callset(byte[])
with an empty byte array (For example,new byte[0]
). -
toString
Convert text back to string -
readWithKnownLength
Read a Text object whose length is already known. This allows creating Text from a stream which uses a different serialization format.- Throws:
IOException
-
equals
Returns true iffo
is a Text with the same contents. -
hashCode
public int hashCode() -
decode
Converts the provided byte array to a String using the UTF-8 encoding. If the input is malformed, replace by a default value.- Throws:
CharacterCodingException
-
decode
- Throws:
CharacterCodingException
-
decode
public static String decode(byte[] utf8, int start, int length, boolean replace) throws CharacterCodingException Converts the provided byte array to a String using the UTF-8 encoding. Ifreplace
is true, then malformed input is replaced with the substitution character, which is U+FFFD. Otherwise the method throws a MalformedInputException.- Throws:
CharacterCodingException
-
encode
Converts the provided String to bytes using the UTF-8 encoding. If the input is malformed, invalid chars are replaced by a default value.- Returns:
- ByteBuffer: bytes stores at ByteBuffer.array() and length is ByteBuffer.limit()
- Throws:
CharacterCodingException
-
encode
Converts the provided String to bytes using the UTF-8 encoding. Ifreplace
is true, then malformed input is replaced with the substitution character, which is U+FFFD. Otherwise the method throws a MalformedInputException.- Returns:
- ByteBuffer: bytes stores at ByteBuffer.array() and length is ByteBuffer.limit()
- Throws:
CharacterCodingException
-
validateUTF8
Check if a byte array contains valid utf-8- Parameters:
utf8
- byte array- Throws:
MalformedInputException
- if the byte array contains invalid utf-8
-
validateUTF8
Check to see if a byte array is valid utf-8- Parameters:
utf8
- the array of bytesstart
- the offset of the first byte in the arraylen
- the length of the byte sequence- Throws:
MalformedInputException
- if the byte array contains invalid bytes
-
bytesToCodePoint
Returns the next code point at the current position in the buffer. The buffer's position will be incremented. Any mark set on this buffer will be changed by this method! -
utf8Length
For the given string, returns the number of UTF-8 bytes required to encode the string.- Parameters:
string
- text to encode- Returns:
- number of UTF-8 bytes required to encode
-