public class BytesEncodingChecker
extends java.lang.Object
Provides UCS encoding information for a given byte array, allowing the client to determine whether its encoding is UTF-8 or UTF-16, whether in the latter case it is big- or little-endian, whether or not there is a BOM, and if so the number of bytes devoted to the BOM.
(The range of UCS encodings supported is thus somewhat limited.)
This class also defines a utility method to decode its byte array as a string.
Modifier and Type | Field and Description |
---|---|
static int |
BOM_0
Byte 0 (big-endian) of a 16-bit BOM character.
|
static int |
BOM_1
Byte 1 (big-endian) of a 16-bit BOM character.
|
static int |
NUL
NUL byte value.
|
static int |
UTF8_BOM_0
Byte 0 of the UTF-8 representation of a BOM character.
|
static int |
UTF8_BOM_1
Byte 1 of the UTF-8 representation of a BOM character.
|
static int |
UTF8_BOM_2
Byte 2 of the UTF-8 representation of a BOM character.
|
Constructor and Description |
---|
BytesEncodingChecker(byte[] bytes)
Constructs a new checker for the given byte array and
determines its encoding parameters.
|
Modifier and Type | Method and Description |
---|---|
int |
countBOMPrefixBytes()
Returns the number of bytes devoted to an initial BOM,
so zero is returned in the case where there is no initial BOM.
|
java.lang.String |
encodingForUCS16()
Assuming the encoding is known to be some form of UTF-16,
this method returns the appropriate encoding name with explicit
endianness, i.e.
|
boolean |
encodingIsKnown16Bit()
Returns
true iff the encoding is some form of UTF-16. |
boolean |
encodingIsKnownUTF8()
Returns
true iff the encoding is known to be UTF-8:
to convert the bytes to a string using this encoding the first
countBOMPrefixBytes() should be omitted. |
java.lang.String |
getDecodedString(java.lang.String enc,
int nskip)
Returns the string obtained by decoding the bytes array associated
with this checker, assuming the given encoding, and skipping the
specified number of initial bytes.
|
boolean |
utf16IsBE()
Assuming the encoding is known to be some form of UTF-16,
returns
true iff it is big-endian. |
boolean |
utf16IsLE()
Assuming the encoding is known to be some form of UTF-16,
returns
true iff it is little-endian. |
public static final int BOM_0
public static final int BOM_1
public static final int UTF8_BOM_0
public static final int UTF8_BOM_1
public static final int UTF8_BOM_2
public static final int NUL
public BytesEncodingChecker(byte[] bytes)
public boolean encodingIsKnown16Bit()
true
iff the encoding is some form of UTF-16.public boolean utf16IsBE()
true
iff it is big-endian.public boolean utf16IsLE()
true
iff it is little-endian.public boolean encodingIsKnownUTF8()
true
iff the encoding is known to be UTF-8:
to convert the bytes to a string using this encoding the first
countBOMPrefixBytes()
should be omitted.public java.lang.String encodingForUCS16()
"UTF-16BE"
(big-endian) or
"UTF-16LE"
(little-endian): to convert the bytes to a
string using this encoding the first countBOMPrefixBytes()
should be omitted.public int countBOMPrefixBytes()
public java.lang.String getDecodedString(java.lang.String enc, int nskip) throws java.io.UnsupportedEncodingException
java.io.UnsupportedEncodingException