java.lang.Object | |
↳ | java.lang.Character |
The wrapper for the primitive type
char
. This class also provides a
number of utility methods for working with characters.
Character data is kept up to date as Unicode evolves.
See the
Locale data
section of
the
Locale
documentation for details of the Unicode versions implemented by current
and historical Android releases.
The Unicode specification, character tables, and other information are available at http://www.unicode.org/ .
Unicode characters are referred to as
code points
. The range of valid
code points is U+0000 to U+10FFFF. The
Basic Multilingual Plane (BMP)
is the code point range U+0000 to U+FFFF. Characters above the BMP are
referred to as
Supplementary Characters
. On the Java platform, UTF-16
encoding and
char
pairs are used to represent code points in the
supplementary range. A pair of
char
values that represent a
supplementary character are made up of a
high surrogate
with a value
range of 0xD800 to 0xDBFF and a
low surrogate
with a value range of
0xDC00 to 0xDFFF.
On the Java platform a
char
value represents either a single BMP code
point or a UTF-16 unit that's part of a surrogate pair. The
int
type
is used to represent all Unicode code points.
Unicode categories
Here's a list of the Unicode character categories and the corresponding Java constant,
grouped semantically to provide a convenient overview. This table is also useful in
conjunction with
\p
and
\P
in
regular expressions
.
Cn | Unassigned |
UNASSIGNED
|
Cc | Control |
CONTROL
|
Cf | Format |
FORMAT
|
Co | Private use |
PRIVATE_USE
|
Cs | Surrogate |
SURROGATE
|
|
||
Lu | Uppercase letter |
UPPERCASE_LETTER
|
Ll | Lowercase letter |
LOWERCASE_LETTER
|
Lt | Titlecase letter |
TITLECASE_LETTER
|
Lm | Modifier letter |
MODIFIER_LETTER
|
Lo | Other letter |
OTHER_LETTER
|
|
||
Mn | Non-spacing mark |
NON_SPACING_MARK
|
Me | Enclosing mark |
ENCLOSING_MARK
|
Mc | Combining spacing mark |
COMBINING_SPACING_MARK
|
|
||
Nd | Decimal digit number |
DECIMAL_DIGIT_NUMBER
|
Nl | Letter number |
LETTER_NUMBER
|
No | Other number |
OTHER_NUMBER
|
|
||
Pd | Dash punctuation |
DASH_PUNCTUATION
|
Ps | Start punctuation |
START_PUNCTUATION
|
Pe | End punctuation |
END_PUNCTUATION
|
Pc | Connector punctuation |
CONNECTOR_PUNCTUATION
|
Pi | Initial quote punctuation |
INITIAL_QUOTE_PUNCTUATION
|
Pf | Final quote punctuation |
FINAL_QUOTE_PUNCTUATION
|
Po | Other punctuation |
OTHER_PUNCTUATION
|
|
||
Sm | Math symbol |
MATH_SYMBOL
|
Sc | Currency symbol |
CURRENCY_SYMBOL
|
Sk | Modifier symbol |
MODIFIER_SYMBOL
|
So | Other symbol |
OTHER_SYMBOL
|
|
||
Zs | Space separator |
SPACE_SEPARATOR
|
Zl | Line separator |
LINE_SEPARATOR
|
Zp | Paragraph separator |
PARAGRAPH_SEPARATOR
|
Nested Classes | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
Character.Subset | ||||||||||
|
Character.UnicodeBlock | Represents a block of Unicode characters. |
Constants | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
byte | COMBINING_SPACING_MARK | Unicode category constant Mc. | |||||||||
byte | CONNECTOR_PUNCTUATION | Unicode category constant Pc. | |||||||||
byte | CONTROL | Unicode category constant Cc. | |||||||||
byte | CURRENCY_SYMBOL | Unicode category constant Sc. | |||||||||
byte | DASH_PUNCTUATION | Unicode category constant Pd. | |||||||||
byte | DECIMAL_DIGIT_NUMBER | Unicode category constant Nd. | |||||||||
byte | DIRECTIONALITY_ARABIC_NUMBER | Unicode bidirectional constant AN. | |||||||||
byte | DIRECTIONALITY_BOUNDARY_NEUTRAL | Unicode bidirectional constant BN. | |||||||||
byte | DIRECTIONALITY_COMMON_NUMBER_SEPARATOR | Unicode bidirectional constant CS. | |||||||||
byte | DIRECTIONALITY_EUROPEAN_NUMBER | Unicode bidirectional constant EN. | |||||||||
byte | DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR | Unicode bidirectional constant ES. | |||||||||
byte | DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR | Unicode bidirectional constant ET. | |||||||||
byte | DIRECTIONALITY_LEFT_TO_RIGHT | Unicode bidirectional constant L. | |||||||||
byte | DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING | Unicode bidirectional constant LRE. | |||||||||
byte | DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE | Unicode bidirectional constant LRO. | |||||||||
byte | DIRECTIONALITY_NONSPACING_MARK | Unicode bidirectional constant NSM. | |||||||||
byte | DIRECTIONALITY_OTHER_NEUTRALS | Unicode bidirectional constant ON. | |||||||||
byte | DIRECTIONALITY_PARAGRAPH_SEPARATOR | Unicode bidirectional constant B. | |||||||||
byte | DIRECTIONALITY_POP_DIRECTIONAL_FORMAT | Unicode bidirectional constant PDF. | |||||||||
byte | DIRECTIONALITY_RIGHT_TO_LEFT | Unicode bidirectional constant R. | |||||||||
byte | DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC | Unicode bidirectional constant AL. | |||||||||
byte | DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING | Unicode bidirectional constant RLE. | |||||||||
byte | DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE | Unicode bidirectional constant RLO. | |||||||||
byte | DIRECTIONALITY_SEGMENT_SEPARATOR | Unicode bidirectional constant S. | |||||||||
byte | DIRECTIONALITY_UNDEFINED | Unicode bidirectional constant. | |||||||||
byte | DIRECTIONALITY_WHITESPACE | Unicode bidirectional constant WS. | |||||||||
byte | ENCLOSING_MARK | Unicode category constant Me. | |||||||||
byte | END_PUNCTUATION | Unicode category constant Pe. | |||||||||
byte | FINAL_QUOTE_PUNCTUATION | Unicode category constant Pf. | |||||||||
byte | FORMAT | Unicode category constant Cf. | |||||||||
byte | INITIAL_QUOTE_PUNCTUATION | Unicode category constant Pi. | |||||||||
byte | LETTER_NUMBER | Unicode category constant Nl. | |||||||||
byte | LINE_SEPARATOR | Unicode category constant Zl. | |||||||||
byte | LOWERCASE_LETTER | Unicode category constant Ll. | |||||||||
byte | MATH_SYMBOL | Unicode category constant Sm. | |||||||||
int | MAX_CODE_POINT |
The maximum code point value,
U+10FFFF
.
|
|||||||||
char | MAX_HIGH_SURROGATE |
The maximum value of a high surrogate or leading surrogate unit in UTF-16
encoding,
'?'
.
|
|||||||||
char | MAX_LOW_SURROGATE |
The maximum value of a low surrogate or trailing surrogate unit in UTF-16
encoding,
'?'
.
|
|||||||||
int | MAX_RADIX | The maximum radix used for conversions between characters and integers. | |||||||||
char | MAX_SURROGATE |
The maximum value of a surrogate unit in UTF-16 encoding,
'?'
.
|
|||||||||
char | MAX_VALUE |
The maximum
Character
value.
|
|||||||||
int | MIN_CODE_POINT |
The minimum code point value,
U+0000
.
|
|||||||||
char | MIN_HIGH_SURROGATE |
The minimum value of a high surrogate or leading surrogate unit in UTF-16
encoding,
'?'
.
|
|||||||||
char | MIN_LOW_SURROGATE |
The minimum value of a low surrogate or trailing surrogate unit in UTF-16
encoding,
'?'
.
|
|||||||||
int | MIN_RADIX | The minimum radix used for conversions between characters and integers. | |||||||||
int | MIN_SUPPLEMENTARY_CODE_POINT |
The minimum value of a supplementary code point,
U+010000
.
|
|||||||||
char | MIN_SURROGATE |
The minimum value of a surrogate unit in UTF-16 encoding,
'?'
.
|
|||||||||
char | MIN_VALUE |
The minimum
Character
value.
|
|||||||||
byte | MODIFIER_LETTER | Unicode category constant Lm. | |||||||||
byte | MODIFIER_SYMBOL | Unicode category constant Sk. | |||||||||
byte | NON_SPACING_MARK | Unicode category constant Mn. | |||||||||
byte | OTHER_LETTER | Unicode category constant Lo. | |||||||||
byte | OTHER_NUMBER | Unicode category constant No. | |||||||||
byte | OTHER_PUNCTUATION | Unicode category constant Po. | |||||||||
byte | OTHER_SYMBOL | Unicode category constant So. | |||||||||
byte | PARAGRAPH_SEPARATOR | Unicode category constant Zp. | |||||||||
byte | PRIVATE_USE | Unicode category constant Co. | |||||||||
int | SIZE |
The number of bits required to represent a
Character
value
unsigned form.
|
|||||||||
byte | SPACE_SEPARATOR | Unicode category constant Zs. | |||||||||
byte | START_PUNCTUATION | Unicode category constant Ps. | |||||||||
byte | SURROGATE | Unicode category constant Cs. | |||||||||
byte | TITLECASE_LETTER | Unicode category constant Lt. | |||||||||
byte | UNASSIGNED | Unicode category constant Cn. | |||||||||
byte | UPPERCASE_LETTER | Unicode category constant Lu. |
Fields | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
TYPE |
The
Class
object that represents the primitive type
char
.
|
Public Constructors | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
Constructs a new
Character
with the specified primitive char
value.
|
Public Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
Calculates the number of
char
values required to represent the
specified Unicode code point.
|
||||||||||
|
Gets the primitive value of this character.
|
||||||||||
|
Returns the code point at
index
in the specified array of
character units.
|
||||||||||
|
Returns the code point at
index
in the specified array of
character units, where
index
has to be less than
limit
.
|
||||||||||
|
Returns the code point at
index
in the specified sequence of
character units.
|
||||||||||
|
Returns the code point that precedes
index
in the specified
sequence of character units.
|
||||||||||
|
Returns the code point that precedes the
index
in the specified
array of character units and is not less than
start
.
|
||||||||||
|
Returns the code point that precedes
index
in the specified
array of character units.
|
||||||||||
|
Counts the number of Unicode code points in the subsequence of the
specified character sequence, as delineated by
beginIndex
and
endIndex
.
|
||||||||||
|
Counts the number of Unicode code points in the subsequence of the
specified char array, as delineated by
offset
and
count
.
|
||||||||||
|
Compares two
char
values.
|
||||||||||
|
Compares this object to the specified character object to determine their
relative order.
|
||||||||||
|
Convenience method to determine the value of the specified character
c
in the supplied radix.
|
||||||||||
|
Convenience method to determine the value of the character
codePoint
in the supplied radix.
|
||||||||||
|
Compares this object with the specified object and indicates if they are
equal.
|
||||||||||
|
Returns the character which represents the specified digit in the
specified radix.
|
||||||||||
|
Gets the Unicode directionality of the specified character.
|
||||||||||
|
Gets the Unicode directionality of the specified character.
|
||||||||||
|
Returns a human-readable name for the given code point,
or null if the code point is unassigned.
|
||||||||||
|
Gets the numeric value of the specified Unicode code point.
|
||||||||||
|
Returns the numeric value of the specified Unicode character.
|
||||||||||
|
Gets the general Unicode category of the specified character.
|
||||||||||
|
Gets the general Unicode category of the specified code point.
|
||||||||||
|
Returns an integer hash code for this object.
|
||||||||||
|
Returns the high surrogate for the given code point.
|
||||||||||
|
Returns true if the given code point is alphabetic.
|
||||||||||
|
Returns true if the given code point is in the Basic Multilingual Plane (BMP).
|
||||||||||
|
Indicates whether the specified code point is defined in the Unicode
specification.
|
||||||||||
|
Indicates whether the specified character is defined in the Unicode
specification.
|
||||||||||
|
Indicates whether the specified character is a digit.
|
||||||||||
|
Indicates whether the specified code point is a digit.
|
||||||||||
|
Indicates whether
ch
is a high- (or leading-) surrogate code unit
that is used for representing supplementary characters in UTF-16
encoding.
|
||||||||||
|
Indicates whether the specified character is an ISO control character.
|
||||||||||
|
Indicates whether the specified code point is an ISO control character.
|
||||||||||
|
Indicates whether the specified character is ignorable in a Java or
Unicode identifier.
|
||||||||||
|
Indicates whether the specified code point is ignorable in a Java or
Unicode identifier.
|
||||||||||
|
Returns true if the given code point is a CJKV ideographic character.
|
||||||||||
|
Indicates whether the specified code point is a valid part of a Java
identifier other than the first character.
|
||||||||||
|
Indicates whether the specified character is a valid part of a Java
identifier other than the first character.
|
||||||||||
|
Indicates whether the specified character is a valid first character for
a Java identifier.
|
||||||||||
|
Indicates whether the specified code point is a valid first character for
a Java identifier.
|
||||||||||
|
This method was deprecated
in API level 1.
Use
isJavaIdentifierStart(char)
instead.
|
||||||||||
|
This method was deprecated
in API level 1.
Use
isJavaIdentifierPart(char)
instead.
|
||||||||||
|
Indicates whether the specified character is a letter.
|
||||||||||
|
Indicates whether the specified code point is a letter.
|
||||||||||
|
Indicates whether the specified character is a letter or a digit.
|
||||||||||
|
Indicates whether the specified code point is a letter or a digit.
|
||||||||||
|
Indicates whether
ch
is a low- (or trailing-) surrogate code unit
that is used for representing supplementary characters in UTF-16
encoding.
|
||||||||||
|
Indicates whether the specified code point is a lower case letter.
|
||||||||||
|
Indicates whether the specified character is a lower case letter.
|
||||||||||
|
Indicates whether the specified character is mirrored.
|
||||||||||
|
Indicates whether the specified code point is mirrored.
|
||||||||||
|
This method was deprecated
in API level 1.
Use
isWhitespace(char)
instead.
|
||||||||||
|
See
isSpaceChar(int)
.
|
||||||||||
|
Returns true if the given code point is a Unicode space character.
|
||||||||||
|
Indicates whether
codePoint
is within the supplementary code
point range.
|
||||||||||
|
Returns true if the given character is a high or low surrogate.
|
||||||||||
|
Indicates whether the specified character pair is a valid surrogate pair.
|
||||||||||
|
Indicates whether the specified code point is a titlecase character.
|
||||||||||
|
Indicates whether the specified character is a titlecase character.
|
||||||||||
|
Indicates whether the specified code point is valid as part of a Unicode
identifier other than the first character.
|
||||||||||
|
Indicates whether the specified character is valid as part of a Unicode
identifier other than the first character.
|
||||||||||
|
Indicates whether the specified character is a valid initial character
for a Unicode identifier.
|
||||||||||
|
Indicates whether the specified code point is a valid initial character
for a Unicode identifier.
|
||||||||||
|
Indicates whether the specified code point is an upper case letter.
|
||||||||||
|
Indicates whether the specified character is an upper case letter.
|
||||||||||
|
Indicates whether
codePoint
is a valid Unicode code point.
|
||||||||||
|
See
isWhitespace(int)
.
|
||||||||||
|
Returns true if the given code point is a Unicode whitespace character.
|
||||||||||
|
Returns the low surrogate for the given code point.
|
||||||||||
|
Determines the index in the specified character sequence that is offset
codePointOffset
code points from
index
.
|
||||||||||
|
Determines the index in a subsequence of the specified character array
that is offset
codePointOffset
code points from
index
.
|
||||||||||
|
Reverses the order of the first and second byte in the specified
character.
|
||||||||||
|
Converts the specified Unicode code point into a UTF-16 encoded sequence
and returns it as a char array.
|
||||||||||
|
Converts the specified Unicode code point into a UTF-16 encoded sequence
and copies the value(s) into the char array
dst
, starting at
index
dstIndex
.
|
||||||||||
|
Converts a surrogate pair into a Unicode code point.
|
||||||||||
|
Returns the lower case equivalent for the specified character if the
character is an upper case letter.
|
||||||||||
|
Returns the lower case equivalent for the specified code point if it is
an upper case letter.
|
||||||||||
|
Converts the specified character to its string representation.
|
||||||||||
|
Returns a string containing a concise, human-readable description of this
object.
|
||||||||||
|
Returns the title case equivalent for the specified character if it
exists.
|
||||||||||
|
Returns the title case equivalent for the specified code point if it
exists.
|
||||||||||
|
Returns the upper case equivalent for the specified character if the
character is a lower case letter.
|
||||||||||
|
Returns the upper case equivalent for the specified code point if the
code point is a lower case letter.
|
||||||||||
|
Returns a
Character
instance for the
char
value passed.
|
[Expand]
Inherited Methods
|
|||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
From class
java.lang.Object
|
|||||||||||
From interface
java.lang.Comparable
|
Unicode category constant Mc.
Unicode category constant Pc.
Unicode category constant Cc.
Unicode category constant Sc.
Unicode category constant Pd.
Unicode category constant Nd.
Unicode bidirectional constant AN.
Unicode bidirectional constant BN.
Unicode bidirectional constant CS.
Unicode bidirectional constant EN.
Unicode bidirectional constant ES.
Unicode bidirectional constant ET.
Unicode bidirectional constant L.
Unicode bidirectional constant LRE.
Unicode bidirectional constant LRO.
Unicode bidirectional constant NSM.
Unicode bidirectional constant ON.
Unicode bidirectional constant B.
Unicode bidirectional constant PDF.
Unicode bidirectional constant R.
Unicode bidirectional constant AL.
Unicode bidirectional constant RLE.
Unicode bidirectional constant RLO.
Unicode bidirectional constant S.
Unicode bidirectional constant.
Unicode bidirectional constant WS.
Unicode category constant Me.
Unicode category constant Pe.
Unicode category constant Pf.
Unicode category constant Cf.
Unicode category constant Pi.
Unicode category constant Nl.
Unicode category constant Zl.
Unicode category constant Ll.
Unicode category constant Sm.
The maximum code point value,
U+10FFFF
.
The maximum value of a high surrogate or leading surrogate unit in UTF-16
encoding,
'?'
.
The maximum value of a low surrogate or trailing surrogate unit in UTF-16
encoding,
'?'
.
The maximum radix used for conversions between characters and integers.
The maximum value of a surrogate unit in UTF-16 encoding,
'?'
.
The maximum
Character
value.
The minimum code point value,
U+0000
.
The minimum value of a high surrogate or leading surrogate unit in UTF-16
encoding,
'?'
.
The minimum value of a low surrogate or trailing surrogate unit in UTF-16
encoding,
'?'
.
The minimum radix used for conversions between characters and integers.
The minimum value of a supplementary code point,
U+010000
.
The minimum value of a surrogate unit in UTF-16 encoding,
'?'
.
The minimum
Character
value.
Unicode category constant Lm.
Unicode category constant Sk.
Unicode category constant Mn.
Unicode category constant Lo.
Unicode category constant No.
Unicode category constant Po.
Unicode category constant So.
Unicode category constant Zp.
Unicode category constant Co.
The number of bits required to represent a
Character
value
unsigned form.
Unicode category constant Zs.
Unicode category constant Ps.
Unicode category constant Cs.
Unicode category constant Lt.
Unicode category constant Cn.
Unicode category constant Lu.
The
Class
object that represents the primitive type
char
.
Constructs a new
Character
with the specified primitive char
value.
value | the primitive char value to store in the new instance. |
---|
Calculates the number of
char
values required to represent the
specified Unicode code point. This method checks if the
codePoint
is greater than or equal to
0x10000
, in which case
2
is
returned, otherwise
1
. To test if the code point is valid, use
the
isValidCodePoint(int)
method.
codePoint | the code point for which to calculate the number of required chars. |
---|
2
if
codePoint >= 0x10000
;
1
otherwise.
Gets the primitive value of this character.
Returns the code point at
index
in the specified array of
character units. If the unit at
index
is a high-surrogate unit,
index + 1
is less than the length of the array and the unit at
index + 1
is a low-surrogate unit, then the supplementary code
point represented by the pair is returned; otherwise the
char
value at
index
is returned.
seq |
the source array of
char
units.
|
---|---|
index |
the position in
seq
from which to retrieve the code
point.
|
char
value at
index
in
seq
.
NullPointerException |
if
seq
is
null
.
|
---|---|
IndexOutOfBoundsException |
if the
index
is negative or greater than or equal to
the length of
seq
.
|
Returns the code point at
index
in the specified array of
character units, where
index
has to be less than
limit
.
If the unit at
index
is a high-surrogate unit,
index + 1
is less than
limit
and the unit at
index + 1
is a
low-surrogate unit, then the supplementary code point represented by the
pair is returned; otherwise the
char
value at
index
is
returned.
seq |
the source array of
char
units.
|
---|---|
index |
the position in
seq
from which to get the code point.
|
limit |
the index after the last unit in
seq
that can be used.
|
char
value at
index
in
seq
.
NullPointerException |
if
seq
is
null
.
|
---|---|
IndexOutOfBoundsException |
if
index < 0
,
index >= limit
,
limit < 0
or if
limit
is greater than the
length of
seq
.
|
Returns the code point at
index
in the specified sequence of
character units. If the unit at
index
is a high-surrogate unit,
index + 1
is less than the length of the sequence and the unit at
index + 1
is a low-surrogate unit, then the supplementary code
point represented by the pair is returned; otherwise the
char
value at
index
is returned.
seq |
the source sequence of
char
units.
|
---|---|
index |
the position in
seq
from which to retrieve the code
point.
|
char
value at
index
in
seq
.
NullPointerException |
if
seq
is
null
.
|
---|---|
IndexOutOfBoundsException |
if the
index
is negative or greater than or equal to
the length of
seq
.
|
Returns the code point that precedes
index
in the specified
sequence of character units. If the unit at
index - 1
is a
low-surrogate unit,
index - 2
is not negative and the unit at
index - 2
is a high-surrogate unit, then the supplementary code
point represented by the pair is returned; otherwise the
char
value at
index - 1
is returned.
seq |
the source sequence of
char
units.
|
---|---|
index |
the position in
seq
following the code
point that should be returned.
|
char
value before
index
in
seq
.
NullPointerException |
if
seq
is
null
.
|
---|---|
IndexOutOfBoundsException |
if the
index
is less than 1 or greater than the
length of
seq
.
|
Returns the code point that precedes the
index
in the specified
array of character units and is not less than
start
. If the unit
at
index - 1
is a low-surrogate unit,
index - 2
is not
less than
start
and the unit at
index - 2
is a
high-surrogate unit, then the supplementary code point represented by the
pair is returned; otherwise the
char
value at
index - 1
is returned.
seq |
the source array of
char
units.
|
---|---|
index |
the position in
seq
following the code point that
should be returned.
|
start |
the index of the first element in
seq
.
|
char
value before
index
in
seq
.
NullPointerException |
if
seq
is
null
.
|
---|---|
IndexOutOfBoundsException |
if the
index <= start
,
start < 0
,
index
is greater than the length of
seq
, or
if
start
is equal or greater than the length of
seq
.
|
Returns the code point that precedes
index
in the specified
array of character units. If the unit at
index - 1
is a
low-surrogate unit,
index - 2
is not negative and the unit at
index - 2
is a high-surrogate unit, then the supplementary code
point represented by the pair is returned; otherwise the
char
value at
index - 1
is returned.
seq |
the source array of
char
units.
|
---|---|
index |
the position in
seq
following the code
point that should be returned.
|
char
value before
index
in
seq
.
NullPointerException |
if
seq
is
null
.
|
---|---|
IndexOutOfBoundsException |
if the
index
is less than 1 or greater than the
length of
seq
.
|
Counts the number of Unicode code points in the subsequence of the
specified character sequence, as delineated by
beginIndex
and
endIndex
. Any surrogate values with missing pair values will be
counted as one code point.
seq |
the
CharSequence
to look through.
|
---|---|
beginIndex | the inclusive index to begin counting at. |
endIndex | the exclusive index to stop counting at. |
NullPointerException |
if
seq
is
null
.
|
---|---|
IndexOutOfBoundsException |
if
beginIndex < 0
,
beginIndex > endIndex
or
if
endIndex
is greater than the length of
seq
.
|
Counts the number of Unicode code points in the subsequence of the
specified char array, as delineated by
offset
and
count
.
Any surrogate values with missing pair values will be counted as one code
point.
seq | the char array to look through |
---|---|
offset | the inclusive index to begin counting at. |
count |
the number of
char
values to look through in
seq
.
|
NullPointerException |
if
seq
is
null
.
|
---|---|
IndexOutOfBoundsException |
if
offset < 0
,
count < 0
or if
offset + count
is greater than the length of
seq
.
|
Compares two
char
values.
Compares this object to the specified character object to determine their relative order.
c | the character object to compare this object to. |
---|
0
if the value of this character and the value of
c
are equal; a positive value if the value of this
character is greater than the value of
c
; a negative
value if the value of this character is less than the value of
c
.
Convenience method to determine the value of the specified character
c
in the supplied radix. The value of
radix
must be
between MIN_RADIX and MAX_RADIX.
c | the character to determine the value of. |
---|---|
radix | the radix. |
Convenience method to determine the value of the character
codePoint
in the supplied radix. The value of
radix
must
be between MIN_RADIX and MAX_RADIX.
codePoint | the character, including supplementary characters. |
---|---|
radix | the radix. |
Compares this object with the specified object and indicates if they are
equal. In order to be equal,
object
must be an instance of
Character
and have the same char value as this object.
object | the object to compare this double with. |
---|
true
if the specified object is equal to this
Character
;
false
otherwise.
Returns the character which represents the specified digit in the
specified radix. The
radix
must be between
MIN_RADIX
and
MAX_RADIX
inclusive;
digit
must not be negative and
smaller than
radix
. If any of these conditions does not hold, 0
is returned.
digit | the integer value. |
---|---|
radix | the radix. |
digit
in the
radix
.
Gets the Unicode directionality of the specified character.
codePoint | the Unicode code point to get the directionality of. |
---|
codePoint
.
Gets the Unicode directionality of the specified character.
c | the character to get the directionality of. |
---|
c
.
Returns a human-readable name for the given code point, or null if the code point is unassigned.
As a fallback mechanism this method returns strings consisting of the Unicode block name (with underscores replaced by spaces), a single space, and the uppercase hex value of the code point, using as few digits as necessary.
Examples:
Character.getName(0)
returns "NULL".
Character.getName('e')
returns "LATIN SMALL LETTER E".
Character.getName('٦')
returns "ARABIC-INDIC DIGIT SIX".
Character.getName(0xe000)
returns "PRIVATE USE AREA E000".
Note that the exact strings returned will vary from release to release.
IllegalArgumentException |
if
codePoint
is not a valid code point.
|
---|
Gets the numeric value of the specified Unicode code point. For example, the code point 'Ⅻ' stands for the Roman number XII, which has the numeric value 12.
There are two points of divergence between this method and the Unicode specification. This method treats the letters a-z (in both upper and lower cases, and their full-width variants) as numbers from 10 to 35. The Unicode specification also supports the idea of code points with non-integer numeric values; this method does not (except to the extent of returning -2 for such code points).
codePoint | the code point |
---|
codePoint
exists, -1 if there is no numeric value for
codePoint
, -2 if the numeric value can not be
represented with an integer.
Returns the numeric value of the specified Unicode character.
See
getNumericValue(int)
.
c | the character |
---|
c
exists, -1 if there is no numeric value for
c
,
-2 if the numeric value can not be represented as an integer.
Gets the general Unicode category of the specified character.
c | the character to get the category of. |
---|
c
.
Gets the general Unicode category of the specified code point.
codePoint | the Unicode code point to get the category of. |
---|
codePoint
.
Returns an integer hash code for this object. By contract, any two
objects for which
equals(Object)
returns
true
must return
the same hash code value. This means that subclasses of
Object
usually override both methods or neither method.
Note that hash values must not change over time unless information used in equals comparisons also changes.
See
Writing a correct
hashCode
method
if you intend implementing your own
hashCode
method.
Returns the high surrogate for the given code point. The result is meaningless if the given code point is not a supplementary character.
Returns true if the given code point is alphabetic. That is, if it is in any of the Lu, Ll, Lt, Lm, Lo, Nl, or Other_Alphabetic categories.
Returns true if the given code point is in the Basic Multilingual Plane (BMP).
Such code points can be represented by a single
char
.
Indicates whether the specified code point is defined in the Unicode specification.
codePoint | the code point to check. |
---|
true
if the general Unicode category of the code point is
not
UNASSIGNED
;
false
otherwise.
Indicates whether the specified character is defined in the Unicode specification.
c | the character to check. |
---|
true
if the general Unicode category of the character is
not
UNASSIGNED
;
false
otherwise.
Indicates whether the specified character is a digit.
c | the character to check. |
---|
true
if
c
is a digit;
false
otherwise.
Indicates whether the specified code point is a digit.
codePoint | the code point to check. |
---|
true
if
codePoint
is a digit;
false
otherwise.
Indicates whether
ch
is a high- (or leading-) surrogate code unit
that is used for representing supplementary characters in UTF-16
encoding.
ch | the character to test. |
---|
true
if
ch
is a high-surrogate code unit;
false
otherwise.
Indicates whether the specified character is an ISO control character.
c | the character to check. |
---|
true
if
c
is an ISO control character;
false
otherwise.
Indicates whether the specified code point is an ISO control character.
c | the code point to check. |
---|
true
if
c
is an ISO control character;
false
otherwise.
Indicates whether the specified character is ignorable in a Java or Unicode identifier.
c | the character to check. |
---|
true
if
c
is ignorable;
false
otherwise.
Indicates whether the specified code point is ignorable in a Java or Unicode identifier.
codePoint | the code point to check. |
---|
true
if
codePoint
is ignorable;
false
otherwise.
Returns true if the given code point is a CJKV ideographic character.
Indicates whether the specified code point is a valid part of a Java identifier other than the first character.
codePoint | the code point to check. |
---|
true
if
c
is valid as part of a Java identifier;
false
otherwise.
Indicates whether the specified character is a valid part of a Java identifier other than the first character.
c | the character to check. |
---|
true
if
c
is valid as part of a Java identifier;
false
otherwise.
Indicates whether the specified character is a valid first character for a Java identifier.
c | the character to check. |
---|
true
if
c
is a valid first character of a Java
identifier;
false
otherwise.
Indicates whether the specified code point is a valid first character for a Java identifier.
codePoint | the code point to check. |
---|
true
if
codePoint
is a valid start of a Java
identifier;
false
otherwise.
This method was deprecated
in API level 1.
Use
isJavaIdentifierStart(char)
instead.
Indicates whether the specified character is a Java letter.
c | the character to check. |
---|
true
if
c
is a Java letter;
false
otherwise.
This method was deprecated
in API level 1.
Use
isJavaIdentifierPart(char)
instead.
Indicates whether the specified character is a Java letter or digit character.
c | the character to check. |
---|
true
if
c
is a Java letter or digit;
false
otherwise.
Indicates whether the specified character is a letter.
c | the character to check. |
---|
true
if
c
is a letter;
false
otherwise.
Indicates whether the specified code point is a letter.
codePoint | the code point to check. |
---|
true
if
codePoint
is a letter;
false
otherwise.
Indicates whether the specified character is a letter or a digit.
c | the character to check. |
---|
true
if
c
is a letter or a digit;
false
otherwise.
Indicates whether the specified code point is a letter or a digit.
codePoint | the code point to check. |
---|
true
if
codePoint
is a letter or a digit;
false
otherwise.
Indicates whether
ch
is a low- (or trailing-) surrogate code unit
that is used for representing supplementary characters in UTF-16
encoding.
ch | the character to test. |
---|
true
if
ch
is a low-surrogate code unit;
false
otherwise.
Indicates whether the specified code point is a lower case letter.
codePoint | the code point to check. |
---|
true
if
codePoint
is a lower case letter;
false
otherwise.
Indicates whether the specified character is a lower case letter.
c | the character to check. |
---|
true
if
c
is a lower case letter;
false
otherwise.
Indicates whether the specified character is mirrored.
c | the character to check. |
---|
true
if
c
is mirrored;
false
otherwise.
Indicates whether the specified code point is mirrored.
codePoint | the code point to check. |
---|
true
if
codePoint
is mirrored,
false
otherwise.
This method was deprecated
in API level 1.
Use
isWhitespace(char)
instead.
Use
isWhitespace(char)
instead.
Returns true if the given code point is a Unicode space character.
The exact set of characters considered as whitespace varies with Unicode version.
Note that non-breaking spaces are considered whitespace.
Note also that line separators are not considered whitespace; see
isWhitespace(char)
for an alternative.
Indicates whether
codePoint
is within the supplementary code
point range.
codePoint | the code point to test. |
---|
true
if
codePoint
is within the supplementary
code point range;
false
otherwise.
Returns true if the given character is a high or low surrogate.
Indicates whether the specified character pair is a valid surrogate pair.
high | the high surrogate unit to test. |
---|---|
low | the low surrogate unit to test. |
true
if
high
is a high-surrogate code unit and
low
is a low-surrogate code unit;
false
otherwise.
Indicates whether the specified code point is a titlecase character.
codePoint | the code point to check. |
---|
true
if
codePoint
is a titlecase character,
false
otherwise.
Indicates whether the specified character is a titlecase character.
c | the character to check. |
---|
true
if
c
is a titlecase character,
false
otherwise.
Indicates whether the specified code point is valid as part of a Unicode identifier other than the first character.
codePoint | the code point to check. |
---|
true
if
codePoint
is valid as part of a Unicode
identifier;
false
otherwise.
Indicates whether the specified character is valid as part of a Unicode identifier other than the first character.
c | the character to check. |
---|
true
if
c
is valid as part of a Unicode
identifier;
false
otherwise.
Indicates whether the specified character is a valid initial character for a Unicode identifier.
c | the character to check. |
---|
true
if
c
is a valid first character for a
Unicode identifier;
false
otherwise.
Indicates whether the specified code point is a valid initial character for a Unicode identifier.
codePoint | the code point to check. |
---|
true
if
codePoint
is a valid first character for
a Unicode identifier;
false
otherwise.
Indicates whether the specified code point is an upper case letter.
codePoint | the code point to check. |
---|
true
if
codePoint
is a upper case letter;
false
otherwise.
Indicates whether the specified character is an upper case letter.
c | the character to check. |
---|
true
if
c
is a upper case letter;
false
otherwise.
Indicates whether
codePoint
is a valid Unicode code point.
codePoint | the code point to test. |
---|
true
if
codePoint
is a valid Unicode code point;
false
otherwise.
Returns true if the given code point is a Unicode whitespace character.
The exact set of characters considered as whitespace varies with Unicode version.
Note that non-breaking spaces are not considered whitespace.
Note also that line separators are considered whitespace; see
isSpaceChar(char)
for an alternative.
Returns the low surrogate for the given code point. The result is meaningless if the given code point is not a supplementary character.
Determines the index in the specified character sequence that is offset
codePointOffset
code points from
index
.
seq | the character sequence to find the index in. |
---|---|
index |
the start index in
seq
.
|
codePointOffset | the number of code points to look backwards or forwards; may be a negative or positive value. |
seq
that is
codePointOffset
code
points away from
index
.
NullPointerException |
if
seq
is
null
.
|
---|---|
IndexOutOfBoundsException |
if
index < 0
,
index
is greater than the
length of
seq
, or if there are not enough values in
seq
to skip
codePointOffset
code points
forwards or backwards (if
codePointOffset
is
negative) from
index
.
|
Determines the index in a subsequence of the specified character array
that is offset
codePointOffset
code points from
index
.
The subsequence is delineated by
start
and
count
.
seq | the character array to find the index in. |
---|---|
start | the inclusive index that marks the beginning of the subsequence. |
count |
the number of
char
values to include within the
subsequence.
|
index | the start index in the subsequence of the char array. |
codePointOffset | the number of code points to look backwards or forwards; may be a negative or positive value. |
seq
that is
codePointOffset
code
points away from
index
.
NullPointerException |
if
seq
is
null
.
|
---|---|
IndexOutOfBoundsException |
if
start < 0
,
count < 0
,
index < start
,
index > start + count
,
start + count
is greater than the length of
seq
, or if there are not enough values in
seq
to skip
codePointOffset
code points
forward or backward (if
codePointOffset
is
negative) from
index
.
|
Reverses the order of the first and second byte in the specified character.
c | the character to reverse. |
---|
Converts the specified Unicode code point into a UTF-16 encoded sequence and returns it as a char array.
codePoint | the Unicode code point to encode. |
---|
codePoint
is a
supplementary code point
,
then the returned array contains two characters, otherwise it
contains just one character.
IllegalArgumentException |
if
codePoint
is not a valid code point.
|
---|
Converts the specified Unicode code point into a UTF-16 encoded sequence
and copies the value(s) into the char array
dst
, starting at
index
dstIndex
.
codePoint | the Unicode code point to encode. |
---|---|
dst | the destination array to copy the encoded value into. |
dstIndex |
the index in
dst
from where to start copying.
|
char
value units copied into
dst
.
IllegalArgumentException |
if
codePoint
is not a valid code point.
|
---|---|
NullPointerException |
if
dst
is
null
.
|
IndexOutOfBoundsException |
if
dstIndex
is negative, greater than or equal to
dst.length
or equals
dst.length - 1
when
codePoint
is a
supplementary code point
.
|
Converts a surrogate pair into a Unicode code point. This method assumes
that the pair are valid surrogates. If the pair are
not
valid
surrogates, then the result is indeterminate. The
isSurrogatePair(char, char)
method should be used prior to this
method to validate the pair.
high | the high surrogate unit. |
---|---|
low | the low surrogate unit. |
Returns the lower case equivalent for the specified character if the character is an upper case letter. Otherwise, the specified character is returned unchanged.
c | the character |
---|
c
is an upper case character then its lower case
counterpart, otherwise just
c
.
Returns the lower case equivalent for the specified code point if it is an upper case letter. Otherwise, the specified code point is returned unchanged.
codePoint | the code point to check. |
---|
codePoint
is an upper case character then its lower
case counterpart, otherwise just
codePoint
.
Converts the specified character to its string representation.
value | the character to convert. |
---|
Returns a string containing a concise, human-readable description of this object. Subclasses are encouraged to override this method and provide an implementation that takes into account the object's type and data. The default implementation is equivalent to the following expression:
getClass().getName() + '@' + Integer.toHexString(hashCode())
See
Writing a useful
toString
method
if you intend implementing your own
toString
method.
Returns the title case equivalent for the specified character if it exists. Otherwise, the specified character is returned unchanged.
c | the character to convert. |
---|
c
if it exists, otherwise
c
.
Returns the title case equivalent for the specified code point if it exists. Otherwise, the specified code point is returned unchanged.
codePoint | the code point to convert. |
---|
codePoint
if it exists,
otherwise
codePoint
.
Returns the upper case equivalent for the specified character if the character is a lower case letter. Otherwise, the specified character is returned unchanged.
c | the character to convert. |
---|
c
is a lower case character then its upper case
counterpart, otherwise just
c
.
Returns the upper case equivalent for the specified code point if the code point is a lower case letter. Otherwise, the specified code point is returned unchanged.
codePoint | the code point to convert. |
---|
codePoint
is a lower case character then its upper
case counterpart, otherwise just
codePoint
.
Returns a
Character
instance for the
char
value passed.
If it is not necessary to get a new
Character
instance, it is
recommended to use this method instead of the constructor, since it
maintains a cache of instances which may result in better performance.
c |
the char value for which to get a
Character
instance.
|
---|
Character
instance for
c
.