Java Coding: How To Find Number of Characters in a String (!= String.length())
Length of string can be interpreted variously -
- number of chars in the string
- number of characters in the string
- number of bytes in the string
String.length() gives you the number of chars in the string accurately.
However a char is not necessarily a complete character. Why?
Supplementary characters exist in the Unicode charset. These are characters that have code points above the base set, and they have values greater than 0xFFFF. They extend all the way up to 0×10FFFF.
In Java, these supplementary characters are represented as surrogate pairs, pairs of char units that fall in a specific range. The leading or high surrogate value is in the 0xD800 through 0xDBFF range. The trailing or low surrogate value is in the 0xDC00 through 0xDFFF range.
J2SE 5.0 API has a new String method: codePointCount(int beginIndex, int endIndex) which tells you how many Unicode code points are between the two indices. The index values refer to code unit or char locations, so endIndex - beginIndex for the entire String is equivalent to the String's length.
So:
int characterLength = myString.codePointCount(0, charLength);
As before:
int charLength = myString.length();
Unless you plan to sell your software to China or Japan (read internationalize) you are unlikely to encounter any difference between charLength and characterLength.
So how many bytes are in a String?
int byteCount = myString.getBytes().length;
getBytes converts its Unicode characters into a legacy charset with the exception of UTF-8 which is a multibyte encoding of Unicode and not a legacy charset. It then returns the characters in a byte array.
Filed under Headline News, How To, Java Software, Tech Note |
|
RSS 2.0 |
Trackback this Article
|
Email this Article
You may also like to read |




































November 11th, 2005 at 11:12 am
So you saying that you should always use “int byteCount = myString.getBytes().length;” instead of “myString.length()”, just in case you Internationalize later?
August 30th, 2007 at 1:26 am
Is there an example for this.
August 30th, 2007 at 3:12 am
Yes, Jason.
August 30th, 2007 at 3:13 am
I gave the example code in the article. What other examples are you looking for?
October 1st, 2007 at 3:41 pm
Is there a way to filter out the punctuation similar to setting delimiters to where the program only counts the letters?
April 1st, 2008 at 11:13 pm
how can i find a length of string with out using any function in java.. plz help me..