Java的类型与运算符

Published on 2016 - 10 - 11

The Char Type

In 1991, Unicode 1.0 was released, using slightly less than half of the available 65,536 code values. Java was designed from the ground up to use 16-bit Unicode characters, which was a major advance over other programming languages that used 8-bit characters.

Unfortunately, over time, the inevitable happened. Unicode grew beyond 65,536 characters, primarily due to the addition of a very large set of ideographs used for Chinese, Japanese, and Korean. Now, the 16-bit char type is insufficient to describe all Unicode characters.

We need a bit of terminology to explain how this problem is resolved in Java, beginning with Java SE 5.0. A code point is a code value that is associated with a character in an encoding scheme. In the Unicode standard, code points are written in hexadecimal and prefixed with U+, such as U+0041 for the code point of the Latin letter A. Unicode has code points that are grouped into 17 code planes. The first code plane, called the basic multilingual plane, consists of the “classic” Unicode characters with code points U+0000 to U+FFFF. Sixteen additional planes, with code points U+10000 to U+10FFFF, hold the supplementary characters.

The UTF-16 encoding represents all Unicode code points in a variable-length code. The characters in the basic multilingual plane are represented as 16-bit values, called code units. The supplementary characters are encoded as consecutive pairs of code units. Each of the values in such an encoding pair falls into a range of 2048 unused values of the basic multilingual plane, called the surrogates area (U+D800 to U+DBFF for the first code unit, U+DC00 to U+DFFF for the second code unit). This is rather clever, because you can immediately tell whether a code unit encodes a single character or it is the first or second part of a supplementary character. For example, the mathematical symbol for the set of integers ZZ has code point U+1D56B and is encoded by the two code units U+D835 and U+DD6B.

In Java, the char type describes a code unit in the UTF-16 encoding.

Our strong recommendation is not to use the char type in your programs unless you are actually manipulating UTF-16 code units. You are almost always better off treating strings as abstract data types.

批注:65535已经无法包含所有的字符,之后扩充的字符将通过一定的方式进行扩充。java 5.0给出了一种扩充的解决方案,code point代表一个utf-16字符,utf-16字符包含了常规字符和附加的字符。常规字符可以使用1个16位的char表示。而附加字符需要使用2个16位的char表示

Operators

The double type uses 64 bits to store a numeric value, but some processors use 80-bit floating-point registers. These registers yield added precision in intermediate steps of a computation. For example, consider the following computation:

double w = x * y / z;

Many Intel processors compute x 乘 y , leave the result in an 80-bit register, then divide by z, and finally truncate the result back to 64 bits. That can yield a more accurate result, and it can avoid exponent overflow. But the result may be different from a computation that uses 64 bits throughout. For that reason, the initial specification of the Java virtual machine mandated that all intermediate computations must be truncated. The numeric community hated it. Not only can the truncated computations cause overflow, they are actually slower than the more precise computations because the truncation operations take time. For that reason, the Java programming language was updated to recognize the conflicting demands for optimum performance and perfect reproducibility. By default, virtual machine designers are now permitted to use extended precision for intermediate computations. However, methods tagged with the strictfp keyword must use strict floating-point operations that yield reproducible results.

public static strictfp void main(String[] args)

批注:strictfp关键字的用途

The methods in the Math class use the routines in the computer’s floating-point unit for fastest performance. If completely predictable results are more important than fast performance, use the StrictMath class instead. It implements the algorithms from the “Freely Distributable Math Library” fdlibm, guaranteeing identical results on all platforms.

Strings

Do not use the == operator to test whether two strings are equal! It only determines whether or not the strings are stored in the same location. Sure, if strings are in the same location, they must be equal. But it is entirely possible to store multiple copies of identical strings in different places.

The StringBuilder class was introduced in JDK 5.0. Its predecessor, StringBuffer, is slightly less efficient, but it allows multiple threads to add or remove characters. If all string editing happens in a single thread (which is usually the case), you should use StringBuilder instead. The APIs of both classes are identical.

Input and Output

if you use an integrated development environment, the starting directory is controlled by the IDE. You can find the directory location with this call:

String dir = System.getProperty("user.dir");

批注:这个方法可以获取到当前路径。

When you launch a program from a command shell, you can use the redirection syntax of your shell and attach any file to System.in and System.out:

java MyProg < myfile.txt > output.txt

Then, you need not worry about handling the FileNotFoundException.

批注:这样System.in和System.out将会通过指定的文件进行读写。

Reference