Class UCharacterProperty

java.lang.Object
com.ibm.icu.impl.UCharacterProperty

public final class UCharacterProperty extends Object

Internal class used for Unicode character property database.

This classes store binary data read from uprops.icu. It does not have the capability to parse the data into more high-level information. It only returns bytes of information when required.

Due to the form most commonly used for retrieval, array of char is used to store the binary data.

UCharacterPropertyDB also contains information on accessing indexes to significant points in the binary data.

Responsibility for molding the binary data into more meaning form lies on UCharacter.

Since:
release 2.1, february 1st 2002
  • Field Details

    • INSTANCE

      public static final UCharacterProperty INSTANCE
    • m_trie_

      public Trie2_16 m_trie_
      Trie data
    • m_unicodeVersion_

      public VersionInfo m_unicodeVersion_
      Unicode version
    • LATIN_CAPITAL_LETTER_I_WITH_DOT_ABOVE_

      public static final char LATIN_CAPITAL_LETTER_I_WITH_DOT_ABOVE_
      Latin capital letter i with dot above
      See Also:
    • LATIN_SMALL_LETTER_DOTLESS_I_

      public static final char LATIN_SMALL_LETTER_DOTLESS_I_
      Latin small letter i with dot above
      See Also:
    • LATIN_SMALL_LETTER_I_

      public static final char LATIN_SMALL_LETTER_I_
      Latin lowercase i
      See Also:
    • TYPE_MASK

      public static final int TYPE_MASK
      Character type mask
      See Also:
    • SRC_NONE

      public static final int SRC_NONE
      No source, not a supported property.
      See Also:
    • SRC_CHAR

      public static final int SRC_CHAR
      From uchar.c/uprops.icu main trie
      See Also:
    • SRC_PROPSVEC

      public static final int SRC_PROPSVEC
      From uchar.c/uprops.icu properties vectors trie
      See Also:
    • SRC_NAMES

      public static final int SRC_NAMES
      From unames.c/unames.icu
      See Also:
    • SRC_CASE

      public static final int SRC_CASE
      From ucase.c/ucase.icu
      See Also:
    • SRC_BIDI

      public static final int SRC_BIDI
      From ubidi_props.c/ubidi.icu
      See Also:
    • SRC_CHAR_AND_PROPSVEC

      public static final int SRC_CHAR_AND_PROPSVEC
      From uchar.c/uprops.icu main trie as well as properties vectors trie
      See Also:
    • SRC_CASE_AND_NORM

      public static final int SRC_CASE_AND_NORM
      From ucase.c/ucase.icu as well as unorm.cpp/unorm.icu
      See Also:
    • SRC_NFC

      public static final int SRC_NFC
      From normalizer2impl.cpp/nfc.nrm
      See Also:
    • SRC_NFKC

      public static final int SRC_NFKC
      From normalizer2impl.cpp/nfkc.nrm
      See Also:
    • SRC_NFKC_CF

      public static final int SRC_NFKC_CF
      From normalizer2impl.cpp/nfkc_cf.nrm
      See Also:
    • SRC_NFC_CANON_ITER

      public static final int SRC_NFC_CANON_ITER
      From normalizer2impl.cpp/nfc.nrm canonical iterator data
      See Also:
    • SRC_INPC

      public static final int SRC_INPC
      See Also:
    • SRC_INSC

      public static final int SRC_INSC
      See Also:
    • SRC_VO

      public static final int SRC_VO
      See Also:
    • SRC_EMOJI

      public static final int SRC_EMOJI
      See Also:
    • SRC_IDSU

      public static final int SRC_IDSU
      See Also:
    • SRC_ID_COMPAT_MATH

      public static final int SRC_ID_COMPAT_MATH
      See Also:
    • SRC_COUNT

      public static final int SRC_COUNT
      One more than the highest UPropertySource (SRC_) constant.
      See Also:
    • MY_MASK

      static final int MY_MASK
      See Also:
    • GC_CN_MASK

      private static final int GC_CN_MASK
    • GC_CC_MASK

      private static final int GC_CC_MASK
    • GC_CS_MASK

      private static final int GC_CS_MASK
    • GC_ZS_MASK

      private static final int GC_ZS_MASK
    • GC_ZL_MASK

      private static final int GC_ZL_MASK
    • GC_ZP_MASK

      private static final int GC_ZP_MASK
    • GC_Z_MASK

      private static final int GC_Z_MASK
      Mask constant for multiple UCharCategory bits (Z Separators).
    • ID_COMPAT_MATH_CONTINUE

      private static final int[] ID_COMPAT_MATH_CONTINUE
      Ranges (start/limit pairs) of ID_Compat_Math_Continue (only), from UCD PropList.txt.
    • ID_COMPAT_MATH_START

      private static final int[] ID_COMPAT_MATH_START
      ID_Compat_Math_Start characters, from UCD PropList.txt.
    • binProps

    • gcbToHst

      private static final int[] gcbToHst
    • intProps

    • m_additionalTrie_

      Trie2_16 m_additionalTrie_
      Extra property trie
    • m_additionalVectors_

      int[] m_additionalVectors_
      Extra property vectors, 1st column for age and second for binary properties.
    • m_additionalColumnsCount_

      int m_additionalColumnsCount_
      Number of additional columns
    • m_maxBlockScriptValue_

      int m_maxBlockScriptValue_
      Maximum values for block, bits used as in vector word 0
    • m_maxJTGValue_

      int m_maxJTGValue_
      Maximum values for script, bits used as in vector word 0
    • m_scriptExtensions_

      public char[] m_scriptExtensions_
      Script_Extensions data
    • DATA_FILE_NAME_

      private static final String DATA_FILE_NAME_
      Default name of the datafile
      See Also:
    • NUMERIC_TYPE_VALUE_SHIFT_

      private static final int NUMERIC_TYPE_VALUE_SHIFT_
      Numeric types and values in the main properties words.
      See Also:
    • NTV_NONE_

      private static final int NTV_NONE_
      No numeric value.
      See Also:
    • NTV_DECIMAL_START_

      private static final int NTV_DECIMAL_START_
      Decimal digits: nv=0..9
      See Also:
    • NTV_DIGIT_START_

      private static final int NTV_DIGIT_START_
      Other digits: nv=0..9
      See Also:
    • NTV_NUMERIC_START_

      private static final int NTV_NUMERIC_START_
      Small integers: nv=0..154
      See Also:
    • NTV_FRACTION_START_

      private static final int NTV_FRACTION_START_
      Fractions: ((ntv>>4)-12) / ((ntvinvalid input: '&'0xf)+1) = -1..17 / 1..16
      See Also:
    • NTV_LARGE_START_

      private static final int NTV_LARGE_START_
      Large integers: ((ntv>>5)-14) * 10^((ntvinvalid input: '&'0x1f)+2) = (1..9)*(10^2..10^33) (only one significant decimal digit)
      See Also:
    • NTV_BASE60_START_

      private static final int NTV_BASE60_START_
      Sexagesimal numbers: ((ntv>>2)-0xbf) * 60^((ntvinvalid input: '&'3)+1) = (1..9)*(60^1..60^4)
      See Also:
    • NTV_FRACTION20_START_

      private static final int NTV_FRACTION20_START_
      Fraction-20 values: frac20 = ntv-0x324 = 0..0x17 -> 1|3|5|7 / 20|40|80|160|320|640 numerator: num = 2*(frac20invalid input: '&'3)+1 denominator: den = 20invalid input: '<'invalid input: '<'(frac20>>2)
      See Also:
    • NTV_FRACTION32_START_

      private static final int NTV_FRACTION32_START_
      Fraction-32 values: frac32 = ntv-0x34c = 0..15 -> 1|3|5|7 / 32|64|128|256 numerator: num = 2*(frac32invalid input: '&'3)+1 denominator: den = 32invalid input: '<'invalid input: '<'(frac32>>2)
      See Also:
    • NTV_RESERVED_START_

      private static final int NTV_RESERVED_START_
      No numeric value (yet).
      See Also:
    • SCRIPT_X_MASK

      public static final int SCRIPT_X_MASK
      Script_Extensions: mask includes Script
      See Also:
    • SCRIPT_HIGH_MASK

      public static final int SCRIPT_HIGH_MASK
      See Also:
    • SCRIPT_HIGH_SHIFT

      public static final int SCRIPT_HIGH_SHIFT
      See Also:
    • MAX_SCRIPT

      public static final int MAX_SCRIPT
      See Also:
    • EAST_ASIAN_MASK_

      private static final int EAST_ASIAN_MASK_
      Integer properties mask and shift values for East Asian cell width. Equivalent to icu4c UPROPS_EA_MASK
      See Also:
    • EAST_ASIAN_SHIFT_

      private static final int EAST_ASIAN_SHIFT_
      Integer properties mask and shift values for East Asian cell width. Equivalent to icu4c UPROPS_EA_SHIFT
      See Also:
    • BLOCK_MASK_

      private static final int BLOCK_MASK_
      Integer properties mask and shift values for blocks. Equivalent to icu4c UPROPS_BLOCK_MASK
      See Also:
    • BLOCK_SHIFT_

      private static final int BLOCK_SHIFT_
      Integer properties mask and shift values for blocks. Equivalent to icu4c UPROPS_BLOCK_SHIFT
      See Also:
    • SCRIPT_LOW_MASK

      public static final int SCRIPT_LOW_MASK
      Integer properties mask and shift values for scripts. Equivalent to icu4c UPROPS_SHIFT_LOW_MASK.
      See Also:
    • SCRIPT_X_WITH_COMMON

      public static final int SCRIPT_X_WITH_COMMON
      See Also:
    • SCRIPT_X_WITH_INHERITED

      public static final int SCRIPT_X_WITH_INHERITED
      See Also:
    • SCRIPT_X_WITH_OTHER

      public static final int SCRIPT_X_WITH_OTHER
      See Also:
    • WHITE_SPACE_PROPERTY_

      private static final int WHITE_SPACE_PROPERTY_
      Additional properties used in internal trie data
      See Also:
    • DASH_PROPERTY_

      private static final int DASH_PROPERTY_
      See Also:
    • HYPHEN_PROPERTY_

      private static final int HYPHEN_PROPERTY_
      See Also:
    • QUOTATION_MARK_PROPERTY_

      private static final int QUOTATION_MARK_PROPERTY_
      See Also:
    • TERMINAL_PUNCTUATION_PROPERTY_

      private static final int TERMINAL_PUNCTUATION_PROPERTY_
      See Also:
    • MATH_PROPERTY_

      private static final int MATH_PROPERTY_
      See Also:
    • HEX_DIGIT_PROPERTY_

      private static final int HEX_DIGIT_PROPERTY_
      See Also:
    • ASCII_HEX_DIGIT_PROPERTY_

      private static final int ASCII_HEX_DIGIT_PROPERTY_
      See Also:
    • ALPHABETIC_PROPERTY_

      private static final int ALPHABETIC_PROPERTY_
      See Also:
    • IDEOGRAPHIC_PROPERTY_

      private static final int IDEOGRAPHIC_PROPERTY_
      See Also:
    • DIACRITIC_PROPERTY_

      private static final int DIACRITIC_PROPERTY_
      See Also:
    • EXTENDER_PROPERTY_

      private static final int EXTENDER_PROPERTY_
      See Also:
    • NONCHARACTER_CODE_POINT_PROPERTY_

      private static final int NONCHARACTER_CODE_POINT_PROPERTY_
      See Also:
    • GRAPHEME_EXTEND_PROPERTY_

      private static final int GRAPHEME_EXTEND_PROPERTY_
      See Also:
    • IDS_BINARY_OPERATOR_PROPERTY_

      private static final int IDS_BINARY_OPERATOR_PROPERTY_
      See Also:
    • IDS_TRINARY_OPERATOR_PROPERTY_

      private static final int IDS_TRINARY_OPERATOR_PROPERTY_
      See Also:
    • RADICAL_PROPERTY_

      private static final int RADICAL_PROPERTY_
      See Also:
    • UNIFIED_IDEOGRAPH_PROPERTY_

      private static final int UNIFIED_IDEOGRAPH_PROPERTY_
      See Also:
    • DEFAULT_IGNORABLE_CODE_POINT_PROPERTY_

      private static final int DEFAULT_IGNORABLE_CODE_POINT_PROPERTY_
      See Also:
    • DEPRECATED_PROPERTY_

      private static final int DEPRECATED_PROPERTY_
      See Also:
    • LOGICAL_ORDER_EXCEPTION_PROPERTY_

      private static final int LOGICAL_ORDER_EXCEPTION_PROPERTY_
      See Also:
    • XID_START_PROPERTY_

      private static final int XID_START_PROPERTY_
      See Also:
    • XID_CONTINUE_PROPERTY_

      private static final int XID_CONTINUE_PROPERTY_
      See Also:
    • ID_START_PROPERTY_

      private static final int ID_START_PROPERTY_
      See Also:
    • ID_CONTINUE_PROPERTY_

      private static final int ID_CONTINUE_PROPERTY_
      See Also:
    • GRAPHEME_BASE_PROPERTY_

      private static final int GRAPHEME_BASE_PROPERTY_
      See Also:
    • S_TERM_PROPERTY_

      private static final int S_TERM_PROPERTY_
      See Also:
    • VARIATION_SELECTOR_PROPERTY_

      private static final int VARIATION_SELECTOR_PROPERTY_
      See Also:
    • PATTERN_SYNTAX

      private static final int PATTERN_SYNTAX
      See Also:
    • PATTERN_WHITE_SPACE

      private static final int PATTERN_WHITE_SPACE
      See Also:
    • PREPENDED_CONCATENATION_MARK

      private static final int PREPENDED_CONCATENATION_MARK
      See Also:
    • LB_MASK

      private static final int LB_MASK
      See Also:
    • LB_SHIFT

      private static final int LB_SHIFT
      See Also:
    • SB_MASK

      private static final int SB_MASK
      See Also:
    • SB_SHIFT

      private static final int SB_SHIFT
      See Also:
    • WB_MASK

      private static final int WB_MASK
      See Also:
    • WB_SHIFT

      private static final int WB_SHIFT
      See Also:
    • GCB_MASK

      private static final int GCB_MASK
      See Also:
    • GCB_SHIFT

      private static final int GCB_SHIFT
      See Also:
    • DECOMPOSITION_TYPE_MASK_

      private static final int DECOMPOSITION_TYPE_MASK_
      Integer properties mask for decomposition type. Equivalent to icu4c UPROPS_DT_MASK.
      See Also:
    • FIRST_NIBBLE_SHIFT_

      private static final int FIRST_NIBBLE_SHIFT_
      First nibble shift
      See Also:
    • LAST_NIBBLE_MASK_

      private static final int LAST_NIBBLE_MASK_
      Second nibble mask
      See Also:
    • AGE_SHIFT_

      private static final int AGE_SHIFT_
      Age value shift
      See Also:
    • DATA_FORMAT

      private static final int DATA_FORMAT
      See Also:
    • TAB

      private static final int TAB
      See Also:
    • CR

      private static final int CR
      See Also:
    • U_A

      private static final int U_A
      See Also:
    • U_F

      private static final int U_F
      See Also:
    • U_Z

      private static final int U_Z
      See Also:
    • U_a

      private static final int U_a
      See Also:
    • U_f

      private static final int U_f
      See Also:
    • U_z

      private static final int U_z
      See Also:
    • DEL

      private static final int DEL
      See Also:
    • NL

      private static final int NL
      See Also:
    • NBSP

      private static final int NBSP
      See Also:
    • CGJ

      private static final int CGJ
      See Also:
    • FIGURESP

      private static final int FIGURESP
      See Also:
    • HAIRSP

      private static final int HAIRSP
      See Also:
    • RLM

      private static final int RLM
      See Also:
    • NNBSP

      private static final int NNBSP
      See Also:
    • WJ

      private static final int WJ
      See Also:
    • INHSWAP

      private static final int INHSWAP
      See Also:
    • NOMDIG

      private static final int NOMDIG
      See Also:
    • U_FW_A

      private static final int U_FW_A
      See Also:
    • U_FW_F

      private static final int U_FW_F
      See Also:
    • U_FW_Z

      private static final int U_FW_Z
      See Also:
    • U_FW_a

      private static final int U_FW_a
      See Also:
    • U_FW_f

      private static final int U_FW_f
      See Also:
    • U_FW_z

      private static final int U_FW_z
      See Also:
    • ZWNBSP

      private static final int ZWNBSP
      See Also:
  • Constructor Details

    • UCharacterProperty

      private UCharacterProperty() throws IOException
      Constructor
      Throws:
      IOException - thrown when data reading fails or data corrupted
  • Method Details

    • getProperty

      public final int getProperty(int ch)
      Gets the main property value for code point ch.
      Parameters:
      ch - code point whose property value is to be retrieved
      Returns:
      property value of code point
    • getAdditional

      public int getAdditional(int codepoint, int column)
      Gets the unicode additional properties. Java version of C u_getUnicodeProperties().
      Parameters:
      codepoint - codepoint whose additional properties is to be retrieved
      column - The column index.
      Returns:
      unicode properties
    • getAge

      public VersionInfo getAge(int codepoint)

      Get the "age" of the code point.

      The "age" is the Unicode version when the code point was first designated (as a non-character or for Private Use) or assigned a character.

      This can be useful to avoid emitting code points to receiving processes that do not accept newer characters.

      The data is from the UCD file DerivedAge.txt.

      This API does not check the validity of the codepoint.

      Parameters:
      codepoint - The code point.
      Returns:
      the Unicode version number
    • isgraphPOSIX

      private static final boolean isgraphPOSIX(int c)
      Checks if c is in [^\p{space}\p{gc=Control}\p{gc=Surrogate}\p{gc=Unassigned}] with space=\p{Whitespace} and Control=Cc. Implements UCHAR_POSIX_GRAPH.
    • hasBinaryProperty

      public boolean hasBinaryProperty(int c, int which)
    • getType

      public int getType(int c)
    • getIntPropertyValue

      public int getIntPropertyValue(int c, int which)
    • getIntPropertyMaxValue

      public int getIntPropertyMaxValue(int which)
    • getSource

      final int getSource(int which)
    • getMaxValues

      public int getMaxValues(int column)
      Get the the maximum values for some enum/int properties.
      Returns:
      maximum values for the integer properties.
    • getMask

      public static final int getMask(int type)
      Gets the type mask
      Parameters:
      type - character type
      Returns:
      mask
    • getEuropeanDigit

      public static int getEuropeanDigit(int ch)
      Returns the digit values of characters like 'A' - 'Z', normal, half-width and full-width. This method assumes that the other digit characters are checked by the calling method.
      Parameters:
      ch - character to test
      Returns:
      -1 if ch is not a character of the form 'A' - 'Z', otherwise its corresponding digit will be returned.
    • digit

      public int digit(int c)
    • getNumericValue

      public int getNumericValue(int c)
    • getUnicodeNumericValue

      public double getUnicodeNumericValue(int c)
    • getNumericTypeValue

      private static final int getNumericTypeValue(int props)
    • ntvGetType

      private static final int ntvGetType(int ntv)
    • mergeScriptCodeOrIndex

      public static final int mergeScriptCodeOrIndex(int scriptX)
    • addPropertyStarts

      public UnicodeSet addPropertyStarts(UnicodeSet set)
    • upropsvec_addPropertyStarts

      public void upropsvec_addPropertyStarts(UnicodeSet set)
    • ulayout_addPropertyStarts

      static UnicodeSet ulayout_addPropertyStarts(int src, UnicodeSet set)
    • mathCompat_addPropertyStarts

      static void mathCompat_addPropertyStarts(UnicodeSet set)