Unicode Information¶
-
class
jkUnicode.
UniInfo
(uni=None)¶ The main Unicode Info object. It gets its Unicode information from the submodules aglfn, uniCase, uniCat, uniDecomposition, uniName, and uniRangesBits which are generated from the official Unicode data. You can find tools to download and regenerate the data in the tools subfolder.
The Unicode Info object is meant to be instantiated once and then reused to get information about different codepoints. Avoid to instantiate it often, because it is expensive on disk access.
Initialize the Info object with a dummy codepoint or None e.g. before a loop and then in the loop assign the actual codepoints that you want information about by setting the unicode instance variable. This will automatically update the other instance variables with the correct information from the Unicode standard.
- Parameters
uni (int) – The codepoint.
-
property
category
¶ The name of the category for the current Unicode value as string.
-
property
category_short
¶ The short name of the category for the current Unicode value as string.
-
property
char
¶ The character for the current Unicode value.
-
property
decomposition_mapping
¶ The decomposition mapping for the current Unicode value as a list of integer codepoints.
-
property
glyphname
¶ The AGLFN glyph name for the current Unicode value as string.
-
property
lc_mapping
¶ The lowercase mapping for the current Unicode value as integer or None.
-
property
name
¶ The Unicode name for the current Unicode value as string.
-
property
nice_name
¶ The Unicode name for the current Unicode value as string.
-
property
uc_mapping
¶ The uppercase mapping for the current Unicode value as integer or None.
-
property
unicode
¶ The Unicode value as integer. Setting this value will look up and fill the other pieces of information, like category, range, decomposition mapping, and case mapping.
-
jkUnicode.
getUnicodeChar
(code)¶ Return the Unicode character for a Unicode number. This supports “high” unicodes (> 0xffff) even on 32-bit builds.
- Parameters
code (int) – The codepoint
-
jkUnicode.
get_expanded_glyph_list
(unicodes)¶ “Expand” or annotate a list of unicodes. For unicodes that have a case mapping (UC or LC), the target unicode of the case mapping will be added to the list. AGLFN glyph names are added to the list too, so the returned list contains tuples of (unicode, glyphname), sorted by unicode value.
- Parameters
unicodes (list) – A list of unicodes (int)