Unicode Information

class jkUnicode.UniInfo(uni=None)

The main Unicode Info object. It gets its Unicode information from the submodules aglfn, uniCase, uniCat, uniDecomposition, uniName, and uniRangesBits which are generated from the official Unicode data. You can find tools to download and regenerate the data in the tools subfolder.

The Unicode Info object is meant to be instantiated once and then reused to get information about different codepoints. Avoid to instantiate it often, because it is expensive on disk access.

Initialize the Info object with a dummy codepoint or None e.g. before a loop and then in the loop assign the actual codepoints that you want information about by setting the unicode instance variable. This will automatically update the other instance variables with the correct information from the Unicode standard.

Parameters

uni (int) – The codepoint.

property category

The name of the category for the current Unicode value as string.

property category_short

The short name of the category for the current Unicode value as string.

property char

The character for the current Unicode value.

property decomposition_mapping

The decomposition mapping for the current Unicode value as a list of integer codepoints.

property glyphname

The AGLFN glyph name for the current Unicode value as string.

property lc_mapping

The lowercase mapping for the current Unicode value as integer or None.

property name

The Unicode name for the current Unicode value as string.

property nice_name

The Unicode name for the current Unicode value as string.

property uc_mapping

The uppercase mapping for the current Unicode value as integer or None.

property unicode

The Unicode value as integer. Setting this value will look up and fill the other pieces of information, like category, range, decomposition mapping, and case mapping.

jkUnicode.getUnicodeChar(code)

Return the Unicode character for a Unicode number. This supports “high” unicodes (> 0xffff) even on 32-bit builds.

Parameters

code (int) – The codepoint

jkUnicode.get_expanded_glyph_list(unicodes)

“Expand” or annotate a list of unicodes. For unicodes that have a case mapping (UC or LC), the target unicode of the case mapping will be added to the list. AGLFN glyph names are added to the list too, so the returned list contains tuples of (unicode, glyphname), sorted by unicode value.

Parameters

unicodes (list) – A list of unicodes (int)