Module eyekit.text
Defines the TextBlock
and InterestArea
objects for handling texts.
Classes
class Box
-
Representation of a bounding box, which provides an underlying framework for
Character
,InterestArea
, andTextBlock
.Subclasses
Instance variables
var x : float
-
X-coordinate of the center of the bounding box
var y : float
-
Y-coordinate of the center of the bounding box
var x_tl : float
-
X-coordinate of the top-left corner of the bounding box
var y_tl : float
-
Y-coordinate of the top-left corner of the bounding box
var x_br : float
-
X-coordinate of the bottom-right corner of the bounding box
var y_br : float
-
Y-coordinate of the bottom-right corner of the bounding box
var width : float
-
Width of the bounding box
var height : float
-
Height of the bounding box
var box : tuple
-
The bounding box represented as x_tl, y_tl, width, and height
var center : tuple
-
XY-coordinates of the center of the bounding box
class Character (char, x_tl, y_tl, x_br, y_br, baseline)
-
Representation of a single character of text. A
Character
object is essentially a one-letter string that occupies a position in space and has a bounding box. It is not usually necessary to createCharacter
objects manually; they are created automatically during the instantiation of aTextBlock
.Ancestors
Instance variables
var baseline : float
-
The y position of the character baseline
Inherited members
class InterestArea (chars, id, padding=0)
-
Representation of an interest area – a portion of a
TextBlock
object that is of potential interest. It is not usually necessary to createInterestArea
objects manually; they are created automatically when you slice aTextBlock
object or when you iterate over lines, words, characters, ngrams, or zones parsed from the raw text.Ancestors
Instance variables
var id : str
-
Interest area ID. By default, these ID's have the form 1:5:10, which represents the line number and column indices of the
InterestArea
in its parentTextBlock
. However, IDs can also be changed to any arbitrary string. var baseline : float
-
The y position of the text baseline
var text : str
-
String representation of the interest area
Inherited members
class TextBlock (text: list, position: tuple = None, font_face: str = None, font_size: float = None, line_height: float = None, alphabet: str = None)
-
Representation of a piece of text, which may be a word, sentence, or entire multiline passage.
Initialized with:
-
text
The line of text (string) or lines of text (list of strings). Optionally, these can be marked up with arbitrary interest areas (zones); for example,The quick brown fox jump[ed]{past-suffix} over the lazy dog
. -
position
XY-coordinates of the left edge of the first baseline of the block of text. -
font_face
Name of a font available on your system. The keywordsitalic
and/orbold
can also be included to select the desired style, e.g.,Minion Pro bold italic
. -
font_size
Font size in points (at 72dpi). To convert a font size from some other dpi, usefont_size_at_72dpi()
. -
line_height
Height of a line of text in points. Generally speaking, for single line spacing, the line height is equal to the font size, for double line spacing, the line height is equal to 2 × the font size, etc. By default, the line height is assumed to be the same as the font size (single line spacing). This parameter also effectively determines the height of the bounding boxes around interest areas. -
alphabet
A string of characters that are to be considered alphabetical, which determines, for example, what is considered a word. By default, Eyekit considers the standard Latin, Greek, and Cyrillic alphabets to be alphabetical, plus the special and accented characters from most European languages. However, if you need support for some other alphabet, or if you want to modify Eyekit's default behavior, you can set an alternative alphabet here. This parameter is case sensitive, so you must supply upper- and lower-case variants. For example, if you wanted to treat apostrophes and hyphens as alphabetical, you might usealphabet="A-Za-z'-"
. This would allow, for example, "Where's the orang-utan?" to be treated as three words rather than five.
Ancestors
Static methods
def defaults(position: tuple = None, font_face: str = None, font_size: float = None, line_height: float = None, alphabet: str = None)
-
Set default
TextBlock
parameters. If you plan to create severalTextBlock
s with the same parameters, it may be useful to set the default parameters at the top of your script or at the start of your session:import eyekit eyekit.TextBlock.defaults(font_face='Helvetica') txt = eyekit.TextBlock('The quick brown fox') print(txt.font_face) # 'Helvetica'
Instance variables
var text : str
-
String representation of the text
var position : tuple
-
Position of the
TextBlock
var font_face : str
-
Name of the font
var font_size : float
-
Font size in points
var line_height : float
-
Line height in points
var alphabet : str
-
Characters that are considered alphabetical
var n_rows : int
-
Number of rows in the text (i.e. the number of lines)
var n_cols : int
-
Number of columns in the text (i.e. the number of characters in the widest line)
var line_positions : list
-
Y-coordinates of the center of each line of text
var word_centers : list
-
XY-coordinates of the center of each word
Methods
def zones(self)
-
Iterate over each marked up zone as an
InterestArea
. def which_zone(self, fixation)
-
Return the marked-up zone that the fixation falls inside as an
InterestArea
. def lines(self)
-
Iterate over each line as an
InterestArea
. def which_line(self, fixation)
-
Return the line that the fixation falls inside as an
InterestArea
. def words(self, pattern=None, line_n=None, add_padding=True)
-
Iterate over each word as an
InterestArea
. Optionally, you can supply a regex pattern to define what constitutes a word or to pick out specific words. For example,r'\b[Tt]he\b'
gives you all occurrences of the word the or'[a-z]+ing'
gives you all words ending with -ing.add_padding
adds half of the width of a space character to the left and right edges of the word's bounding box, so that fixations that fall on a space between two words will at least fall into one of the two words' bounding boxes. def which_word(self, fixation, pattern=None, line_n=None, add_padding=True)
-
Return the word that the fixation falls inside as an
InterestArea
. For the meaning ofpattern
andadd_padding
seeTextBlock.words()
. def characters(self, line_n=None, alphabetical_only=True)
-
Iterate over each character as an
InterestArea
. def which_character(self, fixation, line_n=None, alphabetical_only=True)
-
Return the character that the fixation falls inside as an
InterestArea
. def ngrams(self, n, line_n=None, alphabetical_only=True)
-
Iterate over each ngram, for given n, as an
InterestArea
.
Inherited members
-