Composable API

LAParams

class pdfminer.layout.LAParams(line_overlap=0.5, char_margin=2.0, line_margin=0.5, word_margin=0.1, boxes_flow=0.5, detect_vertical=False, all_texts=False)

Parameters for layout analysis

Parameters
  • line_overlap – If two characters have more overlap than this they are considered to be on the same line. The overlap is specified relative to the minimum height of both characters.

  • char_margin – If two characters are closer together than this margin they are considered to be part of the same word. If characters are on the same line but not part of the same word, an intermediate space is inserted. The margin is specified relative to the width of the character.

  • word_margin – If two words are are closer together than this margin they are considered to be part of the same line. A space is added in between for readability. The margin is specified relative to the width of the word.

  • line_margin – If two lines are are close together they are considered to be part of the same paragraph. The margin is specified relative to the height of a line.

  • boxes_flow – Specifies how much a horizontal and vertical position of a text matters when determining the order of text boxes. The value should be within the range of -1.0 (only horizontal position matters) to +1.0 (only vertical position matters).

  • detect_vertical – If vertical text should be considered during layout analysis

  • all_texts – If layout analysis should be performed on text in figures.

Todo:

  • PDFDevice
    • TextConverter

    • PDFPageAggregator

  • PDFPageInterpreter