docs for
lmcat
v0.1.2
A Python tool for concatenating files and directory structures into a
single document, perfect for sharing code with language models. It
respects .gitignore
and .lmignore
patterns and
provides configurable output formatting.
.gitignore
patterns (can be disabled).lmignore
pyproject.toml
,
lmcat.toml
, or lmcat.json
glob_process
or
decider_process
to run on files, like if you want to
convert a notebook to a markdown fileInstall from PyPI:
pip install lmcat
or, install with support for counting tokens:
pip install lmcat[tokenizers]
Basic usage - concatenate current directory:
# Only show directory tree
python -m lmcat --tree-only
# Write output to file
python -m lmcat --output summary.md
# Print current configuration
python -m lmcat --print-cfg
The output will include a directory tree and the contents of each non-ignored file.
-t
, --tree-only
: Only print the directory
tree, not file contents-o
, --output
: Specify an output file
(defaults to stdout)-h
, --help
: Show help messagelmcat is best configured via a tool.lmcat
section in
pyproject.toml
:
[tool.lmcat]
# Tree formatting
tree_divider = "│ " # Vertical lines in tree
tree_indent = " " # Indentation
tree_file_divider = "├── " # File/directory entries
content_divider = "``````" # File content delimiters
# Processing pipeline
tokenizer = "gpt2" # or "whitespace-split"
tree_only = false # Only show tree structure
on_multiple_processors = "except" # Behavior when multiple processors match
# File handling
ignore_patterns = ["*.tmp", "*.log"] # Additional patterns to ignore
ignore_patterns_files = [".gitignore", ".lmignore"]
# processors
[tool.lmcat.glob_process]
"[mM]akefile" = "makefile_recipes"
"*.ipynb" = "ipynb_to_md"
git clone https://github.com/mivanit/lmcat
cd lmcat
make setup
The project uses make
for common development tasks:
make dep
: Install/update dependenciesmake format
: Format code using ruff and pyclnmake test
: Run testsmake typing
: Run type checksmake check
: Run all checks (format, test, typing)make clean
: Clean temporary filesmake docs
: Generate documentationmake build
: Build the packagemake publish
: Publish to PyPI (maintainers only)Run make help
to see all available commands.
make test
For verbose output:
VERBOSE=1 make test
n
lines if file is too large.lmsummary/
lmcat
A Python tool for concatenating files and directory structures into a
single document, perfect for sharing code with language models. It
respects .gitignore
and .lmignore
patterns and
provides configurable output formatting.
.gitignore
patterns (can be disabled).lmignore
pyproject.toml
,
lmcat.toml
, or lmcat.json
glob_process
or
decider_process
to run on files, like if you want to
convert a notebook to a markdown fileInstall from PyPI:
pip install lmcat
or, install with support for counting tokens:
pip install lmcat[tokenizers]
Basic usage - concatenate current directory:
### Only show directory tree
python -m lmcat --tree-only
### Write output to file
python -m lmcat --output summary.md
### Print current configuration
python -m lmcat --print-cfg
The output will include a directory tree and the contents of each non-ignored file.
-t
, --tree-only
: Only print the directory
tree, not file contents-o
, --output
: Specify an output file
(defaults to stdout)-h
, --help
: Show help messagelmcat is best configured via a tool.lmcat
section in
pyproject.toml
:
[tool.lmcat]
### Tree formatting
tree_divider = "│ " # Vertical lines in tree
tree_indent = " " # Indentation
tree_file_divider = "├── " # File/directory entries
content_divider = "``````" # File content delimiters
### Processing pipeline
tokenizer = "gpt2" # or "whitespace-split"
tree_only = false # Only show tree structure
on_multiple_processors = "except" # Behavior when multiple processors match
### File handling
ignore_patterns = ["*.tmp", "*.log"] # Additional patterns to ignore
ignore_patterns_files = [".gitignore", ".lmignore"]
### processors
[tool.lmcat.glob_process]
"[mM]akefile" = "makefile_recipes"
"*.ipynb" = "ipynb_to_md"
git clone https://github.com/mivanit/lmcat
cd lmcat
make setup
The project uses make
for common development tasks:
make dep
: Install/update dependenciesmake format
: Format code using ruff and pyclnmake test
: Run testsmake typing
: Run type checksmake check
: Run all checks (format, test, typing)make clean
: Clean temporary filesmake docs
: Generate documentationmake build
: Build the packagemake publish
: Publish to PyPI (maintainers only)Run make help
to see all available commands.
make test
For verbose output:
VERBOSE=1 make test
n
lines if file is too large.lmsummary/
def main
-> None ()
Main entry point for the script
docs for
lmcat
v0.1.2
lmcat.file_stats
TOKENIZERS_PRESENT: bool = True
class TokenizerWrapper:
tokenizer wrapper. stores name and provides n_tokens
method.
uses splitting by whitespace as a fallback –
whitespace-split
TokenizerWrapper
str = 'whitespace-split') (name:
name: str
use_fallback: bool
tokenizer: Optional[tokenizers.Tokenizer]
def n_tokens
self, text: str) -> int (
Return number of tokens in text
class FileStats:
Statistics for a single file
FileStats
int, chars: int, tokens: Optional[int] = None) (lines:
lines: int
chars: int
tokens: Optional[int] = None
def from_file
(
cls,
path: pathlib.Path,
tokenizer: lmcat.file_stats.TokenizerWrapper-> lmcat.file_stats.FileStats )
Get statistics for a single file
path : Path
Path to the file to analyzetokenizer : Optional[tokenizers.Tokenizer]
Tokenizer to
use for counting tokens, if anyFileStats
Statistics for the fileclass TreeEntry(typing.NamedTuple):
Entry in the tree output with optional stats
TreeEntry
str, stats: Optional[lmcat.file_stats.FileStats] = None) (line:
Create new instance of TreeEntry(line, stats)
line: str
Alias for field number 0
stats: Optional[lmcat.file_stats.FileStats]
Alias for field number 1
docs for
lmcat
v0.1.2
LMCatConfig
IgnoreHandler
sorted_entries
walk_dir
format_tree_with_stats
walk_and_collect
assemble_summary
main
lmcat.lmcat
class LMCatConfig(muutils.json_serialize.serializable_dataclass.SerializableDataclass):
Configuration dataclass for lmcat
LMCatConfig
(*,
str = '``````',
content_divider: bool = False,
tree_only: list[str] = <factory>,
ignore_patterns: list[pathlib.Path] = <factory>,
ignore_patterns_files: | None = None,
plugins_file: pathlib.Path bool = False,
allow_plugins: dict[str, str] = <factory>,
glob_process: dict[str, str] = <factory>,
decider_process: 'warn', 'except', 'do_first', 'do_last', 'skip'] = 'except',
on_multiple_processors: Literal[str = 'gpt2',
tokenizer: str = '│ ',
tree_divider: str = '├── ',
tree_file_divider: str = ' ',
tree_indent: str | None = None
output: )
content_divider: str = '``````'
tree_only: bool = False
ignore_patterns: list[str]
ignore_patterns_files: list[pathlib.Path]
plugins_file: pathlib.Path | None = None
allow_plugins: bool = False
glob_process: dict[str, str]
decider_process: dict[str, str]
on_multiple_processors: Literal['warn', 'except', 'do_first', 'do_last', 'skip'] = 'except'
tokenizer: str = 'gpt2'
Tokenizer to use for tokenizing the output. gpt2
by
default. passed to tokenizers.Tokenizer.from_pretrained()
.
If specified and tokenizers
not installed, will throw
exception. fallback whitespace-split
used to avoid
exception when tokenizers
not installed.
tree_divider: str = '│ '
tree_file_divider: str = '├── '
tree_indent: str = ' '
output: str | None = None
def get_tokenizer_obj
self) -> lmcat.file_stats.TokenizerWrapper (
Get the tokenizer object
def get_processing_pipeline
self) -> lmcat.processing_pipeline.ProcessingPipeline (
Get the processing pipeline object
def read
-> lmcat.lmcat.LMCatConfig (cls, root_dir: pathlib.Path)
Attempt to read config from pyproject.toml, lmcat.toml, or lmcat.json.
def serialize
self) -> dict[str, typing.Any] (
returns the class as a dict, implemented by using
@serializable_dataclass
decorator
def load
dict[str, Any], ~T]) -> Type[~T] (cls, data: Union[
takes in an appropriately structured dict and returns an instance of
the class, implemented by using @serializable_dataclass
decorator
def validate_fields_types
(self: muutils.json_serialize.serializable_dataclass.SerializableDataclass,
= ErrorMode.Except
on_typecheck_error: muutils.errormode.ErrorMode -> bool )
validate the types of all the fields on a
SerializableDataclass
. calls
SerializableDataclass__validate_field_type
for each
field
class IgnoreHandler:
Handles all ignore pattern matching using igittigitt
IgnoreHandler
(root_dir: pathlib.Path, config: lmcat.lmcat.LMCatConfig)
root_dir: pathlib.Path
config: lmcat.lmcat.LMCatConfig
parser: igittigitt.igittigitt.IgnoreParser
def is_ignored
self, path: pathlib.Path) -> bool (
Check if a path should be ignored
def sorted_entries
-> list[pathlib.Path] (directory: pathlib.Path)
Return directory contents sorted: directories first, then files
def walk_dir
(
directory: pathlib.Path,
ignore_handler: lmcat.lmcat.IgnoreHandler,
config: lmcat.lmcat.LMCatConfig,
tokenizer: lmcat.file_stats.TokenizerWrapper,str = ''
prefix: -> tuple[list[lmcat.file_stats.TreeEntry], list[pathlib.Path]] )
Recursively walk a directory, building tree lines and collecting file paths
def format_tree_with_stats
(list[lmcat.file_stats.TreeEntry],
entries: bool = False
show_tokens: -> list[str] )
Format tree entries with aligned statistics
entries : list[TreeEntry]
List of tree entries with
optional statsshow_tokens : bool
Whether to show token countslist[str]
Formatted tree lines with aligned statsdef walk_and_collect
(
root_dir: pathlib.Path,
config: lmcat.lmcat.LMCatConfig-> tuple[list[str], list[pathlib.Path]] )
Walk filesystem from root_dir and gather tree listing plus file paths
def assemble_summary
-> str (root_dir: pathlib.Path, config: lmcat.lmcat.LMCatConfig)
Assemble the summary output and return
def main
-> None ()
Main entry point for the script
docs for
lmcat
v0.1.2
lmcat.processing_pipeline
OnMultipleProcessors = typing.Literal['warn', 'except', 'do_first', 'do_last', 'skip']
def load_plugins
-> None (plugins_file: pathlib.Path)
Load plugins from a Python file.
plugins_file : Path
Path to plugins fileclass ProcessingPipeline:
Manages the processing pipeline for files.
glob_process : dict[str, ProcessorName]
Maps glob
patterns to processor namesdecider_process : dict[DeciderName, ProcessorName]
Maps
decider names to processor names_compiled_globs : dict[str, re.Pattern]
Cached compiled
glob patterns for performanceProcessingPipeline
(| None,
plugins_file: pathlib.Path dict[str, str],
decider_process_keys: dict[str, str],
glob_process_keys: 'warn', 'except', 'do_first', 'do_last', 'skip']
on_multiple_processors: Literal[ )
plugins_file: pathlib.Path | None
decider_process_keys: dict[str, str]
glob_process_keys: dict[str, str]
on_multiple_processors: Literal['warn', 'except', 'do_first', 'do_last', 'skip']
def get_processors_for_path
self, path: pathlib.Path) -> list[typing.Callable[[pathlib.Path], str]] (
Get all applicable processors for a given path.
path : Path
Path to get processors forlist[ProcessorFunc]
List of applicable path
processorsdef process_file
self, path: pathlib.Path) -> tuple[str, str | None] (
Process a file through the pipeline.
path : Path
Path to process the content oftuple[str, str]
Processed content and the processor
name if no processor is found, will be
(path.read_text(), None)
docs for
lmcat
v0.1.2
ProcessorName
DeciderName
ProcessorFunc
DeciderFunc
PROCESSORS
DECIDERS
register_processor
register_decider
is_over_10kb
is_documentation
remove_comments
compress_whitespace
to_relative_path
ipynb_to_md
makefile_recipes
csv_preview_5_lines
lmcat.processors
ProcessorName = <class 'str'>
DeciderName = <class 'str'>
ProcessorFunc = typing.Callable[[pathlib.Path], str]
DeciderFunc = typing.Callable[[pathlib.Path], bool]
PROCESSORS: dict[str, typing.Callable[[pathlib.Path], str]] = {'remove_comments': <function remove_comments>, 'compress_whitespace': <function compress_whitespace>, 'to_relative_path': <function to_relative_path>, 'ipynb_to_md': <function ipynb_to_md>, 'makefile_recipes': <function makefile_recipes>, 'csv_preview_5_lines': <function csv_preview_5_lines>}
DECIDERS: dict[str, typing.Callable[[pathlib.Path], bool]] = {'is_over_10kb': <function is_over_10kb>, 'is_documentation': <function is_documentation>}
def register_processor
str]) -> Callable[[pathlib.Path], str] (func: Callable[[pathlib.Path],
Register a function as a path processor
def register_decider
bool]) -> Callable[[pathlib.Path], bool] (func: Callable[[pathlib.Path],
Register a function as a decider
def is_over_10kb
-> bool (path: pathlib.Path)
Check if file is over 10KB.
def is_documentation
-> bool (path: pathlib.Path)
Check if file is documentation.
def remove_comments
-> str (path: pathlib.Path)
Remove single-line comments from code.
def compress_whitespace
-> str (path: pathlib.Path)
Compress multiple whitespace characters into single spaces.
def to_relative_path
-> str (path: pathlib.Path)
return the path to the file as a string
def ipynb_to_md
-> str (path: pathlib.Path)
Convert an IPython notebook to markdown.
def makefile_recipes
-> str (path: pathlib.Path)
Process a Makefile to show only target descriptions and basic structure.
Preserves: - Comments above .PHONY targets up to first empty line - The .PHONY line and target line - First line after target if it starts with @echo
path : Path
Path to the Makefile to processstr
Processed Makefile contentdef csv_preview_5_lines
-> str (path: pathlib.Path)
Preview first few lines of a CSV file (up to 5)
Reads only first 1024 bytes and splits into lines. Does not attempt to parse CSV structure.
path : Path
Path to CSV filestr
First few lines of the file