lmcat
lmcat
A Python tool for concatenating files and directory structures into a single document, perfect for sharing code with language models. It respects .gitignore
and .lmignore
patterns and provides configurable output formatting.
Features
- Tree view of directory structure with file statistics (lines, characters, tokens)
- Includes file contents with clear delimiters
- Respects
.gitignore
patterns (can be disabled) - Supports custom ignore patterns via
.lmignore
- Configurable via
pyproject.toml
,lmcat.toml
, orlmcat.json
- you can specify
glob_process
ordecider_process
to run on files, like if you want to convert a notebook to a markdown file
- you can specify
Installation
Install from PyPI:
pip install lmcat
or, install with support for counting tokens:
pip install lmcat[tokenizers]
Usage
Basic usage - concatenate current directory:
# Only show directory tree
python -m lmcat --tree-only
# Write output to file
python -m lmcat --output summary.md
# Print current configuration
python -m lmcat --print-cfg
The output will include a directory tree and the contents of each non-ignored file.
Command Line Options
-t
,--tree-only
: Only print the directory tree, not file contents-o
,--output
: Specify an output file (defaults to stdout)-h
,--help
: Show help message
Configuration
lmcat is best configured via a tool.lmcat
section in pyproject.toml
:
[tool.lmcat]
# Tree formatting
tree_divider = "│ " # Vertical lines in tree
tree_indent = " " # Indentation
tree_file_divider = "├── " # File/directory entries
content_divider = "``````" # File content delimiters
# Processing pipeline
tokenizer = "gpt2" # or "whitespace-split"
tree_only = false # Only show tree structure
on_multiple_processors = "except" # Behavior when multiple processors match
# File handling
ignore_patterns = ["*.tmp", "*.log"] # Additional patterns to ignore
ignore_patterns_files = [".gitignore", ".lmignore"]
# processors
[tool.lmcat.glob_process]
"[mM]akefile" = "makefile_recipes"
"*.ipynb" = "ipynb_to_md"
Development
Setup
- Clone the repository:
git clone https://github.com/mivanit/lmcat
cd lmcat
- Set up the development environment:
make setup
Development Commands
The project uses make
for common development tasks:
make dep
: Install/update dependenciesmake format
: Format code using ruff and pyclnmake test
: Run testsmake typing
: Run type checksmake check
: Run all checks (format, test, typing)make clean
: Clean temporary filesmake docs
: Generate documentationmake build
: Build the packagemake publish
: Publish to PyPI (maintainers only)
Run make help
to see all available commands.
Running Tests
make test
For verbose output:
VERBOSE=1 make test
Roadmap
- more processors and deciders, like:
- only first
n
lines if file is too large - first few lines of a csv file
- json schema of a big json/toml/yaml file
- metadata extraction from images
- only first
- better tests, I feel like gitignore/lmignore interaction is broken
- llm summarization and caching of those summaries in
.lmsummary/
- reasonable defaults for file extensions to ignore
- web interface
1""" 2.. include:: ../README.md 3""" 4 5from lmcat.lmcat import main 6 7__all__ = [ 8 # funcs 9 "main", 10 # submodules 11 "lmcat", 12 "file_stats", 13 "processing_pipeline", 14 "processors", 15]
def
main() -> None:
388def main() -> None: 389 """Main entry point for the script""" 390 arg_parser = argparse.ArgumentParser( 391 description="lmcat - list tree and content, combining .gitignore + .lmignore", 392 add_help=False, 393 ) 394 arg_parser.add_argument( 395 "-t", 396 "--tree-only", 397 action="store_true", 398 default=False, 399 help="Only print the tree, not the file contents.", 400 ) 401 arg_parser.add_argument( 402 "-o", 403 "--output", 404 action="store", 405 default=None, 406 help="Output file to write the tree and contents to.", 407 ) 408 arg_parser.add_argument( 409 "-h", "--help", action="help", help="Show this help message and exit." 410 ) 411 arg_parser.add_argument( 412 "--print-cfg", 413 action="store_true", 414 default=False, 415 help="Print the configuration as json and exit.", 416 ) 417 arg_parser.add_argument( 418 "--allow-plugins", 419 action="store_true", 420 default=False, 421 help="Allow plugins to be loaded from the plugins file. WARNING: this will execute arbitrary code found in the file pointed to by `config.plugins_file`, and **is a security risk**.", 422 ) 423 424 args: argparse.Namespace = arg_parser.parse_known_args()[0] 425 root_dir: Path = Path(".").resolve() 426 config: LMCatConfig = LMCatConfig.read(root_dir) 427 428 # CLI overrides 429 config.output = args.output 430 config.tree_only = args.tree_only 431 config.allow_plugins = args.allow_plugins 432 433 # print cfg and exit if requested 434 if args.print_cfg: 435 print(json.dumps(config.serialize(), indent="\t")) 436 return 437 438 # assemble summary 439 summary: str = assemble_summary(root_dir=root_dir, config=config) 440 441 # Write output 442 if config.output: 443 output_path: Path = Path(args.output) 444 output_path.parent.mkdir(parents=True, exist_ok=True) 445 output_path.write_text(summary, encoding="utf-8") 446 else: 447 if sys.platform == "win32": 448 sys.stdout = io.TextIOWrapper( 449 sys.stdout.buffer, encoding="utf-8", errors="replace" 450 ) 451 sys.stderr = io.TextIOWrapper( 452 sys.stderr.buffer, encoding="utf-8", errors="replace" 453 ) 454 455 print(summary)
Main entry point for the script