docs for lmcat v0.1.2
View Source on GitHub

lmcat

lmcat

A Python tool for concatenating files and directory structures into a single document, perfect for sharing code with language models. It respects .gitignore and .lmignore patterns and provides configurable output formatting.

Features

  • Tree view of directory structure with file statistics (lines, characters, tokens)
  • Includes file contents with clear delimiters
  • Respects .gitignore patterns (can be disabled)
  • Supports custom ignore patterns via .lmignore
  • Configurable via pyproject.toml, lmcat.toml, or lmcat.json
    • you can specify glob_process or decider_process to run on files, like if you want to convert a notebook to a markdown file

Installation

Install from PyPI:

pip install lmcat

or, install with support for counting tokens:

pip install lmcat[tokenizers]

Usage

Basic usage - concatenate current directory:

# Only show directory tree
python -m lmcat --tree-only

# Write output to file
python -m lmcat --output summary.md

# Print current configuration
python -m lmcat --print-cfg

The output will include a directory tree and the contents of each non-ignored file.

Command Line Options

  • -t, --tree-only: Only print the directory tree, not file contents
  • -o, --output: Specify an output file (defaults to stdout)
  • -h, --help: Show help message

Configuration

lmcat is best configured via a tool.lmcat section in pyproject.toml:

[tool.lmcat]
# Tree formatting
tree_divider = "│   "    # Vertical lines in tree
tree_indent = " "        # Indentation
tree_file_divider = "├── "  # File/directory entries
content_divider = "``````"  # File content delimiters

# Processing pipeline
tokenizer = "gpt2"  # or "whitespace-split"
tree_only = false   # Only show tree structure
on_multiple_processors = "except"  # Behavior when multiple processors match

# File handling
ignore_patterns = ["*.tmp", "*.log"]  # Additional patterns to ignore
ignore_patterns_files = [".gitignore", ".lmignore"]

# processors
[tool.lmcat.glob_process]
"[mM]akefile" = "makefile_recipes"
"*.ipynb" = "ipynb_to_md"

Development

Setup

  1. Clone the repository:
git clone https://github.com/mivanit/lmcat
cd lmcat
  1. Set up the development environment:
make setup

Development Commands

The project uses make for common development tasks:

  • make dep: Install/update dependencies
  • make format: Format code using ruff and pycln
  • make test: Run tests
  • make typing: Run type checks
  • make check: Run all checks (format, test, typing)
  • make clean: Clean temporary files
  • make docs: Generate documentation
  • make build: Build the package
  • make publish: Publish to PyPI (maintainers only)

Run make help to see all available commands.

Running Tests

make test

For verbose output:

VERBOSE=1 make test

Roadmap

  • more processors and deciders, like:
    • only first n lines if file is too large
    • first few lines of a csv file
    • json schema of a big json/toml/yaml file
    • metadata extraction from images
  • better tests, I feel like gitignore/lmignore interaction is broken
  • llm summarization and caching of those summaries in .lmsummary/
  • reasonable defaults for file extensions to ignore
  • web interface

 1"""
 2.. include:: ../README.md
 3"""
 4
 5from lmcat.lmcat import main
 6
 7__all__ = [
 8	# funcs
 9	"main",
10	# submodules
11	"lmcat",
12	"file_stats",
13	"processing_pipeline",
14	"processors",
15]

def main() -> None:
388def main() -> None:
389	"""Main entry point for the script"""
390	arg_parser = argparse.ArgumentParser(
391		description="lmcat - list tree and content, combining .gitignore + .lmignore",
392		add_help=False,
393	)
394	arg_parser.add_argument(
395		"-t",
396		"--tree-only",
397		action="store_true",
398		default=False,
399		help="Only print the tree, not the file contents.",
400	)
401	arg_parser.add_argument(
402		"-o",
403		"--output",
404		action="store",
405		default=None,
406		help="Output file to write the tree and contents to.",
407	)
408	arg_parser.add_argument(
409		"-h", "--help", action="help", help="Show this help message and exit."
410	)
411	arg_parser.add_argument(
412		"--print-cfg",
413		action="store_true",
414		default=False,
415		help="Print the configuration as json and exit.",
416	)
417	arg_parser.add_argument(
418		"--allow-plugins",
419		action="store_true",
420		default=False,
421		help="Allow plugins to be loaded from the plugins file. WARNING: this will execute arbitrary code found in the file pointed to by `config.plugins_file`, and **is a security risk**.",
422	)
423
424	args: argparse.Namespace = arg_parser.parse_known_args()[0]
425	root_dir: Path = Path(".").resolve()
426	config: LMCatConfig = LMCatConfig.read(root_dir)
427
428	# CLI overrides
429	config.output = args.output
430	config.tree_only = args.tree_only
431	config.allow_plugins = args.allow_plugins
432
433	# print cfg and exit if requested
434	if args.print_cfg:
435		print(json.dumps(config.serialize(), indent="\t"))
436		return
437
438	# assemble summary
439	summary: str = assemble_summary(root_dir=root_dir, config=config)
440
441	# Write output
442	if config.output:
443		output_path: Path = Path(args.output)
444		output_path.parent.mkdir(parents=True, exist_ok=True)
445		output_path.write_text(summary, encoding="utf-8")
446	else:
447		if sys.platform == "win32":
448			sys.stdout = io.TextIOWrapper(
449				sys.stdout.buffer, encoding="utf-8", errors="replace"
450			)
451			sys.stderr = io.TextIOWrapper(
452				sys.stderr.buffer, encoding="utf-8", errors="replace"
453			)
454
455		print(summary)

Main entry point for the script