Name | Chatterlang Name | Description |
---|---|---|
talkpipe.app.apiendpoint | ||
JSONReceiverSegment | jsonReceiver | Segment for receiving JSON data via FastAPI with configurable form |
talkpipe.chatterlang.compiler | ||
Accum | accum | Accumulates items from the input stream both in an internal buffer and in the specified variable. |
Snippet | snippet | A segment that loads a chatterlang script from a file and compiles it, after which it |
talkpipe.data.email | ||
readEmail | readEmail | A source that monitors an email inbox and yields new unread emails. |
sendEmail | sendEmail | Send emails for each item in the input iterable using SMTP. |
talkpipe.data.extraction | ||
FileExtractor | extract | A class for extracting text content from different file types. |
readdocx | readdocx | Read and extract text from Microsoft Word (.docx) files. |
readtxt | readtxt | Reads text files from given file paths and yields their contents. |
talkpipe.data.html | ||
downloadURLSegment | downloadURL | Download a URL segment and return its content. |
htmlToTextSegment | htmlToText | Converts HTML content to text segment. |
talkpipe.data.mongo | ||
MongoInsert | mongoInsert | Insert items from the input stream into a MongoDB collection. |
MongoSearch | mongoSearch | Search a MongoDB collection and yield results. |
talkpipe.data.rss | ||
rss_source | rss | Generator function that monitors and yields new entries from an RSS feed. |
talkpipe.llm.chat | ||
LlmExtractTerms | llmExtractTerms | For each piece of text read from the input stream, extract terms from the text. |
LLMPrompt | llmPrompt | Interactive, optionally multi-turn, chat with an llm. |
LlmScore | llmScore | For each piece of text read from the input stream, compute a score and an explanation for that score. |
talkpipe.llm.embedding | ||
LLMEmbed | llmEmbed | Read strings from the input stream and emit an embedding for each string using a language model. |
talkpipe.operations.filtering | ||
distinctBloomFilter | distinctBloomFilter | Filter items using a Bloom Filter to yield only distinct elements based on specified fields. |
talkpipe.operations.matrices | ||
ReduceTSNE | reduceTSNE | Use t-SNE to reduce dimensionality of provided matrix. |
ReduceUMAP | reduceUMAP | Use UMAP to reduce dimensionality of provided matrix. |
talkpipe.operations.signatures | ||
SignSegment | sign | Sign items using a private key. |
VerifySegment | verify | Verify signatures on items using a public key. |
talkpipe.operations.thread_ops | ||
threadedSegment | threaded | Links the input stream to a threaded queue system. |
talkpipe.operations.transforms | ||
fill_null | fillNull | Fills null (None) values in a sequence of dictionaries with specified defaults. |
MakeLists | makeLists | |
regex_replace | regexReplace | Transform items by applying regex pattern replacement. |
talkpipe.pipe.basic | ||
appendAs | appendAs | Appends the specified fields to the input item. |
call_func | call_func | Call a function on each item in the input stream. |
Cast | cast | Casts the input data to a specified type. |
concat | concat | Concatenates specified fields from each item with a delimiter. |
ConfigureLogger | configureLogger | Configures loggers based on the provided logger levels and files. |
DescribeData | describe | Returns a dictionary of all attributes of the input data. |
EvalExpression | lambda | Evaluate a Python expression on each item in the input stream. |
everyN | everyN | Yields every nth item from the input stream. |
exec | exec | Execute a command and yields each line passed to stdout as an item. |
fillTemplate | fillTemplate | Fill a template string with values from the input item. |
FilterExpression | lambdaFilter | Filter items from the input stream based on a Python expression. |
First | firstN | Passes on the first N items from the input stream. |
flatten | flatten | Flattens a nested list of items. |
Hash | hash | Hashes the input data using the specified algorithm. |
isIn | isIn | Filters items based on whether a field contains a specified value. |
isNotIn | isNotIn | Filters items based on whether a field does not contain a specified value. |
slice | slice | Slices a sequence using start and end indices. |
ToDataFrame | toDataFrame | Drain all items from the input stream and emit a single DataFrame. |
ToDict | toDict | Creates a dictionary from the input data. |
ToList | toList | Drains the input stream and emits a list of all items. |
talkpipe.pipe.io | ||
dumpsJsonl | dumpsJsonl | Drains the input stream and dumps each item as a jsonl string. |
echo | echo | A source that generates input from a string. |
loadsJsonl | loadsJsonl | Reads each item from the input stream, interpreting it as a jsonl string. |
Log | log | An operation that logs each item from the input stream. |
An operation prints and passes on each item from the input stream. | ||
Prompt | prompt | A source that generates input from a prompt. |
readJsonl | readJsonl | Reads each item from the input stream as a path to a jsonl file. Loads each line of |
writePickle | writePickle | Drains the input stream into a list and then writes the list as a pickle file. |
talkpipe.pipe.math | ||
arange | range | Generate a range of integers between lower (inclusive) and upper (exclusive) |
eq | eq | Filter items where a specified field's value equals a number. |
gt | gt | Filter items where a specified field's value is greater than a number. |
gte | gte | Filter items where a specified field's value is greater than or equal to a number. |
lt | lt | Filters items based on a field value being less than a specified number. |
lte | lte | Filter items where a specified field's value is less than or equal to a number. |
neq | neq | Filter items where a specified field's value does not equal a number. |
randomInts | randomInts | Generate n random integers between lower and upper. |
scale | scale | Scale each item in the input stream by the multiplier. |
Chatterlang Name: jsonReceiver
Segment for receiving JSON data via FastAPI with configurable form
Base Classes: AbstractSource
Chatterlang Name: accum
Accumulates items from the input stream both in an internal buffer and in the specified variable. This is useful for accumulating the results of running the pipeline multiple times. Args: variable (Union[VariableName, str], optional): The name of the variable to store the accumulated data in. Defaults to None. reset (bool, optional): Whether to reset the accumulator each time the segment is run. Defaults to True.
Base Classes: io.AbstractSegment
Chatterlang Name: snippet
A segment that loads a chatterlang script from a file and compiles it, after which it functions as a normal segment that can be integrated into a pipeline. Args: file (str): The path to the chatterlang script file. runtime (RuntimeComponent, optional): The runtime component to use. Defaults to None.
Base Classes: io.AbstractSegment
Chatterlang Name: readEmail
A source that monitors an email inbox and yields new unread emails. This source periodically checks for new unread emails, marks them as read, and yields their content and metadata. It connects using IMAP and can be configured to poll at specific intervals. Args: poll_interval_minutes (int, optional): Minutes between email checks. Defaults to 10. folder (str, optional): Mailbox folder to check. Defaults to 'INBOX'. mark_as_read (bool, optional): Whether to mark emails as read. Defaults to True. limit (int, optional): Maximum number of emails to fetch per check. Defaults to 100. if -1, fetch all. imap_server (str, optional): IMAP server address. If None, uses config. email_address (str, optional): Email address. If None, uses config. password (str, optional): Password. If None, uses config. Yields: dict: Email metadata and content including: - message_id: Unique message ID - subject: Email subject - from: Sender address - to: Recipient address(es) - cc: CC address(es) - date: Datetime object of when email was sent - date_str: Date string from email header - plain_text: Plain text content if available - html_content: HTML content if available - headers: Dictionary of all email headers - raw_email: Full raw email content
Chatterlang Name: sendEmail
Send emails for each item in the input iterable using SMTP. This function processes a list of items and sends an email for each one, using the specified fields for subject and body content. It supports both HTML and plain text email formats. Args: subject_field (str): Field name in the item to use as email subject body_fields (list[str]): List of field names to include in email body sender_email (str, optional): Sender's email address. If None, uses config value recipient_email (str, optional): Recipient's email address. If None, uses config value smtp_server (str, optional): SMTP server address. Defaults to 'smtp.gmail.com' port (int, optional): SMTP server port. Defaults to 587 Yields: item: Returns each processed item after sending its corresponding email Raises: AssertionError: If subject_field or body_fields are None ValueError: If required fields are missing in items Example: >>> items = [{'title': 'Hello', 'content': 'World'}] >>> for item in sendEmail(items, 'title', ['content'], 'sender@email.com', 'recipient@email.com'): ... print(f"Processed {item}") Notes: - Requires valid SMTP credentials in config - Supports HTML formatting in email body - Uses TLS encryption for email transmission
Chatterlang Name: extract
A class for extracting text content from different file types. This class implements the AbstractSegment interface and provides functionality to extract text content from various file formats using registered extractors. It supports multiple file formats and can be extended with additional extractors. Attributes: _extractors (dict): A dictionary mapping file extensions to their corresponding extractor functions. Methods: register_extractor(file_extension: str, extractor): Register a new file extractor for a specific extension. extract(file_path: Union[str, PosixPath]): Extract content from a single file. transform(input_iter): Transform an iterator of file paths into an iterator of their contents. Example: >>> extractor = FileExtractor() >>> content = extractor.extract("document.txt") >>> for text in extractor.transform(["file1.txt", "file2.docx"]): ... print(text) Raises: Exception: When trying to extract content from a file with an unsupported extension.
Base Classes: AbstractSegment
Chatterlang Name: readdocx
Read and extract text from Microsoft Word (.docx) files. This function takes an iterable of file paths to .docx documents and yields the extracted text content from each document, with paragraphs joined by spaces. Yields: str: The full text content of each document with paragraphs joined by spaces Raises: Exception: If there are issues reading the .docx files Example: >>> paths = ['doc1.docx', 'doc2.docx'] >>> for text in readdocx(paths): ... print(text)
Chatterlang Name: readtxt
Reads text files from given file paths and yields their contents. Args: file_paths (Iterable[str]): An iterable containing paths to text files to be read. Yields: str: The contents of each text file. Raises: FileNotFoundError: If a file path does not exist. IOError: If there is an error reading any of the files. Example: >>> files = ['file1.txt', 'file2.txt'] >>> for content in readtxt(files): ... print(content)
Chatterlang Name: downloadURL
Download a URL segment and return its content. This function is a wrapper around downloadURL that specifically handles URL segments. It attempts to download content from the specified URL with configurable error handling and timeout settings. Args: fail_on_error (bool, optional): If True, raises exceptions on download errors. If False, returns None on errors. Defaults to True. timeout (int, optional): The timeout in seconds for the download request. Defaults to 10 seconds. Returns: bytes|None: The downloaded content as bytes if successful, None if fail_on_error is False and an error occurs. Raises: Various exceptions from downloadURL function when fail_on_error is True and an error occurs during download.
Chatterlang Name: htmlToText
Converts HTML content to text segment. This function takes HTML content and converts it to plain text format. If cleanText is enabled, the resulting text will also be cleaned so it tries to retain only the main body content. Args: raw (str): The raw HTML content to be converted cleanText (bool, optional): Whether to clean and normalize the output text. Defaults to True. field (str): The field name to be used for the segment. If None, assuming the incoming item is html. append_as (str): The name of the field to append the text to. If None, just pass on the cleaned text. Returns: str: The extracted text content from the HTML See Also: htmlToText: The underlying function used for HTML to text conversion
Chatterlang Name: mongoInsert
Insert items from the input stream into a MongoDB collection. For each item received, this segment inserts it into the specified MongoDB collection and then yields the item back to the pipeline. This allows for both persisting data and continuing to process it in subsequent pipeline stages. Args: connection_string (str, optional): MongoDB connection string. If not provided, will attempt to get from config using the key "mongo_connection_string". database (str): Name of the MongoDB database to use. collection (str): Name of the MongoDB collection to use. field (str, optional): Field to extract from each item for insertion. If not provided, inserts the entire item. Default is "_". fields (str, optional): Comma-separated list of fields to extract and include in the document, in the format "field1:name1,field2:name2". If provided, this creates a new document with the specified fields. Cannot be used with 'field' parameter. append_as (str, optional): If provided, adds the MongoDB insertion result to the item using this field name. Default is None. create_index (str, optional): If provided, creates an index on this field. Default is None. unique_index (bool, optional): If True and create_index is provided, creates a unique index. Default is False.
Base Classes: core.AbstractSegment
Chatterlang Name: mongoSearch
Search a MongoDB collection and yield results. This segment performs a query against a MongoDB collection and yields the matching documents one by one as they are returned from the database. Args: field(str): the field in the incoming item to use as a query. Defaults is "_" connection_string (str, optional): MongoDB connection string. If not provided, will attempt to get from config using the key "mongo_connection_string". database (str): Name of the MongoDB database to use. collection (str): Name of the MongoDB collection to use. project (str, optional): JSON string defining the projection for returned documents. Default is None (returns all fields). sort (str, optional): JSON string defining the sort order. Default is None. limit (int, optional): Maximum number of results to return per query. Default is 0 (no limit). skip (int, optional): Number of documents to skip. Default is 0. append_as (str, optional): If provided, adds the MongoDB results to the incoming item using this field name. If not provided, the results themselves are yielded. as_list (bool, optional): If True and append_as is provided, all results are collected into a list and appended to the incoming item. Default is False.
Base Classes: core.AbstractSegment
Chatterlang Name: rss
Generator function that monitors and yields new entries from an RSS feed. This function continuously monitors an RSS feed at the specified URL and yields new entries as they become available. It uses a SQLite database to keep track of previously seen entries to avoid duplicates. Args: url (str): The URL of the RSS feed to monitor. If None, the URL is read from the config using the key "RSS_URL" db_path (str, optional): Path to the SQLite database file for storing entry history. Defaults to ':memory:' for an in-memory database. poll_interval_minutes (int, optional): Number of minutes to wait between polling the RSS feed for updates. Defaults to 10 minutes. Yields: dict: New entries from the RSS feed, containing feed item data. Example: >>> for entry in rss_source("http://example.com/feed.xml"): ... print(entry["title"])
Chatterlang Name: llmExtractTerms
For each piece of text read from the input stream, extract terms from the text. The system prompt must be provided and should explain the nature of the terms. For example, a system_prompt might be:Extract keywords from the following text.See the LLMPrompt segment for more information on the other arguments.
Base Classes: AbstractLLMGuidedGeneration
Chatterlang Name: llmPrompt
Interactive, optionally multi-turn, chat with an llm. Reads prompts from the input stream and emits responses from the llm. The model name and source can be specified in three different ways. If explicitly included in the constructor, those values will be used. If not, the values will be loaded from environment variables (TALKPIPE_default_model_name and TALKPIPE_default_source). If those are not set, the values will be loaded from the configuration file (~/.talkpipe.toml). If none of those are set, an error will be raised. Args: name (str, optional): The name of the model to chat with. Defaults to None. source (ModelSource, optional): The source of the model. Defaults to None. Valid values are "openai" and "ollama." system_prompt (str, optional): The system prompt for the model. Defaults to "You are a helpful assistant.". multi_turn (bool, optional): Whether the chat is multi-turn. Defaults to True. pass_prompts (bool, optional): Whether to pass the prompts through to the output. Defaults to False. field (str, optional): The field in the input item containing the prompt. Defaults to None. append_as (str, optional): The field to append the response to. Defaults to None. temperature (float, optional): The temperature to use for the model. Defaults to 0.5. output_format (BaseModel, optional): A class used for guided generation. Defaults to None.
Base Classes: AbstractSegment
Chatterlang Name: llmScore
For each piece of text read from the input stream, compute a score and an explanation for that score. The system prompt must be provided and should explain the range of the score (which must be a range of integers) and the meaning of the score. For example, a system_prompt might be:Score the following text according to how relevant it is to canines, where 0 mean unrelated and 10 means highly related.See the LLMPrompt segment for more information on the other arguments.
Base Classes: AbstractLLMGuidedGeneration
Chatterlang Name: llmEmbed
Read strings from the input stream and emit an embedding for each string using a language model. This segment creates vector embeddings from text using the specified embedding model. It can extract text from a specific field in structured data or process the input directly. Attributes: embedder: The embedding adapter instance that performs the actual embedding. field: Optional field name to extract text from structured input. append_as: Optional field name to append embeddings to the original item.
Base Classes: AbstractSegment
Chatterlang Name: distinctBloomFilter
Filter items using a Bloom Filter to yield only distinct elements based on specified fields. A Bloom Filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not. Args: items (iterable): Input items to filter. capacity (int): Expected number of items to be added to the Bloom Filter. error_rate (float): Acceptable false positive probability (between 0 and 1). field_list (str, optional): Dot-separated string of nested fields to use for distinctness check. Defaults to "_" which uses the entire item. Yields: item: Items that have not been seen before according to the Bloom Filter. Example: >>> items = [{"id": 1, "name": "John"}, {"id": 2, "name": "John"}] >>> list(distinctBloomFilter(items, 1000, 0.01, "name")) [{'id': 1, 'name': 'John'}] # Only first item with name "John" is yielded Note: Due to the probabilistic nature of Bloom Filters, there is a small chance of false positives (items incorrectly identified as duplicates) based on the specified error_rate.
Chatterlang Name: reduceTSNE
Use t-SNE to reduce dimensionality of provided matrix. This segment reduces the dimensionality of the provided matrix using t-SNE (t-Distributed Stochastic Neighbor Embedding). Parameters: n_components: The dimension of the space to embed into. Default is 2. perplexity: The perplexity is related to the number of nearest neighbors used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Default is 30. early_exaggeration: Controls how tight natural clusters in the original space are in the embedded space. Default is 12.0. learning_rate: The learning rate for t-SNE. Default is 200.0. max_iter: Maximum number of iterations for the optimization. Default is 1000. metric: Distance metric for t-SNE. Default is 'euclidean'. random_state: Random state for reproducibility. **tsne_kwargs: Additional keyword arguments to pass to TSNE.
Base Classes: AbstractSegment
Chatterlang Name: reduceUMAP
Use UMAP to reduce dimensionality of provided matrix. This segment reduces the dimensionality of the provided matrix using UMAP. Parameters: n_components: The dimension of the space to embed into. Default is 2. n_neighbors: Size of local neighborhood. Default is 15. min_dist: Minimum distance between embedded points. Default is 0.1. metric: Distance metric for UMAP. Default is 'euclidean'. random_state: Random state for reproducibility. **umap_kwargs: Additional keyword arguments to pass to UMAP.
Base Classes: AbstractSegment
Chatterlang Name: sign
Sign items using a private key. This segment signs each item in the input stream using RSA-PSS with SHA-256.
Base Classes: core.AbstractSegment
Chatterlang Name: verify
Verify signatures on items using a public key. This segment verifies the signature on each item in the input stream using RSA-PSS with SHA-256.
Base Classes: core.AbstractSegment
Chatterlang Name: threaded
Links the input stream to a threaded queue system. This segment takes an input stream and links it to a threaded queue system. It starts the queue system and then starts yielding from the queue. That way the upstream units don't have to wait for the downstream segments to draw from them.
Chatterlang Name: fillNull
Fills null (None) values in a sequence of dictionaries with specified defaults. This generator function processes dictionaries by replacing None values with either a general default value or specific values for named fields. Args: items: An iterable of dictionaries to process. default (str, optional): The default value to use for any None values not specified in kwargs. Defaults to ''. **kwargs: Field-specific default values. Each keyword argument specifies a field name and the default value to use for that field. Yields: dict: The processed dictionary with None values replaced by defaults. Raises: AssertionError: If any item in the input is not a dictionary. TypeError: If any item doesn't support item assignment using square brackets. Examples: >>> data = [{'a': None, 'b': 1}, {'a': 2, 'b': None}] >>> list(fill_null(data, default='N/A')) [{'a': 'N/A', 'b': 1}, {'a': 2, 'b': 'N/A'}] >>> list(fill_null(data, b='EMPTY')) [{'a': None, 'b': 1}, {'a': 2, 'b': 'EMPTY'}]
Chatterlang Name: makeLists
Base Classes: AbstractSegment
Chatterlang Name: regexReplace
Transform items by applying regex pattern replacement. This segment transforms items by applying a regex pattern replacement to either the entire item (if field="_") or a specific field of the item. Args: items (Iterable): Input items to transform. pattern (str): Regular expression pattern to match. replacement (str): Replacement string for matched patterns. field (str, optional): Field to apply transformation to. Use "_" for entire item. Defaults to "_". Yields: Union[str, dict]: Transformed items. Returns string if field="_", otherwise returns modified item dict. Raises: TypeError: If extracted value is not a string or if item is not subscriptable when field != "_". Examples: >>> list(regex_replace(["hello world"], r"world", "everyone")) ['hello everyone'] >>> list(regex_replace([{"text": "hello world"}], r"world", "everyone", field="text")) [{'text': 'hello everyone'}]
Chatterlang Name: appendAs
Appends the specified fields to the input item. Equivalent to toDict except that that item is modified with the new key/value pairs rather than a new dictionary returned. Assumes that the input item can has items assigned using bracket notation ([]).
Chatterlang Name: call_func
Call a function on each item in the input stream. Args: func (callable): The function to call on each item
Chatterlang Name: cast
Casts the input data to a specified type. The type can be specified by passing a type object or a string representation of the type. The cast will optionally fail silently if the data cannot be cast to the specified type. This lets this segment also be used as a filter to remove data that cannot be cast. The cast occurs by calling the type object on the data.
Base Classes: AbstractSegment
Chatterlang Name: concat
Concatenates specified fields from each item with a delimiter. Args: items: Iterable of input items to process fields: String specifying fields to extract and concatenate delimiter (str, optional): String to insert between concatenated fields. Defaults to " " append_as (str, optional): If specified, adds concatenated result as new field with this name. Defaults to None. Yields: If append_as is specified, yields the original item with concatenated result added as new field. Otherwise, yields just the concatenated string.
Chatterlang Name: configureLogger
Configures loggers based on the provided logger levels and files. This segment configures loggers based on the provided logger levels and files. The logger levels are specified as a string in the format "logger:level,logger:level,...". The logger files are specified as a string in the format "logger:file,logger:file,...". It configures when the script is compiled or the object is instantiated and never again after that. It passes the input data through unchanged. Args: logger_levels (str): Logger levels in format 'logger:level,logger:level,...' logger_files (str): Logger files in format 'logger:file,logger:file,...'
Base Classes: AbstractSegment
Chatterlang Name: describe
Returns a dictionary of all attributes of the input data. This is useful mostly for debugging and understanding the structure of the data.
Base Classes: AbstractSegment
Chatterlang Name: lambda
Evaluate a Python expression on each item in the input stream. This segment pre-compiles the expression during initialization for efficiency and then applies it to each item during transformation. Expressions are evaluated in a restricted environment for security. The item is available in expressions as 'item'. If the item is a dictionary, its fields can be accessed directly as variables in the expression. Args: expression: The Python expression to evaluate field: If provided, extract this field from each item before evaluating append_as: If provided, append the result to each item under this field name fail_on_error: If True, raises exceptions when evaluation fails. If False, logs errors and returns None
Base Classes: AbstractSegment
Chatterlang Name: everyN
Yields every nth item from the input stream. Args: items: Iterable of items to process n: Number of items to skip between each yield Yields: Every nth item from the input stream.
Chatterlang Name: exec
Execute a command and yields each line passed to stdout as an item.
Chatterlang Name: fillTemplate
Fill a template string with values from the input item. Args: item: The input item containing values to fill the template template (str): The template string with placeholders for values Returns: str: The filled template string
Chatterlang Name: lambdaFilter
Filter items from the input stream based on a Python expression. This segment pre-compiles the expression during initialization for efficiency and then applies it to each item during transformation. Expressions are evaluated in a restricted environment for security. The item is available in expressions as 'item'. If the item is a dictionary, its fields can be accessed directly as variables in the expression. Args: expression: The Python expression to evaluate field: If provided, extract this field from each item before evaluating fail_on_error: If True, raises exceptions when evaluation fails. If False, logs errors and returns None
Base Classes: AbstractSegment
Chatterlang Name: firstN
Passes on the first N items from the input stream.Args: n (int): The number of items to pass on. Default is 1.
Chatterlang Name: flatten
Flattens a nested list of items. Args: items: Iterable of items to flatten Yields: Flattened list of items
Chatterlang Name: hash
Hashes the input data using the specified algorithm. This segment hashes the input data using the specified algorithm. Strings will be encoded and hashed. All other datatypes wil be hashed using either pickle or repr(). Args: algorithm (str): Hash algorithm to use. Options include SHA1, SHA224, SHA256, SHA384, SHA512, SHA-3, and MD5. use_repr (bool): If True, the repr() version of the input data is hashed. If False, the input data is hashed via pickling. Using repr() will handle all object, even those that can't be pickled and won't be subject to changes in pickling formats. But the pickled version will include more state and generally be more reliable.
Base Classes: AbstractSegment
Chatterlang Name: isIn
Filters items based on whether a field contains a specified value. Args: items: Iterable of items to filter field: Field name to check for value value: Value to check for in the field Yields: Items where the specified field contains the specified value.
Chatterlang Name: isNotIn
Filters items based on whether a field does not contain a specified value. Args: field: Field name to check for value value: Value to check for in the field Yields: Items where the specified field does not contain the specified value.
Chatterlang Name: slice
Slices a sequence using start and end indices. This function takes a sequence and a range string in the format "start:end" to slice the sequence. Both start and end indices are optional. Args: item: Any sequence that supports slicing (e.g., list, string, tuple) range (str, optional): String in format "start:end" where both start and end are optional. For example: "2:5", ":3", "4:", ":" are all valid. Defaults to None. Returns: The sliced sequence containing elements from start to end index. If range is None, returns a full copy of the sequence. Examples: >>> slice([1,2,3,4,5], "1:3") [2, 3] >>> slice("hello", ":3") "hel" >>> slice([1,2,3,4,5], "2:") [3, 4, 5]
Chatterlang Name: toDataFrame
Drain all items from the input stream and emit a single DataFrame. The input data stream should be composed of dictionaries, where each dictionary represents a row in the DataFrame.
Base Classes: AbstractSegment
Chatterlang Name: toDict
Creates a dictionary from the input data.
Base Classes: AbstractSegment
Chatterlang Name: toList
Drains the input stream and emits a list of all items.
Base Classes: AbstractSegment
Chatterlang Name: dumpsJsonl
Drains the input stream and dumps each item as a jsonl string.
Chatterlang Name: echo
A source that generates input from a string. This source will generate input from a string, splitting it on a delimiter.
Chatterlang Name: loadsJsonl
Reads each item from the input stream, interpreting it as a jsonl string.
Chatterlang Name: log
An operation that logs each item from the input stream.
Base Classes: AbstractSegment
Chatterlang Name: print
An operation prints and passes on each item from the input stream.
Base Classes: AbstractSegment
Chatterlang Name: prompt
A source that generates input from a prompt. This source will generate input from a prompt until the user enters an EOF. It is for creating interactive pipelines. It uses prompt_toolkit under the hood to provide a nice prompt experience.
Base Classes: AbstractSource
Chatterlang Name: readJsonl
Reads each item from the input stream as a path to a jsonl file. Loads each line of each file as a json object and yields each individually.
Chatterlang Name: writePickle
Drains the input stream into a list and then writes the list as a pickle file. Args: fname (str): The name of the file to write. first_only (bool): If True, the segment will write only the first item in the input stream, throwing an exception if there is more than one. If False, the segment will write the entire input stream.
Chatterlang Name: range
Generate a range of integers between lower (inclusive) and upper (exclusive) This segment wraps the built-in range function, allowing you to specify the lower and upper bounds of the range. The range is inclusive of the lower bound and exclusive of the upper bound. Args: lower (int): Lower bound of the range (inclusive) upper (int): Upper bound of the range (exclusive)
Chatterlang Name: eq
Filter items where a specified field's value equals a number. For each item passed in, this segment yields only those where the value of the specified field is equal to the given number n. Args: items: Iterable of items to filter field: String representing the field/property to compare. Note that an underscore "_" can be used to refer to the item itself. n: Item to compare against Yields: Items where the specified field's value equals n Raises: AttributeError: If the specified field is missing from any item
Base Classes: AbstractComparisonFilter
Chatterlang Name: gt
Filter items where a specified field's value is greater than a number. For each item passed in, this segment yields only those where the value of the specified field is greater than the given number n. Args: items: Iterable of items to filter field: String representing the field/property to compare. Note that an underscore "_" can be used to refer to the item itself. n: Number to compare against Yields: Items where the specified field's value is greater than n Raises: AttributeError: If the specified field is missing from any item
Base Classes: AbstractComparisonFilter
Chatterlang Name: gte
Filter items where a specified field's value is greater than or equal to a number. For each item passed in, this segment yields only those where the value of the specified field is greater than or equal to the given number n. Args: items: Iterable of items to filter field: String representing the field/property to compare. Note that an underscore "_" can be used to refer to the item itself. n: Number to compare against Yields: Items where the specified field's value is greater than or equal to n Raises: AttributeError: If the specified field is missing from any item
Base Classes: AbstractComparisonFilter
Chatterlang Name: lt
Filters items based on a field value being less than a specified number. For each item passed in, this segment yields items where the specified field value is less than the given number n. Args: items (iterable): An iterable of items to filter field: String representing the field/property to compare. Note that an underscore "_" can be used to refer to the item itself. n (numeric): The number to compare against Yields: item: Items where the specified field value is less than n Raises: AttributeError: If the specified field does not exist on an item (due to fail_on_missing=True)
Base Classes: AbstractComparisonFilter
Chatterlang Name: lte
Filter items where a specified field's value is less than or equal to a number. For each item passed in, this segment yields only those where the value of the specified field is less than or equal to the given number n. Args: items: Iterable of items to filter field: String representing the field/property to compare. Note that an underscore "_" can be used to refer to the item itself. n: Number to compare against Yields: Items where the specified field's value is less than or equal to n Raises: AttributeError: If the specified field is missing from any item
Base Classes: AbstractComparisonFilter
Chatterlang Name: neq
Filter items where a specified field's value does not equal a number. For each item passed in, this segment yields only those where the value of the specified field is not equal to the given number n. Args: items: Iterable of items to filter field: String representing the field/property to compare. Note that an underscore "_" can be used to refer to the item itself. n: Item to compare against Yields: Items where the specified field's value does not equal n Raises: AttributeError: If the specified field is missing from any item
Base Classes: AbstractComparisonFilter
Chatterlang Name: randomInts
Generate n random integers between lower and upper.
Chatterlang Name: scale
Scale each item in the input stream by the multiplier.