Avro DBO 🚀
Avro DBO is a robust Python library designed for handling Apache Avro schemas. It facilitates seamless data serialization and schema management, making it ideal for data engineering pipelines and stream processing applications.
✨ Features
- 🏗️ Schema-First Development: Generate Python classes from Avro schemas.
- 🔄 Full Type Support: Supports all Avro logical types including arrays and enums.
- 🛠️ Custom Serialization: Offers flexible serializers and deserializers.
- 🌐 Schema Registry Integration: Integrates natively with Confluent Schema Registry.
- 🔒 Type Safety: Ensures full static type checking.
- ⚡ High Performance: Optimized for high-load production environments.
🚀 Quick Start
- Install from PyPI
bash
pip install avro-dbo
Example Schemas for Each Avro Logical Type
- Decimal Type
```python from attrs import field, define from decimal import Decimal
@define @avro_schema class DecimalModel: amount: Decimal = field( default=Decimal("100.00"), metadata={ "logicalType": "decimal", "precision": 10, "scale": 2 } ) ```
- Timestamp (millis) Type
```python from attrs import field, define import datetime
@define @avro_schema class TimestampModel: created_at: datetime.datetime = field( metadata={ "logicalType": "timestamp-millis" } ) ```
- Enum Type
```python from attrs import field, define from enum import Enum
class Status(Enum): ACTIVE = "ACTIVE" INACTIVE = "INACTIVE"
@define @avro_schema class EnumModel: status: Status = field( default=Status.ACTIVE, metadata={ "logicalType": "enum", "symbols": list(Status) } ) ```
- Array Type
```python from attrs import field, define from typing import List
@define @avro_schema class ArrayModel: tags: List[str] = field( factory=list, metadata={ "logicalType": "array", "items": "string" } ) ```
- Kitchen Sink Example
```python from attrs import field, define from decimal import Decimal from enum import Enum from typing import List import datetime
class Status(Enum): ACTIVE = "ACTIVE" INACTIVE = "INACTIVE"
@define @avro_schema class KitchenSinkModel: name: str = field(default="") amount: Decimal = field( default=Decimal("999.99"), metadata={ "logicalType": "decimal", "precision": 10, "scale": 2 } ) status: Status = field( default=Status.ACTIVE, metadata={ "logicalType": "enum", "symbols": list(Status) } ) created_at: datetime.datetime = field( metadata={ "logicalType": "timestamp-millis" } ) tags: List[str] = field( factory=list, metadata={ "logicalType": "array", "items": "string" } ) ```
Example Avro Schema Output
You can use the export_schema()
method to export the schema as a JSON object.
print(KitchenSinkModel.export_schema())
The result will be a JSON object that can be used to define the schema in a Confluent Schema Registry.
{
"type": "record",
"name": "KitchenSinkModel",
"fields": [
{"name": "name", "type": "string", "default": ""},
{"name": "amount", "type": "decimal", "precision": 10, "scale": 2},
{"name": "status", "type": "enum", "symbols": ["ACTIVE", "INACTIVE"]},
{"name": "created_at", "type": "long", "logicalType": "timestamp-millis"},
{"name": "tags", "type": "array", "items": "string"}
]
}
Saving an Avro Schema to a File
You can use the export_schema()
method to export the schema as a JSON object.
KitchenSinkModel.export_schema(filename="kitchen_sink_model.json")
Coercing a Python Class Using Avro Schema Model
Avro-DBO will coerce automnatically all fields in the schema to the correct type.
Avro to datetime, date, decimal, enum, array, and more.
Example with Decimal
```python from attrs import field, define from decimal import Decimal
@define @avro_schema class DecimalModel: amount: Decimal = field( default=Decimal("100.00"), metadata={ "logicalType": "decimal", "precision": 10, "scale": 2 } )
my_model = DecimalModel() print(my_model.amount)
> Decimal("100.00")
extra precision is truncated to the scale
my_model.amount = Decimal("100.00383889328932") print(my_model.amount) # > Decimal("100.00") ```
📚 Documentation
For detailed usage instructions, type hints, and comprehensive examples, please refer to our documentation.
🤝 Contributing
We welcome contributions! To submit issues or propose changes, please visit our GitHub repository. See the CONTRIBUTING.md file for more information on how to contribute.
📜 License
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.