Minarrow¶
Arrow-compatible data that moves cleanly between Rust and Python, stays ready for SIMD, and plugs into the rest of the Python data ecosystem.
Minarrow is a compact Python interface over Minarrow Rust. It gives Rust and Python the same columnar data model, with Apache Arrow-compatible memory layouts and 64-byte-aligned buffers. Data produced in Rust can be exposed directly to Python, operated on by native SIMD kernels, and passed to PyArrow, Polars, DuckDB and other Arrow-aware libraries through the Arrow PyCapsule interface.
-
Fast
Data crosses between Rust and Python zero-copy, with no serialisation step.
-
Compatible
Buffers sit on SIMD 64-byte boundaries, ready for AVX2 and AVX-512 kernels.
-
Pluggable
Hand data to Polars, DuckDB, pandas and PyArrow Ecosystem through the Arrow PyCapsule interface.
-
Simple
Array, Table, ChunkedArray and ChunkedTable cover flat columnar data.
The same data model exists on both sides of the boundary.
Overview¶
Minarrow provides four primary Python containers:
Arrayfor typed columnar dataTablefor named collections of equal-length arraysChunkedArrayfor one logical column split across multiple chunksChunkedTablefor a sequence of table batches
The Python package is backed by the Minarrow Rust core, which provides the underlying array types, schemas and aligned buffers.
Minarrow is intended for:
- Passing columnar data between Rust and Python
- Building Python extensions that operate on Arrow-compatible data
- Feeding Polars, DuckDB, PyArrow and other Arrow-aware libraries
- Running SIMD-oriented native kernels over guaranteed aligned buffers
- Keeping binary size and dependency footprint small
Installation¶
Basic usage¶
import minarrow as ma
table = ma.Table(
{
"id": [1, 2, 3],
"price": [9.5, 10.0, 11.2],
},
name="prices",
)
print(table.columns)
# ['id', 'price']
print(table.dtypes)
# {'id': DType.Integer, 'price': DType.Float}
Create an individual array:
Pass data to another Arrow-aware library:
See the Quickstart for arrays, tables, indexing and schema operations.
Why Minarrow¶
Small typed API¶
Minarrow exposes a compact object model rather than reproducing the full Apache Arrow surface area.
Arrays retain a concrete data type, exposed through .dtype, while tables provide named access to their constituent arrays.
Arrow ecosystem integration¶
Array and Table implement the Arrow PyCapsule interface. Compatible libraries can import their schema and buffers without requiring an intermediate serialisation format.
This provides an interoperability path to:
- Polars
- DuckDB
- PyArrow
- pandas through an Arrow-compatible backend
- Native Python extensions that consume Arrow capsules
The PyCapsule integration is zero-copy.
64-byte-aligned buffers¶
The Rust core stores supported data buffers with 64-byte alignment.
This is useful for native kernels using SIMD instruction sets such as AVX2 or AVX-512, as suitably aligned buffers can be processed without first copying them into a separate aligned allocation.
Alignment does not make an operation SIMD-accelerated by itself. It provides a predictable memory layout for native code that implements vectorised kernels.
Rust-backed implementation¶
The core data structures are implemented in Rust using concrete array types and enum-based dispatch. This allows type-specific paths to be compiled and optimised without requiring dynamic Python dispatch inside inner loops.
Python remains the orchestration layer, while storage and native operations are handled by the Rust implementation.
Interoperability¶
Minarrow is a columnar data container/bridge as opposed to a full dataframe execution engine.
Use it to construct or receive data, operate on it through native extensions, and pass it to an appropriate query or dataframe system zero-copy.
| Requirement | Example integration |
|---|---|
| Dataframe expressions | Polars |
| SQL queries | DuckDB |
| General Arrow interchange | PyArrow |
| pandas workflows | pandas with an Arrow-compatible path |
| Custom native computation | Rust, C or C++ through Arrow capsules |
The Rust crate does provide a minimal set of tabular data-processing capabilities but without full parallelism, which serves as a useful basis for foundational operations in that environment. It compiles in less than 2 seconds with the standard feature set to help you stay productive, and your project lightweight.
See Ecosystem interoperability for supported conversion paths and ownership behaviour.
Documentation¶
- Quickstart - arrays, tables, indexing, fields and schemas
- Types and schemas - data types and Arrow schema mapping
- Ecosystem interoperability - PyCapsule and library integration
- API reference - public types, methods and properties
Support¶
Report issues through the Minarrow GitHub repository.
Minarrow is built and maintained by SpaceCell.