Minarrow¶

Arrow-compatible data that moves cleanly between Rust and Python, stays ready for SIMD, and plugs into the rest of the Python data ecosystem.

Minarrow is a compact Python interface over Minarrow Rust. It gives Rust and Python the same columnar data model, with Apache Arrow-compatible memory layouts and 64-byte-aligned buffers. Data produced in Rust can be exposed directly to Python, operated on by native SIMD kernels, and passed to PyArrow, Polars, DuckDB and other Arrow-aware libraries through the Arrow PyCapsule interface.

Fast

Data crosses between Rust and Python zero-copy, with no serialisation step.
Compatible

Buffers sit on SIMD 64-byte boundaries, ready for AVX2 and AVX-512 kernels.
Pluggable

Hand data to Polars, DuckDB, pandas and PyArrow Ecosystem through the Arrow PyCapsule interface.
Simple

Array, Table, ChunkedArray and ChunkedTable cover flat columnar data.

The same data model exists on both sides of the boundary.

PythonRust

import minarrow as ma

table = ma.Table(
    {"id": [1, 2, 3], "price": [9.5, 10.0, 11.2]},
    name="prices",
)

series = table["price"].to_polars()
relation = table.to_duckdb()

use minarrow::{Table, fa_i64, fa_f64};

let table = Table::new(
    "prices",
    Some(vec![
        fa_i64!("id", 1, 2, 3),
        fa_f64!("price", 9.5, 10.0, 11.2),
    ]),
);

let series = table.c[1].to_polars();
let batch = table.to_apache_arrow();

Overview¶

Minarrow provides four primary Python containers:

Array for typed columnar data
Table for named collections of equal-length arrays
ChunkedArray for one logical column split across multiple chunks
ChunkedTable for a sequence of table batches

The Python package is backed by the Minarrow Rust core, which provides the underlying array types, schemas and aligned buffers.

Minarrow is intended for:

Passing columnar data between Rust and Python
Building Python extensions that operate on Arrow-compatible data
Feeding Polars, DuckDB, PyArrow and other Arrow-aware libraries
Running SIMD-oriented native kernels over guaranteed aligned buffers
Keeping binary size and dependency footprint small

Installation¶

pip install minarrow

Basic usage¶

import minarrow as ma

table = ma.Table(
    {
        "id": [1, 2, 3],
        "price": [9.5, 10.0, 11.2],
    },
    name="prices",
)

print(table.columns)
# ['id', 'price']

print(table.dtypes)
# {'id': DType.Integer, 'price': DType.Float}

Create an individual array:

ids = ma.Array([1, 2, 3], name="id")

Pass data to another Arrow-aware library:

polars_series = ids.to_polars()
duckdb_relation = table.to_duckdb()

See the Quickstart for arrays, tables, indexing and schema operations.

Why Minarrow¶

Small typed API¶

Minarrow exposes a compact object model rather than reproducing the full Apache Arrow surface area.

Arrays retain a concrete data type, exposed through .dtype, while tables provide named access to their constituent arrays.

Arrow ecosystem integration¶

Array and Table implement the Arrow PyCapsule interface. Compatible libraries can import their schema and buffers without requiring an intermediate serialisation format.

This provides an interoperability path to:

Polars
DuckDB
PyArrow
pandas through an Arrow-compatible backend
Native Python extensions that consume Arrow capsules

The PyCapsule integration is zero-copy.

64-byte-aligned buffers¶

The Rust core stores supported data buffers with 64-byte alignment.

This is useful for native kernels using SIMD instruction sets such as AVX2 or AVX-512, as suitably aligned buffers can be processed without first copying them into a separate aligned allocation.

Alignment does not make an operation SIMD-accelerated by itself. It provides a predictable memory layout for native code that implements vectorised kernels.

Rust-backed implementation¶

The core data structures are implemented in Rust using concrete array types and enum-based dispatch. This allows type-specific paths to be compiled and optimised without requiring dynamic Python dispatch inside inner loops.

Python remains the orchestration layer, while storage and native operations are handled by the Rust implementation.

Interoperability¶

Minarrow is a columnar data container/bridge as opposed to a full dataframe execution engine.

Use it to construct or receive data, operate on it through native extensions, and pass it to an appropriate query or dataframe system zero-copy.

Requirement	Example integration
Dataframe expressions	Polars
SQL queries	DuckDB
General Arrow interchange	PyArrow
pandas workflows	pandas with an Arrow-compatible path
Custom native computation	Rust, C or C++ through Arrow capsules

The Rust crate does provide a minimal set of tabular data-processing capabilities but without full parallelism, which serves as a useful basis for foundational operations in that environment. It compiles in less than 2 seconds with the standard feature set to help you stay productive, and your project lightweight.

See Ecosystem interoperability for supported conversion paths and ownership behaviour.

Documentation¶

Quickstart - arrays, tables, indexing, fields and schemas
Types and schemas - data types and Arrow schema mapping
Ecosystem interoperability - PyCapsule and library integration
API reference - public types, methods and properties

Support¶

Report issues through the Minarrow GitHub repository.

Minarrow is built and maintained by SpaceCell.