gwkokab.analysis.utils.jenks¶

Bucketing and padding utilities using Jenks’ Natural Breaks algorithm.

This module provides functionality to partition sequences of arrays into buckets based on their sizes using Jenks’ Natural Breaks algorithm, then pad and stack these arrays within each bucket for uniform shape. This is useful for managing memory usage when processing large datasets.

Functions¶

pad_and_stack(...)

Pad and stack multiple arrays into buckets.

Module Contents¶

gwkokab.analysis.utils.jenks.pad_and_stack(*arrays: collections.abc.Sequence[jaxtyping.Array | numpy.ndarray], n_buckets: int | None = None, threshold: float = 3.0) Tuple[collections.abc.Sequence[jaxtyping.Array], Ellipsis]¶

Pad and stack multiple arrays into buckets.

Parameters:
  • *arrays (Sequence[Union[Array, np.ndarray]]) – Variable number of array sequences to be bucketed. All sequences must have the same length.

  • n_buckets (Optional[int]) – The number of buckets to create from the data. If None, the function will partition the data into buckets based on the sizes of the arrays.

  • threshold (float) – if n_buckets is None, this value is used to determine the maximum size of the buckets.

Returns:

A sequence of lists of padded arrays, where each list corresponds to an original array sequence, and the elements of the inner list are the stacked buckets. The last element of the returned tuple is the list of mask arrays (one mask array per bucket).

Return type:

Tuple[Sequence[Array], …]