gwkokab.analysis.core.inference_io

Classes

AnalyticalPELoader

Loader for Analytical PE (Parameter Estimation) samples from files matching a

DiscretePELoader

Loader for Discrete PE (Parameter Estimation) samples from files matching a

FlowMCGlobalConfig

Configuration for the FlowMC sampler.

NumpyroGlobalConfig

Configuration for the Numpyro sampler, including both kernel and MCMC

NumpyroMCMCConfig

Configuration for the Numpyro MCMC.

NumpyroNUTSSamplerConfig

Configuration for the Numpyro NUTS.

PoissonMeanEstimationLoader

!!! abstract "Usage Documentation"

SamplerConfig

Factory interface for generating Sampler Configs.

Package Contents

class gwkokab.analysis.core.inference_io.AnalyticalPELoader(/, **data: Any)

Bases: pydantic.BaseModel

Loader for Analytical PE (Parameter Estimation) samples from files matching a regex.

This class handles the ingestion of gravitational-wave posterior samples, manages parameter aliasing, performs subsampling, and calculates log-prior weights for population inference.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod from_json(config_path: str) AnalyticalPELoader

Initializes the loader from a JSON configuration file.

Parameters:

config_path (str) – Path to the JSON file containing loader settings.

Returns:

An instance of AnalyticalPELoader.

Return type:

AnalyticalPELoader

Raises:
  • KeyError – If the ‘regex’ field is missing in the configuration.

  • FileNotFoundError – If no files match the provided regex pattern.

load(parameters: tuple[str, Ellipsis], seed: int = 37) dict[str, list[numpy.ndarray]]

Loads analytical PE data from disk.

This method reads the mean, covariance, and limits for each event specified in self.event_paths, validates that the necessary parameters are present, and returns them as stacked numpy arrays.

Parameters:
  • parameters (tuple[str, ...]) – The list of parameters to extract from each file.

  • seed (int, optional) – Random seed used for deterministic subsampling, by default 37

Returns:

A dictionary containing lists of arrays of mean, covariance, and limits for each event.

Return type:

dict[str, list[np.ndarray]]

classmethod load_file(filename: pathlib.Path | str, waveform_name: str) AnalyticalPEFileData

Loads a single PE sample file into a DataFrame.

Parameters:
  • filename (Path | str) – Path to the sample file.

  • waveform_name (str) – Name of the waveform model used.

Returns:

NamedTuple containing the samples and metadata from the file.

Return type:

AnalyticalPEFileData

alternate_waveforms: dict[str, str] = None

Mapping of filenames to alternate waveform names, if needed.

default_waveform: str = None

Default waveform name to use when loading samples.

event_paths: tuple[pathlib.Path, Ellipsis]

Tuple of absolute paths to the files containing PE samples.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

parameter_aliases: dict[str, str] = None

Mapping of internal parameter names to the column names used in the CSV files.

sample_transformer: gwkokab.analysis.core.utils.SampleTransformer = None

An instance of a SampleTransformer that defines how to transform the samples from the analytical PE format to the model’s expected format.

This allows for flexible handling of different coordinate systems or parameterizations used in the PE samples.

class gwkokab.analysis.core.inference_io.DiscretePELoader(/, **data: Any)

Bases: pydantic.BaseModel

Loader for Discrete PE (Parameter Estimation) samples from files matching a regex.

This class handles the ingestion of gravitational-wave posterior samples, manages parameter aliasing, performs subsampling, and calculates log-prior weights for population inference.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod from_json(config_path: str) DiscretePELoader

Initializes the loader from a JSON configuration file.

Parameters:

config_path (str) – Path to the JSON file containing loader settings.

Returns:

An instance of DiscreteParameterEstimationLoader.

Return type:

DiscreteParameterEstimationLoader

Raises:
  • KeyError – If the ‘regex’ field is missing in the configuration.

  • FileNotFoundError – If no files match the provided regex pattern.

load(parameters: tuple[str, Ellipsis], seed: int = 37) tuple[list[numpy.ndarray], list[numpy.ndarray]]

Loads samples from disk and computes the corresponding log-prior weights.

It is inspired by evaluate_prior().

Parameters:
  • parameters (tuple[str, ...]) – The list of parameters to extract from each file.

  • seed (int, optional) – Random seed used for deterministic subsampling, by default 37

Returns:

A tuple containing:
  • A list of arrays (one per event) containing the requested parameters.

  • A list of arrays (one per event) containing the log-prior weights.

Return type:

tuple[list[np.ndarray], list[np.ndarray]]

classmethod load_file(filename: pathlib.Path | str, datasets: str | tuple[str, Ellipsis]) pandas.DataFrame

Loads a single PE sample file into a DataFrame.

Parameters:
  • filename (Path | str) – Path to the sample file.

  • datasets (str | tuple[str, ...]) – Name or tuple of names of the dataset(s) to load from the HDF5 file, in order of preference.

Returns:

DataFrame containing the samples from the file.

Return type:

pd.DataFrame

alternate_datasets: dict[str, str] = None

Mapping of filenames to an alternate dataset name, overriding the default dataset(s).

alternate_distance_priors: dict[str, Literal[None, 'comoving', 'euclidean']] = None

Mapping of filenames to an alternate distance prior, overriding the default distance prior.

alternate_mass_priors: dict[str, Literal[None, 'flat-detector-components', 'flat-detector-chirp-mass-ratio', 'flat-source-components']] = None

Mapping of filenames to an alternate mass prior, overriding the default mass prior.

alternate_spin_priors: dict[str, Literal[None, 'component']] = None

Mapping of filenames to an alternate spin prior, overriding the default spin prior.

default_datasets: tuple[str, Ellipsis] = None

Default dataset names to look for in HDF5 files, in order of preference.

default_distance_prior: Literal[None, 'comoving', 'euclidean'] = None

The distance prior assumed; used to calculate volume-sensitive weights.

default_mass_prior: Literal[None, 'flat-detector-components', 'flat-detector-chirp-mass-ratio', 'flat-source-components'] = None

The mass prior assumed during the original PE run to be removed/reweighted.

default_spin_prior: Literal[None, 'component'] = None

The spin prior assumed during the original PE run.

filenames: tuple[pathlib.Path, Ellipsis]

Tuple of absolute paths to the sample files.

max_samples: pydantic.PositiveInt | None = None

If set, limits the number of samples loaded per event to this value.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

parameter_aliases: dict[str, str] = None

Mapping of internal parameter names to the column names used in the CSV files.

class gwkokab.analysis.core.inference_io.FlowMCGlobalConfig(/, **data: Any)

Bases: pydantic.BaseModel

Configuration for the FlowMC sampler.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod from_json(config_path: str) FlowMCGlobalConfig

Initializes the loader from a JSON configuration file.

Parameters:

config_path (str) – Path to the JSON file containing loader settings.

Returns:

An instance of FlowMCGlobalConfig.

Return type:

FlowMCGlobalConfig

batch_size: pydantic.PositiveInt = None

Number of samples per training batch for the Normalizing Flow.

chain_batch_size: Annotated[int, Field(ge=0)] = None

Batch size for processing chains.

If 0, processes all chains simultaneously.

global_thinning: pydantic.PositiveInt = None

Thinning factor applied to global (Normalizing Flow) proposals.

history_window: pydantic.PositiveInt = None

Size of the rolling history window used for training data or adaptation.

learning_rate: pydantic.PositiveFloat = None

Learning rate for the Normalizing Flow optimizer.

local_sampler_name: Literal['mala', 'hmc'] = None

The underlying local MCMC sampler to use (‘mala’ for MALA or ‘hmc’ for HMC).

local_thinning: pydantic.PositiveInt = None

Thinning factor applied to local sampler steps.

mass_matrix: pydantic.PositiveFloat | NumPyArrayTypeForPydantic = None

Mass matrix diagonal elements or scalar value for HMC trajectory dynamics.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_NFproposal_batch_size: pydantic.PositiveInt = None

Batch size used when generating proposal steps from the Normalizing Flow.

n_chains: pydantic.PositiveInt = None

Number of chains to sample.

n_epochs: pydantic.PositiveInt = None

Number of training epochs for the Normalizing Flow per training loop.

n_global_steps: pydantic.PositiveInt = None

Number of global production/exploration steps to take using the NF proposal.

n_leapfrog: pydantic.PositiveInt = None

Number of leapfrog steps per HMC trajectory (ignored if using MALA).

n_local_steps: pydantic.PositiveInt = None

Number of local sampler steps to take between global updates.

n_max_examples: pydantic.PositiveInt = None

Maximum number of total samples/examples to store in the training history.

n_production_loops: pydantic.PositiveInt = None

Number of production loops to run after the model is trained.

n_training_loops: pydantic.PositiveInt = None

Number of initial loops dedicated to tuning and training the Normalizing Flow.

rq_spline_hidden_units: list[pydantic.PositiveInt] = None

Layer widths of the neural network conditioning the Rational-Quadratic Splines.

rq_spline_n_bins: pydantic.PositiveInt = None

Number of bins used in each Rational-Quadratic Spline transformation layer.

rq_spline_n_layers: pydantic.PositiveInt = None

Total number of flow layers (coupling blocks) in the Normalizing Flow.

rq_spline_range: tuple[float, float] = None

The bounding box interval (min, max) where the spline transformations are active.

sampler_name: Literal['flowMC'] = 'flowMC'

The name of the sampler to use.

Currently only ‘flowMC’ is supported.

step_size: pydantic.PositiveFloat = None

The initial step size (or integration step size) for the local sampler.

verbose: bool = None

If True, prints execution progress logs and loss metrics to the console.

class gwkokab.analysis.core.inference_io.NumpyroGlobalConfig(/, **data: Any)

Bases: pydantic.BaseModel

Configuration for the Numpyro sampler, including both kernel and MCMC settings.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod from_json(config_path: str) NumpyroGlobalConfig

Initializes the loader from a JSON configuration file.

Parameters:

config_path (str) – Path to the JSON file containing loader settings.

Returns:

An instance of NumpyroGlobalConfig.

Return type:

NumpyroGlobalConfig

kernel: NumpyroNUTSSamplerConfig = None

Configuration for the NUTS sampler kernel.

mcmc: NumpyroMCMCConfig = None

Configuration for the MCMC sampling procedure.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sampler_name: Literal['numpyro'] = 'numpyro'

The name of the sampler to use.

Currently only ‘numpyro’ is supported.

class gwkokab.analysis.core.inference_io.NumpyroMCMCConfig(/, **data: Any)

Bases: pydantic.BaseModel

Configuration for the Numpyro MCMC.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

chain_method: Literal['parallel', 'sequential', 'vectorized'] = None

A callable jax transform like vmap() or one of “parallel” (default), “sequential” “vectorized”.

The method “parallel” is used to execute the drawing process in parallel on XLA devices (CPUs/GPUs/TPUs), If there are not enough devices for “parallel”, we fall back to “sequential” method to draw chains sequentially. “vectorized” method is an experimental feature which vectorizes the drawing method, hence allowing us to collect samples in parallel on a single device.

jit_model_args: bool = None

If set to True, this will compile the potential energy computation as a function of model arguments.

As such, calling run() again on a same sized but different dataset will not result in additional compilation cost. Note that currently, this does not take effect for the case num_chains > 1 and chain_method == ‘parallel’.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

num_chains: pydantic.PositiveInt = None

Number of MCMC chains to run.

By default, chains will be run in parallel using pmap(). If there are not enough devices available, chains will be run in sequence.

num_samples: pydantic.PositiveInt = None

Number of samples to generate from the Markov chain.

num_warmup: pydantic.PositiveInt = None

Number of warmup steps.

progress_bar: bool = None

Whether to enable progress bar updates.

Defaults to True.

progress_rate: pydantic.PositiveInt | None = None

Number of iterations per progress bar update.

Defaults to None, which is 5% of total iterations when there are more than 20 iterations, otherwise every iteration.

thinning: pydantic.PositiveInt = None

Positive integer that controls the fraction of post-warmup samples that are retained.

For example if thinning is 2 then every other sample is retained. Defaults to 1, i.e. no thinning.

class gwkokab.analysis.core.inference_io.NumpyroNUTSSamplerConfig(/, **data: Any)

Bases: pydantic.BaseModel

Configuration for the Numpyro NUTS.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

adapt_mass_matrix: bool = None

A flag to decide if we want to adapt mass matrix during warm-up phase using Welford scheme.

adapt_step_size: bool = None

A flag to decide if we want to adapt step_size during warm-up phase using Dual Averaging scheme.

dense_mass: bool | list[tuple[str, Ellipsis]] = None

This flag controls whether mass matrix is dense (i.e. full-rank) or diagonal (defaults to dense_mass=False).

To specify a structured mass matrix, users can provide a list of tuples of site names. Each tuple represents a block in the joint mass matrix. For example, assuming that the model has latent variables “x”, “y”, “z” (where each variable can be multi-dimensional), possible specifications and corresponding mass matrix structures are as follows:

  • dense_mass=[(“x”, “y”)]: use a dense mass matrix for the joint (x, y) and a

    diagonal mass matrix for z

  • dense_mass=[] (equivalent to dense_mass=False): use a diagonal mass matrix

    for the joint (x, y, z)

  • dense_mass=[(“x”, “y”, “z”)] (equivalent to full_mass=True): use a dense

    mass matrix for the joint (x, y, z)

  • dense_mass=[(“x”,), (“y”,), (“z”)]: use dense mass matrices for each of x, y,

    and z (i.e. block-diagonal with 3 blocks)

find_heuristic_step_size: bool = None

Whether or not to use a heuristic function to adjust the step size at the beginning of each adaptation window.

Defaults to False.

forward_mode_differentiation: bool = None

Whether to use forward-mode differentiation or reverse-mode differentiation.

By default, we use reverse mode but the forward mode can be useful in some cases to improve the performance. In addition, some control flow utility on JAX such as jax.lax.while_loop or jax.lax.fori_loop only supports forward-mode differentiation.

inverse_mass_matrix: None | NumPyArrayTypeForPydantic | dict = None

Initial value for inverse mass matrix.

This may be adapted during warmup if adapt_mass_matrix = True. If no value is specified, then it is initialized to the identity matrix. For a potential_fn with general JAX pytree parameters, the order of entries of the mass matrix is the order of the flattened version of pytree parameters obtained with tree_flatten(), which is a bit ambiguous (see more at https://jax.readthedocs.io/en/latest/pytrees.html). If model is not None, here we can specify a structured block mass matrix as a dictionary, where keys are tuple of site names and values are the corresponding block of the mass matrix. For more information about structured mass matrix, see dense_mass argument.

max_tree_depth: pydantic.PositiveInt | tuple[pydantic.PositiveInt, pydantic.PositiveInt] = None

Max depth of the binary tree created during the doubling scheme of NUTS sampler.

Defaults to 8. This argument also accepts a tuple of integers (d1, d2), where d1 is the max tree depth during warmup phase and d2 is the max tree depth during post warmup phase.

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

regularize_mass_matrix: bool = None
step_size: pydantic.PositiveFloat = None

Determines the size of a single step taken by the verlet integrator while computing the trajectory using Hamiltonian dynamics.

If not specified, it will be set to 1.

target_accept_prob: Annotated[float, Field(gt=0.0, le=1.0)] = None

Target acceptance probability for step size adaptation using Dual Averaging.

Increasing this value will lead to a smaller step size, hence the sampling will be slower but more robust. Defaults to 0.8.

class gwkokab.analysis.core.inference_io.PoissonMeanEstimationLoader(/, **data: Any)

Bases: pydantic.BaseModel

!!! abstract “Usage Documentation”

[Models](../concepts/models.md)

A base class for creating Pydantic models.

__class_vars__

The names of the class variables defined on the model.

__private_attributes__

Metadata about the private attributes of the model.

__signature__

The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__

Whether model building is completed, or if there are still undefined fields.

__pydantic_core_schema__

The core schema of the model.

__pydantic_custom_init__

Whether the model has a custom __init__ function.

__pydantic_decorators__

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__

A dictionary containing metadata about generic Pydantic models. The origin and args items map to the [__origin__][genericalias.__origin__] and [__args__][genericalias.__args__] attributes of [generic aliases][types-genericalias], and the parameter item maps to the __parameter__ attribute of generic classes.

__pydantic_parent_namespace__

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__

The name of the post-init method for the model, if defined.

__pydantic_root_model__

Whether the model is a [RootModel][pydantic.root_model.RootModel].

__pydantic_serializer__

The pydantic-core SchemaSerializer used to dump instances of the model.

__pydantic_validator__

The pydantic-core SchemaValidator used to validate instances of the model.

__pydantic_fields__

A dictionary of field names and their corresponding [FieldInfo][pydantic.fields.FieldInfo] objects.

__pydantic_computed_fields__

A dictionary of computed field names and their corresponding [ComputedFieldInfo][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__

A dictionary containing extra values, if [extra][pydantic.config.ConfigDict.extra] is set to ‘allow’.

__pydantic_fields_set__

The names of fields explicitly set during instantiation.

__pydantic_private__

Values of private attributes set on the model instance.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod from_json(config_path: str, key: jaxtyping.PRNGKeyArray, parameters: Tuple[str, Ellipsis])
get_estimators() Tuple[Callable[[jaxtyping.Array], jaxtyping.Array] | None, Callable[Ellipsis, jaxtyping.Array], dict[str, Any]]
loader: NeuralVolumeTimeSensitivityPoissonMeanLoader | NeuralVolumeProbabilityOfDetectionPoissonMeanLoader | GWTCInjectionLoader | CustomPoissonMeanEstimationLoader = None
class gwkokab.analysis.core.inference_io.SamplerConfig

Factory interface for generating Sampler Configs.

static from_json(config_path: str) NumpyroGlobalConfig | FlowMCGlobalConfig

Initializes and returns the specific config instance directly from JSON.