gwkokab.analysis.core.inference_io¶

Classes¶

`AnalyticalPELoader`	Loader for Analytical PE (Parameter Estimation) samples from files matching a
`DiscretePELoader`	Loader for Discrete PE (Parameter Estimation) samples from files matching a
`FlowMCGlobalConfig`	Configuration for the FlowMC sampler.
`NumpyroGlobalConfig`	Configuration for the Numpyro sampler, including both kernel and MCMC
`NumpyroMCMCConfig`	Configuration for the Numpyro MCMC.
`NumpyroNUTSSamplerConfig`	Configuration for the Numpyro NUTS.
`PoissonMeanEstimationLoader`	!!! abstract "Usage Documentation"
`SamplerConfig`	Factory interface for generating Sampler Configs.

Package Contents¶

class gwkokab.analysis.core.inference_io.AnalyticalPELoader(/, **data: Any)¶

Bases: pydantic.BaseModel

Loader for Analytical PE (Parameter Estimation) samples from files matching a regex.

This class handles the ingestion of gravitational-wave posterior samples, manages parameter aliasing, performs subsampling, and calculates log-prior weights for population inference.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod from_json(config_path: str) → AnalyticalPELoader¶

Initializes the loader from a JSON configuration file.

Parameters:

config_path (str) – Path to the JSON file containing loader settings.

Returns:

An instance of AnalyticalPELoader.

Return type:

AnalyticalPELoader

Raises:

KeyError – If the ‘regex’ field is missing in the configuration.
FileNotFoundError – If no files match the provided regex pattern.

load(parameters: tuple[str, Ellipsis], seed: int = 37) → dict[str, list[numpy.ndarray]]¶

Loads analytical PE data from disk.

This method reads the mean, covariance, and limits for each event specified in self.event_paths, validates that the necessary parameters are present, and returns them as stacked numpy arrays.

Parameters:

parameters (tuple[str, ...]) – The list of parameters to extract from each file.
seed (int, optional) – Random seed used for deterministic subsampling, by default 37

Returns:

A dictionary containing lists of arrays of mean, covariance, and limits for each event.

Return type:

dict[str, list[np.ndarray]]

classmethod load_file(filename: pathlib.Path | str, waveform_name: str) → AnalyticalPEFileData¶

Loads a single PE sample file into a DataFrame.

Parameters:

filename (Path | str) – Path to the sample file.
waveform_name (str) – Name of the waveform model used.

Returns:

NamedTuple containing the samples and metadata from the file.

Return type:

AnalyticalPEFileData

alternate_waveforms: dict[str, str] = None¶: Mapping of filenames to alternate waveform names, if needed.

default_waveform: str = None¶: Default waveform name to use when loading samples.

event_paths: tuple[pathlib.Path, Ellipsis]¶: Tuple of absolute paths to the files containing PE samples.

model_config¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

parameter_aliases: dict[str, str] = None¶: Mapping of internal parameter names to the column names used in the CSV files.

sample_transformer: gwkokab.analysis.core.utils.SampleTransformer = None¶

An instance of a SampleTransformer that defines how to transform the samples from the analytical PE format to the model’s expected format.

This allows for flexible handling of different coordinate systems or parameterizations used in the PE samples.

class gwkokab.analysis.core.inference_io.DiscretePELoader(/, **data: Any)¶

Bases: pydantic.BaseModel

Loader for Discrete PE (Parameter Estimation) samples from files matching a regex.

This class handles the ingestion of gravitational-wave posterior samples, manages parameter aliasing, performs subsampling, and calculates log-prior weights for population inference.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod from_json(config_path: str) → DiscretePELoader¶

Initializes the loader from a JSON configuration file.

Parameters:

config_path (str) – Path to the JSON file containing loader settings.

Returns:

An instance of DiscreteParameterEstimationLoader.

Return type:

DiscreteParameterEstimationLoader

Raises:

KeyError – If the ‘regex’ field is missing in the configuration.
FileNotFoundError – If no files match the provided regex pattern.

load(parameters: tuple[str, Ellipsis], seed: int = 37) → tuple[list[numpy.ndarray], list[numpy.ndarray]]¶

Loads samples from disk and computes the corresponding log-prior weights.

It is inspired by evaluate_prior().

Parameters:

parameters (tuple[str, ...]) – The list of parameters to extract from each file.
seed (int, optional) – Random seed used for deterministic subsampling, by default 37

Returns:

A tuple containing:

A list of arrays (one per event) containing the requested parameters.
A list of arrays (one per event) containing the log-prior weights.

Return type:

tuple[list[np.ndarray], list[np.ndarray]]

classmethod load_file(filename: pathlib.Path | str, datasets: str | tuple[str, Ellipsis]) → pandas.DataFrame¶

Loads a single PE sample file into a DataFrame.

Parameters:

filename (Path | str) – Path to the sample file.
datasets (str | tuple[str, ...]) – Name or tuple of names of the dataset(s) to load from the HDF5 file, in order of preference.

Returns:

DataFrame containing the samples from the file.

Return type:

pd.DataFrame

alternate_datasets: dict[str, str] = None¶: Mapping of filenames to an alternate dataset name, overriding the default dataset(s).

alternate_distance_priors: dict[str, Literal[None, 'comoving', 'euclidean']] = None¶: Mapping of filenames to an alternate distance prior, overriding the default distance prior.

alternate_mass_priors: dict[str, Literal[None, 'flat-detector-components', 'flat-detector-chirp-mass-ratio', 'flat-source-components']] = None¶: Mapping of filenames to an alternate mass prior, overriding the default mass prior.

alternate_spin_priors: dict[str, Literal[None, 'component']] = None¶: Mapping of filenames to an alternate spin prior, overriding the default spin prior.

default_datasets: tuple[str, Ellipsis] = None¶: Default dataset names to look for in HDF5 files, in order of preference.

default_distance_prior: Literal[None, 'comoving', 'euclidean'] = None¶: The distance prior assumed; used to calculate volume-sensitive weights.

default_mass_prior: Literal[None, 'flat-detector-components', 'flat-detector-chirp-mass-ratio', 'flat-source-components'] = None¶: The mass prior assumed during the original PE run to be removed/reweighted.

default_spin_prior: Literal[None, 'component'] = None¶: The spin prior assumed during the original PE run.

filenames: tuple[pathlib.Path, Ellipsis]¶: Tuple of absolute paths to the sample files.

max_samples: pydantic.PositiveInt | None = None¶: If set, limits the number of samples loaded per event to this value.

model_config¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

parameter_aliases: dict[str, str] = None¶: Mapping of internal parameter names to the column names used in the CSV files.

class gwkokab.analysis.core.inference_io.FlowMCGlobalConfig(/, **data: Any)¶

Bases: pydantic.BaseModel

Configuration for the FlowMC sampler.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod from_json(config_path: str) → FlowMCGlobalConfig¶

Initializes the loader from a JSON configuration file.

Parameters:: config_path (str) – Path to the JSON file containing loader settings.
Returns:: An instance of FlowMCGlobalConfig.
Return type:: FlowMCGlobalConfig

batch_size: pydantic.PositiveInt = None¶: Number of samples per training batch for the Normalizing Flow.

chain_batch_size: Annotated[int, Field(ge=0)] = None¶

Batch size for processing chains.

If 0, processes all chains simultaneously.

global_thinning: pydantic.PositiveInt = None¶: Thinning factor applied to global (Normalizing Flow) proposals.

history_window: pydantic.PositiveInt = None¶: Size of the rolling history window used for training data or adaptation.

learning_rate: pydantic.PositiveFloat = None¶: Learning rate for the Normalizing Flow optimizer.

local_sampler_name: Literal['mala', 'hmc'] = None¶: The underlying local MCMC sampler to use (‘mala’ for MALA or ‘hmc’ for HMC).

local_thinning: pydantic.PositiveInt = None¶: Thinning factor applied to local sampler steps.

mass_matrix: pydantic.PositiveFloat | NumPyArrayTypeForPydantic = None¶: Mass matrix diagonal elements or scalar value for HMC trajectory dynamics.

model_config¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_NFproposal_batch_size: pydantic.PositiveInt = None¶: Batch size used when generating proposal steps from the Normalizing Flow.

n_chains: pydantic.PositiveInt = None¶: Number of chains to sample.

n_epochs: pydantic.PositiveInt = None¶: Number of training epochs for the Normalizing Flow per training loop.

n_global_steps: pydantic.PositiveInt = None¶: Number of global production/exploration steps to take using the NF proposal.

n_leapfrog: pydantic.PositiveInt = None¶: Number of leapfrog steps per HMC trajectory (ignored if using MALA).

n_local_steps: pydantic.PositiveInt = None¶: Number of local sampler steps to take between global updates.

n_max_examples: pydantic.PositiveInt = None¶: Maximum number of total samples/examples to store in the training history.

n_production_loops: pydantic.PositiveInt = None¶: Number of production loops to run after the model is trained.

n_training_loops: pydantic.PositiveInt = None¶: Number of initial loops dedicated to tuning and training the Normalizing Flow.

rq_spline_hidden_units: list[pydantic.PositiveInt] = None¶: Layer widths of the neural network conditioning the Rational-Quadratic Splines.

rq_spline_n_bins: pydantic.PositiveInt = None¶: Number of bins used in each Rational-Quadratic Spline transformation layer.

rq_spline_n_layers: pydantic.PositiveInt = None¶: Total number of flow layers (coupling blocks) in the Normalizing Flow.

rq_spline_range: tuple[float, float] = None¶: The bounding box interval (min, max) where the spline transformations are active.

sampler_name: Literal['flowMC'] = 'flowMC'¶

The name of the sampler to use.

Currently only ‘flowMC’ is supported.

step_size: pydantic.PositiveFloat = None¶: The initial step size (or integration step size) for the local sampler.

verbose: bool = None¶: If True, prints execution progress logs and loss metrics to the console.

class gwkokab.analysis.core.inference_io.NumpyroGlobalConfig(/, **data: Any)¶

Bases: pydantic.BaseModel

Configuration for the Numpyro sampler, including both kernel and MCMC settings.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod from_json(config_path: str) → NumpyroGlobalConfig¶

Initializes the loader from a JSON configuration file.

Parameters:: config_path (str) – Path to the JSON file containing loader settings.
Returns:: An instance of NumpyroGlobalConfig.
Return type:: NumpyroGlobalConfig

kernel: NumpyroNUTSSamplerConfig = None¶: Configuration for the NUTS sampler kernel.

mcmc: NumpyroMCMCConfig = None¶: Configuration for the MCMC sampling procedure.

model_config¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sampler_name: Literal['numpyro'] = 'numpyro'¶

The name of the sampler to use.

Currently only ‘numpyro’ is supported.

class gwkokab.analysis.core.inference_io.NumpyroMCMCConfig(/, **data: Any)¶

Bases: pydantic.BaseModel

Configuration for the Numpyro MCMC.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

chain_method: Literal['parallel', 'sequential', 'vectorized'] = None¶

A callable jax transform like vmap() or one of “parallel” (default), “sequential” “vectorized”.

The method “parallel” is used to execute the drawing process in parallel on XLA devices (CPUs/GPUs/TPUs), If there are not enough devices for “parallel”, we fall back to “sequential” method to draw chains sequentially. “vectorized” method is an experimental feature which vectorizes the drawing method, hence allowing us to collect samples in parallel on a single device.

jit_model_args: bool = None¶

If set to True, this will compile the potential energy computation as a function of model arguments.

As such, calling run() again on a same sized but different dataset will not result in additional compilation cost. Note that currently, this does not take effect for the case num_chains > 1 and chain_method == ‘parallel’.

model_config¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

num_chains: pydantic.PositiveInt = None¶

Number of MCMC chains to run.

By default, chains will be run in parallel using pmap(). If there are not enough devices available, chains will be run in sequence.

num_samples: pydantic.PositiveInt = None¶: Number of samples to generate from the Markov chain.

num_warmup: pydantic.PositiveInt = None¶: Number of warmup steps.

progress_bar: bool = None¶

Whether to enable progress bar updates.

Defaults to True.

progress_rate: pydantic.PositiveInt | None = None¶

Number of iterations per progress bar update.

Defaults to None, which is 5% of total iterations when there are more than 20 iterations, otherwise every iteration.

thinning: pydantic.PositiveInt = None¶

Positive integer that controls the fraction of post-warmup samples that are retained.

For example if thinning is 2 then every other sample is retained. Defaults to 1, i.e. no thinning.

class gwkokab.analysis.core.inference_io.NumpyroNUTSSamplerConfig(/, **data: Any)¶

Bases: pydantic.BaseModel

Configuration for the Numpyro NUTS.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

adapt_mass_matrix: bool = None¶: A flag to decide if we want to adapt mass matrix during warm-up phase using Welford scheme.

adapt_step_size: bool = None¶: A flag to decide if we want to adapt step_size during warm-up phase using Dual Averaging scheme.

dense_mass: bool | list[tuple[str, Ellipsis]] = None¶

This flag controls whether mass matrix is dense (i.e. full-rank) or diagonal (defaults to dense_mass=False).

To specify a structured mass matrix, users can provide a list of tuples of site names. Each tuple represents a block in the joint mass matrix. For example, assuming that the model has latent variables “x”, “y”, “z” (where each variable can be multi-dimensional), possible specifications and corresponding mass matrix structures are as follows:

dense_mass=[(“x”, “y”)]: use a dense mass matrix for the joint (x, y) and a
diagonal mass matrix for z
dense_mass=[] (equivalent to dense_mass=False): use a diagonal mass matrix
for the joint (x, y, z)
dense_mass=[(“x”, “y”, “z”)] (equivalent to full_mass=True): use a dense
mass matrix for the joint (x, y, z)
dense_mass=[(“x”,), (“y”,), (“z”)]: use dense mass matrices for each of x, y,
and z (i.e. block-diagonal with 3 blocks)

find_heuristic_step_size: bool = None¶

Whether or not to use a heuristic function to adjust the step size at the beginning of each adaptation window.

Defaults to False.

forward_mode_differentiation: bool = None¶

Whether to use forward-mode differentiation or reverse-mode differentiation.

By default, we use reverse mode but the forward mode can be useful in some cases to improve the performance. In addition, some control flow utility on JAX such as jax.lax.while_loop or jax.lax.fori_loop only supports forward-mode differentiation.

inverse_mass_matrix: None | NumPyArrayTypeForPydantic | dict = None¶

Initial value for inverse mass matrix.

This may be adapted during warmup if adapt_mass_matrix = True. If no value is specified, then it is initialized to the identity matrix. For a potential_fn with general JAX pytree parameters, the order of entries of the mass matrix is the order of the flattened version of pytree parameters obtained with tree_flatten(), which is a bit ambiguous (see more at https://jax.readthedocs.io/en/latest/pytrees.html). If model is not None, here we can specify a structured block mass matrix as a dictionary, where keys are tuple of site names and values are the corresponding block of the mass matrix. For more information about structured mass matrix, see dense_mass argument.

max_tree_depth: pydantic.PositiveInt | tuple[pydantic.PositiveInt, pydantic.PositiveInt] = None¶

Max depth of the binary tree created during the doubling scheme of NUTS sampler.

Defaults to 8. This argument also accepts a tuple of integers (d1, d2), where d1 is the max tree depth during warmup phase and d2 is the max tree depth during post warmup phase.

model_config¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

regularize_mass_matrix: bool = None¶

step_size: pydantic.PositiveFloat = None¶

Determines the size of a single step taken by the verlet integrator while computing the trajectory using Hamiltonian dynamics.

If not specified, it will be set to 1.

target_accept_prob: Annotated[float, Field(gt=0.0, le=1.0)] = None¶

Target acceptance probability for step size adaptation using Dual Averaging.

Increasing this value will lead to a smaller step size, hence the sampling will be slower but more robust. Defaults to 0.8.

class gwkokab.analysis.core.inference_io.PoissonMeanEstimationLoader(/, **data: Any)¶

Bases: pydantic.BaseModel

!!! abstract “Usage Documentation”: [Models](../concepts/models.md)

A base class for creating Pydantic models.

__class_vars__¶: The names of the class variables defined on the model.

__private_attributes__¶: Metadata about the private attributes of the model.

__signature__¶: The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__¶: Whether model building is completed, or if there are still undefined fields.

__pydantic_core_schema__¶: The core schema of the model.

__pydantic_custom_init__¶: Whether the model has a custom __init__ function.

__pydantic_decorators__¶: Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__¶: A dictionary containing metadata about generic Pydantic models. The origin and args items map to the [__origin__][genericalias.__origin__] and [__args__][genericalias.__args__] attributes of [generic aliases][types-genericalias], and the parameter item maps to the __parameter__ attribute of generic classes.

__pydantic_parent_namespace__¶: Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__¶: The name of the post-init method for the model, if defined.

__pydantic_root_model__¶: Whether the model is a [RootModel][pydantic.root_model.RootModel].

__pydantic_serializer__¶: The pydantic-core SchemaSerializer used to dump instances of the model.

__pydantic_validator__¶: The pydantic-core SchemaValidator used to validate instances of the model.

__pydantic_fields__¶: A dictionary of field names and their corresponding [FieldInfo][pydantic.fields.FieldInfo] objects.

__pydantic_computed_fields__¶: A dictionary of computed field names and their corresponding [ComputedFieldInfo][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__¶: A dictionary containing extra values, if [extra][pydantic.config.ConfigDict.extra] is set to ‘allow’.

__pydantic_fields_set__¶: The names of fields explicitly set during instantiation.

__pydantic_private__¶: Values of private attributes set on the model instance.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod from_json(config_path: str, key: jaxtyping.PRNGKeyArray, parameters: Tuple[str, Ellipsis])¶

get_estimators() → Tuple[Callable[[jaxtyping.Array], jaxtyping.Array] | None, Callable[Ellipsis, jaxtyping.Array], dict[str, Any]]¶

loader: NeuralVolumeTimeSensitivityPoissonMeanLoader | NeuralVolumeProbabilityOfDetectionPoissonMeanLoader | GWTCInjectionLoader | CustomPoissonMeanEstimationLoader = None¶

class gwkokab.analysis.core.inference_io.SamplerConfig¶

Factory interface for generating Sampler Configs.

static from_json(config_path: str) → NumpyroGlobalConfig | FlowMCGlobalConfig¶: Initializes and returns the specific config instance directly from JSON.