ipsuite.configuration_selection package

Submodules

ipsuite.configuration_selection.base module

Base Node for ConfigurationSelection.

class ipsuite.configuration_selection.base.BatchConfigurationSelection(*args, **kwargs)[source]

Bases: ConfigurationSelection

Base node for BatchConfigurationSelection.

Attributes

data: list[ase.Atoms]

The atoms data to process. This must be an input to the Node

train_data: list[ase.Atoms]

Batch active learning methods usually take into account the data a model was trained on. The training dataset has to be supplied with this argument.

atoms: list[ase.Atoms]

The processed atoms data. This is an output of the Node. It does not have to be ‘field.Atoms’ but can also be e.g. a ‘property’.

train_data: list[Atoms]
class ipsuite.configuration_selection.base.ConfigurationSelection(*args, **kwargs)[source]

Bases: IPSNode

Base Node for ConfigurationSelection.

Attributes

data: list[Atoms]|list[list[Atoms]]|utils.types.SupportsAtoms

the data to select from

exclude_configurations: dict[str, list]|utils.types.SupportsSelectedConfigurations

Atoms to exclude from the

exclude: list[zntrack.Node]|zntrack.Node|None

Exclude the selected configurations from these nodes.

data: list[Atoms]
property excluded_frames: list[Atoms]

Get a list of the atoms objects that were not selected.

property frames: list[Atoms]

Get a list of the selected atoms objects.

get_data() list[Atoms][source]

Get the atoms data to process.

img_selection: Path = PosixPath('$nwd$/selection.png')
run()[source]

ZnTrack Node Run method.

select_atoms(atoms_lst: List[Atoms]) List[int][source]

Run the selection method.

Attributes

atoms_lst: List[ase.Atoms]

List of ase Atoms objects to select configurations from.

Returns

List[int]:

A list of the selected ids from 0 .. len(atoms_lst)

selected_ids: list[int] = NOT_AVAILABLE

ipsuite.configuration_selection.filter module

class ipsuite.configuration_selection.filter.FilterOutlier(*args, **kwargs)[source]

Bases: IPSNode

Remove outliers from the data based on a given property.

Attributes

keystr, default=”energy”

The property to filter on.

thresholdfloat, default=3

The threshold for filtering in units of standard deviations.

direction{“above”, “below”, “both”}, default=”both”

The direction to filter in.

data: list[Atoms]
direction: Literal['above', 'below', 'both'] = 'both'
property excluded_frames
filtered_indices: list = NOT_AVAILABLE
property frames: list[Atoms]
histogram: str = PosixPath('$nwd$/histogram.png')
key: str = 'energy'
run()[source]
threshold: float = 3

ipsuite.configuration_selection.index module

Select configurations by item, e.g. slice or list of indices.

class ipsuite.configuration_selection.index.IndexSelection(*args, **kwargs)[source]

Bases: ConfigurationSelection

Select configurations by explicit indices or slice parameters.

Parameters

datalist[ase.Atoms]

The atomic configurations to select from.

indiceslist[int], optional

Explicit list of indices to select. Cannot be used with slice parameters.

startint, optional

Start index for slice selection.

stopint, optional

Stop index for slice selection.

stepint, optional

Step size for slice selection.

Attributes

selected_idslist[int]

Indices of selected configurations.

frameslist[ase.Atoms]

The selected atomic configurations.

excluded_frameslist[ase.Atoms]

The atomic configurations that were not selected.

Examples

>>> with project:
...     data = ips.AddData(file="ethanol.xyz")
...     selector = ips.IndexSelection(data=data.frames, indices=[0, 5, 10, 15])
>>> project.repro()
>>> print(f"Selected {len(selector.selected_ids)} configurations with IDs: "
...       f"{selector.selected_ids}")
Selected 4 configurations with IDs: [0, 5, 10, 15]
data: list[ase.Atoms]
indices: list[int] | None = None
select_atoms(atoms_lst: List[Atoms]) List[int][source]

Select Atoms by explicit indices or slice parameters.

start: int | None = None
step: int | None = None
stop: int | None = None

ipsuite.configuration_selection.kernel module

ipsuite.configuration_selection.random module

Module for selecting Atoms randomly.

class ipsuite.configuration_selection.random.RandomSelection(*args, **kwargs)[source]

Bases: ConfigurationSelection

Select configurations randomly without replacement.

Parameters

datalist[ase.Atoms]

The atomic configurations to select from.

n_configurationsint

Number of configurations to select.

seedint, default=1234

Random seed for reproducible selection.

Attributes

selected_idslist[int]

Indices of selected configurations.

frameslist[ase.Atoms]

The selected atomic configurations.

excluded_frameslist[ase.Atoms]

The atomic configurations that were not selected.

Examples

>>> with project:
...     data = ips.AddData(file="ethanol.xyz")
...     selector = ips.RandomSelection(data=data.frames, n_configurations=10, seed=42)
>>> project.repro()
>>> print(f"Selected {len(selector.selected_ids)} configurations with IDs: "
...       f"{selector.selected_ids}")
Selected 10 configurations with IDs: [83, 53, 70, 45, 44, 39, 22, 80, 10, 0]
data: list[ase.Atoms]
n_configurations: int
seed: int = 1234
select_atoms(atoms_lst: List[Atoms]) List[int][source]

Run the selection method.

Attributes

atoms_lst: List[ase.Atoms]

List of ase Atoms objects to select configurations from.

Returns

List[int]:

A list of the selected ids from 0 .. len(atoms_lst)

ipsuite.configuration_selection.split module

Module for selecting Atoms randomly.

class ipsuite.configuration_selection.split.SplitSelection(*args, **kwargs)[source]

Bases: ConfigurationSelection

Select the first n% of configurations from the dataset.

Parameters

datalist[ase.Atoms]

The atomic configurations to select from.

splitfloat

Fraction of the data to select (0.0 to 1.0).

Attributes

selected_idslist[int]

Indices of selected configurations.

frameslist[ase.Atoms]

The selected atomic configurations.

excluded_frameslist[ase.Atoms]

The atomic configurations that were not selected.

Examples

>>> with project:
...     data = ips.AddData(file="ethanol.xyz")  # contains 100 frames
...     selector = ips.SplitSelection(data=data.frames, split=0.1)
>>> project.repro()
>>> print(f"Selected {len(selector.selected_ids)} configurations with IDs: "
...       f"{selector.selected_ids}")
Selected 10 configurations with IDs: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
data: list[ase.Atoms]
select_atoms(atoms_lst: List[Atoms]) List[int][source]

Run the selection method.

Attributes

atoms_lst: List[ase.Atoms]

List of ase Atoms objects to select configurations from.

Returns

List[int]:

A list of the selected ids from 0 .. len(atoms_lst)

split: float

ipsuite.configuration_selection.threshold module

Selecting atoms with a given step between them.

class ipsuite.configuration_selection.threshold.ThresholdSelection(*args, **kwargs)[source]

Bases: ConfigurationSelection

Select atoms based on a given threshold.

Select atoms above a given threshold or the n_configurations with the highest / lowest value. Typically useful for uncertainty based selection.

Attributes

key: str

The key in ‘calc.results’ to select from

threshold: float, optional

All values above (or below if negative) this threshold will be selected. If n_configurations is given, ‘self.threshold’ will be prioritized, but a maximum of n_configurations will be selected.

reference: str, optional

For visualizing the selection a reference value can be given. For ‘energy_uncertainty’ this would typically be ‘energy’.

n_configurations: int, optional

Number of configurations to select.

min_distance: int, optional

Minimum distance between selected configurations.

dim_reduction: str, optional

Reduces the dimensionality of the chosen uncertainty along the specified axis by calculating either the maximum or mean value.

Choose from [“max”, “mean”]

reduction_axis: tuple(int), optional

Specifies the axis along which the reduction occurs.

data: list[ase.Atoms]
dim_reduction: str = None
key: str = 'energy_uncertainty'
min_distance: int = 1
n_configurations: int | None = None
reduction_axis: list[int] = (1, 2)
reference: str = 'energy'
select_atoms(atoms_lst: List[Atoms], save_fig: bool = True) List[int][source]

Take every nth (step) object of a given atoms list.

Parameters

atoms_lst: typing.List[ase.Atoms]

list of atoms objects to arange

Returns

typing.List[int]:

list containing the taken indices

threshold: float | None = None
ipsuite.configuration_selection.threshold.check_dimension(values)[source]
ipsuite.configuration_selection.threshold.max_reduction(values, axis)[source]
ipsuite.configuration_selection.threshold.mean_reduction(values, axis)[source]

ipsuite.configuration_selection.uniform_arange module

Selecting atoms with a given step between them.

class ipsuite.configuration_selection.uniform_arange.UniformArangeSelection(*args, **kwargs)[source]

Bases: ConfigurationSelection

Select configurations with uniform spacing using a step size.

Parameters

datalist[ase.Atoms]

The atomic configurations to select from.

stepint

Step size for selection. Every nth configuration will be selected.

Attributes

selected_idslist[int]

Indices of selected configurations.

frameslist[ase.Atoms]

The selected atomic configurations.

excluded_frameslist[ase.Atoms]

The atomic configurations that were not selected.

Examples

>>> with project:
...     data = ips.AddData(file="ethanol.xyz")  # contains 100 frames
...     selector = ips.UniformArangeSelection(data=data.frames, step=10)
>>> project.repro()
>>> print(f"Selected {len(selector.selected_ids)} configurations with IDs: "
...       f"{selector.selected_ids}")
Selected 10 configurations with IDs: [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
data: list[ase.Atoms]
select_atoms(atoms_lst: List[Atoms]) List[int][source]

Take every nth (step) object of a given atoms list.

Parameters

atoms_lst: typing.List[ase.Atoms]

list of atoms objects to arange

Returns

typing.List[int]:

list containing the taken indices

step: int

ipsuite.configuration_selection.uniform_energetic module

Module for selecting atoms uniformly in energy space.

class ipsuite.configuration_selection.uniform_energetic.UniformEnergeticSelection(*args, **kwargs)[source]

Bases: ConfigurationSelection

A class to perform data selection based on uniform global energy selection.

data: list[ase.Atoms]
n_configurations: int
select_atoms(atoms_lst: List[Atoms]) List[int][source]

Select Atoms uniform in energy space.

ipsuite.configuration_selection.uniform_temporal module

Module for selecting atoms uniform in time.

class ipsuite.configuration_selection.uniform_temporal.UniformTemporalSelection(*args, **kwargs)[source]

Bases: ConfigurationSelection

Select configurations uniformly distributed across time.

Parameters

datalist[ase.Atoms]

The atomic configurations to select from.

n_configurationsint

Number of configurations to select uniformly across the trajectory.

Attributes

selected_idslist[int]

Indices of selected configurations.

frameslist[ase.Atoms]

The selected atomic configurations.

excluded_frameslist[ase.Atoms]

The atomic configurations that were not selected.

Examples

>>> with project:
...     data = ips.AddData(file="ethanol.xyz")  # contains 100 frames
...     selector = ips.UniformTemporalSelection(data=data.frames, n_configurations=5)
>>> project.repro()
>>> print(f"Selected {len(selector.selected_ids)} configurations with IDs: "
...       f"{selector.selected_ids}")
Selected 5 configurations with IDs: [0, 25, 50, 74, 99]
data: list[ase.Atoms]
n_configurations: int
select_atoms(atoms_lst: List[Atoms]) List[int][source]

Select Atoms uniform in time.

Module contents

Configuration Selection Nodes.

class ipsuite.configuration_selection.ConfigurationSelection(*args, **kwargs)[source]

Bases: IPSNode

Base Node for ConfigurationSelection.

Attributes

data: list[Atoms]|list[list[Atoms]]|utils.types.SupportsAtoms

the data to select from

exclude_configurations: dict[str, list]|utils.types.SupportsSelectedConfigurations

Atoms to exclude from the

exclude: list[zntrack.Node]|zntrack.Node|None

Exclude the selected configurations from these nodes.

data: list[Atoms]
property excluded_frames: list[Atoms]

Get a list of the atoms objects that were not selected.

property frames: list[Atoms]

Get a list of the selected atoms objects.

get_data() list[Atoms][source]

Get the atoms data to process.

img_selection: Path = PosixPath('$nwd$/selection.png')
run()[source]

ZnTrack Node Run method.

select_atoms(atoms_lst: List[Atoms]) List[int][source]

Run the selection method.

Attributes

atoms_lst: List[ase.Atoms]

List of ase Atoms objects to select configurations from.

Returns

List[int]:

A list of the selected ids from 0 .. len(atoms_lst)

selected_ids: list[int] = NOT_AVAILABLE
class ipsuite.configuration_selection.FilterOutlier(*args, **kwargs)[source]

Bases: IPSNode

Remove outliers from the data based on a given property.

Attributes

keystr, default=”energy”

The property to filter on.

thresholdfloat, default=3

The threshold for filtering in units of standard deviations.

direction{“above”, “below”, “both”}, default=”both”

The direction to filter in.

data: list[Atoms]
direction: Literal['above', 'below', 'both'] = 'both'
property excluded_frames
filtered_indices: list = NOT_AVAILABLE
property frames: list[Atoms]
histogram: str = PosixPath('$nwd$/histogram.png')
key: str = 'energy'
run()[source]
threshold: float = 3
class ipsuite.configuration_selection.IndexSelection(*args, **kwargs)[source]

Bases: ConfigurationSelection

Select configurations by explicit indices or slice parameters.

Parameters

datalist[ase.Atoms]

The atomic configurations to select from.

indiceslist[int], optional

Explicit list of indices to select. Cannot be used with slice parameters.

startint, optional

Start index for slice selection.

stopint, optional

Stop index for slice selection.

stepint, optional

Step size for slice selection.

Attributes

selected_idslist[int]

Indices of selected configurations.

frameslist[ase.Atoms]

The selected atomic configurations.

excluded_frameslist[ase.Atoms]

The atomic configurations that were not selected.

Examples

>>> with project:
...     data = ips.AddData(file="ethanol.xyz")
...     selector = ips.IndexSelection(data=data.frames, indices=[0, 5, 10, 15])
>>> project.repro()
>>> print(f"Selected {len(selector.selected_ids)} configurations with IDs: "
...       f"{selector.selected_ids}")
Selected 4 configurations with IDs: [0, 5, 10, 15]
data: list[ase.Atoms]
indices: list[int] | None = None
select_atoms(atoms_lst: List[Atoms]) List[int][source]

Select Atoms by explicit indices or slice parameters.

start: int | None = None
step: int | None = None
stop: int | None = None
class ipsuite.configuration_selection.RandomSelection(*args, **kwargs)[source]

Bases: ConfigurationSelection

Select configurations randomly without replacement.

Parameters

datalist[ase.Atoms]

The atomic configurations to select from.

n_configurationsint

Number of configurations to select.

seedint, default=1234

Random seed for reproducible selection.

Attributes

selected_idslist[int]

Indices of selected configurations.

frameslist[ase.Atoms]

The selected atomic configurations.

excluded_frameslist[ase.Atoms]

The atomic configurations that were not selected.

Examples

>>> with project:
...     data = ips.AddData(file="ethanol.xyz")
...     selector = ips.RandomSelection(data=data.frames, n_configurations=10, seed=42)
>>> project.repro()
>>> print(f"Selected {len(selector.selected_ids)} configurations with IDs: "
...       f"{selector.selected_ids}")
Selected 10 configurations with IDs: [83, 53, 70, 45, 44, 39, 22, 80, 10, 0]
data: list[ase.Atoms]
n_configurations: int
seed: int = 1234
select_atoms(atoms_lst: List[Atoms]) List[int][source]

Run the selection method.

Attributes

atoms_lst: List[ase.Atoms]

List of ase Atoms objects to select configurations from.

Returns

List[int]:

A list of the selected ids from 0 .. len(atoms_lst)

class ipsuite.configuration_selection.SplitSelection(*args, **kwargs)[source]

Bases: ConfigurationSelection

Select the first n% of configurations from the dataset.

Parameters

datalist[ase.Atoms]

The atomic configurations to select from.

splitfloat

Fraction of the data to select (0.0 to 1.0).

Attributes

selected_idslist[int]

Indices of selected configurations.

frameslist[ase.Atoms]

The selected atomic configurations.

excluded_frameslist[ase.Atoms]

The atomic configurations that were not selected.

Examples

>>> with project:
...     data = ips.AddData(file="ethanol.xyz")  # contains 100 frames
...     selector = ips.SplitSelection(data=data.frames, split=0.1)
>>> project.repro()
>>> print(f"Selected {len(selector.selected_ids)} configurations with IDs: "
...       f"{selector.selected_ids}")
Selected 10 configurations with IDs: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
data: list[ase.Atoms]
select_atoms(atoms_lst: List[Atoms]) List[int][source]

Run the selection method.

Attributes

atoms_lst: List[ase.Atoms]

List of ase Atoms objects to select configurations from.

Returns

List[int]:

A list of the selected ids from 0 .. len(atoms_lst)

split: float
class ipsuite.configuration_selection.ThresholdSelection(*args, **kwargs)[source]

Bases: ConfigurationSelection

Select atoms based on a given threshold.

Select atoms above a given threshold or the n_configurations with the highest / lowest value. Typically useful for uncertainty based selection.

Attributes

key: str

The key in ‘calc.results’ to select from

threshold: float, optional

All values above (or below if negative) this threshold will be selected. If n_configurations is given, ‘self.threshold’ will be prioritized, but a maximum of n_configurations will be selected.

reference: str, optional

For visualizing the selection a reference value can be given. For ‘energy_uncertainty’ this would typically be ‘energy’.

n_configurations: int, optional

Number of configurations to select.

min_distance: int, optional

Minimum distance between selected configurations.

dim_reduction: str, optional

Reduces the dimensionality of the chosen uncertainty along the specified axis by calculating either the maximum or mean value.

Choose from [“max”, “mean”]

reduction_axis: tuple(int), optional

Specifies the axis along which the reduction occurs.

data: list[ase.Atoms]
dim_reduction: str = None
key: str = 'energy_uncertainty'
min_distance: int = 1
n_configurations: int | None = None
reduction_axis: list[int] = (1, 2)
reference: str = 'energy'
select_atoms(atoms_lst: List[Atoms], save_fig: bool = True) List[int][source]

Take every nth (step) object of a given atoms list.

Parameters

atoms_lst: typing.List[ase.Atoms]

list of atoms objects to arange

Returns

typing.List[int]:

list containing the taken indices

threshold: float | None = None
class ipsuite.configuration_selection.UniformArangeSelection(*args, **kwargs)[source]

Bases: ConfigurationSelection

Select configurations with uniform spacing using a step size.

Parameters

datalist[ase.Atoms]

The atomic configurations to select from.

stepint

Step size for selection. Every nth configuration will be selected.

Attributes

selected_idslist[int]

Indices of selected configurations.

frameslist[ase.Atoms]

The selected atomic configurations.

excluded_frameslist[ase.Atoms]

The atomic configurations that were not selected.

Examples

>>> with project:
...     data = ips.AddData(file="ethanol.xyz")  # contains 100 frames
...     selector = ips.UniformArangeSelection(data=data.frames, step=10)
>>> project.repro()
>>> print(f"Selected {len(selector.selected_ids)} configurations with IDs: "
...       f"{selector.selected_ids}")
Selected 10 configurations with IDs: [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
data: list[ase.Atoms]
select_atoms(atoms_lst: List[Atoms]) List[int][source]

Take every nth (step) object of a given atoms list.

Parameters

atoms_lst: typing.List[ase.Atoms]

list of atoms objects to arange

Returns

typing.List[int]:

list containing the taken indices

step: int
class ipsuite.configuration_selection.UniformEnergeticSelection(*args, **kwargs)[source]

Bases: ConfigurationSelection

A class to perform data selection based on uniform global energy selection.

data: list[ase.Atoms]
n_configurations: int
select_atoms(atoms_lst: List[Atoms]) List[int][source]

Select Atoms uniform in energy space.

class ipsuite.configuration_selection.UniformTemporalSelection(*args, **kwargs)[source]

Bases: ConfigurationSelection

Select configurations uniformly distributed across time.

Parameters

datalist[ase.Atoms]

The atomic configurations to select from.

n_configurationsint

Number of configurations to select uniformly across the trajectory.

Attributes

selected_idslist[int]

Indices of selected configurations.

frameslist[ase.Atoms]

The selected atomic configurations.

excluded_frameslist[ase.Atoms]

The atomic configurations that were not selected.

Examples

>>> with project:
...     data = ips.AddData(file="ethanol.xyz")  # contains 100 frames
...     selector = ips.UniformTemporalSelection(data=data.frames, n_configurations=5)
>>> project.repro()
>>> print(f"Selected {len(selector.selected_ids)} configurations with IDs: "
...       f"{selector.selected_ids}")
Selected 5 configurations with IDs: [0, 25, 50, 74, 99]
data: list[ase.Atoms]
n_configurations: int
select_atoms(atoms_lst: List[Atoms]) List[int][source]

Select Atoms uniform in time.