tape.analysis#

Subpackages#

Submodules#

Attributes#

Classes#

AnalysisFunction

Base class for analysis functions.

FeatureExtractor

Apply light-curve package feature extractor to a light curve

LightCurve

This base class is meant to support various analysis routines and be

StetsonJ

Compute the StetsonJ statistic on data from one or several bands

StructureFunction2

Calculate structure function squared

Package Contents#

class AnalysisFunction[source]#

Bases: abc.ABC, Callable

Base class for analysis functions.

Analysis functions are functions that take few arrays representing an object and return a single pandas.Series representing the result.

cols(ens) List[str][source]#

Return the columns that the analysis function takes as input.

meta(ens) pd.DataFrame[source]#

Return the metadata pandas.DataFrame required by Dask to pre-build a computation graph. It is basically the schema for calculate() method output.

on(ens) List[str][source]#

Return the columns to group source table by. Typically, [ens._id_col].

__call__(*cols, \*\*kwargs)[source]#

Calculate the analysis function.

abstract cols(ens: Ensemble) List[str][source]#

Return the column names that the analysis function takes as input.

Parameters:

ens (Ensemble) – The ensemble object, it could be required to get column names of the “special” columns like ens._time_col or ens._err_col.

Returns:

The column names to select and pass to .calculate() method. For example [ens._time_col, ens._flux_col].

Return type:

List[str]

abstract meta(ens: Ensemble)[source]#

Return the schema of the analysis function output.

Parameters:

ens (Ensemble) – The ensemble object.

Returns:

pd.DataFrame or (str, dtype) tuple or {str – Dask meta, for example pd.DataFrame(columns=[‘x’, ‘y’], dtype=float).

Return type:

dtype} dictionary

abstract on(ens: Ensemble) List[str][source]#

Return the columns to group source table by.

Parameters:

ens (Ensemble) – The ensemble object.

Returns:

The column names to group by. Typically, [ens._id_col].

Return type:

List[str]

abstract __call__(*cols, **kwargs)[source]#

Calculate the analysis function.

Parameters:
  • *cols (array_like) – The columns to calculate the analysis function on. It must be consistent with .cols(ens) output.

  • **kwargs – Additional keyword arguments.

Returns:

The result, it must be consistent with .meta() output.

Return type:

pd.Series or pd.DataFrame or array or value

class FeatureExtractor(feature: light_curve.light_curve_ext._FeatureEvaluator)[source]#

Bases: tape.analysis.base.AnalysisFunction

Apply light-curve package feature extractor to a light curve

Parameters:

feature (light_curve.light_curve_ext._FeatureEvaluator) – Feature extractor to apply, see “light-curve” package for more details.

feature#

Feature extractor to apply, see “light-curve” package for more details.

Type:

light_curve.light_curve_ext._FeatureEvaluator

feature#
cols(ens: Ensemble) List[str][source]#

Return the column names that the analysis function takes as input.

Parameters:

ens (Ensemble) – The ensemble object, it could be required to get column names of the “special” columns like ens._time_col or ens._err_col.

Returns:

The column names to select and pass to .calculate() method. For example [ens._time_col, ens._flux_col].

Return type:

List[str]

meta(ens: Ensemble) pandas.DataFrame[source]#

Return the schema of the analysis function output.

It always returns a pandas.DataFrame with the same columns as self.feature.names and dtype np.float64. However, if input columns are all single precision floats then the output dtype will be np.float32.

on(ens: Ensemble) List[str][source]#

Return the columns to group source table by.

Parameters:

ens (Ensemble) – The ensemble object.

Returns:

The column names to group by. Typically, [ens._id_col].

Return type:

List[str]

__call__(time, flux, err, band, *, band_to_calc: str, **kwargs) pandas.DataFrame[source]#

Apply a feature extractor to a light curve, concatenating the results over all bands.

Parameters:
  • time (numpy.ndarray) – Time values

  • flux (numpy.ndarray) – Brightness values, flux or magnitudes

  • err (numpy.ndarray) – Errors for “flux”

  • band (numpy.ndarray) – Passband names.

  • band_to_calc (str or int or None) – Name of the passband to calculate features for, usually a string like “g” or “r”, or an integer. If None, then features are calculated for all sources - band is ignored.

  • **kwargs (dict) – Additional keyword arguments to pass to the feature extractor.

Returns:

features – Feature values for each band, dtype is a common type for input arrays.

Return type:

pandas.DataFrame

class LightCurve(times: numpy.ndarray, fluxes: numpy.ndarray, errors: numpy.ndarray, minimum_observations: int = 0)[source]#

This base class is meant to support various analysis routines and be extended as needed. (Hence it’s location in the analysis package.)

The base class ensures that the data for a single lightcurve is well formed. Namely that the input data is all of the same length, with NaN’s removed and that there are enough observations to perform a given analysis.

_times#
_fluxes#
_errors#
_minimum_observations#
_process_input_data()[source]#

Cleaning and validation occurs here, ideally by calling sub-methods for specific checks and cleaning tasks.

_filter_nans()[source]#

Mask out any NaN values from time, flux and error arrays

_check_input_data_size_is_equal()[source]#

Make sure that the three input np.arrays have the same size

_check_input_data_length_is_sufficient()[source]#

Make sure that we have enough data after cleaning and filtering to be able to perform Structure Function calculations.

calc_stetson_J[source]#
class StetsonJ[source]#

Bases: tape.analysis.base.AnalysisFunction

Compute the StetsonJ statistic on data from one or several bands

cols(ens: Ensemble) List[str][source]#

Return the column names that the analysis function takes as input.

Parameters:

ens (Ensemble) – The ensemble object, it could be required to get column names of the “special” columns like ens._time_col or ens._err_col.

Returns:

The column names to select and pass to .calculate() method. For example [ens._time_col, ens._flux_col].

Return type:

List[str]

meta(ens: Ensemble)[source]#

Return the schema of the analysis function output.

Parameters:

ens (Ensemble) – The ensemble object.

Returns:

pd.DataFrame or (str, dtype) tuple or {str – Dask meta, for example pd.DataFrame(columns=[‘x’, ‘y’], dtype=float).

Return type:

dtype} dictionary

on(ens: Ensemble) List[str][source]#

Return the columns to group source table by.

Parameters:

ens (Ensemble) – The ensemble object.

Returns:

The column names to group by. Typically, [ens._id_col].

Return type:

List[str]

__call__(flux: numpy.ndarray, err: numpy.ndarray, band: numpy.ndarray, *, band_to_calc: str | Iterable[str] | None = None, check_nans: bool = False)[source]#

Compute the StetsonJ statistic on data from one or several bands

Parameters:
  • flux (numpy.ndarray (N,)) – Array of flux/magnitude measurements

  • err (numpy.ndarray (N,)) – Array of associated flux/magnitude errors

  • band (numpy.ndarray (N,)) – Array of associated band labels

  • band_to_calc (str or list of str) – Bands to calculate StetsonJ on. Single band descriptor, or list of such descriptors.

  • check_nans (bool) – Boolean to run a check for NaN values and filter them out.

Returns:

stetsonJ – StetsonJ statistic for each of input bands.

Return type:

dict

Note

In case that no value for band_to_calc is passed, the function is executed on all available bands in band.

class StructureFunction2[source]#

Bases: tape.analysis.base.AnalysisFunction

Calculate structure function squared

cols(ens: Ensemble) List[str][source]#

Return the column names that the analysis function takes as input.

Parameters:

ens (Ensemble) – The ensemble object, it could be required to get column names of the “special” columns like ens._time_col or ens._err_col.

Returns:

The column names to select and pass to .calculate() method. For example [ens._time_col, ens._flux_col].

Return type:

List[str]

meta(ens: Ensemble) Dict[str, type][source]#

Return the schema of the analysis function output.

Parameters:

ens (Ensemble) – The ensemble object.

Returns:

pd.DataFrame or (str, dtype) tuple or {str – Dask meta, for example pd.DataFrame(columns=[‘x’, ‘y’], dtype=float).

Return type:

dtype} dictionary

on(ens: Ensemble) List[str][source]#

Return the columns to group source table by.

Parameters:

ens (Ensemble) – The ensemble object.

Returns:

The column names to group by. Typically, [ens._id_col].

Return type:

List[str]

__call__(time, flux, err=None, band=None, lc_id=None, *, sf_method='basic', argument_container=None) pandas.DataFrame[source]#

Calculate structure function squared using one of a variety of structure function calculation methods defined by the input argument sf_method, or in the argument container object.

Parameters:
  • time (numpy.ndarray (N,) or None) – Array of times when measurements were taken. If all array values are None or if a scalar None is provided, then equidistant time between measurements is assumed.

  • flux (numpy.ndarray (N,)) – Array of flux/magnitude measurements.

  • err (numpy.ndarray (N,), float, or None, optional) – Array of associated flux/magnitude errors. If a scalar value is provided we assume that error for all measurements. If None is provided, we assume all errors are 0. By default None

  • band (numpy.ndarray (N,), optional) – Array of associated band labels, by default None

  • lc_id (numpy.ndarray (N,), optional) – Array of lightcurve ids per data point. By default None

  • sf_method (str, optional) – The structure function calculation method to be used, by default “basic”.

  • argument_container (StructureFunctionArgumentContainer, optional) – Container object for additional configuration options, by default None.

Returns:

sf2 – Structure function squared for each of input bands.

Return type:

pandas.DataFrame

Notes

In case that no value for band_to_calc is passed, the function is executed on all available bands in band.

calc_sf2[source]#