tape.analysis

tape.analysis#

Subpackages#

tape.analysis.structure_function

Submodules#

Attributes#

`calc_stetson_J`
`calc_sf2`

Classes#

`AnalysisFunction`	Base class for analysis functions.
`FeatureExtractor`	Apply light-curve package feature extractor to a light curve
`LightCurve`	This base class is meant to support various analysis routines and be
`StetsonJ`	Compute the StetsonJ statistic on data from one or several bands
`StructureFunction2`	Calculate structure function squared

Package Contents#

class AnalysisFunction[source]#

Bases: abc.ABC, Callable

Base class for analysis functions.

Analysis functions are functions that take few arrays representing an object and return a single pandas.Series representing the result.

cols(ens) → List[str][source]#: Return the columns that the analysis function takes as input.

meta(ens) → pd.DataFrame[source]#: Return the metadata pandas.DataFrame required by Dask to pre-build a computation graph. It is basically the schema for calculate() method output.

on(ens) → List[str][source]#: Return the columns to group source table by. Typically, [ens._id_col].

__call__(*cols, \*\*kwargs)[source]#: Calculate the analysis function.

abstract cols(ens: Ensemble) → List[str][source]#

Return the column names that the analysis function takes as input.

Parameters:: ens (Ensemble) – The ensemble object, it could be required to get column names of the “special” columns like ens._time_col or ens._err_col.
Returns:: The column names to select and pass to .calculate() method. For example [ens._time_col, ens._flux_col].
Return type:: List[str]

abstract meta(ens: Ensemble)[source]#

Return the schema of the analysis function output.

Parameters:: ens (Ensemble) – The ensemble object.
Returns:: pd.DataFrame or (str, dtype) tuple or {str – Dask meta, for example pd.DataFrame(columns=[‘x’, ‘y’], dtype=float).
Return type:: dtype} dictionary

abstract on(ens: Ensemble) → List[str][source]#

Return the columns to group source table by.

Parameters:: ens (Ensemble) – The ensemble object.
Returns:: The column names to group by. Typically, [ens._id_col].
Return type:: List[str]

abstract __call__(*cols, **kwargs)[source]#

Calculate the analysis function.

Parameters:

*cols (array_like) – The columns to calculate the analysis function on. It must be consistent with .cols(ens) output.
**kwargs – Additional keyword arguments.

Returns:

The result, it must be consistent with .meta() output.

Return type:

pd.Series or pd.DataFrame or array or value

class FeatureExtractor(feature: light_curve.light_curve_ext._FeatureEvaluator)[source]#

Bases: tape.analysis.base.AnalysisFunction

Apply light-curve package feature extractor to a light curve

Parameters:: feature (light_curve.light_curve_ext._FeatureEvaluator) – Feature extractor to apply, see “light-curve” package for more details.

feature#

Feature extractor to apply, see “light-curve” package for more details.

Type:: light_curve.light_curve_ext._FeatureEvaluator

feature#

cols(ens: Ensemble) → List[str][source]#

Return the column names that the analysis function takes as input.

Parameters:: ens (Ensemble) – The ensemble object, it could be required to get column names of the “special” columns like ens._time_col or ens._err_col.
Returns:: The column names to select and pass to .calculate() method. For example [ens._time_col, ens._flux_col].
Return type:: List[str]

meta(ens: Ensemble) → pandas.DataFrame[source]#

Return the schema of the analysis function output.

It always returns a pandas.DataFrame with the same columns as self.feature.names and dtype np.float64. However, if input columns are all single precision floats then the output dtype will be np.float32.

on(ens: Ensemble) → List[str][source]#

Return the columns to group source table by.

Parameters:: ens (Ensemble) – The ensemble object.
Returns:: The column names to group by. Typically, [ens._id_col].
Return type:: List[str]

__call__(time, flux, err, band, *, band_to_calc: str, **kwargs) → pandas.DataFrame[source]#

Apply a feature extractor to a light curve, concatenating the results over all bands.

Parameters:

time (numpy.ndarray) – Time values
flux (numpy.ndarray) – Brightness values, flux or magnitudes
err (numpy.ndarray) – Errors for “flux”
band (numpy.ndarray) – Passband names.
band_to_calc (str or int or None) – Name of the passband to calculate features for, usually a string like “g” or “r”, or an integer. If None, then features are calculated for all sources - band is ignored.
**kwargs (dict) – Additional keyword arguments to pass to the feature extractor.

Returns:

features – Feature values for each band, dtype is a common type for input arrays.

Return type:

pandas.DataFrame

class LightCurve(times: numpy.ndarray, fluxes: numpy.ndarray, errors: numpy.ndarray, minimum_observations: int = 0)[source]#

This base class is meant to support various analysis routines and be extended as needed. (Hence it’s location in the analysis package.)

The base class ensures that the data for a single lightcurve is well formed. Namely that the input data is all of the same length, with NaN’s removed and that there are enough observations to perform a given analysis.

_times#

_fluxes#

_errors#

_minimum_observations#

_process_input_data()[source]#: Cleaning and validation occurs here, ideally by calling sub-methods for specific checks and cleaning tasks.

_filter_nans()[source]#: Mask out any NaN values from time, flux and error arrays

_check_input_data_size_is_equal()[source]#: Make sure that the three input np.arrays have the same size

_check_input_data_length_is_sufficient()[source]#: Make sure that we have enough data after cleaning and filtering to be able to perform Structure Function calculations.

calc_stetson_J[source]#

class StetsonJ[source]#

Bases: tape.analysis.base.AnalysisFunction

Compute the StetsonJ statistic on data from one or several bands

cols(ens: Ensemble) → List[str][source]#

Return the column names that the analysis function takes as input.

Parameters:: ens (Ensemble) – The ensemble object, it could be required to get column names of the “special” columns like ens._time_col or ens._err_col.
Returns:: The column names to select and pass to .calculate() method. For example [ens._time_col, ens._flux_col].
Return type:: List[str]

meta(ens: Ensemble)[source]#

Return the schema of the analysis function output.

Parameters:: ens (Ensemble) – The ensemble object.
Returns:: pd.DataFrame or (str, dtype) tuple or {str – Dask meta, for example pd.DataFrame(columns=[‘x’, ‘y’], dtype=float).
Return type:: dtype} dictionary

on(ens: Ensemble) → List[str][source]#

Return the columns to group source table by.

Parameters:: ens (Ensemble) – The ensemble object.
Returns:: The column names to group by. Typically, [ens._id_col].
Return type:: List[str]

__call__(flux: numpy.ndarray, err: numpy.ndarray, band: numpy.ndarray, *, band_to_calc: str | Iterable[str] | None = None, check_nans: bool = False)[source]#

Compute the StetsonJ statistic on data from one or several bands

Parameters:

flux (numpy.ndarray (N,)) – Array of flux/magnitude measurements
err (numpy.ndarray (N,)) – Array of associated flux/magnitude errors
band (numpy.ndarray (N,)) – Array of associated band labels
band_to_calc (str or list of str) – Bands to calculate StetsonJ on. Single band descriptor, or list of such descriptors.
check_nans (bool) – Boolean to run a check for NaN values and filter them out.

Returns:

stetsonJ – StetsonJ statistic for each of input bands.

Return type:

dict

Note

In case that no value for band_to_calc is passed, the function is executed on all available bands in band.

class StructureFunction2[source]#

Bases: tape.analysis.base.AnalysisFunction

Calculate structure function squared

cols(ens: Ensemble) → List[str][source]#

Return the column names that the analysis function takes as input.

Parameters:: ens (Ensemble) – The ensemble object, it could be required to get column names of the “special” columns like ens._time_col or ens._err_col.
Returns:: The column names to select and pass to .calculate() method. For example [ens._time_col, ens._flux_col].
Return type:: List[str]

meta(ens: Ensemble) → Dict[str, type][source]#

Return the schema of the analysis function output.

Parameters:: ens (Ensemble) – The ensemble object.
Returns:: pd.DataFrame or (str, dtype) tuple or {str – Dask meta, for example pd.DataFrame(columns=[‘x’, ‘y’], dtype=float).
Return type:: dtype} dictionary

on(ens: Ensemble) → List[str][source]#

Return the columns to group source table by.

Parameters:: ens (Ensemble) – The ensemble object.
Returns:: The column names to group by. Typically, [ens._id_col].
Return type:: List[str]

__call__(time, flux, err=None, band=None, lc_id=None, *, sf_method='basic', argument_container=None) → pandas.DataFrame[source]#

Calculate structure function squared using one of a variety of structure function calculation methods defined by the input argument sf_method, or in the argument container object.

Parameters:

time (numpy.ndarray (N,) or None) – Array of times when measurements were taken. If all array values are None or if a scalar None is provided, then equidistant time between measurements is assumed.
flux (numpy.ndarray (N,)) – Array of flux/magnitude measurements.
err (numpy.ndarray (N,), float, or None, optional) – Array of associated flux/magnitude errors. If a scalar value is provided we assume that error for all measurements. If None is provided, we assume all errors are 0. By default None
band (numpy.ndarray (N,), optional) – Array of associated band labels, by default None
lc_id (numpy.ndarray (N,), optional) – Array of lightcurve ids per data point. By default None
sf_method (str, optional) – The structure function calculation method to be used, by default “basic”.
argument_container (StructureFunctionArgumentContainer, optional) – Container object for additional configuration options, by default None.

Returns:

sf2 – Structure function squared for each of input bands.

Return type:

pandas.DataFrame

Notes

In case that no value for band_to_calc is passed, the function is executed on all available bands in band.

calc_sf2[source]#