tape.ensemble_frame#

Module Contents#

Classes#

TapeSeries

A barebones extension of a Pandas series to be used for underlying Ensemble data.

TapeFrame

A barebones extension of a Pandas frame to be used for underlying Ensemble data.

EnsembleSeries

A barebones extension of a Dask Series for Ensemble data.

EnsembleFrame

An extension for a Dask Dataframe for data used by a lightcurve Ensemble.

TapeSourceFrame

A barebones extension of a Pandas frame to be used for underlying Ensemble source data

TapeObjectFrame

A barebones extension of a Pandas frame to be used for underlying Ensemble object data.

SourceFrame

A subclass of EnsembleFrame for Source data.

ObjectFrame

A subclass of EnsembleFrame for Object data.

class TapeSeries(data=None, index=None, dtype: pandas._typing.Dtype | None = None, name=None, copy: bool | None = None, fastpath: bool | pandas._libs.lib.NoDefault = lib.no_default)[source]#

Bases: pandas.Series

A barebones extension of a Pandas series to be used for underlying Ensemble data.

See https://pandas.pydata.org/docs/development/extending.html#subclassing-pandas-data-structures

property _constructor[source]#

Used when a manipulation result has the same dimensions as the original.

property _constructor_sliced[source]#
class TapeFrame(data=None, index: pandas._typing.Axes | None = None, columns: pandas._typing.Axes | None = None, dtype: pandas._typing.Dtype | None = None, copy: bool | None = None)[source]#

Bases: pandas.DataFrame

A barebones extension of a Pandas frame to be used for underlying Ensemble data.

See https://pandas.pydata.org/docs/development/extending.html#subclassing-pandas-data-structures

property _constructor[source]#

Used when a manipulation result has the same dimensions as the original.

property _constructor_expanddim[source]#
class EnsembleSeries(expr, label=None, ensemble=None)[source]#

Bases: _Frame, dask.dataframe.Series

A barebones extension of a Dask Series for Ensemble data.

_partition_type[source]#
class EnsembleFrame(expr, label=None, ensemble=None)[source]#

Bases: _Frame, dask.dataframe.DataFrame

An extension for a Dask Dataframe for data used by a lightcurve Ensemble.

The underlying non-parallel dataframes are TapeFrames and TapeSeries which extend Pandas frames.

Examples

Instatiation:

import tape
ens = tape.Ensemble()
data = {...} # Some data you want tracked by the Ensemble
ensemble_frame = tape.EnsembleFrame.from_dict(data, label="my_frame", ensemble=ens)
_partition_type[source]#
__getitem__(key)[source]#
classmethod from_tapeframe(data, npartitions=None, chunksize=None, sort=True, label=None, ensemble=None)[source]#

Returns an EnsembleFrame constructed from a TapeFrame.

Parameters:
  • data (TapeFrame) – Frame containing the underlying data fro the EnsembleFram

  • npartitions (int, optional) – The number of partitions of the index to create. Note that depending on the size and index of the dataframe, the output may have fewer partitions than requested.

  • chunksize (int, optional) – Size of the individual chunks of data in non-parallel objects that make up Dask frames.

  • sort (bool, optional) – Whether to sort the frame by a default index.

  • label (str, optional) – The label used to by the Ensemble to identify the frame.

  • ensemble (tape.Ensemble, optional) – A link to the Ensemble object that owns this frame.

Returns:

result – The constructed EnsembleFrame object.

Return type:

tape.EnsembleFrame

classmethod from_dask_dataframe(df, ensemble=None, label=None)[source]#

Returns an EnsembleFrame constructed from a Dask dataframe.

Parameters:
  • df (dask.dataframe.DataFrame or list) – a Dask dataframe to convert to an EnsembleFrame

  • ensemble (tape.ensemble.Ensemble, optional) – A link to the Ensemble object that owns this frame.

  • label (str, optional) – The label used to by the Ensemble to identify the frame.

Returns:

result – The constructed EnsembleFrame object.

Return type:

tape.EnsembleFrame

update_ensemble()[source]#

Updates the Ensemble linked by the EnsembelFrame.ensemble property to track this frame.

Returns:

result – The Ensemble object which tracks this frame, None if no such Ensemble.

Return type:

tape.Ensemble

classmethod from_dict(data, npartitions, orient='columns', dtype=None, columns=None, label=None, ensemble=None)[source]#

Construct a Tape EnsembleFrame from a Python Dictionary

Parameters:
  • data (dict) – Of the form {field : array-like} or {field : dict}.

  • npartitions (int) – The number of partitions of the index to create. Note that depending on the size and index of the dataframe, the output may have fewer partitions than requested.

  • orient ({'columns', 'index', 'tight'}, default 'columns') – The “orientation” of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’. If ‘tight’, assume a dict with keys [‘index’, ‘columns’, ‘data’, ‘index_names’, ‘column_names’].

  • dtype (bool) – Data type to force, otherwise infer.

  • columns (string, optional) – Column labels to use when orient='index'. Raises a ValueError if used with orient='columns' or orient='tight'.

  • label (str, optional) – The label used to by the Ensemble to identify the frame.

  • ensemble (tape.ensemble.Ensemble, optional) – A link to the Ensemble object that owns this frame.

Returns:

result – The constructed EnsembleFrame object.

Return type:

tape.EnsembleFrame

classmethod from_parquet(path, index=None, columns=None, label=None, ensemble=None, **kwargs)[source]#

Returns an EnsembleFrame constructed from loading a parquet file.

Parameters:
  • path (str or list) – Source directory for data, or path(s) to individual parquet files. Prefix with a protocol like s3:// to read from alternative filesystems. To read from multiple files you can pass a globstring or a list of paths, with the caveat that they must all have the same protocol.

  • index (str, list, False, optional) – Field name(s) to use as the output frame index. Default is None and index will be inferred from the pandas parquet file metadata, if present. Use False to read all fields as columns.

  • columns (str or list, optional) – Field name(s) to read in as columns in the output. By default all non-index fields will be read (as determined by the pandas parquet metadata, if present). Provide a single field name instead of a list to read in the data as a Series.

  • label (str, optional) – The label used to by the Ensemble to identify the frame.

  • ensemble (tape.ensemble.Ensemble, optional) – A link to the Ensemble object that owns this frame.

Returns:

result – The constructed EnsembleFrame object.

Return type:

tape.EnsembleFrame

convert_flux_to_mag(flux_col, zero_point, err_col=None, zp_form='mag', out_col_name=None)[source]#

Converts this EnsembleFrame’s flux column into a magnitude column, returning a new EnsembleFrame.

Parameters:
  • flux_col ('str') – The name of the EnsembleFrame flux column to convert into magnitudes.

  • zero_point ('str') – The name of the EnsembleFrame column containing the zero point information for column transformation.

  • err_col ('str', optional) – The name of the EnsembleFrame column containing the errors to propagate. Errors are propagated using the following approximation: Err= (2.5/log(10))*(flux_error/flux), which holds mainly when the error in flux is much smaller than the flux.

  • zp_form (str, optional) – The form of the zero point column, either “flux” or “magnitude”/”mag”. Determines how the zero point (zp) is applied in the conversion. If “flux”, then the function is applied as mag=-2.5*log10(flux/zp), or if “magnitude”, then mag=-2.5*log10(flux)+zp.

  • out_col_name ('str', optional) – The name of the output magnitude column, if None then the output is just the flux column name + “_mag”. The error column is also generated as the out_col_name + “_err”.

Returns:

result – A new EnsembleFrame object with a new magnitude (and error) column.

Return type:

tape.EnsembleFrame

coalesce(input_cols, output_col, drop_inputs=False)[source]#

Combines multiple input columns into a single output column, with values equal to the first non-nan value encountered in the input cols.

Parameters:
  • input_cols (list) – The list of column names to coalesce into a single column.

  • output_col (str, optional) – The name of the coalesced output column.

  • drop_inputs (bool, optional) – Determines whether the input columns are dropped or preserved. If a mapped column is an input and dropped, the output column is automatically assigned to replace that column mapping internally.

Returns:

ensemble – An ensemble object.

Return type:

tape.ensemble.Ensemble

class TapeSourceFrame(data=None, index: pandas._typing.Axes | None = None, columns: pandas._typing.Axes | None = None, dtype: pandas._typing.Dtype | None = None, copy: bool | None = None)[source]#

Bases: TapeFrame

A barebones extension of a Pandas frame to be used for underlying Ensemble source data

See https://pandas.pydata.org/docs/development/extending.html#subclassing-pandas-data-structures

property _constructor[source]#

Used when a manipulation result has the same dimensions as the original.

property _constructor_expanddim[source]#
class TapeObjectFrame(data=None, index: pandas._typing.Axes | None = None, columns: pandas._typing.Axes | None = None, dtype: pandas._typing.Dtype | None = None, copy: bool | None = None)[source]#

Bases: TapeFrame

A barebones extension of a Pandas frame to be used for underlying Ensemble object data.

See https://pandas.pydata.org/docs/development/extending.html#subclassing-pandas-data-structures

property _constructor[source]#

Used when a manipulation result has the same dimensions as the original.

property _constructor_expanddim[source]#
class SourceFrame(expr, ensemble=None)[source]#

Bases: EnsembleFrame

A subclass of EnsembleFrame for Source data.

_partition_type[source]#
__getitem__(key)[source]#
classmethod from_parquet(path, index=None, columns=None, ensemble=None)[source]#

Returns a SourceFrame constructed from loading a parquet file.

classmethod from_dask_dataframe(df, ensemble=None)[source]#

Returns a SourceFrame constructed from a Dask dataframe.

Parameters:
  • df (dask.dataframe.DataFrame or list) – a Dask dataframe to convert to a SourceFrame

  • ensemble (tape.ensemble.Ensemble, optional) – A link to the Ensemble object that owns this frame.

Returns:

result – The constructed SourceFrame object.

Return type:

tape.SourceFrame

class ObjectFrame(expr, ensemble=None)[source]#

Bases: EnsembleFrame

A subclass of EnsembleFrame for Object data.

_partition_type[source]#
classmethod from_parquet(path, index=None, columns=None, ensemble=None)[source]#

Returns an ObjectFrame constructed from loading a parquet file.

classmethod from_dask_dataframe(df, ensemble=None)[source]#

Returns an ObjectFrame constructed from a Dask dataframe.

Parameters:
  • df (dask.dataframe.DataFrame or list) – a Dask dataframe to convert to an ObjectFrame

  • ensemble (tape.ensemble.Ensemble, optional) – A link to the Ensemble object that owns this frame.

Returns:

result – The constructed ObjectFrame object.

Return type:

tape.ObjectFrame