tape.analysis
=============

.. py:module:: tape.analysis


Subpackages
-----------

.. toctree::
   :maxdepth: 1

   /autoapi/tape/analysis/structure_function/index


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/tape/analysis/base/index
   /autoapi/tape/analysis/feature_extractor/index
   /autoapi/tape/analysis/light_curve/index
   /autoapi/tape/analysis/stetsonj/index
   /autoapi/tape/analysis/structurefunction2/index


Attributes
----------

.. autoapisummary::

   tape.analysis.calc_stetson_J
   tape.analysis.calc_sf2


Classes
-------

.. autoapisummary::

   tape.analysis.AnalysisFunction
   tape.analysis.FeatureExtractor
   tape.analysis.LightCurve
   tape.analysis.StetsonJ
   tape.analysis.StructureFunction2


Package Contents
----------------

.. py:class:: AnalysisFunction

   Bases: :py:obj:`abc.ABC`, :py:obj:`Callable`


   Base class for analysis functions.

   Analysis functions are functions that take few arrays representing
   an object and return a single pandas.Series representing the result.

   .. method:: cols(ens) -> List[str]

      Return the columns that the analysis function takes as input.

   .. method:: meta(ens) -> pd.DataFrame

      Return the metadata pandas.DataFrame required by Dask to pre-build
      a computation graph. It is basically the schema for calculate() method
      output.

   .. method:: on(ens) -> List[str]

      Return the columns to group source table by.
      Typically, `[ens._id_col]`.

   .. method:: __call__(*cols, \*\*kwargs)

      Calculate the analysis function.


   .. py:method:: cols(ens: Ensemble) -> List[str]
      :abstractmethod:


      Return the column names that the analysis function takes as input.

      :param ens: The ensemble object, it could be required to get column names of
                  the "special" columns like `ens._time_col` or `ens._err_col`.
      :type ens: Ensemble

      :returns: The column names to select and pass to .calculate() method.
                For example `[ens._time_col, ens._flux_col]`.
      :rtype: List[str]


   .. py:method:: meta(ens: Ensemble)
      :abstractmethod:


      Return the schema of the analysis function output.

      :param ens: The ensemble object.
      :type ens: Ensemble

      :returns: **pd.DataFrame or (str, dtype) tuple or {str** -- Dask meta, for example
                `pd.DataFrame(columns=['x', 'y'], dtype=float)`.
      :rtype: dtype} dictionary


   .. py:method:: on(ens: Ensemble) -> List[str]
      :abstractmethod:


      Return the columns to group source table by.

      :param ens: The ensemble object.
      :type ens: Ensemble

      :returns: The column names to group by. Typically, `[ens._id_col]`.
      :rtype: List[str]


   .. py:method:: __call__(*cols, **kwargs)
      :abstractmethod:


      Calculate the analysis function.

      :param \*cols: The columns to calculate the analysis function on. It must be
                     consistent with .cols(ens) output.
      :type \*cols: array_like
      :param \*\*kwargs: Additional keyword arguments.

      :returns: The result, it must be consistent with .meta() output.
      :rtype: pd.Series or pd.DataFrame or array or value


.. py:class:: FeatureExtractor(feature: light_curve.light_curve_ext._FeatureEvaluator)

   Bases: :py:obj:`tape.analysis.base.AnalysisFunction`


   Apply light-curve package feature extractor to a light curve

   :param feature: Feature extractor to apply, see "light-curve" package for more details.
   :type feature: light_curve.light_curve_ext._FeatureEvaluator

   .. attribute:: feature

      Feature extractor to apply, see "light-curve" package for more details.

      :type: light_curve.light_curve_ext._FeatureEvaluator


   .. py:attribute:: feature


   .. py:method:: cols(ens: Ensemble) -> List[str]

      Return the column names that the analysis function takes as input.

      :param ens: The ensemble object, it could be required to get column names of
                  the "special" columns like `ens._time_col` or `ens._err_col`.
      :type ens: Ensemble

      :returns: The column names to select and pass to .calculate() method.
                For example `[ens._time_col, ens._flux_col]`.
      :rtype: List[str]


   .. py:method:: meta(ens: Ensemble) -> pandas.DataFrame

      Return the schema of the analysis function output.

      It always returns a pandas.DataFrame with the same columns as
      `self.feature.names` and dtype `np.float64`. However, if
      input columns are all single precision floats then the output dtype
      will be `np.float32`.


   .. py:method:: on(ens: Ensemble) -> List[str]

      Return the columns to group source table by.

      :param ens: The ensemble object.
      :type ens: Ensemble

      :returns: The column names to group by. Typically, `[ens._id_col]`.
      :rtype: List[str]


   .. py:method:: __call__(time, flux, err, band, *, band_to_calc: str, **kwargs) -> pandas.DataFrame

      Apply a feature extractor to a light curve, concatenating the results over
      all bands.

      :param time: Time values
      :type time: `numpy.ndarray`
      :param flux: Brightness values, flux or magnitudes
      :type flux: `numpy.ndarray`
      :param err: Errors for "flux"
      :type err: `numpy.ndarray`
      :param band: Passband names.
      :type band: `numpy.ndarray`
      :param band_to_calc: Name of the passband to calculate features for, usually a string
                           like "g" or "r", or an integer. If None, then features are
                           calculated for all sources - band is ignored.
      :type band_to_calc: `str` or `int` or `None`
      :param \*\*kwargs: Additional keyword arguments to pass to the feature extractor.
      :type \*\*kwargs: `dict`

      :returns: **features** -- Feature values for each band, dtype is a common type for input arrays.
      :rtype: pandas.DataFrame


.. py:class:: LightCurve(times: numpy.ndarray, fluxes: numpy.ndarray, errors: numpy.ndarray, minimum_observations: int = 0)

   This base class is meant to support various analysis routines and be
   extended as needed. (Hence it's location in the `analysis` package.)

   The base class ensures that the data for a single lightcurve is well formed.
   Namely that the input data is all of the same length, with NaN's removed and
   that there are enough observations to perform a given analysis.


   .. py:attribute:: _times


   .. py:attribute:: _fluxes


   .. py:attribute:: _errors


   .. py:attribute:: _minimum_observations


   .. py:method:: _process_input_data()

      Cleaning and validation occurs here, ideally by calling
      sub-methods for specific checks and cleaning tasks.


   .. py:method:: _filter_nans()

      Mask out any NaN values from time, flux and error arrays


   .. py:method:: _check_input_data_size_is_equal()

      Make sure that the three input np.arrays have the same size


   .. py:method:: _check_input_data_length_is_sufficient()

      Make sure that we have enough data after cleaning and filtering
      to be able to perform Structure Function calculations.


.. py:data:: calc_stetson_J

.. py:class:: StetsonJ

   Bases: :py:obj:`tape.analysis.base.AnalysisFunction`


   Compute the StetsonJ statistic on data from one or several bands


   .. py:method:: cols(ens: Ensemble) -> List[str]

      Return the column names that the analysis function takes as input.

      :param ens: The ensemble object, it could be required to get column names of
                  the "special" columns like `ens._time_col` or `ens._err_col`.
      :type ens: Ensemble

      :returns: The column names to select and pass to .calculate() method.
                For example `[ens._time_col, ens._flux_col]`.
      :rtype: List[str]


   .. py:method:: meta(ens: Ensemble)

      Return the schema of the analysis function output.

      :param ens: The ensemble object.
      :type ens: Ensemble

      :returns: **pd.DataFrame or (str, dtype) tuple or {str** -- Dask meta, for example
                `pd.DataFrame(columns=['x', 'y'], dtype=float)`.
      :rtype: dtype} dictionary


   .. py:method:: on(ens: Ensemble) -> List[str]

      Return the columns to group source table by.

      :param ens: The ensemble object.
      :type ens: Ensemble

      :returns: The column names to group by. Typically, `[ens._id_col]`.
      :rtype: List[str]


   .. py:method:: __call__(flux: numpy.ndarray, err: numpy.ndarray, band: numpy.ndarray, *, band_to_calc: Union[str, Iterable[str], None] = None, check_nans: bool = False)

      Compute the StetsonJ statistic on data from one or several bands

      :param flux: Array of flux/magnitude measurements
      :type flux: `numpy.ndarray` (N,)
      :param err: Array of associated flux/magnitude errors
      :type err: `numpy.ndarray` (N,)
      :param band: Array of associated band labels
      :type band: `numpy.ndarray` (N,)
      :param band_to_calc: Bands to calculate StetsonJ on. Single band descriptor, or list
                           of such descriptors.
      :type band_to_calc: `str` or `list` of `str`
      :param check_nans: Boolean to run a check for NaN values and filter them out.
      :type check_nans: `bool`

      :returns: **stetsonJ** -- StetsonJ statistic for each of input bands.
      :rtype: `dict`

      .. note::

         In case that no value for `band_to_calc` is passed, the function is
         executed on all available bands in `band`.


.. py:class:: StructureFunction2

   Bases: :py:obj:`tape.analysis.base.AnalysisFunction`


   Calculate structure function squared


   .. py:method:: cols(ens: Ensemble) -> List[str]

      Return the column names that the analysis function takes as input.

      :param ens: The ensemble object, it could be required to get column names of
                  the "special" columns like `ens._time_col` or `ens._err_col`.
      :type ens: Ensemble

      :returns: The column names to select and pass to .calculate() method.
                For example `[ens._time_col, ens._flux_col]`.
      :rtype: List[str]


   .. py:method:: meta(ens: Ensemble) -> Dict[str, type]

      Return the schema of the analysis function output.

      :param ens: The ensemble object.
      :type ens: Ensemble

      :returns: **pd.DataFrame or (str, dtype) tuple or {str** -- Dask meta, for example
                `pd.DataFrame(columns=['x', 'y'], dtype=float)`.
      :rtype: dtype} dictionary


   .. py:method:: on(ens: Ensemble) -> List[str]

      Return the columns to group source table by.

      :param ens: The ensemble object.
      :type ens: Ensemble

      :returns: The column names to group by. Typically, `[ens._id_col]`.
      :rtype: List[str]


   .. py:method:: __call__(time, flux, err=None, band=None, lc_id=None, *, sf_method='basic', argument_container=None) -> pandas.DataFrame

      Calculate structure function squared using one of a variety of structure
      function calculation methods defined by the input argument `sf_method`, or
      in the argument container object.


      :param time: Array of times when measurements were taken. If all array values are
                   `None` or if a scalar `None` is provided, then equidistant time between
                   measurements is assumed.
      :type time: `numpy.ndarray` (N,) or `None`
      :param flux: Array of flux/magnitude measurements.
      :type flux: `numpy.ndarray` (N,)
      :param err: Array of associated flux/magnitude errors. If a scalar value is provided
                  we assume that error for all measurements. If `None` is provided, we
                  assume all errors are 0. By default None
      :type err: `numpy.ndarray` (N,), `float`, or `None`, optional
      :param band: Array of associated band labels, by default None
      :type band: `numpy.ndarray` (N,), optional
      :param lc_id: Array of lightcurve ids per data point. By default None
      :type lc_id: `numpy.ndarray` (N,), optional
      :param sf_method: The structure function calculation method to be used, by default "basic".
      :type sf_method: str, optional
      :param argument_container: Container object for additional configuration options, by default None.
      :type argument_container: StructureFunctionArgumentContainer, optional

      :returns: **sf2** -- Structure function squared for each of input bands.
      :rtype: `pandas.DataFrame`

      .. rubric:: Notes

      In case that no value for `band_to_calc` is passed, the function is
      executed on all available bands in `band`.


.. py:data:: calc_sf2