.. _dev_selection:
The selection/subset framework
==============================
One of the central concepts in Glue is that of subsets, which are typically
created as a result of the user selecting data in a viewer or creating the
subset from the command-line. In order to go from a selection on the screen to
defining a subset from a dataset, Glue includes the following concepts:
* **Region of interests** (ROIs), which are an abstract representation of a
geometrical region or selection.
* **Subset states**, which is a descriptions of the subset selection.
* Data **Subsets**, which are the result of applying a subset state/selection
to a specific dataset.
When a user makes a selection in a data viewer in the Glue application, the
selection is first translated into a ROI, after which the ROI is converted to a
subset state, then applied to the data collection to produce subsets in each
dataset. These three concepts are described in more detail below.
Regions of interest
-------------------
The easiest way to think of regions of interest is as geometrical regions.
Basic classes for common types of ROIs are included in the :mod:`glue.core.roi`
sub-module. For example, the :class:`~glue.core.roi.RectangularROI` class
describes a rectangular region using the lower and upper values in two
dimensions::
>>> from glue.core.roi import RectangularROI
>>> roi = RectangularROI(xmin=1, xmax=3, ymin=2, ymax=5)
Note that this is not related to any particular dataset -- it is an abstract
representation of a rectangular region. It also doesn't specify which
components the rectangle is drawn in. All ROIs have a
:meth:`glue.core.roi.RectangularROI.contains` method that can be used to check
if a point or a set of points lies inside the region::
>>> roi.contains(0, 3)
False
>>> roi.contains(2, 3)
True
>>> import numpy as np
>>> x = np.array([0, 2, 4])
>>> y = np.array([3, 3, 2])
>>> roi.contains(x, y)
array([False, True, False], dtype=bool)
Subset states
-------------
While regions of interest define geometrical regions, subset states, which are
sub-classes of :class:`~glue.core.subset.SubsetState`, describe a selection as
a function of Glue :class:`~glue.core.component_id.ComponentID` objects. Note
that this is different from :class:`~glue.core.subset.Subset` instances, which
describe the subset *resulting* from the selection (see `Subsets`_). The
following simple example shows how to easily create a
:class:`~glue.core.subset.SubsetState`::
>>> from glue.core import Data
>>> data = Data(x=[1,2,3], y=[2,3,4])
>>> state = data.id['x'] > 1.5
>>> state
1.5)>
Note that ``state`` is not the subset of values in ``data`` that are greater
than 1.5 -- instead, it is a representation of the inequality, the *concept* of
selecting all values of x greater than 1.5. This distinction is important,
because if another dataset defines a link between one of its components and the
``x`` component of ``data``, then the inequality can be used for that other
component too.
While the above syntax is convenient for using Glue via the command-line, in the
case of data viewers, we actually want to translate ROIs into subset states. To
do this, we can use the :func:`~glue.core.subset.roi_to_subset_state` function
that takes a ROI and returns a subset state. At the moment this method works for
1- and 2-d ROIs. In more complex cases, you can also define your own logic for
converting ROIs into subset states. See the documentation of
:func:`~glue.core.subset.roi_to_subset_state` for more details.
Subset states can be combined using logical operations:
>>> state1 = data.id['x'] > 1.5
>>> state2 = data.id['y'] < 4
>>> state1 & state2
>>> state1 | state2
>>> ~state1
Note that you should use ``&``, ``|``, and ``~`` as opposed to ``and``, ``or``,
and ``not``.
Subsets
-------
A subset is what we normally think of as sub-part of a dataset. Subsets are
typically created by making `Subset states`_ first. There are then different
ways of applying this subset state to a :class:`~glue.core.data.Data` object to actually create a subset. The
easiest way of doing this is to simply call the
:meth:`~glue.core.data.BaseData.new_subset` method with the
:class:`~glue.core.subset.SubsetState` and optionally a label describing that
subset::
>>> subset = data.new_subset(state, label='x > 1.5')
>>> subset
Subset: x > 1.5 (data: )
The resulting subset can then be used in a similar way to a
:class:`~glue.core.data.Data` object, but it will return only the values in the
subset::
>>> subset['x']
array([2, 3])
>>> subset['y']
array([3, 4])
Finally, you can also get the mask from a subset::
>>> subset.to_mask()
array([False, True, True], dtype=bool)
One of the benefits of subset states is that they can be applied to multiple
data objects, and if the different data objects have linked components (as described in :doc:`linking`), this
may produce several valid subsets in different datasets. We can apply a :class:`~glue.core.subset.SubsetState` to all datasets in a data collection by using the :meth:`~glue.core.data_collection.DataCollection.new_subset_group` method with
the :class:`~glue.core.subset.SubsetState` and a label describing that subset, similarly to :meth:`~glue.core.data.BaseData.new_subset`
>>> from glue.core import DataCollection
>>> data_collection = DataCollection([data])
>>> subset_group = data_collection.new_subset_group('x > 1.5', state)
This creates a :class:`~glue.core.subset_group.SubsetGroup` which represents a group of subsets, with the individual subsets accessible via the ``subsets`` attribute::
>>> subset = subset_group.subsets[0]
>>> subset
Subset: x > 1.5 (data: )