The selection/subset framework#

One of the central concepts in Glue is that of subsets, which are typically created as a result of the user selecting data in a viewer or creating the subset from the command-line. In order to go from a selection on the screen to defining a subset from a dataset, Glue includes the following concepts:

  • Region of interests (ROIs), which are an abstract representation of a geometrical region or selection.

  • Subset states, which is a descriptions of the subset selection.

  • Data Subsets, which are the result of applying a subset state/selection to a specific dataset.

When a user makes a selection in a data viewer in the Glue application, the selection is first translated into a ROI, after which the ROI is converted to a subset state, then applied to the data collection to produce subsets in each dataset. These three concepts are described in more detail below.

Regions of interest#

The easiest way to think of regions of interest is as geometrical regions. Basic classes for common types of ROIs are included in the glue.core.roi sub-module. For example, the RectangularROI class describes a rectangular region using the lower and upper values in two dimensions:

>>> from glue.core.roi import RectangularROI
>>> roi = RectangularROI(xmin=1, xmax=3, ymin=2, ymax=5)

Note that this is not related to any particular dataset – it is an abstract representation of a rectangular region. It also doesn’t specify which components the rectangle is drawn in. All ROIs have a glue.core.roi.RectangularROI.contains() method that can be used to check if a point or a set of points lies inside the region:

>>> roi.contains(0, 3)
False
>>> roi.contains(2, 3)
True
>>> import numpy as np
>>> x = np.array([0, 2, 4])
>>> y = np.array([3, 3, 2])
>>> roi.contains(x, y)
array([False,  True, False], dtype=bool)

Subset states#

While regions of interest define geometrical regions, subset states, which are sub-classes of SubsetState, describe a selection as a function of Glue ComponentID objects. Note that this is different from Subset instances, which describe the subset resulting from the selection (see Subsets). The following simple example shows how to easily create a SubsetState:

>>> from glue.core import Data
>>> data = Data(x=[1,2,3], y=[2,3,4])
>>> state = data.id['x'] > 1.5
>>> state
<InequalitySubsetState: (x > 1.5)>

Note that state is not the subset of values in data that are greater than 1.5 – instead, it is a representation of the inequality, the concept of selecting all values of x greater than 1.5. This distinction is important, because if another dataset defines a link between one of its components and the x component of data, then the inequality can be used for that other component too.

While the above syntax is convenient for using Glue via the command-line, in the case of data viewers, we actually want to translate ROIs into subset states. To do this, we can use the roi_to_subset_state() function that takes a ROI and returns a subset state. At the moment this method works for 1- and 2-d ROIs. In more complex cases, you can also define your own logic for converting ROIs into subset states. See the documentation of roi_to_subset_state() for more details.

Subset states can be combined using logical operations:

>>> state1 = data.id['x'] > 1.5
>>> state2 = data.id['y'] < 4
>>> state1 & state2
<glue.core.subset.AndState at 0x10ebd0160>
>>> state1 | state2
<glue.core.subset.OrState at 0x10ebd00f0>
>>> ~state1
<glue.core.subset.InvertState at 0x10ebd03c8>

Note that you should use &, |, and ~ as opposed to and, or, and not.

Subsets#

A subset is what we normally think of as sub-part of a dataset. Subsets are typically created by making Subset states first. There are then different ways of applying this subset state to a Data object to actually create a subset. The easiest way of doing this is to simply call the new_subset() method with the SubsetState and optionally a label describing that subset:

>>> subset = data.new_subset(state, label='x > 1.5')
>>> subset
Subset: x > 1.5 (data: )

The resulting subset can then be used in a similar way to a Data object, but it will return only the values in the subset:

>>> subset['x']
array([2, 3])

>>> subset['y']
array([3, 4])

Finally, you can also get the mask from a subset:

>>> subset.to_mask()
array([False,  True,  True], dtype=bool)

One of the benefits of subset states is that they can be applied to multiple data objects, and if the different data objects have linked components (as described in The linking framework), this may produce several valid subsets in different datasets. We can apply a SubsetState to all datasets in a data collection by using the new_subset_group() method with the SubsetState and a label describing that subset, similarly to new_subset()

>>> from glue.core import DataCollection
>>> data_collection = DataCollection([data])
>>> subset_group = data_collection.new_subset_group('x > 1.5', state)

This creates a SubsetGroup which represents a group of subsets, with the individual subsets accessible via the subsets attribute:

>>> subset = subset_group.subsets[0]
>>> subset
Subset: x > 1.5 (data: )