In [1]:
%pylab inline
Populating the interactive namespace from numpy and matplotlib

Tutorial on how to use MadArray objects

A MadArray is a numpy array with missing elements. It is generated using three types of parameter:

  • data as an array of entries, either int, float or complex;
  • a mask indicating the missing entries;
  • options to define the behaviour of the object.

A basic initialisation requires only a data matrix. Without mask, all elements are considered as non-missing

In [2]:
from madarrays import MadArray

# initialisation without mask
data = np.random.rand(4, 6)

A = MadArray(data)
print(A)
MadArray, dtype=float64, 0 missing entries (0.0%)
[[ 0.88852592  0.01194842  0.61658832  0.6555601   0.23662042  0.31518837]
 [ 0.08438302  0.07847182  0.9321147   0.96558448  0.96066855  0.03913056]
 [ 0.18174032  0.91104309  0.44158461  0.41123823  0.81190651  0.5251598 ]
 [ 0.75796825  0.74749213  0.77437337  0.75607608  0.86733387  0.22696674]]

Masking

The masking of data differs according to the type of entries:

  • if the data entries are not complex (e.g., int or float), argument mask must be exclusively a boolean array with the same shape as the data array, each entry indicating if the corresponding entry in the data array is missing or not;
  • if the data entries are complex, the masking can be done as previously, or by giving two boolean arrays mask_magnitude and mask_phase with the same size with the same shape as the data array, each entry indicating respectively if the magnitude and the phase of the corresponding entry is missing or not.
In [3]:
# initialization with a mask
mask = np.random.random(data.shape) < 0.5

Am = MadArray(data, mask)
print(mask)
print(Am)
[[False  True  True False False  True]
 [ True  True  True False False  True]
 [False False False  True False False]
 [False False  True False  True False]]
MadArray, dtype=float64, 10 missing entries (41.7%)
[[ 0.88852592           x           x  0.6555601   0.23662042           x]
 [          x           x           x  0.96558448  0.96066855           x]
 [ 0.18174032  0.91104309  0.44158461           x  0.81190651  0.5251598 ]
 [ 0.75796825  0.74749213           x  0.75607608           x  0.22696674]]

A MadArray can also be defined from another MadArray, for example to copy the object:

In [4]:
Am2 = MadArray(Am)
print('{} - {}'.format(str(Am), repr(Am)))
print('{} - {}'.format(str(Am2), repr(Am2)))
MadArray, dtype=float64, 10 missing entries (41.7%)
[[ 0.88852592           x           x  0.6555601   0.23662042           x]
 [          x           x           x  0.96558448  0.96066855           x]
 [ 0.18174032  0.91104309  0.44158461           x  0.81190651  0.5251598 ]
 [ 0.75796825  0.74749213           x  0.75607608           x  0.22696674]] - <MadArray at 0x7f4693c25198>
MadArray, dtype=float64, 10 missing entries (41.7%)
[[ 0.88852592           x           x  0.6555601   0.23662042           x]
 [          x           x           x  0.96558448  0.96066855           x]
 [ 0.18174032  0.91104309  0.44158461           x  0.81190651  0.5251598 ]
 [ 0.75796825  0.74749213           x  0.75607608           x  0.22696674]] - <MadArray at 0x7f4684f66048>

A different mask can also be used:

In [5]:
mask2 = np.random.random(data.shape) < 0.9
Am3 = MadArray(Am, mask2)
print('{} - {}'.format(str(Am), repr(Am)))
print('{} - {}'.format(str(Am3), repr(Am3)))
MadArray, dtype=float64, 10 missing entries (41.7%)
[[ 0.88852592           x           x  0.6555601   0.23662042           x]
 [          x           x           x  0.96558448  0.96066855           x]
 [ 0.18174032  0.91104309  0.44158461           x  0.81190651  0.5251598 ]
 [ 0.75796825  0.74749213           x  0.75607608           x  0.22696674]] - <MadArray at 0x7f4693c25198>
MadArray, dtype=float64, 22 missing entries (91.7%)
[[          x           x           x           x           x  0.31518837]
 [          x           x           x           x           x           x]
 [          x           x           x           x           x           x]
 [          x           x           x           x  0.86733387           x]] - <MadArray at 0x7f4684f660b8>

For complex data:

In [6]:
import madarrays
complex_data = np.random.rand(4, 6) + 1j * np.random.rand(4, 6)
mask_mag = np.random.random(data.shape) < 0.5
mask_pha = np.random.random(data.shape) < 0.5
A_cpx1 = MadArray(complex_data, mask)
A_cpx2 = MadArray(complex_data, mask_magnitude=mask_mag, mask_phase=mask_pha)
print('{} - {}'.format(str(A_cpx1), repr(A_cpx1)))
print('{} - {}'.format(str(A_cpx2), repr(A_cpx2)))
print('Magnitude mask', mask_mag)
print('Phase mask', mask_pha)
MadArray, dtype=complex128, 10 missing entries (41.7%)
[[ 0.92320960+0.88661785j           x                       x
   0.38311296+0.70678738j  0.96491806+0.30121117j           x            ]
 [          x                       x                       x
   0.50314827+0.11344015j  0.48763414+0.61569745j           x            ]
 [ 0.24238896+0.51915684j  0.87787040+0.03319473j  0.77742883+0.90048085j
            x              0.95141413+0.19915758j  0.44219935+0.95957203j]
 [ 0.86931370+0.97679168j  0.02726614+0.1547769j            x
   0.53347458+0.92113509j           x              0.01187069+0.25350673j]] - <MadArray at 0x7f4684f66198>
MadArray, dtype=complex128, 11 missing magnitudes (45.8%) and 11 missing phases (45.8%),  including 7 missing magnitudes and phases jointly (29.2%)
[[          x                       x              0.85485470+0.92124838j
            x                       x                       x            ]
 [          x              0.07092382+0.22792513j           x
            x              0.48763414+0.61569745j           x            ]
 [          x                       x                       x
            x              0.95141413+0.19915758j  0.44219935+0.95957203j]
 [ 0.86931370+0.97679168j  0.02726614+0.1547769j   0.06635526+0.6628895j
            x                       x              0.01187069+0.25350673j]] - <MadArray at 0x7f4684f66278>
Magnitude mask [[False False False  True False  True]
 [ True False False  True False  True]
 [ True  True  True  True False False]
 [False False False  True  True False]]
Phase mask [[ True  True False  True  True  True]
 [ True False  True False False  True]
 [False  True  True False False False]
 [False False False  True False False]]

Methods and properties

A MadArray has methods and properties that give information about the masking.

In [7]:
# mask of non-missing elements
print(Am.get_known_mask())
[[ True False False  True  True False]
 [False False False  True  True False]
 [ True  True  True False  True  True]
 [ True  True False  True False  True]]
In [8]:
# mask of missing elements
print(Am.get_unknown_mask())
[[False  True  True False False  True]
 [ True  True  True False False  True]
 [False False False  True False False]
 [False False  True False  True False]]
In [9]:
print('Is masked: {}'.format(Am.is_masked))
print('Ratio missing data: {}'.format(Am.ratio_missing_data))
Is masked: <bound method MadArray.is_masked of <MadArray at 0x7f4693c25198>>
Ratio missing data: 0.4166666666666667

Indexing

There are two different and incompatible ways to index MadArray. By default (masked_indexing=False), it is similar to the indexing of nd-array: both the data matrix and the mask are indexed, and a MadArray with the shape defined by the indices is returned:

In [10]:
print(A[0:3, 1:3])
print(Am[0:3, 1:3])
MadArray, dtype=float64, 0 missing entries (0.0%)
[[ 0.01194842  0.61658832]
 [ 0.07847182  0.9321147 ]
 [ 0.91104309  0.44158461]]
MadArray, dtype=float64, 4 missing entries (66.7%)
[[          x           x]
 [          x           x]
 [ 0.91104309  0.44158461]]

With the other way (masked_indexing=True), a MadArray with the shape unchanged is returned, where non-indexed entries are considered as masked.

In [11]:
Am4 = MadArray(data, mask, masked_indexing=True)
print(Am4[0:3, 1:3])
MadArray, dtype=float64, 22 missing entries (91.7%)
[[          x           x           x           x           x           x]
 [          x           x           x           x           x           x]
 [          x  0.91104309  0.44158461           x           x           x]
 [          x           x           x           x           x           x]]

This latter approach is adapted to be handled with scikit-learn procedures.

Numerical operations

Numpy functions apply on MadArray, but without taking into account the mask

In [12]:
print(np.mean(A))
print(np.mean(Am))
0.5499028203648367
0.5499028203648367