In [1]:
%pylab inline
Populating the interactive namespace from numpy and matplotlib
Tutorial on how to use MadArray
objects¶
A MadArray
is a numpy array with missing elements. It is generated
using three types of parameter:
- data as an array of entries, either int, float or complex;
- a mask indicating the missing entries;
- options to define the behaviour of the object.
A basic initialisation requires only a data matrix. Without mask, all elements are considered as non-missing
In [2]:
from madarrays import MadArray
# initialisation without mask
data = np.random.rand(4, 6)
A = MadArray(data)
print(A)
MadArray, dtype=float64, 0 missing entries (0.0%)
[[ 0.88852592 0.01194842 0.61658832 0.6555601 0.23662042 0.31518837]
[ 0.08438302 0.07847182 0.9321147 0.96558448 0.96066855 0.03913056]
[ 0.18174032 0.91104309 0.44158461 0.41123823 0.81190651 0.5251598 ]
[ 0.75796825 0.74749213 0.77437337 0.75607608 0.86733387 0.22696674]]
Masking¶
The masking of data differs according to the type of entries:
- if the data entries are not complex (e.g., int or float),
argument
mask
must be exclusively a boolean array with the same shape as the data array, each entry indicating if the corresponding entry in the data array is missing or not; - if the data entries are complex, the masking can be done as
previously, or by giving two boolean arrays
mask_magnitude
andmask_phase
with the same size with the same shape as the data array, each entry indicating respectively if the magnitude and the phase of the corresponding entry is missing or not.
In [3]:
# initialization with a mask
mask = np.random.random(data.shape) < 0.5
Am = MadArray(data, mask)
print(mask)
print(Am)
[[False True True False False True]
[ True True True False False True]
[False False False True False False]
[False False True False True False]]
MadArray, dtype=float64, 10 missing entries (41.7%)
[[ 0.88852592 x x 0.6555601 0.23662042 x]
[ x x x 0.96558448 0.96066855 x]
[ 0.18174032 0.91104309 0.44158461 x 0.81190651 0.5251598 ]
[ 0.75796825 0.74749213 x 0.75607608 x 0.22696674]]
A MadArray can also be defined from another MadArray, for example to copy the object:
In [4]:
Am2 = MadArray(Am)
print('{} - {}'.format(str(Am), repr(Am)))
print('{} - {}'.format(str(Am2), repr(Am2)))
MadArray, dtype=float64, 10 missing entries (41.7%)
[[ 0.88852592 x x 0.6555601 0.23662042 x]
[ x x x 0.96558448 0.96066855 x]
[ 0.18174032 0.91104309 0.44158461 x 0.81190651 0.5251598 ]
[ 0.75796825 0.74749213 x 0.75607608 x 0.22696674]] - <MadArray at 0x7f4693c25198>
MadArray, dtype=float64, 10 missing entries (41.7%)
[[ 0.88852592 x x 0.6555601 0.23662042 x]
[ x x x 0.96558448 0.96066855 x]
[ 0.18174032 0.91104309 0.44158461 x 0.81190651 0.5251598 ]
[ 0.75796825 0.74749213 x 0.75607608 x 0.22696674]] - <MadArray at 0x7f4684f66048>
A different mask can also be used:
In [5]:
mask2 = np.random.random(data.shape) < 0.9
Am3 = MadArray(Am, mask2)
print('{} - {}'.format(str(Am), repr(Am)))
print('{} - {}'.format(str(Am3), repr(Am3)))
MadArray, dtype=float64, 10 missing entries (41.7%)
[[ 0.88852592 x x 0.6555601 0.23662042 x]
[ x x x 0.96558448 0.96066855 x]
[ 0.18174032 0.91104309 0.44158461 x 0.81190651 0.5251598 ]
[ 0.75796825 0.74749213 x 0.75607608 x 0.22696674]] - <MadArray at 0x7f4693c25198>
MadArray, dtype=float64, 22 missing entries (91.7%)
[[ x x x x x 0.31518837]
[ x x x x x x]
[ x x x x x x]
[ x x x x 0.86733387 x]] - <MadArray at 0x7f4684f660b8>
For complex data:
In [6]:
import madarrays
complex_data = np.random.rand(4, 6) + 1j * np.random.rand(4, 6)
mask_mag = np.random.random(data.shape) < 0.5
mask_pha = np.random.random(data.shape) < 0.5
A_cpx1 = MadArray(complex_data, mask)
A_cpx2 = MadArray(complex_data, mask_magnitude=mask_mag, mask_phase=mask_pha)
print('{} - {}'.format(str(A_cpx1), repr(A_cpx1)))
print('{} - {}'.format(str(A_cpx2), repr(A_cpx2)))
print('Magnitude mask', mask_mag)
print('Phase mask', mask_pha)
MadArray, dtype=complex128, 10 missing entries (41.7%)
[[ 0.92320960+0.88661785j x x
0.38311296+0.70678738j 0.96491806+0.30121117j x ]
[ x x x
0.50314827+0.11344015j 0.48763414+0.61569745j x ]
[ 0.24238896+0.51915684j 0.87787040+0.03319473j 0.77742883+0.90048085j
x 0.95141413+0.19915758j 0.44219935+0.95957203j]
[ 0.86931370+0.97679168j 0.02726614+0.1547769j x
0.53347458+0.92113509j x 0.01187069+0.25350673j]] - <MadArray at 0x7f4684f66198>
MadArray, dtype=complex128, 11 missing magnitudes (45.8%) and 11 missing phases (45.8%), including 7 missing magnitudes and phases jointly (29.2%)
[[ x x 0.85485470+0.92124838j
x x x ]
[ x 0.07092382+0.22792513j x
x 0.48763414+0.61569745j x ]
[ x x x
x 0.95141413+0.19915758j 0.44219935+0.95957203j]
[ 0.86931370+0.97679168j 0.02726614+0.1547769j 0.06635526+0.6628895j
x x 0.01187069+0.25350673j]] - <MadArray at 0x7f4684f66278>
Magnitude mask [[False False False True False True]
[ True False False True False True]
[ True True True True False False]
[False False False True True False]]
Phase mask [[ True True False True True True]
[ True False True False False True]
[False True True False False False]
[False False False True False False]]
Methods and properties¶
A MadArray has methods and properties that give information about the masking.
In [7]:
# mask of non-missing elements
print(Am.get_known_mask())
[[ True False False True True False]
[False False False True True False]
[ True True True False True True]
[ True True False True False True]]
In [8]:
# mask of missing elements
print(Am.get_unknown_mask())
[[False True True False False True]
[ True True True False False True]
[False False False True False False]
[False False True False True False]]
In [9]:
print('Is masked: {}'.format(Am.is_masked))
print('Ratio missing data: {}'.format(Am.ratio_missing_data))
Is masked: <bound method MadArray.is_masked of <MadArray at 0x7f4693c25198>>
Ratio missing data: 0.4166666666666667
Indexing¶
There are two different and incompatible ways to index MadArray. By
default (masked_indexing=False
), it is similar to the indexing of
nd-array: both the data matrix and the mask are indexed, and a
MadArray with the shape defined by the indices is returned:
In [10]:
print(A[0:3, 1:3])
print(Am[0:3, 1:3])
MadArray, dtype=float64, 0 missing entries (0.0%)
[[ 0.01194842 0.61658832]
[ 0.07847182 0.9321147 ]
[ 0.91104309 0.44158461]]
MadArray, dtype=float64, 4 missing entries (66.7%)
[[ x x]
[ x x]
[ 0.91104309 0.44158461]]
With the other way (masked_indexing=True
), a MadArray with the shape
unchanged is returned, where non-indexed entries are considered as
masked.
In [11]:
Am4 = MadArray(data, mask, masked_indexing=True)
print(Am4[0:3, 1:3])
MadArray, dtype=float64, 22 missing entries (91.7%)
[[ x x x x x x]
[ x x x x x x]
[ x 0.91104309 0.44158461 x x x]
[ x x x x x x]]
This latter approach is adapted to be handled with scikit-learn procedures.
Numerical operations¶
Numpy functions apply on MadArray, but without taking into account the mask
In [12]:
print(np.mean(A))
print(np.mean(Am))
0.5499028203648367
0.5499028203648367