In [None]:
%pylab inline

# Tutorial on how to use `MadArray` objects

A `MadArray` is a numpy array with missing elements. It is generated using three types of parameter:

* **data** as an array of entries, either *int*, *float* or *complex*;
* a **mask** indicating the missing entries;
* **options** to define the behaviour of the object.

A basic initialisation requires only a data matrix. Without mask, all elements are considered as non-missing

In [None]:
from madarrays import MadArray

# initialisation without mask
data = np.random.rand(4, 6)

A = MadArray(data)
print(A)

## Masking

The masking of data differs according to the type of entries:

* if the data entries are not *complex* (e.g., *int* or *float*), argument `mask` must be exclusively a boolean array with the same shape as the data array, each entry indicating if the corresponding entry in the data array is missing or not;
* if the data entries are *complex*, the masking can be done as previously, or by giving two boolean arrays `mask_magnitude` and `mask_phase` with the same size with the same shape as the data array, each entry indicating respectively if the magnitude and the phase of the corresponding entry is missing or not.

In [None]:
# initialization with a mask
mask = np.random.random(data.shape) < 0.5

Am = MadArray(data, mask)
print(mask)
print(Am)

A *MadArray* can also be defined from another *MadArray*, for example to copy the object:

In [None]:
Am2 = MadArray(Am)
print('{} - {}'.format(str(Am), repr(Am)))
print('{} - {}'.format(str(Am2), repr(Am2)))

A different mask can also be used:

In [None]:
mask2 = np.random.random(data.shape) < 0.9
Am3 = MadArray(Am, mask2)
print('{} - {}'.format(str(Am), repr(Am)))
print('{} - {}'.format(str(Am3), repr(Am3)))

For complex data:

In [None]:
import madarrays
complex_data = np.random.rand(4, 6) + 1j * np.random.rand(4, 6)
mask_mag = np.random.random(data.shape) < 0.5
mask_pha = np.random.random(data.shape) < 0.5
A_cpx1 = MadArray(complex_data, mask)
A_cpx2 = MadArray(complex_data, mask_magnitude=mask_mag, mask_phase=mask_pha)
print('{} - {}'.format(str(A_cpx1), repr(A_cpx1)))
print('{} - {}'.format(str(A_cpx2), repr(A_cpx2)))
print('Magnitude mask', mask_mag)
print('Phase mask', mask_pha)

## Methods and properties

A *MadArray* has methods and properties that give information about the masking.

In [None]:
# mask of non-missing elements
print(Am.get_known_mask())

In [None]:
# mask of missing elements
print(Am.get_unknown_mask())

In [None]:
print('Is masked: {}'.format(Am.is_masked))
print('Ratio missing data: {}'.format(Am.ratio_missing_data))

## Indexing

There are two different and incompatible ways to index *MadArray*. By default (`masked_indexing=False`), it is similar to the indexing of *nd-array*: both the data matrix and the mask are indexed, and a *MadArray* with the shape defined by the indices is returned:

In [None]:
print(A[0:3, 1:3])
print(Am[0:3, 1:3])

With the other way (`masked_indexing=True`), a MadArray with the shape unchanged is returned, where non-indexed entries are considered as masked.

In [None]:
Am4 = MadArray(data, mask, masked_indexing=True)
print(Am4[0:3, 1:3])

This latter approach is adapted to be handled with *scikit-learn* procedures.

## Numerical operations
Numpy functions apply on *MadArray*, but **without** taking into account the mask 


In [None]:
print(np.mean(A))
print(np.mean(Am))