{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "%pylab inline" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "source": [ "# Tutorial on how to use `MadArray` objects\n", "\n", "A `MadArray` is a numpy array with missing elements. It is generated using three types of parameter:\n", "\n", "* **data** as an array of entries, either *int*, *float* or *complex*;\n", "* a **mask** indicating the missing entries;\n", "* **options** to define the behaviour of the object.\n", "\n", "A basic initialisation requires only a data matrix. Without mask, all elements are considered as non-missing" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "from madarrays import MadArray\n", "\n", "# initialisation without mask\n", "data = np.random.rand(4, 6)\n", "\n", "A = MadArray(data)\n", "print(A)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Masking\n", "\n", "The masking of data differs according to the type of entries:\n", "\n", "* if the data entries are not *complex* (e.g., *int* or *float*), argument `mask` must be exclusively a boolean array with the same shape as the data array, each entry indicating if the corresponding entry in the data array is missing or not;\n", "* if the data entries are *complex*, the masking can be done as previously, or by giving two boolean arrays `mask_magnitude` and `mask_phase` with the same size with the same shape as the data array, each entry indicating respectively if the magnitude and the phase of the corresponding entry is missing or not." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "# initialization with a mask\n", "mask = np.random.random(data.shape) < 0.5\n", "\n", "Am = MadArray(data, mask)\n", "print(mask)\n", "print(Am)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "source": [ "A *MadArray* can also be defined from another *MadArray*, for example to copy the object:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "Am2 = MadArray(Am)\n", "print('{} - {}'.format(str(Am), repr(Am)))\n", "print('{} - {}'.format(str(Am2), repr(Am2)))" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "source": [ "A different mask can also be used:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "mask2 = np.random.random(data.shape) < 0.9\n", "Am3 = MadArray(Am, mask2)\n", "print('{} - {}'.format(str(Am), repr(Am)))\n", "print('{} - {}'.format(str(Am3), repr(Am3)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For complex data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import madarrays\n", "complex_data = np.random.rand(4, 6) + 1j * np.random.rand(4, 6)\n", "mask_mag = np.random.random(data.shape) < 0.5\n", "mask_pha = np.random.random(data.shape) < 0.5\n", "A_cpx1 = MadArray(complex_data, mask)\n", "A_cpx2 = MadArray(complex_data, mask_magnitude=mask_mag, mask_phase=mask_pha)\n", "print('{} - {}'.format(str(A_cpx1), repr(A_cpx1)))\n", "print('{} - {}'.format(str(A_cpx2), repr(A_cpx2)))\n", "print('Magnitude mask', mask_mag)\n", "print('Phase mask', mask_pha)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "source": [ "## Methods and properties\n", "\n", "A *MadArray* has methods and properties that give information about the masking." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "# mask of non-missing elements\n", "print(Am.get_known_mask())" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "# mask of missing elements\n", "print(Am.get_unknown_mask())" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "print('Is masked: {}'.format(Am.is_masked))\n", "print('Ratio missing data: {}'.format(Am.ratio_missing_data))" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "source": [ "## Indexing\n", "\n", "There are two different and incompatible ways to index *MadArray*. By default (`masked_indexing=False`), it is similar to the indexing of *nd-array*: both the data matrix and the mask are indexed, and a *MadArray* with the shape defined by the indices is returned:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "print(A[0:3, 1:3])\n", "print(Am[0:3, 1:3])" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "source": [ "With the other way (`masked_indexing=True`), a MadArray with the shape unchanged is returned, where non-indexed entries are considered as masked." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "Am4 = MadArray(data, mask, masked_indexing=True)\n", "print(Am4[0:3, 1:3])" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "source": [ "This latter approach is adapted to be handled with *scikit-learn* procedures." ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "source": [ "## Numerical operations\n", "Numpy functions apply on *MadArray*, but **without** taking into account the mask \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "print(np.mean(A))\n", "print(np.mean(Am))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" }, "name": "data_structures.ipynb" }, "nbformat": 4, "nbformat_minor": 1 }