{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "ein.tags": [
     "worksheet-0"
    ],
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "# Tutorial for package `yafe`\n",
    "\n",
    "This tutorial shows the basics for:\n",
    "\n",
    "* designing an experiment by specifying some data, a problem, a solver, performance measures and related parameters;\n",
    "* running an experiment, i.e. running all related tasks;\n",
    "* collecting and analyzing the results of an experiments;\n",
    "* updating an experiment by adding tasks and running them;\n",
    "* dealing with multiple instances of an experiment, e.g. to compare solvers;\n",
    "* understanding `yafe` in more details for debugging or developing more in depth: looking at one specific task, using functions instead of classes, understanding `yafe`'s internal mechanisms.\n",
    "\n",
    "In order to illustrate how to use `yafe`, a simple experiment is implemented: from signals synthesized from a sinusoidal model with several possible frequencies, generate denoising problem by adding Gaussian noise with several signal-to-noise ratio (SNR) levels, solve each problem using a low-pass filter with a filter length parameter and compute the performance in terms of signal-to-distortion ratio (SDR)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "autoscroll": "json-false",
    "collapsed": true,
    "ein.tags": [
     "worksheet-0"
    ],
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "import yafe\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "try:\n",
    "    # The xarray package is optional, it may enhance how to handle results\n",
    "    import xarray\n",
    "except:\n",
    "    xarray = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import tempfile\n",
    "# For this tutorial, we will store the data of our experiments in a temporary directory\n",
    "temp_data_path = tempfile.mkdtemp(prefix='yafe_')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Design an experiment\n",
    "An experiment is based on:\n",
    "\n",
    "* a workflow composed of four blocks: data access, problem generation, solver, performance measures\n",
    "* a set of parameters for each block, whom cartesian product will define the set of all tasks\n",
    "\n",
    "Designing an experiment consists in defining each of those blocks and the set of related parameters.\n",
    "\n",
    "Let us examine a simple signal denoising example."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define access to data\n",
    "Data access should be performed by a *function* that ouputs the data in a dictionary, depending on some parameters passed as arguments.\n",
    "Data parameters must be given in a dictionary whose keys match the parameter names and whose values are the ranges of each parameter, given as lists or 1D ndarrays.\n",
    "\n",
    "In this example, the generated data is a sinusoid with two parameters, length and frequency. The length takes only one value while the frequency ranges from 0.01 to 0.1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def get_sine_data(f0, signal_len=1000):\n",
    "    return {'signal': np.sin(2*np.pi*f0*np.arange(signal_len))}\n",
    "data_params = {'f0': np.arange(0.01, 0.1, 0.01), 'signal_len': [1000]}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define problem generation (using classes)\n",
    "A simple way to define problem generation is to create a class and to follow four rules:\n",
    "\n",
    "* the inputs of the `__init__` method are the parameters of the problem\n",
    "* the parameters of the `__call__` method match the keys of the dictionary obtained from the data access function ``get_data``\n",
    "* the output of the `__call__` method is a dictionary containing the problem data for the solver\n",
    "\n",
    "In this example, noise is added to the input `signal`, with a signal-to-noise ratio given as a problem parameter. An optional `__str__ ` is defined here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class SimpleDenoisingProblem:\n",
    "    def __init__(self, snr_db):\n",
    "        self.snr_db = snr_db\n",
    "\n",
    "    def __call__(self, signal):\n",
    "        random_state = np.random.RandomState(0)\n",
    "        noise = random_state.randn(*signal.shape)\n",
    "        observation = signal + 10 ** (-self.snr_db / 20) * noise / np.linalg.norm(noise) * np.linalg.norm(signal)\n",
    "        problem_data = {'observation': observation}\n",
    "        solution_data = {'signal': signal}\n",
    "        return (problem_data, solution_data)\n",
    "    \n",
    "    def __str__(self):\n",
    "        return 'SimpleDenoisingProblem(snr_db={})'.format(self.snr_db)\n",
    "    \n",
    "problem_params = {'snr_db': [-10, 0, 30]}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define solvers (using classes)\n",
    "Solvers can be defined in a similar way as problems, by creating a class and by following four rules:\n",
    "\n",
    "* the parameters of the solver are passed to the ``__init__`` method\n",
    "* the parameters of the ``__call__`` method match the keys of the dictionary obtained from the problem generation method ``__call__``\n",
    "* the output of the ``__call__`` method is a dictionary containing the solution estimated by the solver.\n",
    "\n",
    "In this example, the problem is solved by low-pass filtering the noisy observation, the filter length being the unique solver parameter. An optional ``__str__`` is defined here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class SmoothingSolver:\n",
    "    def __init__(self, filter_len):\n",
    "        self.filter_len = filter_len\n",
    "        \n",
    "    def __call__(self, observation):\n",
    "        smoothing_filter = np.hamming(self.filter_len)\n",
    "        smoothing_filter /= np.sum(smoothing_filter)\n",
    "        return {'reconstruction': np.convolve(observation, smoothing_filter, mode='same')}\n",
    "        \n",
    "    def __str__(self):\n",
    "        return 'SmoothingSolver(filter_len={})'.format(self.filter_len)\n",
    "\n",
    "solver_params = {'filter_len': 2**np.arange(6, step=2)}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that one may also define problem and solvers using functions instead of classes, as described in section *Alternate way using functions*."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define measures\n",
    "Several performance measures may be calculated from the estimated solution, using other data like the original data, the problem data, or parameters of the data, problem or solver.\n",
    "\n",
    "These performance measures must be computed within a *function* as follows:\n",
    "\n",
    "* its arguments should be dictionaries ``source_data``, ``problem_data``, ``solution_data`` and ``solved_data``, as returned by the data access function, the ``__call__`` methods of problem generation class and the ``__call__`` methods of solver class, respectively; an additional argument ``task`` is a dictionary that contains the data, problem and solver parameters;\n",
    "* its output is a dictionary whose keys and values are the names and values of the various performance measures.\n",
    "\n",
    "In this example, the signal-to-distortion ratio, the euclidian distance and the infinite distance are computed between the estimated solution and the noiseless reference signal."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def measure(solution_data, solved_data, task_params=None, source_data=None, problem_data=None):\n",
    "    euclidian_distance = np.linalg.norm(solution_data['signal']-solved_data['reconstruction'])\n",
    "    sdr = 20 * np.log10(np.linalg.norm(solution_data['signal']) / euclidian_distance)\n",
    "    inf_distance = np.linalg.norm(solution_data['signal']-solved_data['reconstruction'], ord=np.inf)\n",
    "    return {'sdr': sdr, 'euclidian_distance': euclidian_distance, 'inf_distance': inf_distance}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the experiment\n",
    "All the components being defined, one can simply create an experiment as an instance of the `Experiment` class, by passing a name and each component to the constructor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "my_first_exp = yafe.Experiment(name='My first experiment',\n",
    "                               get_data=get_sine_data,\n",
    "                               get_problem=SimpleDenoisingProblem,\n",
    "                               get_solver=SmoothingSolver,\n",
    "                               measure=measure,\n",
    "                               force_reset=True,\n",
    "                               data_path=temp_data_path,\n",
    "                               log_to_file=False,\n",
    "                               log_to_console=False)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Add all parameters as tasks\n",
    "Then all the parameters' ranges are passed to the `add_tasks` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "my_first_exp.add_tasks(data_params=data_params, problem_params=problem_params, solver_params=solver_params)\n",
    "print(my_first_exp)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run an experiment\n",
    "Running an experiment consists in:\n",
    "\n",
    "* generating tasks from the experiment parameters\n",
    "* executing all pending tasks\n",
    "* collecting all the task results\n",
    "\n",
    "At this point, no task appear:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "my_first_exp.display_status()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One must call the `generate_tasks` method, which will set up the internal mechanisms to create tasks, the set of tasks being the cartesian product between all parameter ranges in the experiment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "my_first_exp.generate_tasks()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "my_first_exp.display_status()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One can see the total number of tasks, which equals the product between the size of all parameter ranges in the experiment:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "print(np.product([np.array(v).size\n",
    "                  for params in (data_params, problem_params, solver_params) \n",
    "                  for v in params.values()]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One may run a specific task from its id:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "my_first_exp.run_task_by_id(idt=5)\n",
    "my_first_exp.display_status()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This may usefull to check whether a task is running successfully, to run a specific task within a job on a computer cluster."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One may also run all pending tasks:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "my_first_exp.launch_experiment()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "my_first_exp.display_status()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Results from all tasks needs to be collected and gathered in a unique structure before being analyzed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "my_first_exp.collect_results()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exploit results\n",
    "Results are stored in a hypercube whose axes are the parameters of the experiment, with an additional axis containing the performance measures. After collecting results, one may load them in an ndarray together with the labels and values of the related axes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "results, axes_labels, axes_values = my_first_exp.load_results()\n",
    "print(results.shape, results.dtype)\n",
    "print(axes_labels)\n",
    "print(axes_values)\n",
    "# Let us display one value:\n",
    "print('Euclidian distance value for paramenters f0=0.04, signal length=1000, SNR=30 and filter length=1:',\n",
    "      results[3, 0, 2, 0, 1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One may then analyze and display the results.\n",
    "\n",
    "Here, performance measures are average along the data axes, and the resulting averaged SDR measure is displayed as a function of the problem SNR for each filter length:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Compute average results w.r.t. all data (axis 0 and 1)\n",
    "mean_results = np.mean(results, axis=(0, 1))\n",
    "\n",
    "# For each solver parameter (axis 3), plot SDR (axis 4, first element) as a function of the problem SNR (axis 2)\n",
    "for i_solver_param, solver_param in enumerate(axes_values[3]):\n",
    "    plt.plot(axes_values[2],\n",
    "             mean_results[:, i_solver_param, 0], \n",
    "             label='{}: {}'.format('solver_filter_len', solver_param))\n",
    "plt.xlabel('Problem SNR (dB)')\n",
    "plt.ylabel('SDR (dB)')\n",
    "plt.legend()\n",
    "pass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One can note that handling such multidimensional arrays and their multiple axes is likely to be confusing and is prone to dissimulate errors. A more convenient and reliable way is to handle an `DataArray` object, from the optional package `xarray`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "if xarray:\n",
    "    xresults = my_first_exp.load_results(array_type='xarray')\n",
    "    print(xresults.shape)\n",
    "    print(xresults.coords)\n",
    "    # Let us display one value:\n",
    "    print('Euclidian distance value for paramenters f0=0.04, signal length=1000, SNR=30 and filter length=1:')\n",
    "    print(xresults.sel(data_f0=0.04, \n",
    "                       data_signal_len=1000, \n",
    "                       problem_snr_db=30, \n",
    "                       solver_filter_len=1, \n",
    "                       measure='euclidian_distance'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "if xarray:\n",
    "    # Compute average results w.r.t. all data\n",
    "    mean_results = xresults.mean(['data_f0', 'data_signal_len'])\n",
    "    \n",
    "    # For each solver parameter, plot SDR as a function of the problem SNR\n",
    "    for solver_param in mean_results['solver_filter_len'].values:\n",
    "        plt.plot(mean_results['problem_snr_db'].values,\n",
    "                 mean_results.sel(solver_filter_len=solver_param, measure='sdr'), \n",
    "                 label='{}: {}'.format('solver_filter_len', solver_param))\n",
    "        plt.legend()\n",
    "        plt.xlabel('Problem SNR (dB)')\n",
    "        plt.ylabel('SDR (dB)')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is an extended plot of the results, which is provided equivalently in the `numpy` and `xarray` result format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def plot_results(results, axes_labels, axes_values):\n",
    "    fig = plt.gcf()\n",
    "    axes = fig.subplots(1, results.shape[4])\n",
    "    mean_results = np.mean(results, axis=(0, 1))  # Average w.r.t. input data\n",
    "    for i_meas in range(results.shape[4]):\n",
    "        for i_solver_param, solver_param in enumerate(axes_values[3]):\n",
    "            axes[i_meas].plot(axes_values[2],\n",
    "                              mean_results[:, i_solver_param, i_meas], \n",
    "                              label='{}: {}'.format(axes_labels[3], solver_param))\n",
    "        axes[i_meas].set_xlabel('problem_snr_db')\n",
    "        axes[i_meas].set_title(axes_values[4][i_meas])\n",
    "        axes[i_meas].legend()\n",
    "\n",
    "if xarray:\n",
    "    def plot_xresults(xresults):\n",
    "        fig = plt.gcf()\n",
    "        abscissa = 'problem_snr_db'\n",
    "        k_solver_param = [s for s in xresults.dims if s.startswith('solver_')][0]  # Find solver parameter key\n",
    "        fig = plt.gcf()\n",
    "        axes = fig.subplots(1, xresults['measure'].values.size)\n",
    "        mean_results = xresults.mean(['data_f0', 'data_signal_len'])  # Average w.r.t. input data\n",
    "        for i_meas, k_meas in enumerate(mean_results['measure'].values):\n",
    "            for solver_param in mean_results[k_solver_param].values:\n",
    "                sel_dict = {'measure': k_meas, k_solver_param: solver_param}  # Use a dict to select results\n",
    "                axes[i_meas].plot(mean_results[abscissa].values,\n",
    "                                  mean_results.sel(**sel_dict), \n",
    "                                  label='{}: {}'.format(k_solver_param, solver_param))\n",
    "            axes[i_meas].set_xlabel(abscissa)\n",
    "            axes[i_meas].set_title(k_meas)\n",
    "            axes[i_meas].legend()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "plt.figure(figsize=(15, 5))\n",
    "plot_results(results, axes_labels, axes_values)\n",
    "\n",
    "if xarray:\n",
    "    plt.figure(figsize=(15, 5))\n",
    "    plot_xresults(xresults)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Add news tasks\n",
    "Adding new task is perfomed by calling `add_tasks` again with additional parameters. It will cause the extension of the cartesian product as if all the parameters had been given together. Tasks that were previously completed will not be executed again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "additional_snr_db = [10, 20]\n",
    "additional_filter_len = 2**np.arange(6)\n",
    "my_first_exp.add_tasks(problem_params={'snr_db': additional_snr_db}, data_params=dict(), \n",
    "                 solver_params={'filter_len': additional_filter_len})\n",
    "my_first_exp.generate_tasks()\n",
    "my_first_exp.display_status()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "my_first_exp.launch_experiment()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "my_first_exp.collect_results()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "results, axes_labels, axes_values = my_first_exp.load_results()\n",
    "plt.figure(figsize=(15, 5))\n",
    "plot_results(results, axes_labels, axes_values)\n",
    "\n",
    "if xarray:\n",
    "    xresults = my_first_exp.load_results(array_type='xarray')\n",
    "    plt.figure(figsize=(15, 5))\n",
    "    plot_xresults(xresults)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use several solvers\n",
    "One may define several solvers to address the same problem and compare various approaches.\n",
    "\n",
    "For instance, let us create a solver that uses median filtering, with the radius of the local filter as parameter:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class MedianSolver():\n",
    "    def __init__(self, radius):\n",
    "        self.radius = radius\n",
    "        \n",
    "    def __call__(self, observation):\n",
    "        reconstruction = np.zeros_like(observation)\n",
    "        for i in range(observation.size):\n",
    "            i_start = max(0, i - self.radius)\n",
    "            i_end = min(observation.size, i + self.radius + 1)\n",
    "            reconstruction[i] = np.median(observation[i_start:i_end])\n",
    "        return {'reconstruction': reconstruction}\n",
    "\n",
    "    def __str__(self):\n",
    "        return 'MedianSolver(radius={})'.format(self.radius)\n",
    "\n",
    "median_solver_params = {'radius': [0, 1, 5, 10]}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One must create a new instance of the Experiment for each solver, using the same parameters, functions and classes for the data, problem and performance measures (creating a new instance is needed since the solvers are not sharing the same parameter space):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "median_solver_exp = yafe.Experiment(name='Median solver experiment',\n",
    "                                    get_data=get_sine_data,\n",
    "                                    get_problem=SimpleDenoisingProblem,\n",
    "                                    get_solver=MedianSolver,\n",
    "                                    measure=measure,\n",
    "                                    force_reset=True,\n",
    "                                    data_path=temp_data_path,\n",
    "                                    log_to_file=False,\n",
    "                                    log_to_console=False)\n",
    "median_solver_exp.add_tasks(data_params=data_params, problem_params=problem_params, solver_params=median_solver_params)\n",
    "median_solver_exp.add_tasks(problem_params={'snr_db': additional_snr_db}, data_params=dict(), solver_params=dict())\n",
    "median_solver_exp.generate_tasks()\n",
    "median_solver_exp.display_status()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "median_solver_exp.launch_experiment()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "median_solver_exp.collect_results()\n",
    "\n",
    "if xarray:\n",
    "    xresults = median_solver_exp.load_results(array_type='xarray')\n",
    "    plt.figure(figsize=(15, 5))\n",
    "    plot_xresults(xresults)\n",
    "else:\n",
    "    results, axes_labels, axes_values = median_solver_exp.load_results()\n",
    "    plt.figure(figsize=(15, 5))\n",
    "    plot_results(results, axes_labels, axes_values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One can then compare the solvers, with results in either `ndarray` or `xarray` format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def compare_solvers_and_plot_results(smooth_results, smooth_axes_labels, smooth_axes_values,\n",
    "                                     median_results, median_axes_labels, median_axes_values):\n",
    "    \"\"\" Compare ndarray-format results between two solvers, \n",
    "\n",
    "    The results are assumed to be structured as follows:\n",
    "    * both smooth_results and median_results should have similar structures, \n",
    "      including order of axes labels and axes values\n",
    "    * data parameters are on axes 0 and 1\n",
    "    * problem parameter SNR is on axis 2\n",
    "    * solver parameter is on axis 3\n",
    "    * performance measure is on axis 4\n",
    "    \"\"\"\n",
    "    fig = plt.gcf()\n",
    "    axes = fig.subplots(1, smooth_results.shape[-1])\n",
    "    for results, axes_labels, axes_values, name in [\n",
    "        (smooth_results, smooth_axes_labels, smooth_axes_values, 'smoothing'),\n",
    "        (median_results, median_axes_labels, median_axes_values, 'median')]:\n",
    "        mean_results = np.mean(results, axis=(0, 1))  # Average w.r.t. input data\n",
    "        for i_meas in range(results.shape[-1]):  # One measure per subplot\n",
    "            # Fill an area between min and max values w.r.t. solver parameters\n",
    "            axes[i_meas].fill_between(axes_values[2],\n",
    "                                      np.min(mean_results[:, :, i_meas], axis=1), \n",
    "                                      np.max(mean_results[:, :, i_meas], axis=1), \n",
    "                                      alpha=0.5,\n",
    "                                      label=name)\n",
    "            axes[i_meas].set_xlabel(axes_labels[2])\n",
    "            axes[i_meas].set_title(axes_values[-1][i_meas])\n",
    "            axes[i_meas].legend()\n",
    "\n",
    "if xarray:\n",
    "    def compare_solvers_and_plot_xresults(smooth_xresults, median_xresults):\n",
    "        \"\"\" Compare xarray-format results between two solvers \"\"\"\n",
    "        fig = plt.gcf()\n",
    "        abscissa = 'problem_snr_db'\n",
    "        axes = fig.subplots(1, smooth_xresults['measure'].values.size)\n",
    "        for xresults, name in [(smooth_xresults, 'smoothing'), (median_xresults, 'median')]:\n",
    "            k_solver_param = [s for s in xresults.dims if s.startswith('solver_')][0]  # Find solver parameter key\n",
    "            mean_results = xresults.mean(['data_f0', 'data_signal_len'])  # Average w.r.t. input data\n",
    "            for i_meas, k_meas in enumerate(mean_results['measure'].values):  # One measure per subplot\n",
    "                # Fill an area between min and max values w.r.t. solver parameters\n",
    "                axes[i_meas].fill_between(mean_results[abscissa].values,\n",
    "                                          mean_results.sel(measure=k_meas).min(k_solver_param), \n",
    "                                          mean_results.sel(measure=k_meas).max(k_solver_param), \n",
    "                                          alpha=0.5,\n",
    "                                          label=name)\n",
    "                axes[i_meas].set_xlabel(abscissa)\n",
    "                axes[i_meas].set_title(k_meas)\n",
    "                axes[i_meas].legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that structures getting more complex, one may prefer to handle `xarray` objects for clarity and error-free purposes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "smooth_results, smooth_axes_labels, smooth_axes_values = my_first_exp.load_results()\n",
    "median_results, median_axes_labels, median_axes_values = median_solver_exp.load_results()\n",
    "fig = plt.figure(figsize=(15, 5))\n",
    "compare_solvers_and_plot_results(\n",
    "    smooth_results, smooth_axes_labels, smooth_axes_values,\n",
    "    median_results, median_axes_labels, median_axes_values)\n",
    "\n",
    "if xarray:\n",
    "    smooth_xresults = my_first_exp.load_results(array_type='xarray')\n",
    "    median_xresults = median_solver_exp.load_results(array_type='xarray')\n",
    "    fig = plt.figure(figsize=(15, 5))\n",
    "    compare_solvers_and_plot_xresults(smooth_xresults, median_xresults)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Look at one specific task\n",
    "Let us detail how to handle one specific task."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### From a task id\n",
    "In order to look at a particular task from its id, get all available data using method `get_task_data_by_id`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "idt = 27\n",
    "task_data = my_first_exp.get_task_data_by_id(idt=idt)\n",
    "print(task_data.keys())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One may then recompute easily some part of the process:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Get source data\n",
    "source_data = my_first_exp.get_data(**task_data['task_params']['data_params'])\n",
    "print(task_data['task_params']['data_params'])\n",
    "\n",
    "# Get problem, problem data and that it equals what was computed previously\n",
    "problem = my_first_exp.get_problem(**task_data['task_params']['problem_params'])\n",
    "print(problem)\n",
    "problem_data, solution_data = problem(**source_data)\n",
    "\n",
    "# Get solver, compute solved data\n",
    "solver = my_first_exp.get_solver(**task_data['task_params']['solver_params'])\n",
    "print(solver)\n",
    "solved_data = solver(**problem_data)\n",
    "\n",
    "# Compute performane measures\n",
    "results = my_first_exp.measure(solution_data=solution_data, solved_data=solved_data)\n",
    "print('Performance measures:', results)\n",
    "\n",
    "# Compare all generated data to what was computed previously\n",
    "print('Source data match:', np.all(source_data['signal'] == task_data['source_data']['signal']))\n",
    "print('Problem data match:', np.all(problem_data['observation'] == task_data['problem_data']['observation']))\n",
    "print('Solved data match:', np.all(solved_data['reconstruction'] == task_data['solved_data']['reconstruction']))\n",
    "print('Performance measure match:', \n",
    "      np.all([np.all(results[k_measure] == task_data['result'][k_measure])\n",
    "             for k_measure in results.keys()]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### From task parameters\n",
    "In order to look at a particular task from its parameters, get all available data using method `get_task_data_by_id`, using `data_*`, `problem_* `, `solver_*` to denote parameters of the data, problem and solver providers respectively, replacing `*` by the name of the parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "task_data = my_first_exp.get_task_data_by_params(data_params={'f0': 0.05, 'signal_len': 1000},\n",
    "                                                 problem_params={'snr_db': 0},\n",
    "                                                 solver_params={'filter_len': 4})\n",
    "print('Task ID:', task_data['id_task'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Specifying arbitrary parameter values\n",
    "Here, parameter values are not in the parameter ranges of the experiment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data_params={'f0': 0.015, 'signal_len': 200}\n",
    "problem_params={'snr_db': 20}\n",
    "solver_params={'filter_len': 9}\n",
    "\n",
    "# Get source data\n",
    "source_data = my_first_exp.get_data(**data_params)\n",
    "\n",
    "# Get problem, problem data and that it equals what was computed previously\n",
    "problem = my_first_exp.get_problem(**problem_params)\n",
    "print(problem)\n",
    "problem_data, solution_data = problem(**source_data)\n",
    "\n",
    "# Get solver, compute solved data\n",
    "solver = my_first_exp.get_solver(**solver_params)\n",
    "print(solver)\n",
    "solved_data = solver(**problem_data)\n",
    "\n",
    "# Compute performane measures\n",
    "results = my_first_exp.measure(solution_data= solution_data, solved_data=solved_data)\n",
    "print('Performance measures:', results)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "fig = plt.figure(figsize=(15,10))\n",
    "axes = fig.subplots(2, 1)\n",
    "axes[0].plot(source_data['signal'], label='signal')\n",
    "axes[0].plot(problem_data['observation'], label='observation')\n",
    "axes[0].plot(solved_data['reconstruction'], label='reconstruction')\n",
    "axes[0].legend()\n",
    "axes[1].plot(source_data['signal']-problem_data['observation'], label='noise')\n",
    "axes[1].plot(source_data['signal']-solved_data['reconstruction'], label='residue')\n",
    "axes[1].legend()\n",
    "pass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## An alternate way using functions\n",
    "When designing a problem and a solver, one may want to use functions only instead of classes. Here is the variant of the first experiment using functions only"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data_params = {'f0': np.arange(0.01, 0.1, 0.01), 'signal_len': [1000]}\n",
    "def add_noise_to_signal(signal, snr_db):\n",
    "    noise = np.random.randn(*signal.shape)\n",
    "    observation = signal + 10 ** (-snr_db / 20) * noise / np.linalg.norm(noise) * np.linalg.norm(signal)\n",
    "    return observation\n",
    "\n",
    "def get_problem(snr_db):\n",
    "    def generate_problem(signal):\n",
    "        problem_data = {'observation': add_noise_to_signal(signal, snr_db)}\n",
    "        solution_data = {'signal': signal}\n",
    "        return (problem_data, solution_data)\n",
    "    return generate_problem\n",
    "\n",
    "problem_params = {'snr_db': [-10, 0, 30]}\n",
    "\n",
    "def denoise_with_smooth_filter(observation, filter_len):\n",
    "    filter = np.hamming(filter_len)\n",
    "    filter /= np.sum(filter)\n",
    "    return np.convolve(observation, filter, mode='same')\n",
    "\n",
    "def get_solver(filter_len):\n",
    "    def solve_problem(observation):\n",
    "        return {'reconstruction': denoise_with_smooth_filter(observation, filter_len)}\n",
    "    return solve_problem\n",
    "\n",
    "solver_params = {'filter_len': 2**np.arange(6, step=2)}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "variant_exp = yafe.Experiment(name='Variant of first experiment',\n",
    "                              get_data=get_sine_data,\n",
    "                              get_problem=get_problem,\n",
    "                              get_solver=get_solver,\n",
    "                              measure=measure,\n",
    "                              force_reset=True,\n",
    "                              data_path=temp_data_path,\n",
    "                              log_to_file=False,\n",
    "                              log_to_console=False)\n",
    "variant_exp.add_tasks(data_params=data_params, problem_params=problem_params, solver_params=solver_params)\n",
    "print(variant_exp._schema)\n",
    "variant_exp.generate_tasks()\n",
    "variant_exp.launch_experiment()\n",
    "variant_exp.collect_results()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "plt.figure(figsize=(15, 5))\n",
    "if xarray:\n",
    "    xresults = variant_exp.load_results(array_type='xarray')\n",
    "    plt.figure(figsize=(15, 5))\n",
    "    plot_xresults(xresults)\n",
    "else:\n",
    "    results, axes_labels, axes_values = variant_exp.load_results()\n",
    "    plot_results(results, axes_labels, axes_values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Internal mechanisms\n",
    "Here are some details about how an experiment is handled internally. This should not be needed for the general user but may help in some cases (debugging, extending `yafe`, and so on)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `Experiment._schema`\n",
    "Attribute `_schema` is a dictionary where all the experiment parameters are stored. Keys are tuples with two elements: the first one denotes the block related to the parameter (`'data'`, `'problem'` or `'solver'`); the second one is the parameter name defined by the user. Attribute `_schema` should not be modified by the user in order to preserve the integrity of the `Experiment` object and related data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "my_first_exp._schema"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Task numbering\n",
    "Task IDs are assigned to new tasks once for all when generating tasks (`Experiment.generate_tasks`). The method looks at the IDs already assigned to existing tasks and assign available IDs to new tasks. As a consequence, task IDs are dependent on the sequence of task creations and does not only depends on the combination of parameters related to the task.\n",
    "This make it easier the management of task IDs when adding new tasks. This also implies that finding the one-to-one matching between task IDs and parameters requires to parse all task data from their ID and to check the parameters of each task. This is time consuming but only happens when generating tasks, collecting results and checking the status of the Experiment."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Internal data files\n",
    "Experiments and related tasks are relying on files where parameters, intermediate data and final results are stored:\n",
    "\n",
    "* All files are stored in a folder whose name matches that of the experiment, located either in the path passed using the parameter `data_path` when creating the Experiment, or in the data path defined in the user-defined `yafe.conf` file.\n",
    "* File `_schema.pickle` contains the schema of the parameters for each section of the experiment, which can be read and written using the property `Experiment._schema`.\n",
    "* File `results.npz` contains the results gathered by the method `collect_results()`.\n",
    "* For each task, a subfolder named by the task id is created by method `generate_tasks()`, and the following files are added when the task is executed by methods `run_task_by_id()` and `launch_experiment()`:\n",
    "    * `task_params.pickle`: parameters of the task,\n",
    "    * `source_data.pickle`: data returned by the data provider,\n",
    "    * `problem_data.pickle`: problem data returned by the problem provider,\n",
    "    * `solution_data.pickle`: solution data returned by the problem provider,\n",
    "    * `solved_data.pickle`: data returned by the solver,\n",
    "    * `result.pickle`: results returned by the performance measure function,\n",
    "    * `error_log`: error log if an error occurs during any processing step when the task is run with `launch_experiment()`.\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  },
  "name": "data_structures.ipynb"
 },
 "nbformat": 4,
 "nbformat_minor": 1
}