{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "%pylab inline" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "source": [ "# Tutorial on how to use `Waveform` objects" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "source": [ "A *Waveform* is a *MadArray* dedicated to handle audio signals. As such, it has a mandatory attribute *fs*, giving the sampling frequency of the signal.\n", "\n", "## Initialization\n", "\n", "As for *MadArray*, *Waveform* can be initialized from a 1D nd-array with or without mask. The parameter *fs* should be explicitly given. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "from madarrays import Waveform\n", "\n", "fs = 8000\n", "f0 = 200\n", "f1 = 220\n", "x_len = fs // 4\n", "x = np.cos(2*np.pi*f0*np.arange(x_len)/fs) + np.cos(2*np.pi*f1*np.arange(x_len)/fs)\n", "x *= np.hanning(x_len)\n", "x /= np.max(np.abs(x))\n", "mask = np.zeros_like(x, dtype=np.bool) \n", "mask[int(0.4*x_len):int(0.6*x_len)] = 1\n", "\n", "# initialization without missing samples\n", "w = Waveform(x, fs=fs)\n", "w.plot()\n", "print(w)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "# initialization with missing samples\n", "wm = Waveform(x, fs=fs, mask=mask)\n", "wm.plot()\n", "print(wm)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "source": [ "A *Waveform* can also be initialized from another *Waveform*. In this case, the parameter *fs* is optional." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "wm2 = Waveform(wm)\n", "wm2.plot()\n", "print(wm2)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "source": [ "If *fs* is provided, the audio signal is **not** resampled" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "wm3 = Waveform(wm, fs=22050)\n", "wm3.plot()\n", "print(wm3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Stereo signals are handled as $N \\times 2$ arrays:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x_stereo = np.array([np.cos(2*np.pi*0.001*np.arange(2000)),\n", " np.sin(2*np.pi*0.001*np.arange(2000))]). T\n", "mask_stereo = np.zeros_like(x_stereo, dtype=np.bool) \n", "mask_stereo[250:500, 0] = 1\n", "mask_stereo[1000:1500, 1] = 1\n", "\n", "w_stereo = Waveform(x_stereo, mask=mask_stereo, fs=1)\n", "\n", "w_plot = w_stereo.plot()\n", "legend(w_plot, ('left', 'right'))\n", "print(w_stereo)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Extracting left and right channels as mono *Waveform* objects is easy:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "w_left = w_stereo[:, 0]\n", "w_right = w_stereo[:, 1]\n", "\n", "w_left.plot(label='left mono')\n", "w_right.plot(label='right mono')\n", "legend()\n", "\n", "print('Is w_left stereo?', w_left.is_stereo())\n", "print('Is w_right stereo?', w_left.is_stereo())\n", "print(w_left)\n", "print(w_right)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "source": [ "## Special audio abilities\n", "\n", "### Resampling\n", "A *Waveform* can be resampled using the *resample* method:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "wr = Waveform(w)\n", "wr.resample(22050)\n", "plt.subplot(211)\n", "w.plot()\n", "plt.subplot(212)\n", "wr.plot()\n", "print(w)\n", "print(wr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Changing the sampling frequency without resampling the waveform" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "w_fs = Waveform(w)\n", "w_fs.fs = 22050\n", "plt.subplot(211)\n", "w.plot()\n", "plt.subplot(212)\n", "w_fs.plot()\n", "print(w)\n", "print(w_fs)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "source": [ "### Intensity\n", "A *Waveform* has an attribute *rms* giving the root mean square of the audio signal (where missing samples equal zero). It can be changed by setting a new value." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "w_rms = Waveform(w)\n", "\n", "plt.subplot(211)\n", "w_rms.plot()\n", "print('RMS before modification: ', w_rms.rms)\n", "\n", "w_rms.set_rms(1)\n", "\n", "plt.subplot(212)\n", "w_rms.plot()\n", "print('RMS after modification: ', w_rms.rms)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "source": [ "### Properties\n", "A *Waveform* has several attributes that give information about the audio signal" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "print('Length: {} samples'.format(w.length))\n", "print('Duration: {} s'.format(w.duration))\n", "print('Time axis: {}'.format(w.time_axis))" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "source": [ "### Plotting\n", "A *Waveform* can be plotted, as well as the associated mask." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "plt.figure()\n", "wm.plot()\n", "plt.title('Audio signal')\n", "\n", "plt.figure()\n", "wm.plot_mask()\n", "plt.title('Mask')\n", "pass" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "source": [ "### Playing sound\n", "The sound can be played using *show_player* in a notebook or *play* in a console." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "json-false", "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "w.show_player()" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": [ "worksheet-0" ], "slideshow": { "slide_type": "-" } }, "source": [ "#### I/O\n", "A *Waveform* can be exported as a .wav file using *to_wavfile*:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "f0_io = 10\n", "fs_io = 8000\n", "x_io_len = fs_io\n", "x_io = np.array([np.cos(2*np.pi*f0_io/fs_io*np.arange(x_io_len)),\n", " np.sin(2*np.pi*f0_io/fs_io*np.arange(x_io_len))]).T\n", "mask_io = np.zeros_like(x_io, dtype=bool)\n", "mask_io[0, -1000:] = mask_io[1, -500:] = True\n", "w_io = Waveform(x_io, mask=mask_io, fs=fs)\n", "w_io.plot()\n", "print(w_io)\n", "\n", "w_io.to_wavfile('my_sound.wav')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A .wav file can be read using static method *from_wavfile*, returning a *Waveform*:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "w_load = Waveform.from_wavfile('my_sound.wav')\n", "w_load.plot()\n", "print(w_load)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Stereo files may be converted to mono\n", "for mode in ('left', 'right', 'mean'):\n", " w_load = Waveform.from_wavfile('my_sound.wav', conversion_to_mono=mode)\n", " w_load.plot(label=mode)\n", "legend()\n", "pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that:\n", "\n", "* *dtype*: float/int data types are conserved when exporting a *Waveform*, since the .wav format allows many data types. However, many audio players only read .wav files coded with int16 values so you may not be able to listen to your exported sound with your favorite player. In that case, you may convert the data type of your *Waveform* using the optional *dtype* argument of method *to_wavfile*.\n", "* mask: the mask is lost when exporting to a .wav file.\n", "* sampling frequency: sampling frequencies may be arbitrary ``float`` or ``int`` values; however, only a restricted set of sampling frequencies are allowed for input/output (see set of supported frequencies ``madarrays.waveform.VALID_IO_FS` below).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from madarrays.waveform import VALID_IO_FS\n", "print(VALID_IO_FS)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Clipping\n", "\n", "Clipping a *Waveform* is done by using the `clip` method, taking as arguments the minimal and maximal values. Warnings are displayed to inform the user if any value has been clipped." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "wm_clipped = wm.copy()\n", "\n", "wm_clipped.clip(min_value=-0.75, max_value=0.25)\n", "\n", "# Plot signals\n", "plt.figure()\n", "wm.plot('b', label='x')\n", "wm_clipped.plot('y', label='y')\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Type of entries in Waveform\n", "\n", "This section is for advanced usages.\n", "\n", "Audio data can have different types, that are associated with specific constraints on the values:\n", "\n", "* *float* (np.float16, no.float32, np.float64): the values are float between -1 and 1;\n", "* *int* (np.uint8, np.int16, np.int32): the values are integers between a range that depends on the precision.\n", "* *complex* (np.complex64, np.complex128): the real and imaginary parts are float betwen -1 and 1.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Integer-valued waveforms\n", "Method *Waveform.astype* not only converts data types but also scale values to the range of the target type. The choice among the available integer types will result in different ranges. The following figures show integer-valued waveforms with different types: on the first row, waveforms created without conversion, from integer-valued data arrays where the full `dtype` range is used; on the second row, similar waveforms are created with a conversion from a float-valued array with entries in [-1, 1]." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fs = 1000\n", "f0 = 10\n", "duration = 1\n", "t = np.linspace(0, duration, int(duration*fs))\n", "x_cos = 0.5 * np.cos(2*np.pi*f0*t)\n", "\n", "w_uint8 = Waveform((2**7*x_cos + 128).astype(np.uint8), fs=fs)\n", "w_int16 = Waveform((2**15*x_cos).astype(np.int16), fs=fs)\n", "w_int32 = Waveform((2**31*x_cos).astype(np.int32), fs=fs)\n", "\n", "plt.figure(figsize=(20, 5))\n", "plt.subplot(131)\n", "plt.title('uint8')\n", "w_uint8.plot()\n", "plt.subplot(132)\n", "plt.title('int16')\n", "w_int16.plot()\n", "plt.subplot(133)\n", "plt.title('int32')\n", "w_int32.plot()\n", "\n", "w_uint8 = Waveform(x_cos, fs=fs).astype(np.uint8)\n", "w_int16 = Waveform(x_cos, fs=fs).astype(np.int16)\n", "w_int32 = Waveform(x_cos, fs=fs).astype(np.int32)\n", "\n", "plt.figure(figsize=(20, 5))\n", "plt.subplot(131)\n", "plt.title('uint8')\n", "w_uint8.plot()\n", "plt.subplot(132)\n", "plt.title('int16')\n", "w_int16.plot()\n", "plt.subplot(133)\n", "plt.title('int32')\n", "w_int32.plot()\n", "pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Real-valued waveforms\n", "The choice among the available float types will not affect the range of the values but the precision. In the following example, one may observe how the floating-point precision varies, depending on the float type, when the fractionnal part is very small compared to the exponent part, which equals 1 here (see right column)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fs = 1000\n", "f0 = 10\n", "duration = 1\n", "t = np.linspace(0, duration, int(duration*fs))\n", "\n", "w_float16 = Waveform(np.cos(2*np.pi*f0*t).astype(np.float16) + 1, fs=fs)\n", "w_float32 = Waveform(np.cos(2*np.pi*f0*t).astype(np.float32) + 1, fs=fs)\n", "w_float64 = Waveform(np.cos(2*np.pi*f0*t).astype(np.float64) + 1, fs=fs)\n", "\n", "plt.figure(figsize=(20, 15))\n", "plt.subplot(321)\n", "plt.title('float16')\n", "w_float16.plot()\n", "plt.subplot(323)\n", "plt.title('float32')\n", "w_float32.plot()\n", "plt.subplot(325)\n", "plt.title('float64')\n", "w_float64.plot()\n", "\n", "eps16=np.finfo(np.float16).eps * 4\n", "eps32=np.finfo(np.float32).eps * 4\n", "eps64=np.finfo(np.float64).eps * 4\n", "print(eps16, eps32, eps64)\n", "\n", "w_float16 = Waveform(np.cos(2*np.pi*f0*t).astype(np.float16) * eps16 + 1, fs=fs)\n", "w_float32 = Waveform(np.cos(2*np.pi*f0*t).astype(np.float32) * eps32 + 1, fs=fs)\n", "w_float64 = Waveform(np.cos(2*np.pi*f0*t).astype(np.float64) * eps64 + 1, fs=fs)\n", "\n", "plt.subplot(322)\n", "plt.title('float16')\n", "w_float16.plot()\n", "plt.subplot(324)\n", "plt.title('float32')\n", "w_float32.plot()\n", "plt.subplot(326)\n", "plt.title('float64')\n", "w_float64.plot()\n", "plt.ylim(1 - 1.2 * eps64, 1 + 1.2 * eps64)\n", "\n", "pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Complex-valued waveforms" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fs = 1000\n", "f0 = 10\n", "duration = 1\n", "t = np.linspace(0, duration, int(duration*fs))\n", "w_complex128 = Waveform((np.cos(2*np.pi*f0*t) + 1j*np.sin(2*np.pi*f0*t)).astype(np.complex128), fs=fs)\n", "w_complex256 = Waveform((np.cos(2*np.pi*f0*t) + 1j*np.sin(2*np.pi*f0*t)).astype(np.complex256), fs=fs)\n", "\n", "plt.figure(figsize=(20, 5))\n", "plt.subplot(121)\n", "plt.title('complex128')\n", "w_complex128.plot(cpx_mode='both')\n", "plt.subplot(122)\n", "plt.title('complex256')\n", "w_complex256.plot(cpx_mode='both')\n", "pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Casting into another dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The casting of a waveform in a different dtype depends on the current dtype and the desired dtype:\n", "\n", "* *Integer-to-real* casting is performed by applying on each entry $x$ the function $f(x)=\\frac{x - z}{2^{n-1}}$, where the source integral type is coded with $n$ bits, and $z$ is the integer associated with zero, i.e., $z=0$ for a signed type (`int`) and $z=2^{n-1}$ for an unsigned type (`uint`).\n", "* *Real-to-integer* casting is performed by applying on each entry $x$ the function $f(x)=\\lfloor\\left(x + 1\\right) 2^{n-1} + m\\rfloor$, where the target integral type is coded with $n$ bits, and $m$ is the minimum integer value, i.e., $m=-2^{n-1}$ for a signed type (`int`) and $z=0$ for an unsigned type (`uint`);\n", "* *Real-to-real* casting is obtained by a basic rounding operation;\n", "* *Integer-to-integer* casting is obtained by chaining an integer-to-float64 casting and a float64-to-integer casting.\n", "\n", "These constraints are only applied when calling explicitely the method `astype`.\n", "\n", "Clipping is performed for unexpected values:\n", "\n", "* When casting to `float`, values outside $[-1, 1]$ are clipped;\n", "* When casting to `int`, values outside the minimum and maximum values allowed by the integral type are clipped:\n", " * $\\left[-2^{n-1}, 2^{n-1}-1\\right]$ for $n$-bits signed integers;\n", " * $\\left[0, 2^{n}-1\\right]$ for $n$-bits unsigned integers." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "w_float32 = Waveform(np.cos(2*np.pi*f0*t).astype(np.float32), fs=fs)\n", "plt.figure(figsize=(20, 5))\n", "plt.subplot(121)\n", "plt.title('float32')\n", "w_float32.plot()\n", "plt.subplot(122)\n", "plt.title('uint8')\n", "w_float32.astype('uint8').plot()\n", "pass" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": true, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "name": "data_structures.ipynb" }, "nbformat": 4, "nbformat_minor": 1 }