{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# NumPy introduction and statistics in python with sciPy\n", "\n", "\n", "**www.numpy.org**\n", "\n", "NumPy is the fundamental package for scientific computing with Python.\n", "\n", "#### Highlights\n", "\n", "* a powerful N-dimensional array object\n", "* Efficient, broadcasting functions\n", "* tools for integrating C/C++ and Fortran code\n", "* useful linear algebra, Fourier transform, and random number capabilities\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Why NumPy?\n", "\n", "* NumPy arrays is faster than standard python lists\n", "* NumPy functions allows to write many operations with much less code\n", "* NumPy functions are faster than naive Python implementation\n", "* Great collection of mathematical functions available (numpy, scipy, sympy)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to start?" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# loading numpy module\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The heart of NumPy: array\n", "\n", "NumPy's main object is the **homogeneous** multidimensional array.\n", "\n", "Arrays are very efficient for operations with large numerical data and in general outperform standard Python lists." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "my_array:\n", " [[1. 2. 3.]\n", " [4. 5. 6.]]\n", "my_array dimentions: (2, 3)\n", "my_array number of elements: 6\n", "my_array type of elements float64\n" ] } ], "source": [ "my_array = np.array([[1.,2.,3.],[4.,5.,6.]])\n", "print(\"my_array:\\n\", my_array)\n", "print(\"my_array dimentions:\", my_array.shape)\n", "print(\"my_array number of elements:\", my_array.size)\n", "print(\"my_array type of elements\", my_array.dtype)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One great functionnality of numpy arrays is that it is painfully easy to perform an operation over the entirety of the elements in the array :" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "my_result = my_array * 3 # multiply by 3 all elements in the array\n", "\n", "#compare with a native python equivalent :\n", "my_list=[[1.,2.,3.],[4.,5.,6.]]\n", "my_result2 = [[0,0,0],[0,0,0]]\n", "for i in range(len(my_list)):\n", " for j in range(len(my_list[i])):\n", " my_result2[i][j] = my_list[i][j] * 3\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Numpy is also able to make the most out of the constraint of homogeneity in the array data to provide amazing speed-ups :" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "native timing : 0.514258861541748\n", "numpy timing : 0.00655055046081543\n", "numpy acceleration factor : 78.50620564149227\n" ] } ], "source": [ "from time import time\n", "native_data = [x for x in range(10**7)]\n", "numpy_data = np.array(native_data)\n", "\n", "t0 = time()\n", "numpy_data *= 3\n", "t1 = time()\n", "numpyTime = t1-t0\n", "\n", "t0 = time()\n", "native_data = [ x*3 for x in native_data ]\n", "t1 = time()\n", "nativeTime = t1-t0\n", "\n", "print(\"native timing :\",nativeTime)\n", "print(\"numpy timing :\",numpyTime)\n", "print(\"numpy acceleration factor :\" , nativeTime / numpyTime )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, let's familiarize ourselves with the numpy array." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to create NumPy array?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating arrays from lists" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "my_array: [1 2 3]\n" ] } ], "source": [ "# one dimentional array\n", "my_array = np.array([1,2,3])\n", "print(\"my_array:\", type(my_array), my_array)" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "my_array: \n", "[[1 2 3]\n", " [4 5 6]]\n" ] } ], "source": [ "# two dimentional array\n", "my_array = np.array([[1,2,3],[4,5,6]])\n", "print(\"my_array:\", type(my_array))\n", "print(my_array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating arrays with functions" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0.]\n", " [ 0. 0. 0. 0. 0.]]\n" ] } ], "source": [ "# array filled with zeroes\n", "my_array = np.zeros((3,5)) # ( number of rows , number of columns )\n", "print(my_array)" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1. 1.]\n", " [ 1. 1.]\n", " [ 1. 1.]\n", " [ 1. 1.]]\n" ] } ], "source": [ "# array filled with ones\n", "my_array = np.ones((4,2))\n", "print(my_array)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[42 42 42]\n", " [42 42 42]]\n" ] } ], "source": [ "# array filled with desired number\n", "my_array = np.full((2,3), 42)\n", "print(my_array)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 0. 0. 0.]\n", " [0. 1. 0. 0.]\n", " [0. 0. 1. 0.]\n", " [0. 0. 0. 1.]]\n" ] } ], "source": [ "## identity matrix\n", "my_array = np.eye(4,4,0) \n", "# the first 2 arguments give the matrix dimensions and the 3rd argument specify where the main diagnoal will be\n", "print(my_array)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0.2066392 0.5245414 ]\n", " [0.28290859 0.2211281 ]]\n" ] } ], "source": [ "my_array = np.random.rand(2,2) # random numbers from a uniform distribution between 0.0 and 1.0\n", "print(my_array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Arange function\n", "\n", "Generate one-dimentional array of evenly spaced numbers." ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 1 2 3 4 5]\n" ] } ], "source": [ "my_array = np.arange(6)\n", "print(my_array)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1.1 2.2 3.3 4.4 5.5]\n" ] } ], "source": [ "# support float start/end points, as well as float steps\n", "my_array = np.arange(1.1, 6 , 1.1) #start , stop , step\n", "print(my_array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Array reshaping\n", "\n", "**numpy.reshape** changes the shape of an array without changing its data" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1.1 1.8 2.5 3.2 3.9 4.6 5.3 6. ]\n" ] } ], "source": [ "# 1D array\n", "my_array = np.arange(1.1, 6, 0.7)\n", "print(my_array)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1.1 1.8 2.5 3.2]\n", " [3.9 4.6 5.3 6. ]]\n" ] } ], "source": [ "# 2D array\n", "print(np.reshape(my_array, (2,4)))" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[[1.1 1.8]\n", " [2.5 3.2]]\n", "\n", " [[3.9 4.6]\n", " [5.3 6. ]]]\n" ] } ], "source": [ "# 3D array\n", "print(np.reshape(my_array, (2,2,2), order=\"C\"))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reading arrays from files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**numpy.loadtxt**(fname, dtype=<type float>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a b c\r\n", "0.286368595465 0.0284349169133 0.561298539899\r\n", "0.662679670119 0.718228561506 0.79446312338" ] } ], "source": [ "!cat test.tab" ] }, { "cell_type": "code", "execution_count": 140, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[['a' 'b' 'c']\n", " ['0.286368595465' '0.0284349169133' '0.561298539899']\n", " ['0.662679670119' '0.718228561506' '0.79446312338']]\n" ] }, { "data": { "text/plain": [ "array([['a', 'b', 'c'],\n", " ['0.286368595465', '0.0284349169133', '0.561298539899'],\n", " ['0.662679670119', '0.718228561506', '0.79446312338']],\n", " dtype='start:stop:step notation can also be used when accessing array elements" ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1. 2. 3. 4. 5.]\n", " [ 6. 7. 8. 9. 10.]\n", " [ 11. 12. 13. 14. 15.]\n", " [ 16. 17. 18. 19. 20.]]\n", "subset:\n", " [[ 1. 3. 5.]\n", " [ 6. 8. 10.]\n", " [ 11. 13. 15.]\n", " [ 16. 18. 20.]]\n" ] } ], "source": [ "my_array = np.arange(1., 21.).reshape((4,5))\n", "print(my_array)\n", "# accessing a subset with step argument\n", "print(\"subset:\\n\", my_array[:,0::2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparison operations" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "my_array:\n", " [[1. 2. 3.]\n", " [4. 5. 6.]\n", " [7. 8. 9.]]\n", "my_array > 5:\n", " [[False False False]\n", " [False False True]\n", " [ True True True]]\n" ] } ], "source": [ "# comparison operators return array of boolean values\n", "my_array = np.arange(1., 10.).reshape((3,3))\n", "print(\"my_array:\\n\", my_array)\n", "print(\"my_array > 5:\\n\", my_array > 5)" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "All values in my_array are greater than 5: False\n", "There is at least one value in my_array greater than 5: True\n" ] } ], "source": [ "# evaluation of boolean arrays\n", "results = my_array > 5\n", "print(\"All values in my_array are greater than 5:\", results.all()) \n", "print(\"There is at least one value in my_array greater than 5:\",\n", " results.any()) " ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Values which are greater than 5: [6. 7. 8. 9.]\n" ] } ], "source": [ "# extracting values with boolen arrays\n", "print(\"Values which are greater than 5:\", my_array[my_array > 5])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Iterating through the numpy arrays" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1. 2. 3.]\n", "[4. 5. 6.]\n", "[7. 8. 9.]\n" ] } ], "source": [ "# standard for loop iterates over rows\n", "for x in my_array:\n", " print(x)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.0 2.0 3.0 \n", "4.0 5.0 6.0 \n", "7.0 8.0 9.0 \n" ] } ], "source": [ "# iterating over all elements in standard way\n", "for x in my_array:\n", " for y in x:\n", " print(y, \" \",end=\"\")\n", " print()" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 " ] } ], "source": [ "# iterating over all elements in numpy way\n", "for x in np.nditer(my_array):\n", " print(x, \" \", end=\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to change array values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Assignment" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2]\n", " [3 4]]\n", "After changing one element:\n", " [[1 5]\n", " [3 4]]\n" ] } ], "source": [ "# changing one element\n", "my_array = np.arange(1,5).reshape((2,2))\n", "print(my_array)\n", "my_array[0,1] = 5\n", "print(\"After changing one element:\\n\", my_array)" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2]\n", " [3 4]]\n", "After changing one row:\n", " [[5 6]\n", " [3 4]]\n", "After changing one column:\n", " [[5 7]\n", " [3 8]]\n" ] } ], "source": [ "# changing rows and columns\n", "my_array = np.arange(1,5).reshape((2,2))\n", "print(my_array)\n", "my_array[0,:] = np.array([5,6])\n", "print(\"After changing one row:\\n\", my_array)\n", "my_array[:,1] = np.array([7,8])\n", "print(\"After changing one column:\\n\", my_array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding rows and columns to an existing array" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0 1 2]\n", " [3 4 5]]\n", "after column adding\n", " [[0 1 2 6]\n", " [3 4 5 7]]\n" ] } ], "source": [ "# appending\n", "my_array = np.arange(0,6).reshape((2,3))\n", "print(my_array)\n", "# append column\n", "my_array = np.append(my_array, np.array([6, 7]).reshape(2,1), axis=1) # 0 : row , 1: column\n", "print(\"after column adding\\n\", my_array)" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0 1 2]\n", " [3 4 5]]\n", "after row insertion\n", " [[0 1 2]\n", " [6 7 8]\n", " [3 4 5]]\n" ] } ], "source": [ "# insertion\n", "my_array = np.arange(0,6).reshape((2,3))\n", "print(my_array)\n", "# insert row\n", "my_array = np.insert(my_array, 1, [[6, 7, 8]], axis=0)\n", "print(\"after row insertion\\n\", my_array)\n" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "row concatenation:\n", " [[ 0 1 2]\n", " [ 3 4 5]\n", " [ 6 7 8]\n", " [ 9 10 11]]\n", "column concatenation:\n", " [[ 0 1 2 6 7 8]\n", " [ 3 4 5 9 10 11]]\n" ] } ], "source": [ "# concatenation\n", "my_array = np.arange(0,6).reshape((2,3))\n", "my_array2 = np.arange(6,12).reshape((2,3))\n", "print(\"row concatenation:\\n\", np.concatenate((my_array, my_array2),\n", " axis=0))\n", "print(\"column concatenation:\\n\", np.concatenate((my_array, my_array2),\n", " axis=1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## To copy or not to copy?\n", "When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This is often a source of confusion for beginners." ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1. 2. 3. 4. 5.]\n", " [ 6. 7. 8. 9. 10.]\n", " [11. 12. 13. 14. 15.]\n", " [16. 17. 18. 19. 20.]]\n", "tmp is my_array True\n", "my array after we changed tmp:\n", " [[ 1. 2. 3. 4. 5.]\n", " [ 6. 999. 8. 9. 10.]\n", " [ 11. 12. 13. 14. 15.]\n", " [ 16. 17. 18. 19. 20.]]\n" ] } ], "source": [ "my_array = np.arange(1., 21.).reshape((4,5))\n", "print(my_array)\n", "tmp = my_array\n", "print(\"tmp is my_array\", tmp is my_array)\n", "tmp[1,1] = 999\n", "print(\"my array after we changed tmp:\\n\", my_array) \n", "# the change is tmp is present in my_array, because they are the same object" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1. 2. 3. 4. 5.]\n", " [ 6. 7. 8. 9. 10.]\n", " [11. 12. 13. 14. 15.]\n", " [16. 17. 18. 19. 20.]]\n", "tmp is my_array False\n", "tmp:\n", " [[ 7. 8.]\n", " [12. 13.]]\n", "my_array after we changed tmp:\n", " [[ 1. 2. 3. 4. 5.]\n", " [ 6. 7. 8. 9. 10.]\n", " [ 11. 12. 999. 14. 15.]\n", " [ 16. 17. 18. 19. 20.]]\n" ] } ], "source": [ "my_array = np.arange(1., 21.).reshape((4,5))\n", "print(my_array)\n", "tmp = my_array[1:3,1:3]\n", "print(\"tmp is my_array\", tmp is my_array)\n", "print(\"tmp:\\n\", tmp)\n", "tmp[1,1] = 999\n", "print(\"my_array after we changed tmp:\\n\", my_array)\n", "# tmp is not my_array, but the change in one is reported in the other" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> This behavior may appear strange, but it is because numpy arrays access their data by reference. And thus multiple array may access the same memory space, or subset of the same memory space \n", "> See " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### If you plan to change array values but want to keep the old array untouched, then make copy of it!" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1. 2. 3. 4. 5.]\n", " [ 6. 7. 8. 9. 10.]\n", " [ 11. 12. 13. 14. 15.]\n", " [ 16. 17. 18. 19. 20.]]\n", "tmp is my_array False\n", "my array after we changed tmp:\n", " [[ 1. 2. 3. 4. 5.]\n", " [ 6. 7. 8. 9. 10.]\n", " [ 11. 12. 13. 14. 15.]\n", " [ 16. 17. 18. 19. 20.]]\n", "tmp array after change:\n", " [[ 1. 2. 3. 4. 5.]\n", " [ 6. 999. 8. 9. 10.]\n", " [ 11. 12. 13. 14. 15.]\n", " [ 16. 17. 18. 19. 20.]]\n" ] } ], "source": [ "my_array = np.arange(1., 21.).reshape((4,5))\n", "print(my_array)\n", "tmp = my_array.copy()\n", "print(\"tmp is my_array\", tmp is my_array)\n", "tmp[1,1] = 999\n", "print(\"my array after we changed tmp:\\n\", my_array)\n", "print(\"tmp array after change:\\n\", tmp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Functions\n", "\n", "Numpy provide and great number of functions. The power of numpy functions are speed and possibility to apply it arrays element-wise, column-wise or row-wise." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Element-wise functions\n", "This includes all basic math operators like +, -, /, \\*, //, \\*\\*" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1. 2. 3.]\n", " [ 4. 5. 6.]]\n", "division\n", "[[ 0.5 1. 1.5]\n", " [ 2. 2.5 3. ]]\n", "sum\n", "[[ 11. 12. 13.]\n", " [ 14. 15. 16.]]\n", "power of 2\n", "[[ 1. 4. 9.]\n", " [ 16. 25. 36.]]\n", "log2\n", "[[ 0. 1. 1.5849625 ]\n", " [ 2. 2.32192809 2.5849625 ]]\n" ] } ], "source": [ "# element-wise operation with scalars\n", "my_array = np.arange(1., 7.).reshape((2,3))\n", "print(my_array)\n", "print(\"division\")\n", "print(my_array/2)\n", "print(\"sum\")\n", "print(my_array + 10)\n", "# element-wise functions\n", "print(\"power of 2\")\n", "print(my_array**2)\n", "print(\"log2\")\n", "print(np.log2(my_array))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sum, division ... of arrays" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1. 2. 3.]\n", " [ 4. 5. 6.]]\n", "division\n", "[[ 1. 1. 1.]\n", " [ 1. 1. 1.]]\n", "sum\n", "[[ 2. 4. 6.]\n", " [ 8. 10. 12.]]\n" ] } ], "source": [ "# element-wise operation with arrays\n", "my_array = np.arange(1., 7.).reshape((2,3))\n", "print(my_array)\n", "print(\"division\")\n", "print(my_array / my_array)\n", "print(\"sum\")\n", "print(my_array + my_array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Element-wise, row-wise, column-wise functions" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "element-wise sum: 21.0\n", "sum of columns: [ 5. 7. 9.]\n", "sum of rows: [ 6. 15.]\n" ] } ], "source": [ "# sum\n", "print(\"element-wise sum:\", np.sum(my_array))\n", "print(\"sum of columns:\", np.sum(my_array, axis=0))\n", "print(\"sum of rows:\", np.sum(my_array, axis=1))" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "std of the whole array: 1.70782512766\n", "std of columns: [ 1.5 1.5 1.5]\n", "std of rows: [ 0.81649658 0.81649658]\n" ] } ], "source": [ "# standard deviation\n", "print(\"std of the whole array:\", np.std(my_array))\n", "print(\"std of columns:\", np.std(my_array, axis=0))\n", "print(\"std of rows:\", np.std(my_array, axis=1))" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mean of the whole array: 3.5\n", "mean of columns: [ 2.5 3.5 4.5]\n", "mean of rows: [ 2. 5.]\n" ] } ], "source": [ "# mean function\n", "print(\"mean of the whole array:\", np.mean(my_array))\n", "print(\"mean of columns:\", np.mean(my_array, axis=0))\n", "print(\"mean of rows:\", np.mean(my_array, axis=1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Random numbers in NumPy\n", "\n", "The numpy.random mudule provides a large collection of distributions (uniform, normal, beta, binomial, gamma, poisson ...) to draw from." ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "single number: 0.8840433667904156\n", "array of random numbers:\n", " [[ 0.22295178 0.52276298 0.54877142]\n", " [ 0.49866561 0.8327065 0.85795209]]\n" ] } ], "source": [ "# random numbers from uniform distribution [0,1)\n", "print(\"single number:\", np.random.rand())\n", "print(\"array of random numbers:\\n\", np.random.rand(2,3))" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "array of random numbers.\n", " mean : -0.0011793176474030839 \tstandard deviation : 1.000647567911789\n" ] } ], "source": [ "# random numbers from normal distribution\n", "my_array = np.random.randn(100000)\n", "print(\"array of random numbers.\\n\", \"mean :\", np.mean(my_array) , \"\\tstandard deviation :\" , np.std(my_array) )" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "array permutation:\n", " [0 6 4 5 3 2 1]\n", "sample from the array:\n", " [6 0 1]\n" ] } ], "source": [ "# permutation and sampling\n", "my_array = np.arange(7)\n", "print(\"array permutation:\\n\", np.random.permutation(my_array))\n", "print(\"sample from the array:\\n\", np.random.choice(my_array,\n", " size=3,\n", " replace=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linear algebra built-in capabilities\n", "NumPy arrays could be used as matrices without any special conversion.\n", "For advanced linear algebra operations there is spacial package for it (**numpy.linalg**)" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 2. 3.]\n", " [4. 5. 6.]]\n" ] } ], "source": [ "my_array = np.arange(1., 7.).reshape((2,3))\n", "print(my_array)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "transposed matrix:\n", " [[ 1. 4.]\n", " [ 2. 5.]\n", " [ 3. 6.]]\n" ] } ], "source": [ "# Transpose of a matrix\n", "print(\"transposed matrix:\\n\", my_array.T)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### There are different types of matrix multiplication!!!" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1. 4. 9.]\n", " [ 16. 25. 36.]]\n" ] } ], "source": [ "# matrix multiplication element-wise\n", "print(my_array * my_array)" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[14. 32.]\n", " [32. 77.]]\n", "[[14. 32.]\n", " [32. 77.]]\n" ] } ], "source": [ "# matrix product\n", "print(my_array.dot(my_array.T)) \n", "# one can also use the @ operator :\n", "print(my_array @ my_array.T) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "# SciPy.stats and statistics in python\n", "\n", "\n", "SciPy references a comprehensive [project for scientific python programming](https://scipy.org) regrouping as well as a [library](https://docs.scipy.org/doc/scipy/reference/) (which is part of the project) implementing various tools and algorithm for scientific software.\n", "\n", "Here we will give a few pointers on the `scipy.stats` library, which provides ways to interact with various random distribution functions, as well as implement numerous statistical tests.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## manipulation of random distributions\n", "\n", "scipy.stats implements utilisties for a large number of continuous and discrete distributions :" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "number of continuous distributions: 98\n", "number of discrete distributions: 14\n", "['alpha', 'anglit', 'arcsine', 'argus', 'beta', 'betaprime', 'bradford', 'burr', 'burr12', 'cauchy', 'chi', 'chi2', 'cosine', 'crystalball', 'dgamma', 'dweibull', 'erlang', 'expon', 'exponnorm', 'exponpow', 'exponweib', 'f', 'fatiguelife', 'fisk', 'foldcauchy', 'foldnorm', 'frechet_l', 'frechet_r', 'gamma', 'gausshyper', 'genexpon', 'genextreme', 'gengamma', 'genhalflogistic', 'genlogistic', 'gennorm', 'genpareto', 'gilbrat', 'gompertz', 'gumbel_l', 'gumbel_r', 'halfcauchy', 'halfgennorm', 'halflogistic', 'halfnorm', 'hypsecant', 'invgamma', 'invgauss', 'invweibull', 'johnsonsb', 'johnsonsu', 'kappa3', 'kappa4', 'ksone', 'kstwobign', 'laplace', 'levy', 'levy_l', 'levy_stable', 'loggamma', 'logistic', 'loglaplace', 'lognorm', 'lomax', 'maxwell', 'mielke', 'moyal', 'nakagami', 'ncf', 'nct', 'ncx2', 'norm', 'norminvgauss', 'pareto', 'pearson3', 'powerlaw', 'powerlognorm', 'powernorm', 'rayleigh', 'rdist', 'recipinvgauss', 'reciprocal', 'rice', 'semicircular', 'skewnorm', 't', 'trapz', 'triang', 'truncexpon', 'truncnorm', 'tukeylambda', 'uniform', 'vonmises', 'vonmises_line', 'wald', 'weibull_max', 'weibull_min', 'wrapcauchy']\n" ] } ], "source": [ "from scipy import stats\n", "\n", "dist_continu = [d for d in dir(stats) if isinstance(getattr(stats, d), stats.rv_continuous)]\n", "dist_discrete = [d for d in dir(stats) if isinstance(getattr(stats, d), stats.rv_discrete)]\n", "print('number of continuous distributions: %d' % len(dist_continu))\n", "print('number of discrete distributions: %d' % len(dist_discrete))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "let's experiment with the normal distribution, or `norm` in `scipy.stats`\n", "\n", "\n", "A look at `help(stats.norm)` tells us that \n", "```\n", " | The location (``loc``) keyword specifies the mean.\n", " | The scale (``scale``) keyword specifies the standard deviation.\n", "```\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array(10.), array(4.))" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## we can generate a specific normal distribution :\n", "N = stats.norm(loc = 10 , scale = 2)\n", "\n", "# the mean and variance of a distribution can be retrieved using the .stats method :\n", "print(N.stats())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That object can then be used to interact with the distribution in many ways.\n", "\n", "### drawing some random numbers : rvs" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 7.45179805, 9.87769114, 10.12902769, 10.8202259 , 8.85423502],\n", " [ 8.39733275, 12.62407038, 12.54939775, 7.57128479, 10.62743881],\n", " [ 7.11035717, 9.2620774 , 8.46154685, 10.78523221, 10.11458767],\n", " [14.17995768, 10.08394262, 9.90331856, 8.97369216, 9.83082144],\n", " [ 7.56909984, 7.17413853, 7.0261789 , 10.76444972, 11.875346 ]])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# draw some random number in this distribution : rvs\n", "# the size argument is 1 or several integers and defines the dimensions of the returned arrays of random numbers\n", "N.rvs(size = [5,5]) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "as with any drawing of random variable on a computer, [one merely emulates randomness](https://en.wikipedia.org/wiki/Pseudorandom_number_generator). This also means that one can make some random operation reproducible by setting up the random seed.\n", "\n" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "are the ramdom draws equal? [ True True True True True]\n" ] } ], "source": [ "import numpy as np\n", "np.random.seed(2020) # we set the random seed\n", "draw1 = N.rvs(size=5)\n", "np.random.seed(2020) # we set the random seed back to 2020\n", "draw2 = N.rvs(size=5)\n", "print(\"Are the ramdom draws equal?\",draw1 == draw2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### looking up the quantiles and probability density functions\n" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "what is the probability of drawing a number <=15.0 ? 0.9937903346742238\n", "quantiles: [0.025, 0.5, 0.975] -> [ 6.08007203 10. 13.91992797]\n" ] } ], "source": [ "# pdf: Probability Density Function\n", "# I know this is not the plotting lesson, but here is a small recipe to plot the distribution\n", "import matplotlib.pyplot as plt \n", "X = np.arange(0,20,0.1)\n", "plt.plot( X , N.pdf(X) )\n", "plt.show()\n", "\n", "# cdf: Cumulative Distribution Function\n", "print('what is the probability of drawing a number <=15.0 ?' , N.cdf(15.0)) \n", "\n", "\n", "# ppf: Percent Point Function (Inverse of CDF) , gives the quantiles of the distribution\n", "P = [0.025,0.5,0.975]\n", "Q = N.ppf(P)\n", "print( 'quantiles:', P , '->' , Q )\n" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD4CAYAAADiry33AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAARwElEQVR4nO3df2xd533f8fdnlJywDVq5sf6RZEXKpmoxli4abpVuxlJg+SEFGyytSBBlyOAOAYxu9dYtmwZrBRrA/aNZNQzdH8YWo8lQrD9c1xEEYXPHZbW7f4ZkosIkmuxxVdTUJpkhKhxlw8LFkvLdH7yyrxjKPIxIHum57xdA6J7nec7ll0f3fHj4nIeXqSokSe36M30XIEnaWAa9JDXOoJekxhn0ktQ4g16SGrel7wKWu++++2rPnj19lyFJd5Vz5879aVVtX6nvjgv6PXv2MD093XcZknRXSfInt+pz6kaSGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjesU9EkOJ5lNcjHJYyv0fyLJC0m+muQPkrxtpO96ki8PP86sZ/GSpNWt+u6VSSaAJ4D3A3PA2SRnquqFkWEzwKCqvpPk7wK/Cnxk2LdYVe9a57olSR11uaI/CFysqktV9SrwFHBkdEBVPV9V3xlufgHYtb5lSpJ+UF3ej34n8PLI9hzw7jcY/3Hg90e235xkGrgGfKqqTi/fIckjwCMAu3fv7lCSdPc5PTPPyalZFq4ssmPbJMcP7efogZ19l6Ux0CXos0JbrTgw+RgwAH56pHl3VS0keTvwXJLzVfW1m56s6kngSYDBYLDic0t3s9Mz85w4dZ7Fq9cBmL+yyIlT5wEMe224LlM3c8D9I9u7gIXlg5K8D/hF4KGq+u6N9qpaGP57CfhD4MBt1CvdlU5Ozb4W8jcsXr3OyanZnirSOOkS9GeBfUn2JrkHOAbctHomyQHg0yyF/DdH2u9N8qbh4/uAB4HRm7jSWFi4srimdmk9rRr0VXUNeBSYAl4Enq6qC0keT/LQcNhJ4C3A7y1bRvkOYDrJV4DnWZqjN+g1dnZsm1xTu7SeOv1x8Kp6Fnh2WdsvjTx+3y32+6/AO2+nQKkFxw/tv2mOHmBy6wTHD+3vsSqNi05BL+n23Ljh6qob9cGglzbJ0QM7DXb1wve6kaTGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjdvSdwHSRjs9M8/JqVkWriyyY9skxw/t5+iBnX2X1QuPxXgy6NW00zPznDh1nsWr1wGYv7LIiVPnAcYu4DwW48upGzXt5NTsa8F2w+LV65ycmu2pov54LMZXp6BPcjjJbJKLSR5bof8TSV5I8tUkf5DkbSN9Dyf5o+HHw+tZvLSahSuLa2pvmcdifK0a9EkmgCeADwIPAB9N8sCyYTPAoKp+AngG+NXhvj8GfBJ4N3AQ+GSSe9evfOmN7dg2uab2lnksxleXK/qDwMWqulRVrwJPAUdGB1TV81X1neHmF4Bdw8eHgM9X1StV9S3g88Dh9SldWt3xQ/uZ3DpxU9vk1gmOH9rfU0X98ViMry43Y3cCL49sz7F0hX4rHwd+/w329a6PNs2Nm4yuNPFYjLMuQZ8V2mrFgcnHgAHw02vZN8kjwCMAu3fv7lCS1N3RAzsNsyGPxXjqMnUzB9w/sr0LWFg+KMn7gF8EHqqq765l36p6sqoGVTXYvn1719olSR10CfqzwL4ke5PcAxwDzowOSHIA+DRLIf/Nka4p4ANJ7h3ehP3AsE2StElWnbqpqmtJHmUpoCeAz1bVhSSPA9NVdQY4CbwF+L0kAC9V1UNV9UqSX2bpmwXA41X1yoZ8JZKkFaVqxen23gwGg5qenu67DEm6qyQ5V1WDlfr8zVhJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1rlPQJzmcZDbJxSSPrdD/niRfSnItyYeW9V1P8uXhx5n1KlyS1M2W1QYkmQCeAN4PzAFnk5ypqhdGhr0E/CzwT1Z4isWqetc61CpJ+gGsGvTAQeBiVV0CSPIUcAR4Leir6uvDvu9tQI2SpNvQZepmJ/DyyPbcsK2rNyeZTvKFJEdXGpDkkeGY6cuXL6/hqSVJq+kS9FmhrdbwOXZX1QD4W8CvJfmz3/dkVU9W1aCqBtu3b1/DU0uSVtMl6OeA+0e2dwELXT9BVS0M/70E/CFwYA31SZJuU5egPwvsS7I3yT3AMaDT6pkk9yZ50/DxfcCDjMztS5I23qpBX1XXgEeBKeBF4OmqupDk8SQPAST5ySRzwIeBTye5MNz9HcB0kq8AzwOfWrZaR5K0wVK1lun2jTcYDGp6errvMiTprpLk3PB+6PfxN2MlqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklq3Ja+C1C7Ts/Mc3JqloUri+zYNsnxQ/s5emAtf1derfK1sbkMem2I0zPznDh1nsWr1wGYv7LIiVPnATyhx5yvjc3n1I02xMmp2ddO5BsWr17n5NRsTxXpTuFrY/MZ9NoQC1cW19Su8eFrY/MZ9NoQO7ZNrqld48PXxuYz6LUhjh/az+TWiZvaJrdOcPzQ/p4q0p3C18bm82asNsSNm2qurNByvjY2X6qq7xpuMhgManp6uu8yJOmukuRcVQ1W6nPqRpIaZ9BLUuMMeklqnEEvSY3rFPRJDieZTXIxyWMr9L8nyZeSXEvyoWV9Dyf5o+HHw+tVuCSpm1WDPskE8ATwQeAB4KNJHlg27CXgZ4HfXrbvjwGfBN4NHAQ+meTe2y9bktRVlyv6g8DFqrpUVa8CTwFHRgdU1der6qvA95btewj4fFW9UlXfAj4PHF6HuiVJHXUJ+p3AyyPbc8O2Ljrtm+SRJNNJpi9fvtzxqSVJXXQJ+qzQ1vW3rDrtW1VPVtWgqgbbt2/v+NSSpC66BP0ccP/I9i5goePz386+kqR10CXozwL7kuxNcg9wDDjT8fmngA8kuXd4E/YDwzZJ0iZZNeir6hrwKEsB/SLwdFVdSPJ4kocAkvxkkjngw8Cnk1wY7vsK8MssfbM4Czw+bJMkbRLf1EySGuCbmknSGDPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNa5T0Cc5nGQ2ycUkj63Q/6Ykvzvs/2KSPcP2PUkWk3x5+PFv1rd8SdJqtqw2IMkE8ATwfmAOOJvkTFW9MDLs48C3qurPJTkG/HPgI8O+r1XVu9a5bklSR12u6A8CF6vqUlW9CjwFHFk25gjwG8PHzwDvTZL1K1OS9IPqEvQ7gZdHtueGbSuOqaprwLeBtw779iaZSfJfkvzVlT5BkkeSTCeZvnz58pq+AEnSG+sS9CtdmVfHMd8AdlfVAeATwG8n+ZHvG1j1ZFUNqmqwffv2DiVJkrpadY6epSv4+0e2dwELtxgzl2QL8KPAK1VVwHcBqupckq8BPw5M327hurXTM/OcnJpl4coiO7ZNcvzQfo4eWP5DmKRxOVe6XNGfBfYl2ZvkHuAYcGbZmDPAw8PHHwKeq6pKsn14M5ckbwf2AZfWp3St5PTMPCdOnWf+yiIFzF9Z5MSp85yeme+7NOmOMk7nyqpBP5xzfxSYAl4Enq6qC0keT/LQcNhngLcmucjSFM2NJZjvAb6a5Css3aT9uap6Zb2/CL3u5NQsi1ev39S2ePU6J6dme6pIujON07nSZeqGqnoWeHZZ2y+NPP5/wIdX2O9zwOdus0atwcKVxTW1S+NqnM4VfzO2MTu2Ta6pXRpX43SuGPSNOX5oP5NbJ25qm9w6wfFD+3uqSLozjdO50mnqRnePGysGxmElgXQ7xulcydIKyDvHYDCo6WlXX0rSWiQ5V1WDlfqcupGkxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TG+acE19npmfmx+NNkktbPRueGQb+OTs/Mc+LUeRavXgdg/soiJ06dBzDsJa1oM3LDqZt1dHJq9rX/rBsWr17n5NRsTxVJutNtRm4Y9Oto4crimtolaTNyw6BfRzu2Ta6pXZI2IzcM+nV0/NB+JrdO3NQ2uXWC44f291SRpDvdZuRGp6BPcjjJbJKLSR5bof9NSX532P/FJHtG+k4M22eTHFq3ypc5PTPPg596jr2P/Qce/NRznJ6Z36hPdUtHD+zkV37mnezcNkmAndsm+ZWfeac3YiXd0mbkRqrqjQckE8D/BN4PzAFngY9W1QsjY/4e8BNV9XNJjgF/s6o+kuQB4HeAg8AO4D8DP15V15d/nhsGg0FNT0+v6YtYftcalr4jGrKSxkWSc1U1WKmvyxX9QeBiVV2qqleBp4Ajy8YcAX5j+PgZ4L1JMmx/qqq+W1V/DFwcPt+6crWLJN1al6DfCbw8sj03bFtxTFVdA74NvLXjvrfN1S6SdGtdgj4rtC2f77nVmC77kuSRJNNJpi9fvtyhpJu52kWSbq1L0M8B949s7wIWbjUmyRbgR4FXOu5LVT1ZVYOqGmzfvr179UOudpGkW+sS9GeBfUn2JrkHOAacWTbmDPDw8PGHgOdq6S7vGeDYcFXOXmAf8N/Wp/TXudpFkm5t1fe6qaprSR4FpoAJ4LNVdSHJ48B0VZ0BPgP8uyQXWbqSPzbc90KSp4EXgGvAz7/RipvbcfTAToNdklaw6vLKzfaDLK+UpHF3u8srJUl3MYNekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcXfc2xQnuQz8yW08xX3An65TOXc7j8XNPB6v81jcrIXj8baqWvFP9N1xQX+7kkzf6j2Zx43H4mYej9d5LG7W+vFw6kaSGmfQS1LjWgz6J/su4A7isbiZx+N1HoubNX08mpujlyTdrMUreknSCINekhrXTNAnOZxkNsnFJI/1XU+fktyf5PkkLya5kOQX+q6pb0kmkswk+fd919K3JNuSPJPkfwxfI3+575r6kuQfDc+R/57kd5K8ue+aNkITQZ9kAngC+CDwAPDRJA/0W1WvrgH/uKreAfwU8PNjfjwAfgF4se8i7hD/CviPVfXngb/ImB6XJDuBfwAMquovABPAsX6r2hhNBD1wELhYVZeq6lXgKeBIzzX1pqq+UVVfGj7+PyydyDv7rao/SXYBfx349b5r6VuSHwHeA3wGoKperaor/VbVqy3AZJItwA8BCz3XsyFaCfqdwMsj23OMcbCNSrIHOAB8sd9KevVrwD8Fvtd3IXeAtwOXgX87nMr69SQ/3HdRfaiqeeBfAC8B3wC+XVX/qd+qNkYrQZ8V2sZ+3WiStwCfA/5hVf3vvuvpQ5K/AXyzqs71XcsdYgvwl4B/XVUHgP8LjOU9rST3svST/15gB/DDST7Wb1Ubo5WgnwPuH9neRaM/gnWVZCtLIf9bVXWq73p69CDwUJKvszSl99eS/Ga/JfVqDpirqhs/4T3DUvCPo/cBf1xVl6vqKnAK+Cs917QhWgn6s8C+JHuT3MPSDZUzPdfUmyRhaQ72xar6l33X06eqOlFVu6pqD0uvi+eqqsmrti6q6n8BLyfZP2x6L/BCjyX16SXgp5L80PCceS+N3pje0ncB66GqriV5FJhi6c75Z6vqQs9l9elB4G8D55N8edj2z6rq2R5r0p3j7wO/NbwougT8nZ7r6UVVfTHJM8CXWFqpNkOjb4XgWyBIUuNambqRJN2CQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIa9/8BRCDF1LgPBoMAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# For discrete distribution these rules change a bit , the pdf function is replaced by pmf:\n", "X = np.arange(0,10)\n", "plt.scatter( X , stats.binom.pmf( X , n = 10 , p = 0.5 ) ) # binomial distribution with 10 draws and a 0.5 probability of success\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## statistical tests\n", "\n", "`scipy.stats` implements a number of statistical tests as functions.\n", "\n", "Most return two values : the computed test statistic and the p-value.\n", "\n", "We will only demonstrate a couple tests here.\n", "You can get a more in-depth explaination and demonstration of scipy.stats tests [there](https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/)\n", "\n" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "We reject the hypothesis of equality of means(H0). p-value : 0.0011955595116142342\n" ] } ], "source": [ "#Imagine we have two samples of measurement drawn from 2 sub-population :\n", "sample1 = stats.norm.rvs(size = 93 , loc = 173 , scale = 20)\n", "sample2 = stats.norm.rvs(size = 132 , loc = 181 , scale = 20)\n", "\n", "# we perform a t-test to test the equality of the means\n", "statistic , pValue = stats.ttest_ind(sample1 , sample2) \n", "significanceThreshold = 0.05\n", "if pValue < significanceThreshold:\n", " print( \"We reject the hypothesis of equality of means(H0). p-value :\" , pValue )\n", "else:\n", " print( \"We do not reject the hypothesis of equality of means(H0). p-value :\" , pValue )\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`stats.ttest_ind` has a `equal_var` parameter that one can set to `False` in order to perform Welsch's t-test, which is warranted when one cannot assume the two sub-population's variances to be equal.\n", "\n", "> In general, these functions have a very good documentation, detailing the tests and giving usage examples. We heartily recommend any would-be users to have a read at the `help()`." ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Chi-square test of independence of variables\n", "stat=14.555, degree of freedom=3 , p-value=0.002\n" ] } ], "source": [ "# Example of the Chi-Squared Test\n", "# imagine you count different cell types in two biopsies and report them in a list :\n", "biopsy1 = np.array([135 , 423 , 24 , 72])\n", "biopsy2 = [184 , 552 , 77 , 101]\n", "\n", "table = [biopsy1 , biopsy2]\n", "\n", "stat, pValue, degreeOfFreedom, expectedValues = stats.chi2_contingency(table)\n", "print('Chi-square test of independence of variables')\n", "print('stat=%.3f, degree of freedom=%i , p-value=%.3f' % (stat, degreeOfFreedom, pValue))\n", "# here the two biopsies seem to differ significatively in their composition\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### statiscial modelling and regression\n", "\n", "`scipy` implements methods to fit a model to some data. \n", "\n", "`scipy.stats` proposes a simple linear regression function between two variable,\n", "while `scipy.optimize` implements functions to fit (non-linear) models to data." ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "slope: 1.618178 intercept: -13.018001\n", "R-squared: 0.526027\n", "p-value for the slope: 0.000006\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXkAAAD4CAYAAAAJmJb0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3df3QU9b3/8ee7GCH+KAFFCwkC3uuNIj8SjFSlV6RcRJEi0t4Wj9Zo24ui1tp7L4rtbbW1HrCgUnvVFq1gW9vqxRipilSQHmtVNIgNCPIVFWsCBUSDKEFD/Hz/2An54e4mm93ZnZl9Pc7JSXZmsvPZzPLis+/PzGfMOYeIiETTZ3LdABER8Y9CXkQkwhTyIiIRppAXEYkwhbyISIQdlOsGtHXkkUe6wYMH57oZIiKhsmbNmnecc/3irQtUyA8ePJiamppcN0NEJFTM7K1E61SuERGJMIW8iEiEKeRFRCIsUDX5eJqamqirq2Pfvn25bkpe69WrFyUlJRQUFOS6KSKSgsCHfF1dHYcffjiDBw/GzHLdnLzknGPXrl3U1dUxZMiQXDdHRFIQ+JDft2+fAj7HzIwjjjiCnTt35ropIpFTvbaeecs3sbWhkQFFhcyaWMrU8uKMPX/gQx5QwAeAjoFI5lWvree6qnU0NjUDUN/QyHVV6wAyFvQaeBURyZF5yzcdCPgWjU3NzFu+KWP7UMhn0KRJk2hoaEi6zQ9/+ENWrFjRref/85//zOTJkzvd7owzzuj0orIFCxawd+/ebrVDRDJja0NjSsu7IxTlmlT4Xd+KxzmHc47HH3+8021//OMf+9qWrlqwYAEXXnghhxxySK6bIpK3BhQVUh8n0AcUFWZsH5HqybfUt+obGnG01req19an9by33norw4YNY9iwYSxYsACALVu2cMIJJ3D55ZczatQo3n77bQYPHsw777wDwI033sjxxx/PhAkTOP/885k/fz4AF198MUuWLAFi0zhcf/31jBo1iuHDh/Pqq68C8MILL3DaaadRXl7OaaedxqZNyT+6NTY2Mn36dEaMGMHXvvY1Ghtb3zQzZ86koqKCE088keuvvx6A22+/na1btzJu3DjGjRuXcDsR8desiaUUFvRot6ywoAezJpZmbB+R6sknq291tze/Zs0aFi1axOrVq3HO8fnPf56xY8fSp08fNm3axKJFi7jzzjvb/U5NTQ0PPfQQa9euZf/+/YwaNYqTTjop7vMfeeSRvPTSS9x5553Mnz+fe+65h+OPP56nn36agw46iBUrVvC9732Phx56KGEb77rrLg455BBqa2upra1l1KhRB9bddNNN9O3bl+bmZsaPH09tbS1XXXUVt956K6tWreLII49MuN2IESO69TcTka5pyaW8P7umq/yobz3zzDOcd955HHrooQBMmzaNv/zlL0yZMoVBgwZxyimnxP2dc889l8LC2EeuL33pSwmff9q0aQCcdNJJVFVVAbB7924qKyt57bXXMDOampqStvHpp5/mqquuAmDEiBHtwvnBBx9k4cKF7N+/n23btrFhw4a44d3V7UQks6aWF/taUo5UuSZRHSud+layG523BH8qv9NRz549AejRowf79+8H4Ac/+AHjxo1j/fr1/PGPf+zS1b7xTnF88803mT9/PitXrqS2tpZzzjkn7nN1dTsRCZ8uh7yZDTSzVWa20cxeMbPveMv7mtmTZvaa972Pt9zM7HYz22xmtWY2Kvke0udHfev000+nurqavXv38uGHH/Lwww/zr//6r0l/5wtf+MKBcP7ggw947LHHUtrn7t27KS6O/c++ePHiLrXx/vvvB2D9+vXU1tYC8P7773PooYfSu3dvtm/fzrJlyw78zuGHH86ePXs63U5Ewi2Vcs1+4L+ccy+Z2eHAGjN7ErgYWOmcm2tms4HZwLXA2cBx3tfngbu8777xo741atQoLr74YkaPHg3At771LcrLy9myZUvC3zn55JOZMmUKI0eOZNCgQVRUVNC7d+8u7/Oaa66hsrKSW2+9lS9+8Yudbj9z5kwuueQSRowYQVlZ2YG2jhw5kvLyck488USOPfZYxowZc+B3ZsyYwdlnn03//v1ZtWpVwu1EJNwsldJCu180ewT4X+/rDOfcNjPrD/zZOVdqZr/0fv69t/2mlu0SPWdFRYXreH73xo0bOeGEE7rVxlz64IMPOOyww9i7dy+nn346CxcubDcgGkZhPRYiUWdma5xzFfHWdWvg1cwGA+XAauDoluD2gv4ob7Ni4O02v1bnLWsX8mY2A5gBcMwxx3SnOYE0Y8YMNmzYwL59+6isrAx9wItIOKUc8mZ2GPAQcLVz7v0kc5rEW/Gpjw3OuYXAQoj15FNtT1D97ne/y3UTRERSO7vGzAqIBfz9zrkqb/F2r0yD932Ht7wOGNjm10uArek1V0REUpHK2TUG/ArY6Jy7tc2qpUCl93Ml8Eib5Rd5Z9mcAuxOVo8XEZHMS6VcMwb4OrDOzF72ln0PmAs8aGbfBP4O/Lu37nFgErAZ2AtckpEWi4hIl3U55J1zzxC/zg4wPs72Driim+0SEZEMiNQVr365/fbbOeGEE7jgggtYunQpc+fOBaC6upoNGzYc2G7x4sVs3ZrasMOWLVsYNmxY0uU1NTUHpi0QEUlFpOau8cudd97JsmXLDtzfdMqUKUAs5CdPnszQoUOBWMgPGzaMAQMGZHT/FRUVVFTEPQVWRCQp9eQ7cdlll/HGG28wZcoUbrvtNhYvXsyVV17Js88+y9KlS5k1axZlZWXcfPPN1NTUcMEFF1BWVkZjYyNr1qxh7NixnHTSSUycOJFt22LjzmvWrGHkyJGceuqp3HHHHZ22oe3NQm644Qa+8Y1vcMYZZ3Dsscdy++23H9jut7/9LaNHj6asrIxLL72U5ubmRE8pInkiXD35q6+Gl1/ufLtUlJWBN0d8PL/4xS944oknDkzL2zKXzGmnncaUKVOYPHkyX/nKVwBYtmwZ8+fPp6KigqamJr797W/zyCOP0K9fPx544AG+//3vc++993LJJZfw85//nLFjxzJr1qyUm/zqq6+yatUq9uzZQ2lpKTNnzmTz5s088MAD/PWvf6WgoIDLL7+c+++/n4suuqhbfxYRiYZwhXyIbNq0ifXr1zNhwgQAmpub6d+/P7t376ahoYGxY8cC8PWvfz3lCcHOOeccevbsSc+ePTnqqKPYvn07K1euZM2aNZx88slA7EYiRx11VCfPJCJRF66QT9LjDhrnHCeeeCLPPfdcu+UNDQ1xpwVORcv0xNA6RbFzjsrKSubMmZPWc4tItKgmn4a20/V2fFxaWsrOnTsPhHxTUxOvvPIKRUVF9O7dm2eeeQbgwBTB6Ro/fjxLlixhx47YBcfvvvsub731VkaeW0TCSyGfhunTpzNv3jzKy8t5/fXXufjii7nssssoKyujubmZJUuWcO211zJy5EjKysp49tlnAVi0aBFXXHEFp5566oG7R6Vr6NCh/OQnP+HMM89kxIgRTJgw4cBAr4jkr25PNeyHKE01HEU6FiLBlGyqYfXkRUQiTCEvIhJhoQj5IJWU8pWOgUg4Bf4Uyl69erFr1y6OOOKItE89lO5xzrFr1y569eqV66aIhFL12vqM3ns6FYEP+ZKSEurq6ti5c2eum5LXevXqRUlJSa6bIRI61Wvrua5qHY1NsWlG6hsaua5qHUBWgj7wIV9QUHBgYjARkbCZt3zTgYBv0djUzLzlm7IS8qGoyYuIhNXWhsaUlmeaQl5ExEcDiuJf8JhoeaYp5EVEfDRrYimFBT3aLSss6MGsiaVZ2X/ga/IiImHWUnfX2TUiIhE1tbw4a6Hekco1IiIRppAXEYkwlWtERHyUy6tdQSEvIuKbXF/tCirXiIj4JtnVrtmikBcR8Umur3YFhbyIiG9yfbUrKORFRHyT66tdIYWQN7N7zWyHma1vs+wGM6s3s5e9r0lt1l1nZpvNbJOZTcx0w0VEgm5qeTFzpg2nuKgQA4qLCpkzbXhgz65ZDPwv8OsOy29zzs1vu8DMhgLTgROBAcAKM/sX51wzIiJ5JJdXu0IKPXnn3NPAu13c/FzgD865j5xzbwKbgdHdaJ+IiKQhEzX5K82s1ivn9PGWFQNvt9mmzlv2KWY2w8xqzKxGd38S+bTqtfWMmfsUQ2Y/xpi5T1G9tj7XTYqMQPxt//EPuOgiWLfOl6dPN+TvAv4JKAO2Abd4y+PdjDXunaCdcwudcxXOuYp+/fql2RyRaGm5mKa+oRFH68U0Cvr05fRv+8knMHcumEH//vCb38A11/iyq7SueHXObW/52czuBh71HtYBA9tsWgJsTWdfIvko17eOi7Lu/m07m6Yg6foXX4Tx42HPnvZPesst8N3vZuy1tZVWT97M+rd5eB7QcubNUmC6mfU0syHAccAL6exLJB8F4WKaqOrO37az3n+89b+8e1msx24Go0e3Bvz48bB9OzgH//mfsfU+6HJP3sx+D5wBHGlmdcD1wBlmVkasFLMFuBTAOfeKmT0IbAD2A1fozBqR1A0oKqQ+Tuhk82KaTMn1RF0ddedv21nvv+36LTdPjv8ky5fDmWd2v+Ep6nLIO+fOj7P4V0m2vwm4qTuNEpGYWRNL201wBdm/mCYTgjBRV0fd+dt21vu/4VffY8Lm1fF/+YMP4NBDu9/gbtIVryIBFoSLaTIhCBN1ddSdv228Xn6/D97lzZsng9mnAv7ek6Yw+NpHGTNnZU4CHjTVsEjg5fpimkwI6thCqn/btr3/hOUYYPC1jx74OdefvBTykhNBq8+Kv6IytjB1xe+Y+pMEpzo+9RSMG0f12nqKA/TeVshL1gWxPiv+CvXYQlMTHHxw4vWu/SVAQfvkpZq8ZF0Q67Pir1COLbSc9hgv4BsbY+Hu4l7jGSjqyUvWBbU+K/4KWg83rj/9CSYmmDR39myYMye77ckAhbxkXVTqsxIhSS5EGnLto6219Sw2KVMU8pJ1oa7PSnT06QMNDXFXrfy/lVz5t48jMW6kmrxkXSjrsxIN9fWttfZ4Ae/V2X+4mciMG6knLzkRivqsREeyeWE++eRT66M0bqSevERGIOYGl+CYNau1197RLbe0nh0TZ30QbsCdKerJSyRk6tx7XaQVcs3NcFCSWOviKY9RGjdST14iIRPn3usGHSHW0mOPF/C7dqV8TnuUxo3Uk5dIyEQNVTfoCJnnn4dTT42/bsQI+Nvf0nr6qIwbKeQlJUEtZ2Ti3PsoDbZFWrJB1BBcgZptKtdIlwW5nDFrYimFBT3aLUu1hhqlwbbIOeusxIOoq1eHZoqBXFDIS5cFec6ZTNRQM/EfhWTQe++1Bvvy5Z9e3xLso0dnv20honKNdFnQyxnp1lBbfjeI5ai8kqwc09wMn1HfNBUKeemyfJhzJiqDbaGzcCFcemn8dfPmwX//d3bbEyEKeemyKJ07LAHgXPJeuWrsGaGQly5TOUMyIlk5Zvt2OOqo7LUlDyjkJSUqZ0i3rFsXO3c9nooKePHF7LYnjyjkRcQ/Oqc95zRMLSKZdeGFic9pX7FC57RnmXryIpK+Dz+Eww5LvF6hnjMKeRHpvmTlmI8/hoKC7LVF4lK5RkRS88ADicsx117bWo5RwAdCl3vyZnYvMBnY4Zwb5i3rCzwADAa2AF91zr1nZgb8DJgE7AUuds69lNmmi0hWaRA1lFLpyS8GzuqwbDaw0jl3HLDSewxwNnCc9zUDuCu9ZopITvTtm7jXvmWLBlFDoMsh75x7Gni3w+Jzgfu8n+8DprZZ/msX8zxQZGb9022siGTB66+3Bvt777VfN3Bga7APGpSb9klK0h14Pdo5tw3AObfNzFouVSsG3m6zXZ23bFvHJzCzGcR6+xxzzDFpNkdEuk3lmEjya+A13rsl7rvEObfQOVfhnKvo16+fT80RkbiuvjpxOaaqSuWYCEi3J7/dzPp7vfj+wA5veR0wsM12JcDWNPclIpnw8cfQs2fi9Qr1SEm3J78UqPR+rgQeabP8Ios5BdjdUtYRkRxp6bHHC/i9e9Vrj6guh7yZ/R54Dig1szoz+yYwF5hgZq8BE7zHAI8DbwCbgbuByzPaahHpmmXLEpdjvvWt1mAvjM49AaS9LpdrnHPnJ1g1Ps62Driiu40SkTRpEFU8uuJVJCqGDk3ca9+4UeWYPKW5a0TCbOtWKE4wv/9BB0FTU3bbI4GjkBcJo2TlmE8+Sb5e8orKNSJhceONicsxixa1lmMU8NKGevIiQdbcHCu7JKIau3RCPXmRIGrpsccL+N27NYgqXaaQFwmKJ59MXI45+eTWYP/sZ7PfNgktlWtEck3ntIuP1JMXyYV//ufEvfZnnlE5RjJGPXmRbNmxA44+OvF6hbr4QCEv4rdk5ZjmZviMPlCLf/TuEvHDTTclLsf84Aet5RgFvPhMPXmRTOkstFWOkRxQN0IkXS099ngBv22bBlElpxTyIt2xenXickxJSWuwf+5z2W+bSBsq14ikQue0S8ioJ++D6rX1jJn7FENmP8aYuU9RvbY+102SdIwdm7jX/vjjKsdIoKknn2HVa+u5rmodjU3NANQ3NHJd1ToAppYnmPdbguf996F378TrFeoSEurJZ9i85ZsOBHyLxqZm5i3flKMWSUpaeuzxAr6pSb12CR2FfIZtbWhMabkEwIIFicsxM2e2BnuyKX9DQGXE/BTud20ADSgqpD5OoA8oKsxBa/xVvbaeecs3sbWhkQFFhcyaWBquklQeDaKqjJi/1JPPsFkTSyks6NFuWWFBD2ZNLM1Ri/zREhr1DY04WkMj8L3Dlh57vIB//fXIlmNURsy+oHxyUshn2NTyYuZMG05xUSEGFBcVMmfa8Mj1lkIVGrW1iYMdWoP92GOz264sUhkxu4LUCVK5xgdTy4sjF+odhSI08qgc05l8KiMGQbJOULazQT156ZZE4ZDz0PjSlxL32u+7L7LlmM7kSxkxKILUCVJPXrpl1sTSdgN5kMPQ2LcPCpP855KHod5RS+8x1APlIRKkT04KeemWQIRGsnLM3r3Jgz8P5UMZMSiC1AlSyEu35SQ0qqrgy1+Ov66yEhYvzmpzROIJRCfIk5GQN7MtwB6gGdjvnKsws77AA8BgYAvwVefce5nYn+QhDaJKyATlk1MmB17HOefKnHMV3uPZwErn3HHASu+xSNf17Zt4EHXjxrwdRBVJhZ9n15wL3Of9fB8w1cd9SVTU17cG+3txPvi1BPvxx2e/bSIhlKmQd8CfzGyNmc3wlh3tnNsG4H0/Kt4vmtkMM6sxs5qdO3dmqDkSOi3BXlLy6XUtwa5eu0jKMhXyY5xzo4CzgSvM7PSu/qJzbqFzrsI5V9GvX78MNUdC4X/+J3E55p57FOwiGZCRgVfn3Fbv+w4zexgYDWw3s/7OuW1m1h/YkYl9Scjt3w8FBYnXK9RFMirtnryZHWpmh7f8DJwJrAeWApXeZpXAI+nuS0KspcceL+A/+CCveu1BmbhK8kMmevJHAw9b7CP3QcDvnHNPmNmLwINm9k3g78C/Z2BfgRT6KXf9smIFTJgQf92kSfDYY9ltTwBoyl/JtrRD3jn3BjAyzvJdwPh0nz/o9I82Dp3TnlCQJq6S/KAJytIUqil3/TR8eOJB1JqavCrHJBOkiaskP2hagzTl9T/aHTvg6KMTr1eof0qQJq6S/KCefJoCO+Wun1p67PEC/pNP1GtPQlP+SrYp5NOUN/9of/nLxOWY+fNbgz1ZPV4S3jkM0Bk34guVa9IUpNnmMs45+EySfoB6693SceIqDd6LnxTyGRCU2eYyJllv/N13oU+f7LUlD+iMG/GTyjUS87e/JS7HTJ3aWo5RwGdcXg/ei+/Uk893Oqc953TGjfhJPfl8VFmZuNe+erXOjsmyvBm8l5xQTz5f7NkDn/1s4vUK9ZyJ9OC95JxCPuqSlWP274cePRKvl6yJ3OC9BIbKNVH06KNdO6ddAS8SeerJR4XOaReRONSTD7vTT4/12OMF/LZtGkQVyXPqyYfRW2/B4MHx1114IfzmN1ltjkgu6D4OXZPXIR+6N4nOaRcBNBVEKvK2XNPyJqlvaMTR+iYJ3MRQP/1p4kHU558PdTlGt8GT7tJ9HLoub3vygZ4v5KOPoFev+OsOPji2PuTUE5N0aCqIrsvbnnwg3yQtPfZ4Af/xx7EeewQCHtQTk/Tk5X0cuilvQz4wb5K//CVxOeZnP2stxxQUZLddPgvkf7ISGpoKouvytlwza2Jpu3IBZPlNkueDqJqUS9KhqSC6Lm9DPidvki9/Gaqq4q+rr4cBA/zbd8Dk/D9ZCT1NBdE1eRvykKU3yfbt8LnPxV83aRI89pi/+w8o9cREsiOvQ95XeV6O6Qr1xET8l7cDr76oqko8iLpyZajPaReRcFJPPl2ffJJ8NkeFuojkkHry3fXd78Z67PECft8+9dpFJBB8D3kzO8vMNpnZZjOb7ff+fPXOO63lmAUL2q979NHWYO/ZMzftExHpwNdyjZn1AO4AJgB1wItmttQ5t8HP/XaU9kRkiU59HDAgduqjiEhA+V2THw1sds69AWBmfwDOBbIW8t2eI+Xpp2Hs2Pjrdu9Ofr9UEZGA8LtcUwy83eZxnbfsADObYWY1Zlazc+fOjDcgpTlSPvqotRzTMeCrqlrLMQp4EQkJv0M+3sni7UYjnXMLnXMVzrmKfv36ZbwBXZoj5Uc/ij8x2Mknx86ecQ7OOy/jbRMR8Zvf5Zo6YGCbxyXAVp/32U6iOVJO3v9u4guWtmyBQYP8bZiISBb43ZN/ETjOzIaY2cHAdGCpz/tsp91sdc7x5D0z2XLzZB685aL2G958c2s5RgEvIhHha0/eObffzK4ElgM9gHudc6/4uc+OppYXc/jmVxn/1bPjb/Dxx5GbxldEpIXvV7w65x4HHvd7P5+ybx9cdRXcfTfjO6577jk45ZSsNylVobsHrYgETvSmNXj4YZg27dPLH3oo/vKA0u3xRCQTojGtgXNw5pmxgdS2Qf4f/wF798bWhyjgQbfHE5HMiEZPfu1aePLJ2M/FxfDEEzBsWG7blCbdHk9EMiEaIV9eDn//O5SUJJ/HPUTy8fZ4GoMQybxolGvMYODAyAQ85N+NilvGIOobGnG0jkFUr9XcQCLpiEZPPoK6e3u8sPaGk41BhKH9IkGlkA+wVG+PF+YzcjQGIeKPaJRrBAj3GTmJxhqiPAYhkg0K+QgJc28438YgRLJFIR8hYe4NTy0vZs604RQXFWJAcVEhc6YND3yZSSToVJOPkFkTS9vV5CFcveFUxyBEpHMK+Qjp7hk5IhJdCvmIUW9YRNpSTV5EJMIU8iIiEaaQFxGJMIW8iEiEKeRFRCJMIS8iEmEKeRGRCFPIi4hEmEJeRCTCFPIiIhGmaQ0iIqx3hBIRfynkIyDMd4QSEX+pXBMBYb4jlIj4SyEfAWG+I5SI+EshHwFhviOUiPgrrZA3sxvMrN7MXva+JrVZd52ZbTazTWY2Mf2mSiK6P6qIJJKJgdfbnHPz2y4ws6HAdOBEYACwwsz+xTnXHO8JJD26I5SIJOLX2TXnAn9wzn0EvGlmm4HRwHM+7S/v6Y5QIhJPJmryV5pZrZnda2Z9vGXFwNtttqnzln2Kmc0wsxozq9m5c2cGmiMiIi067cmb2Qrgc3FWfR+4C7gRcN73W4BvABZnexfv+Z1zC4GFABUVFXG3SUYXAYmIJNZpyDvn/q0rT2RmdwOPeg/rgIFtVpcAW1NuXSd0EZCISHLpnl3Tv83D84D13s9Lgelm1tPMhgDHAS+ks694cnERUPXaesbMfYohsx9jzNynqF5b79u+RETSle7A60/NrIxYKWYLcCmAc+4VM3sQ2ADsB67w48yabF8EpE8OIhI2aYW8c+7rSdbdBNyUzvN3ZkBRIfVxAt2vi4CSfXJQyItIEIX6itdsXwSk6QNEJGxCHfJTy4uZM204xUWFGFBcVMicacN961Vr+gARCZvQTzWczYuAZk0sbVeTB00fICLBFvqQzyZNHyAiYaOQT5GmDxCRMAl1TV5ERJJTyIuIRJhCXkQkwhTyIiIRppAXEYkwcy7l2X19Y2Y7gbe6uPmRwDs+Nieo9Lrzi153/unOax/knOsXb0WgQj4VZlbjnKvIdTuyTa87v+h1559Mv3aVa0REIkwhLyISYWEO+YW5bkCO6HXnF73u/JPR1x7amryIiHQuzD15ERHphEJeRCTCQhnyZnaWmW0ys81mNjvX7fGLmQ00s1VmttHMXjGz73jL+5rZk2b2mve9T67b6gcz62Fma83sUe/xEDNb7b3uB8zs4Fy3MdPMrMjMlpjZq95xPzUfjreZfdd7j683s9+bWa8oHm8zu9fMdpjZ+jbL4h5fi7ndy7laMxvVnX2GLuTNrAdwB3A2MBQ438yG5rZVvtkP/Jdz7gTgFOAK77XOBlY6544DVnqPo+g7wMY2j28GbvNe93vAN3PSKn/9DHjCOXc8MJLY64/08TazYuAqoMI5NwzoAUwnmsd7MXBWh2WJju/ZwHHe1wzgru7sMHQhD4wGNjvn3nDOfQz8ATg3x23yhXNum3PuJe/nPcT+wRcTe733eZvdB0zNTQv9Y2YlwDnAPd5jA74ILPE2idzrNrPPAqcDvwJwzn3snGsgD443sXtbFJrZQcAhwDYieLydc08D73ZYnOj4ngv82sU8DxSZWf9U9xnGkC8G3m7zuM5bFmlmNhgoB1YDRzvntkHsPwLgqNy1zDcLgGuAT7zHRwANzrn93uMoHvdjgZ3AIq9MdY+ZHUrEj7dzrh6YD/ydWLjvBtYQ/ePdItHxzUjWhTHkLc6ySJ8HamaHAQ8BVzvn3s91e/xmZpOBHc65NW0Xx9k0asf9IGAUcJdzrhz4kIiVZuLxatDnAkOAAcChxEoVHUXteHcmI+/5MIZ8HTCwzeMSYGuO2uI7MysgFvD3O+eqvMXbWz62ed935Kp9PhkDTDGzLcTKcV8k1rMv8j7OQzSPex1Q55xb7T1eQiz0o368/w140zm30znXBFQBpxH9490i0fHNSNaFMeRfBI7zRt4PJjZAszTHbfKFV4f+FbDROXdrm1VLgUrv50rgkWy3ze8cM74AAADwSURBVE/OueuccyXOucHEju9TzrkLgFXAV7zNovi6/wG8bWal3qLxwAYifryJlWlOMbNDvPd8y+uO9PFuI9HxXQpc5J1lcwqwu6WskxLnXOi+gEnA/wNeB76f6/b4+Dq/QOzjWS3wsvc1iVh9eiXwmve9b67b6uPf4AzgUe/nY4EXgM3A/wE9c90+H15vGVDjHfNqoE8+HG/gR8CrwHrgN0DPKB5v4PfExh2aiPXUv5no+BIr19zh5dw6YmcfpbxPTWsgIhJhYSzXiIhIFynkRUQiTCEvIhJhCnkRkQhTyIuIRJhCXkQkwhTyIiIR9v8BbPFPehbGb/gAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "\n", "x = stats.uniform.rvs(size=30 , loc=0 , scale=100) # generating X\n", "y = 1.6*x + stats.norm.rvs(size=30 , loc=0 , scale=50) # Y = 1.6 * X + some noise\n", " \n", "# Perform the linear regression:\n", " \n", "slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)\n", "print(\"slope: %f intercept: %f\" % (slope, intercept))\n", "print(\"R-squared: %f\" % r_value**2)\n", "print(\"p-value for the slope: %f\" % p_value)\n", "\n", "# Plot the data along with the fitted line:\n", " \n", "plt.plot(x, y, 'o', label='original data')\n", "plt.plot(x, intercept + slope*x, 'r', label='fitted line')\n", "plt.legend()\n", "plt.show()\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 133, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "parameter estimates : [5.33171291 0.18878757 0.21927505 0.08021771]\n", "parameter standard deviation : [0.26920328 0.02583608 0.29269083 0.02316083]\n", "\n", "relative estimation error : [0.06634258 0.05606215 0.56144989 0.1978229 ]\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "from scipy.optimize import curve_fit\n", "\n", "\n", "## example with 2 explanatory variables\n", "\n", "\n", "# we define the model as a function.\n", "# here it is a form of exponential decay model \n", "# where variable1 is the time and the rate of decay changes with variable2\n", "# the model function takes as argument the 2 explanatory variable (grouped in a single tuple), and the 4 parameters\n", "def func(X , a, b, c , d):\n", " x0 , x1 = X\n", " return a * np.exp( -(b + d*x1 ) * x0 ) + c \n", "\n", "#realParameter values\n", "realParams = [ 5, 0.2, 0.5, 0.1 ]\n", "\n", "# we simulate some data, with some noise\n", "n=50 # number of points\n", "variable1 = stats.uniform.rvs(size =n , loc = 0 , scale = 10 ) # explanatory variable number 1 : some uniform variable\n", "variable2 = stats.bernoulli.rvs(size=n , p=0.5) # explanatory variable number 2 : can be 0 or 1\n", "\n", "y = func( (variable1 , variable2) , realParams[0], realParams[1], realParams[2], realParams[3])\n", "y_noise = stats.norm.rvs(size=n , scale = 0.4) \n", "ydata = y+y_noise\n", "\n", "\n", "popt, pcov = curve_fit(func, (variable1 , variable2), ydata)\n", "perr = np.sqrt(np.diag(pcov))\n", "print('parameter estimates :',popt)\n", "print('parameter standard deviation :',perr)\n", "print('\\nrelative estimation error :',np.abs(popt - realParams)/realParams )\n", "\n", "\n", "plt.scatter(variable1, ydata, c=variable2 , label='data')\n", "\n", "x = np.linspace(min(variable1) , max(variable1) , 100)\n", "plt.plot(x, func( ( x , np.zeros(100) ) , *popt), '--' , label='fit for variable2==0')\n", "plt.plot(x, func( ( x , np.ones(100) ) , *popt), '--' , label='fit for variable2==1')\n", "\n", "plt.xlabel('x')\n", "plt.ylabel('y')\n", "plt.legend()\n", "plt.show()\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> You can find most of what was discussed here and more in the [official tutorial](https://docs.scipy.org/doc/scipy/reference/tutorial/stats.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Load the data in thefile \"sample_data.tsv\" as a numpy array\n", "2. Log-transform the data\n", "3. Find the row-wise means for replicates of Sample1 and Sample2\n", "4. Find the row-wise standard deviations the same way as means\n", "5. Use a function *scipy.stats.ttest_ind* to calculate p-value for every row\n", "6. Select p-values which are smaller than $10^{-2}$\n", "7. Print how many P-values below $10^{-2}$ are found" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1.19127275e-02 7.65848100e-04 2.37491470e-05]\n" ] } ], "source": [ "# use of ttest_ind function\n", "import scipy.stats as sps\n", "\n", "# two arrays of random numbers\n", "a = np.random.randn(3,5) * 3 + 15\n", "b = np.random.randn(3,8) * 2 + 5\n", "\n", "print(sps.ttest_ind(a, b, axis=1, equal_var=False).pvalue)" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sample1.1\tsample1.2\tsample1.3\tsample1.4\tsample2.1\tsample2.2\tsample2.3\tsample2.4\r\n", "6.801411216875708305e+02\t5.565263845884210241e+02\t2.525828159262368047e+02\t2.956703139364956314e+02\t9.604893343325777550e+02\t4.193181331491081210e+02\t3.126077132320110081e+02\t8.246976543530363415e+02\r\n", "5.874671746889415545e+02\t4.128408625270991479e+02\t4.378621470916139060e+02\t6.736096888155080933e+02\t7.240856598618292992e+02\t4.807175238900777003e+02\t4.731265298940686534e+02\t5.216420065104225614e+02\r\n" ] } ], "source": [ "!head -3 test_data.tab" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 1 }