{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercises with pandas\n", "\n", "### Exercise 01\n", "\n", "Create a pandas dataframe where the index consists of the integers from 1-100. \n", "Remove a few rows from the dataframe. Slice the first rows using `.iloc` and `.loc` and make sure that you get the expected behavior.\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
index
index
11
22
33
44
55
\n", "
" ], "text/plain": [ " index\n", "index \n", "1 1\n", "2 2\n", "3 3\n", "4 4\n", "5 5" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "df = pd.DataFrame( {'index': range(1,100)} )\n", "df.index= df['index']\n", "\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
index
index
2020
2121
2222
2323
2424
2525
2626
2727
2828
2929
3030
3131
3232
\n", "
" ], "text/plain": [ " index\n", "index \n", "20 20\n", "21 21\n", "22 22\n", "23 23\n", "24 24\n", "25 25\n", "26 26\n", "27 27\n", "28 28\n", "29 29\n", "30 30\n", "31 31\n", "32 32" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[20:32,] #the index will go from 20 to 32 because we select by index" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
index
index
2121
2222
2323
2424
2525
2626
2727
2828
2929
3030
3131
3232
\n", "
" ], "text/plain": [ " index\n", "index \n", "21 21\n", "22 22\n", "23 23\n", "24 24\n", "25 25\n", "26 26\n", "27 27\n", "28 28\n", "29 29\n", "30 30\n", "31 31\n", "32 32" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[20:32,] # the index will go from 21 to 32 because we select by position" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 02\n", "\n", "Grab the most recent data on:\n", " * [number of hospitalised person](https://raw.githubusercontent.com/daenuprobst/covid19-cases-switzerland/master/covid19_hospitalized_switzerland_openzh.csv)\n", " * [canton demographics](https://raw.githubusercontent.com/daenuprobst/covid19-cases-switzerland/master/demographics.csv)\n", "\n", "1. download and read these two data files as pandas `DataFrame`" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateAGAIARBEBLBSFRGEGL...VS_diff_pcZG_diffZG_pcZG_diff_pcZH_diffZH_pcZH_diff_pcCH_diffCH_pcCH_diff_pc
02020-02-25NaNNaNNaNNaNNaNNaNNaN0.0NaN...NaNNaNNaNNaNNaNNaNNaNNaN0.000000e+00NaN
12020-02-26NaNNaNNaNNaNNaNNaNNaN1.0NaN...NaNNaNNaNNaNNaNNaNNaN1.01.170767e-071.170767e-07
22020-02-27NaNNaNNaNNaNNaN3.0NaN1.0NaN...NaNNaNNaNNaNNaNNaNNaN4.05.853834e-074.683068e-07
32020-02-28NaNNaNNaNNaNNaN3.0NaN2.0NaN...NaNNaNNaNNaNNaNNaNNaN12.01.990304e-061.404920e-06
42020-02-29NaNNaNNaNNaN1.04.00.03.0NaN...0.0NaN0.0NaNNaNNaNNaN3.02.341534e-063.512301e-07
\n", "

5 rows × 109 columns

\n", "
" ], "text/plain": [ " Date AG AI AR BE BL BS FR GE GL ... VS_diff_pc \\\n", "0 2020-02-25 NaN NaN NaN NaN NaN NaN NaN 0.0 NaN ... NaN \n", "1 2020-02-26 NaN NaN NaN NaN NaN NaN NaN 1.0 NaN ... NaN \n", "2 2020-02-27 NaN NaN NaN NaN NaN 3.0 NaN 1.0 NaN ... NaN \n", "3 2020-02-28 NaN NaN NaN NaN NaN 3.0 NaN 2.0 NaN ... NaN \n", "4 2020-02-29 NaN NaN NaN NaN 1.0 4.0 0.0 3.0 NaN ... 0.0 \n", "\n", " ZG_diff ZG_pc ZG_diff_pc ZH_diff ZH_pc ZH_diff_pc CH_diff \\\n", "0 NaN NaN NaN NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN NaN NaN 1.0 \n", "2 NaN NaN NaN NaN NaN NaN 4.0 \n", "3 NaN NaN NaN NaN NaN NaN 12.0 \n", "4 NaN 0.0 NaN NaN NaN NaN 3.0 \n", "\n", " CH_pc CH_diff_pc \n", "0 0.000000e+00 NaN \n", "1 1.170767e-07 1.170767e-07 \n", "2 5.853834e-07 4.683068e-07 \n", "3 1.990304e-06 1.404920e-06 \n", "4 2.341534e-06 3.512301e-07 \n", "\n", "[5 rows x 109 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "datHospit = pd.read_csv(\"covid19_hospitalized_switzerland_openzh.csv\")\n", "datHospit.head()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Canton Population SettlementAreaHa SettlementAreaKm2 Density \\\n", "Canton \n", "ZH ZH 1520968 37796.0 377.96 4024.150704 \n", "BE BE 1034977 41197.0 411.97 2512.263029 \n", "VD VD 799145 29940.0 299.40 2669.154977 \n", "AG AG 678207 23854.0 238.54 2843.158380 \n", "SG SG 507697 19408.0 194.08 2615.916117 \n", "\n", " O65 O65P Beds BedsPerCapita \n", "Canton \n", "ZH 0.170 258565 4472 0.002940 \n", "BE 0.208 215275 3053 0.002950 \n", "VD 0.164 131060 2268 0.002838 \n", "AG 0.177 120043 1450 0.002138 \n", "SG 0.183 92909 1565 0.003083 \n", "Canton ZH BE VD AG SG GE LU TI VS FR ... ZG SH JU \\\n", "Date ... \n", "2020-02-25 NaN NaN NaN NaN NaN 0.0 NaN 0.0 NaN NaN ... NaN NaN NaN \n", "2020-02-26 NaN NaN NaN NaN NaN 1.0 NaN 0.0 NaN NaN ... NaN NaN NaN \n", "2020-02-27 NaN NaN NaN NaN NaN 1.0 NaN 0.0 NaN NaN ... NaN NaN 1.0 \n", "2020-02-28 NaN NaN 4.0 NaN NaN 2.0 NaN 0.0 7.0 NaN ... NaN NaN 1.0 \n", "2020-02-29 NaN NaN 4.0 NaN NaN 3.0 NaN 0.0 7.0 0.0 ... 0.0 NaN 1.0 \n", "\n", "Canton AR NW GL OW UR AI CH \n", "Date \n", "2020-02-25 NaN NaN NaN NaN NaN NaN 0.0 \n", "2020-02-26 NaN NaN NaN NaN NaN NaN 1.0 \n", "2020-02-27 NaN NaN NaN NaN NaN NaN 5.0 \n", "2020-02-28 NaN NaN NaN NaN NaN NaN 17.0 \n", "2020-02-29 NaN NaN NaN NaN NaN NaN 20.0 \n", "\n", "[5 rows x 27 columns]\n" ] } ], "source": [ "import pandas as pd\n", "datHospit = pd.read_csv(\"covid19_hospitalized_switzerland_openzh.csv\")\n", "datHospit.index = datHospit['Date']\n", "datDemographics = pd.read_csv(\"swiss_demographics.csv\" )\n", "datDemographics.index = datDemographics['Canton']\n", "\n", "datHospit = datHospit.loc[ : , datDemographics.index]\n", "\n", "print(datDemographics.head())\n", "print(datHospit.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Modify the numbers from \"number of people hospitalized\" to \"number of people hospitalized per 10 000 habitants\" for each canton" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CantonZHBEVDAGSGGELUTIVSFR...ZGSHJUARNWGLOWURAICH
Date
2020-02-25NaNNaNNaNNaNNaN0.000000NaN0.0NaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaN0.000000
2020-02-26NaNNaNNaNNaNNaN0.020192NaN0.0NaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaN0.001171
2020-02-27NaNNaNNaNNaNNaN0.020192NaN0.0NaNNaN...NaNNaN0.136205NaNNaNNaNNaNNaNNaN0.005854
2020-02-28NaNNaN0.050053NaNNaN0.040384NaN0.00.203515NaN...NaNNaN0.136205NaNNaNNaNNaNNaNNaN0.019903
2020-02-29NaNNaN0.050053NaNNaN0.060576NaN0.00.2035150.0...0.0NaN0.136205NaNNaNNaNNaNNaNNaN0.023415
\n", "

5 rows × 27 columns

\n", "
" ], "text/plain": [ "Canton ZH BE VD AG SG GE LU TI VS FR ... \\\n", "Date ... \n", "2020-02-25 NaN NaN NaN NaN NaN 0.000000 NaN 0.0 NaN NaN ... \n", "2020-02-26 NaN NaN NaN NaN NaN 0.020192 NaN 0.0 NaN NaN ... \n", "2020-02-27 NaN NaN NaN NaN NaN 0.020192 NaN 0.0 NaN NaN ... \n", "2020-02-28 NaN NaN 0.050053 NaN NaN 0.040384 NaN 0.0 0.203515 NaN ... \n", "2020-02-29 NaN NaN 0.050053 NaN NaN 0.060576 NaN 0.0 0.203515 0.0 ... \n", "\n", "Canton ZG SH JU AR NW GL OW UR AI CH \n", "Date \n", "2020-02-25 NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.000000 \n", "2020-02-26 NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.001171 \n", "2020-02-27 NaN NaN 0.136205 NaN NaN NaN NaN NaN NaN 0.005854 \n", "2020-02-28 NaN NaN 0.136205 NaN NaN NaN NaN NaN NaN 0.019903 \n", "2020-02-29 0.0 NaN 0.136205 NaN NaN NaN NaN NaN NaN 0.023415 \n", "\n", "[5 rows x 27 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Pop = datDemographics.loc[ datHospit.columns , 'Population' ]\n", "\n", "datHospit.loc[:, datHospit.columns ] /= (Pop/10000)\n", "\n", "datHospit.loc[:,datHospit.columns ].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. get, for each canton, the date at which the hosptitalization rate was maximal" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Canton\n", "ZH 2020-11-23\n", "BE 2020-11-08\n", "VD 2020-11-16\n", "AG 2020-11-30\n", "SG 2020-11-08\n", "GE 2020-11-16\n", "LU 2020-11-30\n", "TI 2020-11-23\n", "VS 2020-11-06\n", "FR 2020-11-11\n", "BL 2020-03-29\n", "TG 2020-11-15\n", "SO 2020-11-24\n", "GR 2020-11-16\n", "BS 2020-12-02\n", "NE 2020-11-15\n", "SZ 2020-10-15\n", "ZG 2020-11-03\n", "SH 2020-12-01\n", "JU 2020-11-02\n", "AR 2020-11-11\n", "NW 2020-11-08\n", "GL 2020-11-09\n", "OW 2020-11-16\n", "UR 2020-04-01\n", "AI 2020-11-02\n", "CH 2020-11-16\n", "dtype: object\n", "Canton\n", "ZH 2.182820\n", "BE 3.632931\n", "VD 6.932409\n", "AG 2.255948\n", "SG 3.486331\n", "GE 12.337228\n", "LU 2.636996\n", "TI 9.820486\n", "VS 8.954660\n", "FR 5.930082\n", "BL 2.915330\n", "TG 2.314882\n", "SO 2.525678\n", "GR 3.831051\n", "BS 6.880051\n", "NE 8.260098\n", "SZ 1.570697\n", "ZG 2.522923\n", "SH 3.049115\n", "JU 12.258407\n", "AR 4.345150\n", "NW 2.313583\n", "GL 4.455115\n", "OW 3.963954\n", "UR 2.744764\n", "AI 6.193868\n", "CH 4.198370\n", "dtype: float64\n" ] } ], "source": [ "# one liner answer\n", "maxIdx = datHospit.idxmax(axis=0)\n", "maxVal = datHospit.max(axis=0)\n", "print(maxIdx)\n", "print(maxVal)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 03\n" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Heart_WT_1Heart_WT_2Heart_WT_3Heart_WT_4Heart_KO_1Heart_KO_2Heart_KO_3Heart_KO_4
1415670_at1214.44701182.46401206.22601196.03001174.61801184.4580985.55031214.5400
1415671_at3490.09802882.78402650.03302934.86102723.97602823.26002721.88402790.8340
1415672_at4510.36904292.05704071.05704275.27604127.94104045.99004553.73604358.6350
1415673_at598.8334385.0178458.4872514.2919584.0671544.3807569.1154323.8668
1415674_a_at1400.32501328.29501416.92301388.41801459.95601462.96401237.24401797.9060
\n", "
" ], "text/plain": [ " Heart_WT_1 Heart_WT_2 Heart_WT_3 Heart_WT_4 Heart_KO_1 \\\n", "1415670_at 1214.4470 1182.4640 1206.2260 1196.0300 1174.6180 \n", "1415671_at 3490.0980 2882.7840 2650.0330 2934.8610 2723.9760 \n", "1415672_at 4510.3690 4292.0570 4071.0570 4275.2760 4127.9410 \n", "1415673_at 598.8334 385.0178 458.4872 514.2919 584.0671 \n", "1415674_a_at 1400.3250 1328.2950 1416.9230 1388.4180 1459.9560 \n", "\n", " Heart_KO_2 Heart_KO_3 Heart_KO_4 \n", "1415670_at 1184.4580 985.5503 1214.5400 \n", "1415671_at 2823.2600 2721.8840 2790.8340 \n", "1415672_at 4045.9900 4553.7360 4358.6350 \n", "1415673_at 544.3807 569.1154 323.8668 \n", "1415674_a_at 1462.9640 1237.2440 1797.9060 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "column_names = [\"gene\", \"Heart_WT_1\", \"Heart_WT_2\", \"Heart_WT_3\", \"Heart_WT_4\", \"Heart_KO_1\", \"Heart_KO_2\", \"Heart_KO_3\", \"Heart_KO_4\"]\n", "df = pd.read_csv(\"../exercises/GSE41558_series_matrix.tsv\" , sep='\\t')\n", "#df.columns=column_names\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Center each column : substract their mean from their values" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Heart_WT_1 970.611017\n", "Heart_WT_2 975.626698\n", "Heart_WT_3 941.090693\n", "Heart_WT_4 908.006058\n", "Heart_KO_1 1037.988028\n", "Heart_KO_2 966.555207\n", "Heart_KO_3 1008.138146\n", "Heart_KO_4 952.867198\n", "dtype: float64" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M = df.mean(axis=0)\n", "M" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Heart_WT_1Heart_WT_2Heart_WT_3Heart_WT_4Heart_KO_1Heart_KO_2Heart_KO_3Heart_KO_4
1415670_at243.835983206.837302265.135307288.023942136.629972217.902793-22.587846261.672802
1415671_at2519.4869831907.1573021708.9423072026.8549421685.9879721856.7047931713.7458541837.966802
1415672_at3539.7579833316.4303023129.9663073367.2699423089.9529723079.4347933545.5978543405.767802
1415673_at-371.777617-590.608898-482.603493-393.714158-453.920928-422.174507-439.022746-629.000398
1415674_a_at429.713983352.668302475.832307480.411942421.967972496.408793229.105854845.038802
\n", "
" ], "text/plain": [ " Heart_WT_1 Heart_WT_2 Heart_WT_3 Heart_WT_4 Heart_KO_1 \\\n", "1415670_at 243.835983 206.837302 265.135307 288.023942 136.629972 \n", "1415671_at 2519.486983 1907.157302 1708.942307 2026.854942 1685.987972 \n", "1415672_at 3539.757983 3316.430302 3129.966307 3367.269942 3089.952972 \n", "1415673_at -371.777617 -590.608898 -482.603493 -393.714158 -453.920928 \n", "1415674_a_at 429.713983 352.668302 475.832307 480.411942 421.967972 \n", "\n", " Heart_KO_2 Heart_KO_3 Heart_KO_4 \n", "1415670_at 217.902793 -22.587846 261.672802 \n", "1415671_at 1856.704793 1713.745854 1837.966802 \n", "1415672_at 3079.434793 3545.597854 3405.767802 \n", "1415673_at -422.174507 -439.022746 -629.000398 \n", "1415674_a_at 496.408793 229.105854 845.038802 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df -= M\n", "\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. select from the data-frame only the genes whose expression is above the column-wise average in all the WT samples" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
geneHeart_WT_1Heart_WT_2Heart_WT_3Heart_WT_4Heart_KO_1Heart_KO_2Heart_KO_3Heart_KO_4
01415670_at243.835983206.837302265.135307288.023942136.629972217.902793-22.587846261.672802
11415671_at2519.4869831907.1573021708.9423072026.8549421685.9879721856.7047931713.7458541837.966802
21415672_at3539.7579833316.4303023129.9663073367.2699423089.9529723079.4347933545.5978543405.767802
41415674_a_at429.713983352.668302475.832307480.411942421.967972496.408793229.105854845.038802
61415676_a_at6957.2859835729.6513026301.6153075951.5629427330.4459727564.7737937539.1088546886.417802
..............................
45087AFFX-r2-Ec-bioC-5_at3629.9439833302.1253023419.7283073660.1259424229.1019723438.6157933131.5228542968.824802
45088AFFX-r2-Ec-bioD-3_at23467.01898321859.22330222176.19930721195.40394225472.63197221378.72479322478.74185421989.652802
45089AFFX-r2-Ec-bioD-5_at21102.61898320470.05330219450.12930718338.59394224137.56197218814.15479319755.93185419712.012802
45090AFFX-r2-P1-cre-3_at69878.96898376552.32330265187.63930755401.31394284234.31197266379.58479374694.23185465889.582802
45091AFFX-r2-P1-cre-5_at52047.81898357347.42330248870.35930743529.15394265602.53197249490.50479355143.20185448179.362802
\n", "

6184 rows × 9 columns

\n", "
" ], "text/plain": [ " gene Heart_WT_1 Heart_WT_2 Heart_WT_3 \\\n", "0 1415670_at 243.835983 206.837302 265.135307 \n", "1 1415671_at 2519.486983 1907.157302 1708.942307 \n", "2 1415672_at 3539.757983 3316.430302 3129.966307 \n", "4 1415674_a_at 429.713983 352.668302 475.832307 \n", "6 1415676_a_at 6957.285983 5729.651302 6301.615307 \n", "... ... ... ... ... \n", "45087 AFFX-r2-Ec-bioC-5_at 3629.943983 3302.125302 3419.728307 \n", "45088 AFFX-r2-Ec-bioD-3_at 23467.018983 21859.223302 22176.199307 \n", "45089 AFFX-r2-Ec-bioD-5_at 21102.618983 20470.053302 19450.129307 \n", "45090 AFFX-r2-P1-cre-3_at 69878.968983 76552.323302 65187.639307 \n", "45091 AFFX-r2-P1-cre-5_at 52047.818983 57347.423302 48870.359307 \n", "\n", " Heart_WT_4 Heart_KO_1 Heart_KO_2 Heart_KO_3 Heart_KO_4 \n", "0 288.023942 136.629972 217.902793 -22.587846 261.672802 \n", "1 2026.854942 1685.987972 1856.704793 1713.745854 1837.966802 \n", "2 3367.269942 3089.952972 3079.434793 3545.597854 3405.767802 \n", "4 480.411942 421.967972 496.408793 229.105854 845.038802 \n", "6 5951.562942 7330.445972 7564.773793 7539.108854 6886.417802 \n", "... ... ... ... ... ... \n", "45087 3660.125942 4229.101972 3438.615793 3131.522854 2968.824802 \n", "45088 21195.403942 25472.631972 21378.724793 22478.741854 21989.652802 \n", "45089 18338.593942 24137.561972 18814.154793 19755.931854 19712.012802 \n", "45090 55401.313942 84234.311972 66379.584793 74694.231854 65889.582802 \n", "45091 43529.153942 65602.531972 49490.504793 55143.201854 48179.362802 \n", "\n", "[6184 rows x 9 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#here, we can take advantage from the fact that we have already removed the column averages, so we are interested in positive values\n", "df[(df['Heart_WT_1']>0) &\n", " (df['Heart_WT_2']>0) &\n", " (df['Heart_WT_3']>0) &\n", " (df['Heart_WT_4']>0) ]\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
geneHeart_WT_1Heart_WT_2Heart_WT_3Heart_WT_4Heart_KO_1Heart_KO_2Heart_KO_3Heart_KO_4
01415670_at1214.4471182.4641206.2261196.0301174.6181184.458985.55031214.540
11415671_at3490.0982882.7842650.0332934.8612723.9762823.2602721.88402790.834
21415672_at4510.3694292.0574071.0574275.2764127.9414045.9904553.73604358.635
41415674_a_at1400.3251328.2951416.9231388.4181459.9561462.9641237.24401797.906
61415676_a_at7927.8976705.2787242.7066859.5698368.4348531.3298547.24707839.285
..............................
45087AFFX-r2-Ec-bioC-5_at4600.5554277.7524360.8194568.1325267.0904405.1714139.66103921.692
45088AFFX-r2-Ec-bioD-3_at24437.63022834.85023117.29022103.41026510.62022345.28023486.880022942.520
45089AFFX-r2-Ec-bioD-5_at22073.23021445.68020391.22019246.60025175.55019780.71020764.070020664.880
45090AFFX-r2-P1-cre-3_at70849.58077527.95066128.73056309.32085272.30067346.14075702.370066842.450
45091AFFX-r2-P1-cre-5_at53018.43058323.05049811.45044437.16066640.52050457.06056151.340049132.230
\n", "

6184 rows × 9 columns

\n", "
" ], "text/plain": [ " gene Heart_WT_1 Heart_WT_2 Heart_WT_3 Heart_WT_4 \\\n", "0 1415670_at 1214.447 1182.464 1206.226 1196.030 \n", "1 1415671_at 3490.098 2882.784 2650.033 2934.861 \n", "2 1415672_at 4510.369 4292.057 4071.057 4275.276 \n", "4 1415674_a_at 1400.325 1328.295 1416.923 1388.418 \n", "6 1415676_a_at 7927.897 6705.278 7242.706 6859.569 \n", "... ... ... ... ... ... \n", "45087 AFFX-r2-Ec-bioC-5_at 4600.555 4277.752 4360.819 4568.132 \n", "45088 AFFX-r2-Ec-bioD-3_at 24437.630 22834.850 23117.290 22103.410 \n", "45089 AFFX-r2-Ec-bioD-5_at 22073.230 21445.680 20391.220 19246.600 \n", "45090 AFFX-r2-P1-cre-3_at 70849.580 77527.950 66128.730 56309.320 \n", "45091 AFFX-r2-P1-cre-5_at 53018.430 58323.050 49811.450 44437.160 \n", "\n", " Heart_KO_1 Heart_KO_2 Heart_KO_3 Heart_KO_4 \n", "0 1174.618 1184.458 985.5503 1214.540 \n", "1 2723.976 2823.260 2721.8840 2790.834 \n", "2 4127.941 4045.990 4553.7360 4358.635 \n", "4 1459.956 1462.964 1237.2440 1797.906 \n", "6 8368.434 8531.329 8547.2470 7839.285 \n", "... ... ... ... ... \n", "45087 5267.090 4405.171 4139.6610 3921.692 \n", "45088 26510.620 22345.280 23486.8800 22942.520 \n", "45089 25175.550 19780.710 20764.0700 20664.880 \n", "45090 85272.300 67346.140 75702.3700 66842.450 \n", "45091 66640.520 50457.060 56151.3400 49132.230 \n", "\n", "[6184 rows x 9 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#if we hadn't done the substraction, here's how we could do it :\n", "\n", "column_names = [\"gene\", \"Heart_WT_1\", \"Heart_WT_2\", \"Heart_WT_3\", \"Heart_WT_4\", \"Heart_KO_1\", \"Heart_KO_2\", \"Heart_KO_3\", \"Heart_KO_4\"]\n", "df = pd.read_csv(\"../exercises/GSE41558_series_matrix_no_header.tsv\", sep='\\t', header=None)\n", "df.columns=column_names\n", "\n", "M = df.mean()\n", "\n", "df[(df['Heart_WT_1']>M['Heart_WT_1']) &\n", " (df['Heart_WT_2']>M['Heart_WT_2']) &\n", " (df['Heart_WT_3']>M['Heart_WT_3']) &\n", " (df['Heart_WT_4']>M['Heart_WT_4']) ]\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }