{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Working with modules\n", "Almost everything you'll want to do with Python has already been implemented by someone else. \n", "Many workflows have been developed into **modules** which can be **imported** into your Python session.\n", "\n", "There are quite a few modules which come bundled with the basic Python installation, and even more if you installed Python via the Anaconda distribution (which you in principle have for this course).\n", "\n", "Additional modules can be installed to your (environment-specific) library using `conda package manager` or `pip`, both of which are shipped with Anaconda. \n", "\n", "> **It is not advisable to mix installations via `conda` and via `pip` within a Conda environment.** \n", " So it's best if you stick to using conda for the time being.\n", "\n", "
\n", "\n", "## Importing modules\n", "There are a number of ways to **import modules** into your code. Modules can be imported entirely, or partially.\n", "Here are 3 different ways of importing a module (examplified here with the `os` module):" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1. Import the module name, without adding all of its content to your namespace.\n", "* This is the simplest, and most frequently used, way to import a module.\n", "* Any object of the module (e.g. a function) must be called using the syntax: `modulename.object`\n", " as the name of the object is not directly availble in the namespace - only the name of the module is.\n", " This is actually a good thing because:\n", " * It avoids adding a lot of names to the namespace (most of which we probably don't use).\n", " * It gives an indication of where the function/class/object was taken from." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import os \n", "print(os.name)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import statistics\n", "statistics.mean(range(101))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Trying to call directly the 'mean()' function without its module name raises a NameError.\n", "import statistics\n", "mean(range(101))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2. Import the module name as an alias\n", "This is essentially the same as the first solution above, with the only difference that the module name is given an alias. This is used for modules with a long name." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "plt.plot(np.linspace(0, 10, 100), [np.sin(x) for x in np.linspace(0, 10, 100)], color='darkorange')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3. Import specific objects from a module\n", "This is useful if you only need a limited number of objects from a module.\n", "\n", "In this example, we only import the function `getcwd()` and the variable `name` from the `os` module:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from os import getcwd, name\n", "print(\"The type of the operating system running this Jupyter instance is:\\n ->\", name)\n", "print(\"The current working directory is:\\n ->\", getcwd())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At first, the third method may appear nicer as it leads to shorter code. However, it often **hampers code readability**: you now have a variable called `name` but it is not directly obvious that it contains the name of the type of the os that you are operating on!\n", "\n", "Therefore **the third method should be used with parcimony**: only in in specific cases, e.g. when you need only a specific function (with a specific name) from a very large module for instance.\n", "\n", "Finally, it is also possible to import all the object from a module at once, doing something like \n", "`from os import *`. While it might again look convenient, it is in reality **bad practice**, and we only show it here so you know to **avoid it** when you see it! This is because it:\n", " 1. Unecessarily pollutes your namespace (i.e. creates many new names that you will not use)\n", " 2. Can lead to unpredictable resutls, since the content of a module might change over time and\n", " you are simply importing it all without any check of what it actually is.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Importing your own module\n", "Often it can be useful to import your **own module**, typically so you can:\n", "* **Re-use elements - typically a function -** that you wrote earlier.\n", "* **Oragnise your code** into multiple files, e.g. your main workflow in one file, and functions \n", " grouped by category in different files.\n", "\n", "This is done exactly like with built-in and external modules:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import my_own_module\n", "help(my_own_module)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import my_own_module\n", "my_own_module.greeting()\n", "my_own_module.greeting(my_own_module.DEFAULT_USER)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from my_own_module import greeting, DEFAULT_USER\n", "greeting(name=\"Bob\")\n", "greeting(DEFAULT_USER)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import my_own_module as mom\n", "mom.greeting(name=\"James\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "## Frequently used native modules: `os`\n", "The `os` module is a native module (meaning it comes installed with base python) designed to manage interactions with the operating system. \n", "It greatly enhances code portability, as it allows you tu run the same code on different platforms (Linux, Windows, MacOS).\n", "\n", "Here we will give you an overview of a few useful functions from `os`, but there are plenty more that are not covered here.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Get** and **set working directory** with:\n", "* `os.getcwd()` - returns the current working directory.\n", "* `os.chdir(path)` - sets the working directory to `path`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "current_wd = os.getcwd()\n", "print('Current working dir:', current_wd, '\\n')\n", "\n", "os.chdir('../solutions')\n", "print('Working dir changed to:', os.getcwd(), '\\n')\n", "\n", "os.chdir(current_wd)\n", "print('Working dir is now again:', os.getcwd(), '\\n')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "Manipulate files and directories:\n", "* `os.mkdir(path)` - creates a new directory non-recursively. To create directories recursively use `os.makedirs(path)`.\n", "* `os.rmdir(path)` - deletes `path` if it is an empty diretory.\n", "* `os.remove(path)` - deletes the file `path` (does not delete directories, even if empty).\n", "* `os.listdir(path)` - lists the content (files and directories) of `path`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Manipulate paths:\n", "* `os.path.basename(path)` - returns the **basename** of a path, i.e. the last element (file or dir) of a path.\n", "* `os.path.dirname(path)` - returns the parent directory of the last element of a path.\n", "* `os.path.isfile(path)` - returns `True` if `path` is an existing regular file.\n", "* `os.path.isdir()` - returns `True` if `path` is an existing directory.\n", "* `os.path.join(path1, path2, ...)` - returns a new path by appending all paths passed as arguments one after the other." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def list_files_from_dir(path, show_hidden=False):\n", " \"\"\"Prints files and directories found at a given path.\n", " Ignores files part of the ignored list.\n", " \"\"\"\n", " # Verify the input path is a directory.\n", " if not os.path.isdir(path):\n", " raise ValueError(\"argument 'path' is not a valid directory.\")\n", " \n", " # Print files in the directory.\n", " print(\"Content of directory:\", os.path.basename(path), \"(including hidden files)\" if show_hidden else '')\n", " for f in os.listdir(path=path):\n", " if not f.startswith('.') or show_hidden:\n", " print(\"-\", f)\n", " \n", " print('\\n', end='')\n", " \n", " \n", "# Show files in the parent of the current working directory.\n", "parent_dir = os.path.dirname(os.getcwd())\n", "list_files_from_dir(parent_dir)\n", "list_files_from_dir(parent_dir, show_hidden=True)\n", "\n", "files_orig = os.listdir(path='.')\n", "\n", "# Create a new directory:\n", "new_dir = os.path.join(parent_dir, 'tmp_dir')\n", "os.mkdir(new_dir)\n", "list_files_from_dir(parent_dir)\n", "os.rmdir(new_dir)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercises: 4.1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "## Frequently used native modules: `time`\n", "The `time` module is designed to measure and format time. It is very useful to monitor code execution times, e.g. when doing optimization.\n", "\n", "Here are a few interesting functions from the `time` module:\n", "* `time.time()` - returns the **time in seconds since the epoch** as a floating point number.\n", " The epoch is the point from where the time starts (for your computer!), and is platform dependent.\n", " For Unix, the epoch is January 1, 1970, 00:00:00 (UTC - Coordinated Universal Time - the same as GMT).\n", "* `time.gmtime()` - transforms the number of seconds given by `time.time()` into human readable UTC **struct_time** object.\n", "* `time.localtime()` - same as `.gmtime()` but transforms to local time.\n", "* `time.asctime(struct_time)` - further format this into a nice string." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import time\n", "\n", "current_time = time.time()\n", "print(\"The current time is:\", current_time)\n", "print(\"Oh, sorry, I forgot you are a mere human... \\nLet me convert that for you:\", \n", " time.asctime(time.localtime(current_time)), '\\n')\n", "\n", "# Let's have a look at \"time_struct\" object.\n", "current_time_struct = time.localtime(current_time) \n", "print(\"This is the structure returned by 'localtime()' and 'gmtime()':\\n\", current_time_struct, \"\\n\")\n", "\n", "# Let's look at what the epoch is for your system :\n", "print(\"The current Epoch is:\", time.asctime(time.gmtime(0)))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "Let's now use the time module to measure the execution time of some code:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import time \n", "\n", "# Implementation version 1 of the reverse complement function: uses if... else...\n", "def reverse_complement_v1(dna_sequence):\n", " \"\"\"Returns the reverse complement of a DNA sequence\n", " given as argument.\n", " \"\"\"\n", " reversed_seq = \"\" \n", " for nucleotide in dna_sequence:\n", " if nucleotide == 'A':\n", " reversed_seq += 'T'\n", " elif nucleotide == 'T':\n", " reversed_seq += 'A'\n", " elif nucleotide == 'G':\n", " reversed_seq += 'C'\n", " elif nucleotide == 'C':\n", " reversed_seq += 'G'\n", " else:\n", " pass\n", " \n", " return reversed_seq[::-1]\n", "\n", "# Implementation version 2 of the reverse complement function: uses a dictionary.\n", "def reverse_complement_v2(dna_sequence):\n", " \"\"\"Returns the reverse complement of a DNA sequence\n", " given as argument.\n", " \"\"\"\n", " complement_dict = {'A':'T',\n", " 'T':'A',\n", " 'C':'G',\n", " 'G':'C'}\n", " return ''.join([complement_dict.get(nucleotide, '') for nucleotide in dna_sequence])[::-1]\n", "\n", "\n", "# Let's benchmark our 2 implementations.\n", "test_sequence_patterns = [\"ATAGAGCGATCGATCCCTAG\",\n", " \"AAAAAAAAAAAAAAAAAAAA\",\n", " \"CCCCCCCCCCCCCCCCCCCC\"]\n", "\n", "for dna_sequence_pattern in test_sequence_patterns:\n", " print(\"Starting benchmark for pattern:\", dna_sequence_pattern)\n", " for sequence_length in (1e3, 1e6, 1e8):\n", " dna_sequence = dna_sequence_pattern * int(sequence_length / 20)\n", "\n", " start_time = time.time()\n", " revcomp_v1 = reverse_complement_v1(dna_sequence)\n", " time_v1 = time.time() - start_time\n", "\n", " start_time = time.time()\n", " revcomp_v2 = reverse_complement_v2(dna_sequence)\n", " time_v2 = time.time() - start_time\n", " if revcomp_v1 != revcomp_v2:\n", " print(\"ERROR: both outputs do not match!\")\n", "\n", " print(\"Benchmark sequence length:\", len(dna_sequence))\n", " print(\"Time method 1 (uses if else):\", round(time_v1, 4))\n", " print(\"Time method 1 (uses dict) :\", round(time_v2, 4))\n", " print(\"Time ratio: dict method is\", round(time_v1/time_v2, 2), \"times faster.\\n\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are [many more modules](https://docs.python.org/3/py-modindex.html) integrated to the basic python distribution, including:\n", "* os : interaction with the operating system \n", "* argparse : to manage LINUX-like options for your scripts\n", "* random : \tto generate random numbers with various common distributions\n", "* collections : contains some useful container classes\n", "* itertools : useful iterators. A must-go for combinatorics (eg. permutations, cobinations, ...)\n", "* ..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "## Building your own modules\n", "Building your own module in python is fairly easy.\n", "\n", "### From a regular script\n", "Any python script - i.e. a plain text file with `.py` extension and some python code in it - can be imported as a module. The only restriction is that the imported module must either:\n", " * Be in the same directory as the code that imports it.\n", " * Have been installed with anaconda: [here's an idea on how to do this](https://stackoverflow.com/questions/49474575/how-to-install-my-own-python-module-package-via-conda-and-watch-its-changes)\n", " * Be in a directory listed in the environment variable `PYTHONPATH` : [windows](https://docs.python.org/3/using/windows.html#excursus-setting-environment-variables) , [UNIX-like](https://stackoverflow.com/a/3402176)\n", " \n", "You can lean more about creating modules in this [python3 module online tutorial](https://docs.python.org/3/tutorial/modules.html).\n", "\n", "### From a Jupyter notebook\n", "Although it is a bit tricky, you can import a Jupyter notebook as a module, so that you may re-use the functions you have coded in it.\n", "\n", "E.g., to import a Jupyter Notebook named `MyOtherNotebook.ipynb`, you can use the following syntax that uses the `%run` \"magic\" command:\n", "* `%run MyOtherNotebook.ipynb`\n", "\n", "If you want to import a Notebook into a classical script, the [import-ipynb](https://pypi.org/project/import-ipynb/) module is what you are looking for." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%run 01_python_basics.ipynb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "## Exercises: 4.2 and 4.3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
\n", "\n", "# Install modules needed for the upcomming notebooks\n", "\n", "In the comming lessons, we will introduce you to several well known Python libraries that are particularly useful when doing bioinformatics or (biological) data-analysis.\n", "\n", "Use the following code to ensure every library is properly installed :\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Note: you may comment-out any library you are not interested in.\n", "import Bio # biopython : bioinformatics in python.\n", "import matplotlib # create high-quality plots.\n", "import numpy # powerful array structure for fast numerical computation.\n", "import scipy # scientific computing package, with linear algebra and statistical tests.\n", "import pandas # powerful DataFrame structure that mimics R dataframe. A must for data analysis.\n", "print('All libraries imported successfully')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If any of these fail, install them from your Linux/MacOS terminal, or the conda console for Windows users.\n", "* biopython : type `conda install -c anaconda biopython`\n", "* matplotlib : follow instrcutions from [here](https://github.com/conda-forge/matplotlib-feedstock#installing-matplotlib-suite)\n", "* pandas : type `conda install pandas`\n", "* scipy : type `conda install scipy` \n", "* numpy : type `conda install numpy`" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.6" } }, "nbformat": 4, "nbformat_minor": 4 }