{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Working with modules\n", "Almost everything you'll want to do with Python has already been implemented by someone else. \n", "Many workflows have been developed into **modules** which can be **imported** into your Python session.\n", "\n", "There are quite a few modules which come bundled with the basic Python installation, and even more if you installed the Anaconda distribution. \n", "Additional modules can be installed to your (environment-specific) library using `conda package manager` or `pip`, both of which are shipped with Anaconda. \n", "\n", "> **It is not advisable to mix `conda` and `pip` within one Python environment.**\n", "\n", "
\n", "\n", "## Importing modules\n", "There are a number of ways to **import modules** into your code. Modules can be imported entirely, or partially.\n", "Here are 3 different ways of importing a module (examplified with the `os`):" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "posix\n", "No Python documentation found for '/home/rengler/projects/training_SIB/python_course/first_steps_with_python.git/exercises'.\n", "Use help() to get the interactive help utility.\n", "Use help(str) for help on the str class.\n", "\n", "posix\n", "posix\n", "/home/rengler/projects/training_SIB/python_course/first_steps_with_python.git/exercises\n" ] } ], "source": [ "# 1. This is the simplest way to import a module.\n", "# Any object (e.g. a function) of the module must be called with using the syntax: modulename.object\n", "import os \n", "print(os.name) \n", "\n", "\n", "# 2. Import the module and give it an alias. This is very useful for modules with a long name, \n", "# e.g. \"matplotlib.pyplot\" is commonly aliased to \"plt\".\n", "import os as OSmodule\n", "print(OSmodule.name) # Calling the function using the alias. Not so useful in this example.\n", "\n", "\n", "# 3 Import specific objects from a module. This is useful if you only need a limited number of objects from a module.\n", "# In this example, we only import the function getcwd() and the variable \"name\"\n", "from os import getcwd, name\n", "print(name)\n", "print(getcwd())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "At first, the third method may appear nicer as it leads to shorter code. However, it often hampers **code readability**: now you have a variable called `name` but it is not directly obvious that it contains the name of the type of os that you are operating on! \n", "Therefore the third method should be used with parcimony: only in in specific cases, e.g. when you need a specific function (with a specific name) from a very large module for instance.\n", "\n", "> It is also possible to import all the object from a module at once, doing something like `from os import *`, but that is generally considered bad practice.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "## Frequently used native modules: `os`\n", "The `os` module is a native module (meaning it is already installed with base python) designed to manage interactions with the operating system. \n", "It greatly enhances code portability, as it allows you tu run the same code on different platforms (Linux, Windows, MacOS). \n", "Here we will give you an overview of a few useful functions from `os`, but there are plenty more that are not covered here.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Get and set working directory with:\n", "* `os.getcwd()` - returns the current working directory.\n", "* `os.chdir(path)` - sets the working directory to `path`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "current_wd = os.getcwd()\n", "print('The current working dir is:', current_wd)\n", "\n", "os.chdir('../solutions')\n", "print('The workind dir has been changed to:', os.getcwd())\n", "\n", "os.chdir(current_wd)\n", "print('The workind dir is now again:', os.getcwd())\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "Manipulate files and directories:\n", "* `os.mkdir(path)` - creates a new directory non-recursively. To create directories recursively use `os.makedirs(path)`.\n", "* `os.rmdir(path)` - deletes `path` if it is an empty diretory.\n", "* `os.remove(path)` - deletes the file `path` (does not delete directories, even if empty).\n", "* `os.listdir(path)` - lists the content (files and directories) of `path`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Manipulate paths:\n", "* `os.path.basename(path)` - returns the **basename** of a path, i.e. the last element (file or dir) of a path.\n", "* `os.path.dirname(path)` - returns the parent directory of the last element of a path.\n", "* `os.path.isfile(path)` - returns `True` if `path` is an existing regular file.\n", "* `os.path.isdir()` - returns `True` if `path` is an existing directory.\n", "* `os.path.join(path1, path2, ...)` - returns a new path by appending all paths passed as arguments one after the other." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def list_files_from_dir(path, show_hidden=False):\n", " \"\"\"Prints files and directories found at a given path.\n", " Ignores files part of the ignored list.\n", " \"\"\"\n", " # Verify the input path is a directory.\n", " if not os.path.isdir(path):\n", " raise ValueError(\"argument 'path' is not a valid directory.\")\n", " \n", " # Print files in the directory.\n", " print(\"Content of directory:\", os.path.basename(path))\n", " for f in os.listdir(path=path):\n", " if not f.startswith('.') or show_hidden:\n", " print(\"-\", f)\n", " \n", " print('\\n', end='')\n", " \n", " \n", "# Show files in the parent of the current working directory.\n", "print(os.getcwd())\n", "print(parent_dir, '\\n')\n", "\n", "parent_dir = os.path.dirname(os.getcwd())\n", "list_files_from_dir(parent_dir)\n", "list_files_from_dir(parent_dir, show_hidden=True)\n", "\n", "files_orig = os.listdir(path='.')\n", "\n", "# Create a new directory:\n", "new_dir = os.path.join(parent_dir, 'tmp_dir')\n", "os.mkdir(new_dir)\n", "list_files_from_dir(parent_dir)\n", "os.rmdir(new_dir)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercises: 4.1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "## Frequently used native modules: `time`\n", "The `time` module is designed to measure and format time. It is very useful to monitor code execution times, e.g. when doing optimization.\n", "\n", "Here are a few interesting functions from the `time` module:\n", "* `time.time()` - returns the time in seconds since the epoch as a floating point number. The epoch is the point from where the time starts, and is platform dependent. For Unix, the epoch is January 1, 1970, 00:00:00 (UTC - Coordinated Universal Time - the same as GMT).\n", "* `time.gmtime()` - transforms the number of seconds given by `time.time()` into human readable UTC **struct_time** object.\n", "* `time.localtime()` - same as `.gmtime()` but transforms to local time.\n", "* `time.asctime(struct_time)` - further format this into a nice string." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The current time is: 1599124678.7972088\n", "Oh, sorry, I forgot you are a mere human. Let me convert that for you: Thu Sep 3 11:17:58 2020 \n", "\n" ] } ], "source": [ "import time\n", "\n", "current_time = time.time()\n", "print(\"The current time is:\", current_time)\n", "print(\"Oh, sorry, I forgot you are a mere human. Let me convert that for you:\", \n", " time.asctime(time.localtime(current_time)), '\\n')\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The current time is: 1599123655.6045876\n", "Oh, sorry, I forgot you are a mere human. Let me convert that for you: Thu Sep 3 11:00:55 2020 \n", "\n", "This is the structure returned by 'localtime()' and 'gmtime()':\n", " time.struct_time(tm_year=2020, tm_mon=9, tm_mday=3, tm_hour=11, tm_min=0, tm_sec=55, tm_wday=3, tm_yday=247, tm_isdst=1) \n", "\n", "The current Epoch is: Thu Jan 1 00:00:00 1970\n" ] } ], "source": [ "import time\n", "\n", "current_time = time.time()\n", "print(\"The current time is:\", current_time)\n", "print(\"Oh, sorry, I forgot you are a mere human. Let me convert that for you:\", \n", " time.asctime(time.localtime(current_time)), '\\n')\n", "\n", "# Let's have a look at \"time_struct\" object.\n", "current_time_struct = time.localtime(current_time) \n", "print(\"This is the structure returned by 'localtime()' and 'gmtime()':\\n\", current_time_struct, \"\\n\")\n", "\n", "# Let's look at what the epoch is for your system :\n", "print(\"The current Epoch is:\", time.asctime(time.gmtime(0)))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "Let's now use the time module to measure the execution time of some code:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'foofoofoo'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"foo\" * 3" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Benchmark sequence length: 10000\n", "Time method 1 (uses if else): 0.006232023239135742\n", "Time method 1 (uses dict) : 0.006100177764892578\n", "Time ratio: dict method is 1.02 times faster.\n", "\n", "Benchmark sequence length: 100000\n", "Time method 1 (uses if else): 0.02112269401550293\n", "Time method 1 (uses dict) : 0.0071985721588134766\n", "Time ratio: dict method is 2.93 times faster.\n", "\n", "Benchmark sequence length: 1000000\n", "Time method 1 (uses if else): 0.132399320602417\n", "Time method 1 (uses dict) : 0.05555868148803711\n", "Time ratio: dict method is 2.38 times faster.\n", "\n", "Benchmark sequence length: 10000000\n", "Time method 1 (uses if else): 1.3234424591064453\n", "Time method 1 (uses dict) : 0.660947322845459\n", "Time ratio: dict method is 2.0 times faster.\n", "\n", "Benchmark sequence length: 100000000\n", "Time method 1 (uses if else): 12.370341539382935\n", "Time method 1 (uses dict) : 6.035518646240234\n", "Time ratio: dict method is 2.05 times faster.\n", "\n" ] } ], "source": [ "import time \n", "\n", "# Implementation version 1 of the reverse complement function: uses if... else...\n", "def reverse_complement_v1(dna_sequence):\n", " \"\"\"returns the reverse complement of a DNA sequence given as argument\"\"\"\n", " reversed_seq = \"\" \n", " for nucleotide in dna_sequence:\n", " if nucleotide == 'A':\n", " reversed_seq += 'T'\n", " elif nucleotide == 'T':\n", " reversed_seq += 'A'\n", " elif nucleotide == 'G':\n", " reversed_seq += 'C'\n", " elif nucleotide == 'C':\n", " reversed_seq += 'G'\n", " else:\n", " pass\n", " \n", " return reversed_seq[::-1]\n", "\n", "# Implementation version 2 of the reverse complement function: uses a dictionary.\n", "def reverse_complement_v2(dna_sequence):\n", " \"\"\"returns the reverse complement of a DNA sequence given as argument\"\"\"\n", " complement_dict = {'A':'T',\n", " 'T':'A',\n", " 'C':'G',\n", " 'G':'C'}\n", " return ''.join([complement_dict.get(nucleotide, '') for nucleotide in dna_sequence])[::-1]\n", "\n", "\n", "# Let's benchmark our 2 implementations.\n", "#dna_sequence_pattern = \"ATAGAGCGATCGATCCCTAG\"\n", "dna_sequence_pattern = \"CCCCCCCCCCCCCCCCCCCC\"\n", "#dna_sequence_pattern = \"AAAAAAAAAAAAAAAAAAAA\"\n", "\n", "for sequence_length in (1e4, 1e5, 1e6, 1e7, 1e8):\n", " dna_sequence = dna_sequence_pattern * int(sequence_length / 20)\n", "\n", " start_time = time.time()\n", " revcomp_v1 = reverse_complement_v1(dna_sequence)\n", " time_v1 = time.time() - start_time\n", "\n", " start_time = time.time()\n", " revcomp_v2 = reverse_complement_v2(dna_sequence)\n", " time_v2 = time.time() - start_time\n", " if revcomp_v1 != revcomp_v2:\n", " print(\"ERROR: both outputs do not match!\")\n", " \n", " print(\"Benchmark sequence length:\", len(dna_sequence))\n", " print(\"Time method 1 (uses if else):\", time_v1)\n", " print(\"Time method 1 (uses dict) :\", time_v2)\n", " print(\"Time ratio: dict method is\", round(time_v1/time_v2, 2), \"times faster.\\n\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are [many more modules](https://docs.python.org/3/py-modindex.html) integrated to the basic python distribution, including :\n", " * os : interaction with the operating system \n", " * argparse : to manage LINUX-like options for your scripts\n", " * random : \tto generate random numbers with various common distributions\n", " * collections : contains some useful container classes\n", " * itertools : useful iterators. A must-go for combinatorics (eg. permutations, cobinations, ...)\n", " * ..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "## Building your own modules\n", "Building your own module in python is fairly easy.\n", "\n", "### From a regular script\n", "Any python script - i.e. a plain text file with `.py` extension and some python code in it - can be imported as a module into python code.\n", "\n", "The only restriction is that the imported module must either:\n", " * be in the same directory as the code that imports it.\n", " * Have been installed with anaconda: [here's an idea on how to do this](https://stackoverflow.com/questions/49474575/how-to-install-my-own-python-module-package-via-conda-and-watch-its-changes)\n", " * Be in a directory listed in the environment variable `PYTHONPATH` : [windows](https://docs.python.org/3/using/windows.html#excursus-setting-environment-variables) , [UNIX-like](https://stackoverflow.com/a/3402176)\n", " \n", "You can lean more about creating modules in this [python3 module online tutorial](https://docs.python.org/3/tutorial/modules.html).\n", "\n", "### From a Jupyter notebook\n", "Although it is a bit tricky, you can import a jupyter notebook as a module, so that you may re-use the functions you have coded in it. \n", "For instance if you want to import them into another notebook use the `%run` magic command:\n", "```\n", "%run MyOtherNotebook.ipynb\n", "```\n", "\n", "If you want to import them into a classical script, the [import-ipynb](https://pypi.org/project/import-ipynb/) module is what you need." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%run 01_python_basics.ipynb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "## Exercises: 4.2 and 4.3" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }