# Python basics

## Variables
In python, as in many programming language, objects are stored in variables.
* A value is assigned to a variable using the `=` sign. 
* **Warning:** unlike in mathematics, the `=` sign in python is directional: the variable name must always be on the left of the `=`, and the value to assign to the variable on the right.  
  Example:
  ```python
  a = 23    # is a valid assignment.
  8 = b     # is NOT a valid assignment.
  ```

In python, variables names must adhere to these restrictions:
* Variable names must be composed solely of uppercase and lowercase letters (`A-Z`, `a-z`), 
  digits (`0-9`), and the underscore character `_`.
* the first character of a variable name cannot be a digit.
* by convention, variable names starting with a single or double underscore `_`/`__` are reserved 
  for "special" variables (class private attributes, "magic" variables).
* Examples:
    * `var_1` is a valid variable name.
    * `1_var` is **not** a valid name (starts with a digit)
    * `var-1` is **not** a valid name (contains a the non-authorized character `-`)
    * `__var_1__` is valid, but **should not be used**, whith the exception of very specific situations.

> **Pro tip**: using explicit variable names makes your code easier to read for others, and possibly 
  yourself in a not-so-distant future.  
  For instance `input_file` is better than `iptf`, even if it is a bit longer.

In [None]:
myVariable = 35     # assign the value 35 to variable "myVariable".
var_a = 2.3         # assign the value 2.3 to variable "a".
var_b = var_a       # assign to value of "var_a" to "var_b".

# By the way, text located after a "#" character - just like this line - are comments. 
# Comments is text that will not be executed, but is useful for code documentation
print(myVariable)
print(var_a)
print(var_b)

### Code indentation - the importance of white spaces in Python
**Indentation** is the number of white spaces on a given line before the first text element.
```
    |var_1 = 2
    | var_1 = 2
     ^
     The line above is indented by 1 space.
    |  var_1 = 2
     ^^
     The line above is indented by 2 space.
```

* Indentation has a very important meaning in python, as it it used to define "code blocks" 
  (more on that later in the course). 
* When outside of a "code block", there should be no indentation on the line.
* A arong level of indentation will trigger an `IndentationError`.

In [None]:
var_1 = 'abc'    # No indentation -> valid syntax.
 var_1 = 'abc'   # unexpected indentation (i.e. outside of a code block) -> IndentationError
 
     # Comment lines, however, can be indented as you wish.

When assigning a variable, white spaces after the variable name not matter. However the [Python style convention](https://www.python.org/dev/peps/pep-0008/#whitespace-in-expressions-and-statements) is to have **exactly 1 space** on each side of the `=` operator.


In [None]:
var_1 = 'abc'                   # Valid syntax and good style.
var_1           =        'abc'  # Valid syntax, bad style -> please avoid.
print(var_1)

<br>

## Functions
Another very important concept in Python - as in most programming language - are **functions**.
* Functions are a **re-usable blocks of code** that have been given a name and are ready to perform an action.
  How to define your your own functions will be detailed later in this course.
* Functions can be written to perform anything, from the simplest task to the most complex.
* To call a function, one uses its name followed by parentheses `()`, which contain an eventual set of 
  arguments
* **Arguments** are the variables/values that the function uses as input to do its job. 
  In Python, we differentiate between two types of arguments:
    * **positional** arguments:
        * are mandatory.
        * their position in the call to the function is important.
    * **keyword** arguments:
        * are optional and have a **default value**.
        * are passed to the function with the syntax `argument_name=value`.
        * can be passed passed in any order (as long as they are passed after positional arguments).
        * depending on the case, keyword arguments can also be passed without their name being specified.
    * positonal arguments must always be passed **before** keyword arguments.
    
Example of the `print()` function

    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)


In [None]:
print("This", "will", "be", "printed")
print("This", "will", "be", "printed", sep="--")
print("This", "will", "be", "printed", "--")
#print(sep="--", "This", "will", "be", "printed")  # -> raises a SyntaxError

### The help function - your best friend in Python
In python, almost any object or function is extensively documented: what it is, what it does, how to use it, ...  
This information is accessed using the `help()` function, which takes as argument the object we want to get help with.

In [None]:
# Let's try to look up the help page of the print function that we encountered moments ago.
help(print)

It tells us that:
 * `print` is a function.
 * It "Prints the values to a stream, or to sys.stdout by default.". So the function prints the 
   values that are passed to it to the console (or possibly to a file).
 * Its arguments are the things that will be printed.
 * It has 4 optional arguments that refines its use (e.g. `sep` and `end`).

Let's try to apply our new knowledge of the `print` function :

In [None]:
print('test')                # simple usage.
print('test' , 42)           # we can make it print several values. by default, they are separated by spaces.
print('test' , 42 , sep='/') # the "sep" argument can be used to change the separator between values.
print('first line')
print('second line')
print('first line', end='')  # the 'end' argument can be used to modify the character printed at the end of
print('second line')         # each line. It defaults to \n (new line character).

**Don't hesitate to use the `help` function on any object or function to understand how they work.**

<br>

## Reading and understanding errors

Unless you are a perfect human being, your code will contain errors at some point.  
Errors ~~can sometimes be~~ are frustrating, but they are unavoidable, and the best way to correct them is to actually read and try to understand them.

Here is an error example:

In [None]:
var_a = 42
var_b = var_a + 3
print(var_c)

The python error message gives us a number of useful info:
* The first line indicates the **type** of ther error. In our example we got a `NameError`, 
  meaning that a name (of an object) has not been found.  
  If you want to know more about a certain error type, you can use the help function on it: `help(NameError)`.

* The following lines point out the line where the error occured, which is very useful when there are 
  hundreds of lines. Here the error occured on line `3` (pointed-to by the arrow), the line of the 
  `print` statement.

* Finally, we have `NameError: name 'var_c' is not defined`, which points out that we tried to print 
  the variable `var_c` when that variable does not exists (i.e., that name is not defined).

> <span style="color:blue">Arguably, being able to **read and understand errors** and being able to **read the help** accounts for ~50% of "coding skills"...</span>.



#### Micro Exercise:
* look at the error given by the following code. Try to understand it and modify the code accordingly.

In [None]:
42 + "a"

<br>

## Object types: simple types
Everything in python is an objct, and Python divides objects into several categories called **types**.  
There exist plenty of type (is it even common to define your own new type), but there a few very common ones - known as **built-in** types - that you ought to know.
* `bool`: boolean/logical values, either `True` or `False`, like 0 or 1.
* `int` : integer number.
* `float`: floating point number (numbers with a decimal fraction).

To know the type of an object, we can make use of the `type()` function.  

A few comments about types in python:
* Python is (by default\*\*) a **dynamically typed** language (as opposed to **statically typed** 
  languages such as C or C++ e.g.). This means that variables are declared without a specific type, 
  and the type is assigned based on what object is assigned to the variable.  
  This has its advantages (easier and faster to write code) and downsides (e.g. type error bugs can 
  remain hidden for a long time until they are triggered by some unusual input data).
  
* A corollary is that variables in Python are not restricted to a single type and can be reassigned 
  another type of value at any time.
  
 \*\* Starting with python 3.6, it is possible (as an option) to define static types for variables.

In [None]:
# In this example we successively assign different values and types to the variable "a".
# boolean
a = True
print("type of a is:", type(a))

# float
a = 4.2
print("type of a is:", type(a))

# integer
a = 42
print("type of a is:", type(a))
print("type of 42 is:", type(42))

**Type conversion** is (often) fairly easy : juste use the type name as a function.

In [None]:
# Convert a integer to a float:
a = 42
print("type of a before conversion:", type(a))
a = float(a)
print("type of a after conversion:", type(a))

#### Mirco Exercise:
* Convert `a` back to an integer (`int`). Look up the `help` for integers.

<br>

## Operators
Now that we have variables containing objects of a certain **type**, we can begin to play with them using operators.

### Arithmetic operators 
You know most of these already:

In [None]:
print( 3 + 7 )            # + : addition
print( 1.1 - 5 )          # - : substraction
print( 5 / 2 )            # / : division
print( 5 // 2 )           # //: integer division (fractional part is discarded: 2.5 -> 2)
print( 5 * 2 )            # * : multiplication
print( 2 ** 4 )           # **: power
print( 5 % 2 )            # % : modulus (remainder of the division)

# Variables can be used there as well:
x = 4
y = 16 * x**2 - 2 * x + 0.5 
print(y)

Now you can use Python as a fancy calculator!

**Bonus:** when modifying the value of a variable, you can use the following shortcut operators:  
(e.g. useful to increment the value of a variable in a loop)

In [None]:
a = 0
print("The start value of 'a' is", a)

# Same as a = a + 3
a += 3
print("The value of 'a' is now:", a)

# Same as a = a - 1
a -= 1                                 
print("The value of 'a' is now:", a)

# Same as a = a * 3
a *= 3
print("The value of 'a' is now:", a)

# Same as a = a / 2
a /= 2
print("The value of 'a' is now:", a)

<br>

### Comparison operators

These operators return a `bool` value (`True`  or `False`)


In [None]:
a = 5
print("is a equal to 1?:", a == 1)                  # == : equality
print("is a different to 13.37?:", a != 13.37)      # != : inequality
print("is a greater than 5?:", a > 5 )              # >  : larger than
print("is a lower than 10?:", a < 10 )              # <  : lower than
print("is a above 5?:", a >= 5 )                    # <= : lower or equal
print("is a lower than 10?:", a <= 10 )             # >= : larger or equal

**Warning:** comparisons are type-sensitive, so the following expression evaluates to **False**:

In [None]:
a = 5
print("is a equal to '5'?:", a == "5")

Boolean values (the result from a comparison) can be:
* combined using `and` or `or`.
* inversed using `not` (True becomes False and False becomes True).

In [None]:
print("'and' requires both elements to be True:" , True and ( 1 + 1 != 2 ) )
print("'or' requires at least element to be True:" , ( a * 2 > 10 ) or ( a > 0 ) )
print("'not' inverses a boolean value! This is simply", not False)

#### Micro Exercise:
* Compute the product of 348 and 157.2.
* Use a comparison operator to check if the result is larger than 230 square (`230**2`)

<br>

## Object types: container types
These types are object that contain other objects:
* `str`: string - text
* `list`: "mutable" list of python object
* `tuple`: "immutable" list of python object
* `dict`: dictionnary associating 'key' to 'value'

They all have a dedicated `[]` operator that lets user access one - or several - of the object they contain.  
In addition, the number of objects a container has (its length) can be accessed using the `len()` function.

**Important:** in python (unlike e.g. in R), **indexing is zero-based**. This means that the first element of a container type object is accessed with `object[0]`, and not `object[1]`.

### Strings
* In python, the `string` type is a **sequences of characters** that can be used to represent text of any length.
* Strings are represented surrounded by single `'` or double `"` quotes. One can also use triple 
  quotes `"""` to make a multi-line string.

In [None]:
# Both single and double quotes can be used to define a string.
gene_seq = "ATGCGACTGATCGATCGATCGATCGATGATCGATCGATCGATGCTAGCTAC"
name = 'Sir Lancelot of Camelot'

# Triple quotes can be used to define multi-line strings.
long_string = """Let me tell you something, my lad. 
When you’re walking home tonight and some great 
homicidal maniac comes after you with a bunch 
of loganberries, don’t come crying to me!\n"""
print(long_string)

# Special characters are possible in strings.
my_quote = """Gracieux : « aimez-vous à ce point les oiseaux
que paternellement vous vous préoccupâtes
de tendre ce perchoir à leurs petites pattes ? »"""

# We also commonly use special characters, such as:
print('a\tb')  # \t : tabulation
print('a\nb')  # \n : newline

# We can use the len() function to know the length of a string:
print("The length of the string in the 'name' variable is:", len(name))

# NB: strings can be added together and multiplied by an integer:
print( 'dead' + 'parrot' ) 
print( 'spam' * 5 ) 

Because strings are a type of sequence, the different letters of a string can be accessed using the **`[]` operator**, with the index of the desired element.  
Remember that in python, the index of the first element is `[0]`.

In [None]:
my_string = "And now, something completely different."
print("The first element of this string is:", my_string[0] )  # 0 is the index of the 1st element of the string.
print("The 5th element of this string is:", my_string[4] )    # 5th element of the string.
print("The last element of this string is:", my_string[-1] )  # -1 is the index of the last element of the string.

Indices can also be used to retrieve several element at once: this is called a **slice operation** or **slicing**:
* The general syntax of slicing is [start index: excluded end index: step]
* The end index position is **excluded from the slice**.
* The **default step value is 1**, and it is therefore very often omitted.

In [None]:
print(my_string[0:5])   # slice operation: get all elements from index 0 (included) to index 5 (excluded)
print(my_string[:5])    # implicitely slices from the beginning of the string up to (but not included) index 5.
print(my_string[5:])    # implicitely slices until the end of the string.
print(my_string[5::2])  # keep every second letter, starting from index 5 to the end of the string.
print(my_string[::-1])  # goes through the string from end to start -> reverses the string !

#### Micro exercise: 
* create a `str` variable containing your name.
* Extract the last 3 letters from it using slicing.

<br>

### Lists and tuples
Lists and tuples are **sequence type** objects that can contain any type of elements (other objects).  
* Lists are declared by surrounding a comma separated list of objects with `[]`.  
* Tuples are declared similarly, but using `()`.

In [None]:
# Declaring a list.
my_list = [1 , 2 , 3 , 5 , 5.2 , 6.99]
print("my list is:", my_list)

# Declaring a tuple.
my_tuple = ('a' , 4.2 , 5)  # Lists/tuples can contain a mix of different types.
print("my tuple is:", my_tuple)

# Creating a list from a tuple, or a tuple from a list.
another_list = list((1, 2, 3))
another_tuple = tuple([1, 2, 3])
print("my other list is:", another_list)
print("my other tuple is:", another_tuple)

The **`[]` operator** works in much the same way than with strings, and allows **accessing individual objects** from a list/tuple, or **slicing** it:
* as with strings, remember that the end position index is **excluded** from the slicing.

In [None]:
print(my_tuple[0])     # get the 1st item of the list.
print(my_list[2:])     # get all elements from index 2 (i.e. the 3rd element) to the end of the list.

#### Mutability - an important difference between lists and tuples
* A `tuple` is **immutable**: its length is fixed and its elements cannot be changed.
* A `list` is **mutable**: it can be extended, reduced, and its elements can be changed. 

In [None]:
# Changing an element in a list
my_list = [1 , 2 , 3 , 5 , 5.2 , 6.99]
my_list[3] = "Spam"
print(my_list[3])
print(my_list)

In [None]:
# Trying the same with a tuple raises a TypeError:
my_tuple = (1 , 2 , 3 , 5) 
my_tuple[3] = "Spam"

What can be done however, is to assign a new tuple to the same variable - this will *look* line we have modified a tuple, but in fact we have created a new tuple object and assigned it to our variable.

In [None]:
my_tuple = (1 , 2 , 3 , "spam")    # We do not modify an existing tuple: we create a new one.
print(my_tuple)

Remember the `help()` function ? Let's use it to gain a better undertsanding of the lists :

In [None]:
help(list)

That's a lot of information... let's go through it!  
* First we learn that `list` is a class (i.e. a function that can generate objects of a certain type).
  It can thus create objects of type `list`.
* The help page then tells us that lists are `Built-in mutable sequence.`, and describes the behaviour 
  of `list()` if no argument is given (creates an empty list). 
* Then, it says `Methods defined here:`. **Methods** are functions that can be called on objects of the
  class they belong to. This often enable some basic manipulation of objects of that type.  
    * Methods are called using the syntax `object.method(...)`

Let's focus on two methods of the `list` class:
 * `append(self, object, /) `: this method adds an object - given as argument - at the end of the list.
 * `insert(self, index, object, /)`: this method inserts an object - given as the 2nd argument - before 
   the index given as the 1st argument.
 
Let's try out these methods:

In [None]:
my_list = [1 , 2 , 3 , 5]
print("Initially, my list is:", my_list)

# Calling the "append()" method of the my_list list to add an element at the end of it.
my_list.append("ham") 
print("The list, after appending ham is now:", my_list)

# Calling the method insert of my_list to add an element in second position. 
# Remember that python indices start with 0, so inserting before position 1 puts 
# the new object in second position in my_list (and not in the first).
my_list.insert(1 , "beans") 
print("list after insert:", my_list)

Methods are a very important part of python, and provide tons of functionalities to objects. Before you start writing your own code to manipulate an object, **always check** if the object already has a method that does exactly (or nearly) what you want.  
This will save you a lot of time and grief.

### From list to string, and back again ...
Since string variable are iterables (i.e. sequences), they can be converted to lists using the `list()` function:

In [None]:
my_string = "Drop your panties Sir William, I cannot wait till lunchtime."
list_from_string = list(my_string)
print(list_from_string)

As can be seen above, the default behavior is that each letter of the string becomes an element in the list.

However, often we prefer to create a list that contains each word of the string. For this we use the `split()` method of string:
* The `split()` method is very useful when reading formatted text files.
* By default, it splits on white space (i.e. spaces, tabs, newlines).
* It accepts an optional `sep` argument that allows separation of fields using the specified character (look up `help(str.split)` for details).

In [None]:
my_string = "Drop your panties Sir William, I cannot wait till lunchtime."
my_list = my_string.split()
print(my_list)

To convert a list to a string, the `join()` method can be used (which may be seen as the inverse of `split()`).
Somehow counter-intuitively, the `join()` method applies to strings, and takes a list as argument:

In [None]:
# Here, the separator calls the join method which accepts the list "my_words" as argument.
my_string = " ".join(my_words) 
print(my_string)

# One can use a more exotic separator - in fact, any string can be used as separator.
my_string = "_SEP_".join(my_words) 
print(my_string)

# TIP: use an empty separator to just join letters.
my_string = "".join(['to','ba','c','co','ni','st']) 
print(my_string)

**Bonus**: lists can be concatenated with the `+` operator, extended with `+=` (addition assignment) and "multiplied" with `*`:

In [None]:
# Crate a new list by appending two lists.
list_one = [ ',' , 1159 ]
list_two = list_one + [10.1, '45', 7] 
print(list_two)

# Extend a list with the += operator.
# This could also have been written with the += operator:
# list_one += [10.1, '45', 7] 

# As well as multiplication
menu = ['spam', 'eggs'] * 3  
print(menu)

#### Micro Exercise:
* create a list with all integers from 0 to 3 in it.
* Add two numbers at the end of the list.
* Use a slicing operation to select the fourth element in the list.

* **If you have the time:**
    * What is the difference between `list.append()` and `list.extend()`? Try to figure-it out empirically
      by trying to append a list to another list.
    * Why does `print(my_list.append("something"))` print "None"?

<br><br>

### Dictionnaries
Dictionnaries, or `dict`, are containers that associate a **key** to a **value**, just like a real world dictionnary associates a word to its definition.
* Dictionaries are instantiated with the `{key:value}` or `dict()` syntax.
* **keys** must be unique in the dictionnary.
* **values** can appear as many time as desired in the dictionnary.
* the `[]` operator is used to select objects from the dictionnary, but using their key instead of their index.
  ```python
  color_code = {'blue': 23, 'green': 45, 'red': 8}
  color_code['blue']   # returns 23
  color_code['red']    # returns 8
  ```
* Unlike **Lists** or **Tuples**, **Dict** are unordered collections: they do not record element position 
  or order of insertion. Therefore values cannot be retrieved by index position.  
  E.g. `color_code[0]` is not a valid syntax (and will raise a `keyError`), unless there is a key value 
  of "0" in the dict.
* Dictionaries are **mutable** objects: key:value pairs can be added and removed, values can be modified. 

In [None]:
# Create an empty dictionnary.
student_age = dict()
student_age = {}

# Create a dictionnary and directly add values to it.
student_age = {'Anne': 26 , 
               'Viktor': 31 }
student_age = dict(Anne=26 , Viktor=31)

# Adding key:value pairs to an existing dictionary is as easy as:
student_age['Eleonore'] = 5
print('dictionnary:', student_age)

# Modifying the value associated to a key is equally easy:
student_age['Eleonore'] = 25
print('dictionnary:',student_age)

# We are not restricted to a particular type for keys, nor for values. 
# We can e.g. make dict of lists or dict of dict.
student_age[0] = 'zero' 
student_age['group_1'] = [23, 25, 28] 
student_age['group_2'] = {'bob':26, 'alice':27}
print('dictionnary:', student_age)

# Removing objects from the dictionnary is done with the pop() method, look at the help for more details.
student_age.pop('Anne') 
print('dictionnary:',student_age)


<br><br>

## Exercises: 1.1 - 1.4

You can do the additionnal exercises if you have the time.

We recommend you have a look at the `Mutable_vs_immutable.ipynb` notebook to gain a better understanding of the difference between some the objects presented here.
This is an important notation that newcomers to Python need to be aware of, which otherwise can lead to serious bugs in our codes.