# Python Crash Course
This is a small crash course in writing Python for students who have already learned another programming language. A lot of
this material is pulled from the excellent *Whirldwind Tour of Python* by Jake VanderPlas, [available at this link](https://jakevdp.github.io/WhirlwindTourOfPython/).
This crash course was developed for *Mathematics for Machine Learning*, a course taught by Samuel Deng at Columbia University
in Summer 2024 and Summer 2025.


# Jupyter Notebooks
Throughout this course, we will be using *Jupyter Notebooks*, or iPython notebooks. These formatted notebooks allow
us to integrate text, images, and code into a single document while allowing users to interactively execute and write
code into *code cells*. This document is itself a Jupyter notebook, distinguished by the extension `.ipynb`. Online, there
are various platforms to work with `.ipynb` files; a popular choice is [Google Colab](https://colab.research.google.com/).

This tutorial will be focused on details of the Python programming language itself, which can be run interactively within
Jupyter notebooks. To learn more about Jupyter notebooks, visit the page for the [Jupyter Project](https://jupyter.org/), which maintains this document format.


In Python notebooks, you can run a block of code (these are the cells that are colored grey) by pressing the Play button on the right hand side, or by clicking into the cell and pressing *Shift + Enter* on your keyboard. A Jupyter notebook instance has a persistent state from cell to cell, so running cells in a certain order 
will affect behavior as if the code in those cells were ordered in that order.

In [40]:
# assign 5 to x
x = 5

In [42]:
# assign 10 to x
x = 10

In [41]:
print(x)

5


# Basic Python Syntax
In the following code snippet, I'd like to draw your attention to a few features
of the Python syntax. We'll focus on the actual *content* of these statements (or *semantics* in a bit).

- **Assignment.** Variables are assigned without needing to specify a type. Notice that `5`, an integer, is assigned to the variable `midpoint` without specifying its datatype.
- **Comments.** Notice that comments are marked by a `#` symbol. Anything on the line following a `#` is ignored by the Python interpreter.
- **End of line terminates statements.** Notice that in the next line of the script, `midpoint = 5`, there is nothing terminating the line, like a semicolon `;` as in languages such as Java or C. When we terminate a line, that ends the specific statement. Optionally, if you need to put multiple statements in a line, a semicolon `;` can put two statements into a single line.

In [None]:
# Set the variable `midpoint` to 5
midpoint = 5 # this is an inline comment

# make two empty lists
lower = []; upper = []

# the above is functionally equivalent to
lower = []
upper = []

# split the numbers into lower and upper
for i in range(10):
    if (i < midpoint):
        lower.append(i)
    else:
        upper.append(i)

print("lower:", lower)
print("upper:", upper)

lower: [0, 1, 2, 3, 4]
upper: [5, 6, 7, 8, 9]


One important thing to note in the main loop above is that *indentation and whitespace matter*! The following two snippets
produce different results. This first snippet will only execute the indented block, which includes `print(x)`, if `x < 4` is True.

In [5]:
x = 2
if x < 4:
    y = x * 2
    print(x)

2


In this block, however, `x` will be printed regardless of its value because the `print` statement is indented *outside* of the
code block!

In [4]:
x = 5
if x < 4:
    y = x * 2
print(x)

5


Whitespace does not matter *within* a line, however.

In [8]:
x = 10     **     2
print(x)

x = 10**2
print(x)

100
100


# Basic Python Semantics: Variables and Objects
## Variables
To assign a variable in Python, just put the variable name to to the left of an equals `=` sign. In Python, variable
assignment defines *pointers* to data.

In the C programming language, our mental model might be that variables are "buckets" we put data in. For example, writing `int x = 4` in
C means that we create a "bucket" named `x` that can store integers, and we put the value `4` in that bucket. 

In [9]:
# assign 4 to the variable x
x = 4

However, in Python, we define *pointers* with variables. The variable `x` in Python points to some other bucket that
contains the value `4`. Because of this, Python variables just point to various objects, so there is no need to "declare"
the variable or always require the variable to point to data of the same type. We say that Python is *dynamically-typed* 
because of this property. Variable names can point to objects of any type.

In [10]:
x = 1               # x is an integer
x = 'hello'         # x is a string now
x = [1, 2, 3]       # x is a list now

A consequence of this property is that if two variable names point to the same
*mutable* object, then changing one will change the other as well. For example, 
create and modify a list:

In [11]:
x = [1, 2, 3]
y = x

The above snippet creates tow variables `x` and `y` that are both pointing at the
same object. Because of this, if we modify the list via either of its names, the "other" list
is modified as well.

In [13]:
x = [1, 2, 3]
y = x

print(y)

x.append(4)
print(y)

[1, 2, 3]
[1, 2, 3, 4]


Note that if we use the `=` operator to assign another value to `x`, this will
not affect the value of `y`. Assignment just changes the object that the variable
points to.

In [14]:
x = 'something else'
print(y)    # y is unchanged

[1, 2, 3, 4]


## Objects
In Python, *everything* is an object, in the "object-oriented programming" sense.

Although variables in Python are pointers to data, the data themselves are typed.

In [15]:
x = 4
type(x)

int

In [16]:
x = 'hello'
type(x)

str

In [17]:
x = 3.14159
type(x)

float

Recall that, in object-oriented programming, an *object* is an entity that contains
data along with associated metadata or functionality. Everything is an object in 
Python, so every entity has some metadata (called its *attributes*) and associated
functionality (called its *methods*). These are both accessed through the dot `.` syntax.

For example, lists have an `append` method, which adds an item to the list. Methods are accessed through 
parentheses, within which we put the method's arguments.

In [18]:
L = [1, 2, 3]
L.append(100)
print(L)

[1, 2, 3, 100]


Even numerical types have a `real` and `imag` attribute that returns the real and imaginary
part of the value, if viewed as a complex number. These are accessed without parentheses.

In [24]:
x = 4.5
print(x.real)
print(x.imag)

4.5
0.0


# Basic Python Semantics: Operators
There are several basic binary arithmetic operators in Python, summarized in this table:


| Operator     | Name           | Description                                            |
|--------------|----------------|--------------------------------------------------------|
| ``a + b``    | Addition       | Sum of ``a`` and ``b``                                 |
| ``a - b``    | Subtraction    | Difference of ``a`` and ``b``                          |
| ``a * b``    | Multiplication | Product of ``a`` and ``b``                             |
| ``a / b``    | True division  | Quotient of ``a`` and ``b``                            |
| ``a // b``   | Floor division | Quotient of ``a`` and ``b``, removing fractional parts |
| ``a % b``    | Modulus        | Integer remainder after division of ``a`` by ``b``     |
| ``a ** b``   | Exponentiation | ``a`` raised to the power of ``b``                     |
| ``-a``       | Negation       | The negative of ``a``                                  |
| ``+a``       | Unary plus     | ``a`` unchanged (rarely used)                          |

We can combine and use them in the typical ways you might expect, including parentheses to group operations.

In [25]:
# addition, subtraction, multiplication
(4 + 8) * (6.5 - 3)

42.0

Bitwise operations also exist, but we won't cover them here. Take a look at [The Whirlwind Tour of Python: Operators](https://jakevdp.github.io/WhirlwindTourOfPython/04-semantics-operators.html) for the full list of bitwise operations.

We have already seen assignment operations, which assign data to variables to store for later use. The `=` operator assigns
data to variables.

In [26]:
a = 24
print(a)

24


Updating and assigning a variable with a new value all in one line can be written with
built-in update operators.

In [27]:
a += 2 # Equivalent to a = a + 2
print(a)

26


There is a corresponding "update" operator for each of the above arithmetic operators.

In [28]:
# Some more update operators
a -= 5
print(a)
a *= 2
print(a)
a /= 2
print(a)

21
42
21.0


We often need to compare different values in programming. For this, Python
implements standard comparison operators that return Boolean values `True` or `False`.
These are listed in the following table:

| Operation     | Description                       |
|---------------|-----------------------------------|
| ``a == b``    | ``a`` equal to ``b``              |
| ``a < b``     | ``a`` less than ``b``             |
| ``a <= b``    | ``a`` less than or equal to ``b`` |
| ``a > b``     | ``a`` greater than ``b``             |
| ``a >= b``    | ``a`` greater than or equal to ``b`` |
| ``a != b``    | ``a`` not equal to ``b``             |

Here are some examples of these operators.

In [31]:
# check if a is less than b
a = 30
b = 10
a < b

False

In [None]:
# 25 is odd
25 % 2 == 1

True

In [30]:
# Check if a is between 15 and 30
a = 25
15 < a < 30

True

When working with Boolean values, Python provides operators to combine these
values using standard concepts of "and", "or", and "not." To use these, the 
operators are the actual words `and`, `or`, and `not`.

In [32]:
x = 4
(x < 6) and (x > 2)

True

In [34]:
x = 2
not (x < 6)

False

In [35]:
x = 8
(x > 10) or (x % 2 == 0)

True

Python also provides **identity operators** to check for object identity. Object
identity is different from *equality.*

In [36]:
a = [1, 2, 3]
b = [1, 2, 3]

a == b

True

In [37]:
a is b

False

In [38]:
a is not b

True

Notice that, because variables are pointers in Python, the above three snippets make sense
because variables `a` and `b` point to *different objects*. Therefore the `is` operator checks
if two variables are pointing at the same object; the `==` operator checks for equality of (potentially different)
objects. The following code shows what the `is` operator actually sees as identical objects.

In [39]:
a = [1, 2, 3]
b = a
a is b

True

For compound objects such as lists in Python, **membership operators** `in` and `not in` can output a Boolean value
for whether an object is in the object.

In [43]:
1 in [1, 2, 3]

True

In [44]:
2 not in [1, 2, 3]

False

There is no need to construct a loop over the list to check whether some object is contained in a list; this is all
shortcutted with the `in` and `not in` operators.

# Simple Types in Python
When discussing Python variables and objects, we mentioned that all Python objects have some `type` information attached.
We summarize these in the following table:

| Type        | Example        | Description                                                  |
|-------------|----------------|--------------------------------------------------------------|
| ``int``     | ``x = 1``      | integers (i.e., whole numbers)                               |
| ``float``   | ``x = 1.0``    | floating-point numbers (i.e., real numbers)                  |
| ``complex`` | ``x = 1 + 2j`` | Complex numbers (i.e., numbers with real and imaginary part) |
| ``bool``    | ``x = True``   | Boolean: True/False values                                   |
| ``str``     | ``x = 'abc'``  | String: characters or text                                   |
| ``NoneType``| ``x = None``   | Special object indicating nulls                              |

## Integer
An integer is the most basic numerical type. Any number without a decimal point
is automatically an `int`.

In [45]:
x = 1
type(x)

int

A convenient feature of Python is that division up-casts integers to floating-point types.

In [46]:
5 / 2

2.5

## Floating-Point Numbers
Floating-point type numbers can store fractional numbers. These can be defined either
in standard decimal notation or in exponential notation. The exponential notation `e` can be
read as "...times ten to the..." so that `1.4e6` is interpreted as $1.4 \times 10^6$.

In [47]:
x = 0.000005
y = 5e-6
print(x == y)

True


In [48]:
x = 1400000.00
y = 1.4e6
print(x == y)

True


Any integer can be converted into a floating-point number using the `float()` constructor.

In [49]:
float(1)

1.0

## String
Strings in Python are created with single or double quotes.

In [50]:
message = "what do you like?"
response = 'spam'

Python has many very useful string functions and methods. Here are a few.

In [51]:
# length of string
len(response)

4

In [53]:
# make upper case (see also: str.lower())
response.upper()

'SPAM'

In [54]:
# Capitalize a string
message.capitalize()

'What do you like?'

In [55]:
# Concatenation with + operator
message + response

'what do you like?spam'

In [56]:
# multiplication is multiple concatenation
5 * response

'spamspamspamspamspam'

In [57]:
# Access individual characters (zero-based indexing)
message[0]

'w'

## None Type
Python includes a special type, the `NoneType`, which has only a single possible value, `None`.

In [58]:
type(None)

NoneType

You'll see that `None` is used in many places, but, perhaps most commonly, it is used as the default return value
of a function. For example, the `print()` function in Python does not return anything, but we can still catch its return
value.


In [60]:
return_value = print("abc")
print(return_value)

abc
None


So any function in Python that doesn't return anything really returns a `None` object.

## Boolean
The Boolean type is a simple type with two possible values: `True` or `False`. Note the capitalization.
It is returned by comparison operators.

In [61]:
result = (4 < 5)
result

True

In [62]:
type(result)

bool

The Boolean values are case-sensitive: unlike other languages, `True` and `False` must be capitalized!

In Python, Booleans can also be constructed with a `bool()` object constructor. Values of any other type can be 
converted to Boolean via predictable rules. For example, any numeric type is `False` if it is equal to zero; otherwise, it is `True`.

In [63]:
bool(2014)

True

In [64]:
bool(0)

False

For strings, `bool(s)` is `False` for empty strings and `True` otherwise.

In [65]:
bool("")

False

In [66]:
bool("abc")

True

# Compound Types in Python
We have seen Python's simple types: ``int``, ``float``, ``complex``, ``bool``, ``str``, and so on.
Python also has several built-in compound types, which act as containers for other types.
These compound types are:

| Type Name | Example                   |Description                            |
|-----------|---------------------------|---------------------------------------|
| ``list``  | ``[1, 2, 3]``             | Ordered collection                    |
| ``tuple`` | ``(1, 2, 3)``             | Immutable ordered collection          |
| ``dict``  | ``{'a':1, 'b':2, 'c':3}`` | Unordered (key,value) mapping         |
| ``set``   | ``{1, 2, 3}``             | Unordered collection of unique values |

As you can see, round, square, and curly brackets have distinct meanings when it comes to the type of collection produced.
We'll take a quick tour of these data structures here.

## Lists
Lists are the basic *ordered* and *mutable* data collection type in Python.
They can be defined with comma-separated values between square brackets; for example, here is a list of the first several prime numbers:

In [67]:
L = [2, 3, 5, 7]

Lists have a number of useful properties and methods available to them.
Here we'll take a quick look at some of the more common and useful ones:

In [68]:
# Length of a list
len(L)

4

In [69]:
# Append a value to the end
L.append(11)
L

[2, 3, 5, 7, 11]

In [70]:
# Addition concatenates lists
L + [13, 17, 19]

[2, 3, 5, 7, 11, 13, 17, 19]

In [71]:
# sort() method sorts in-place
L = [2, 5, 1, 6, 3, 4]
L.sort()
L

[1, 2, 3, 4, 5, 6]

In addition, there are many more built-in list methods; they are well-covered in Python's [online documentation](https://docs.python.org/3/tutorial/datastructures.html).

While we've been demonstrating lists containing values of a single type, one of the powerful features of Python's compound objects is that they can contain objects of *any* type, or even a mix of types. For example:

In [72]:
L = [1, 'two', 3.14, [0, 3, 5]]

This flexibility is a consequence of Python's dynamic type system.
Creating such a mixed sequence in a statically-typed language like C can be much more of a headache!
We see that lists can even contain other lists as elements.
Such type flexibility is an essential piece of what makes Python code relatively quick and easy to write.

So far we've been considering manipulations of lists as a whole; another essential piece is the accessing of individual elements.
This is done in Python via *indexing* and *slicing*, which we'll explore next.

### List indexing and slicing
Python provides access to elements in compound types through *indexing* for single elements, and *slicing* for multiple elements.
As we'll see, both are indicated by a square-bracket syntax.
Suppose we return to our list of the first several primes:

In [73]:
L = [2, 3, 5, 7, 11]

Python uses *zero-based* indexing, so we can access the first and second element in using the following syntax:

In [74]:
L[0]

2

In [75]:
L[1]

3

Elements at the end of the list can be accessed with negative numbers, starting from -1:

In [76]:
L[-1]

11

In [77]:
L[-2]

7

Where *indexing* is a means of fetching a single value from the list, *slicing* is a means of accessing multiple values in sub-lists.
It uses a colon to indicate the start point (inclusive) and end point (non-inclusive) of the sub-array.
For example, to get the first three elements of the list, we can write:

In [78]:
L[0:3]

[2, 3, 5]

Leaving out the first index means that `0` is assumed, so we can just write:

In [79]:
L[:3]

[2, 3, 5]

Similarly, if we leave out the last index, it defaults to the length of the list.
Thus, the last three elements can be accessed as follows:

In [80]:
L[-3:]

[5, 7, 11]

Finally, it is possible to specify a third integer that represents the step size; for example, to select every second element of the list, we can write:

In [81]:
L[::2]  # equivalent to L[0:len(L):2]

[2, 5, 11]

Both indexing and slicing can be used to set elements as well as access them.
The syntax is as you would expect:

In [82]:
L[0] = 100
print(L)

[100, 3, 5, 7, 11]


In [83]:
L[1:3] = [55, 56]
print(L)

[100, 55, 56, 7, 11]


## Tuples
Tuples are in many ways similar to lists, but they are defined with parentheses rather than square brackets:

In [84]:
t = (1, 2, 3)

Like the lists discussed before, tuples have a length, and individual elements can be extracted using square-bracket indexing:

In [85]:
len(t)

3

In [86]:
t[0]

1

The main distinguishing feature of tuples is that they are *immutable*: this means that once they are created, their size and contents cannot be changed:

In [87]:
t[1] = 4

TypeError: 'tuple' object does not support item assignment

In [88]:
t.append(4)

AttributeError: 'tuple' object has no attribute 'append'

Tuples are often used in a Python program; a particularly common case is in functions that have multiple return values.
For example, the ``as_integer_ratio()`` method of floating-point objects returns a numerator and a denominator; this dual return value comes in the form of a tuple:

In [89]:
x = 0.125
x.as_integer_ratio()

(1, 8)

These multiple return values can be individually assigned as follows:

In [90]:
numerator, denominator = x.as_integer_ratio()
print(numerator / denominator)

0.125


## Dictionaries
Dictionaries are extremely flexible mappings of keys to values, and form the basis of much of Python's internal implementation.
They can be created via a comma-separated list of ``key:value`` pairs within curly braces:

In [91]:
numbers = {'one':1, 'two':2, 'three':3}

Items are accessed and set via the indexing syntax used for lists and tuples, except here the index is not a zero-based order but valid key in the dictionary:

In [92]:
# Access a value via the key
numbers['two']

2

New items can be added to the dictionary using indexing as well:

In [93]:
# Set a new key:value pair
numbers['ninety'] = 90
print(numbers)

{'one': 1, 'two': 2, 'three': 3, 'ninety': 90}


Keep in mind that dictionaries do not maintain any sense of order for the input parameters; this is by design.
This lack of ordering allows dictionaries to be implemented very efficiently, so that random element access is very fast, regardless of the size of the dictionary (if you're curious how this works, read about the concept of a *hash table*).
The [python documentation](https://docs.python.org/3/library/stdtypes.html) has a complete list of the methods available for dictionaries.

## Sets

The fourth basic collection is the set, which contains unordered collections of unique items.
They are defined much like lists and tuples, except they use the curly brackets of dictionaries:

In [94]:
primes = {2, 3, 5, 7}
odds = {1, 3, 5, 7, 9}

If you're familiar with the mathematics of sets, you'll be familiar with operations like the union, intersection, difference, symmetric difference, and others.
Python's sets have all of these operations built-in, via methods or operators.
For each, we'll show the two equivalent methods:

In [96]:
# union: items appearing in either
primes | odds      # with an operator
primes.union(odds) # equivalently with a method

{1, 2, 3, 5, 7, 9}

In [97]:
# intersection: items appearing in both
primes & odds             # with an operator
primes.intersection(odds) # equivalently with a method

{3, 5, 7}

In [98]:
# difference: items in primes but not in odds
primes - odds           # with an operator
primes.difference(odds) # equivalently with a method

{2}

In [99]:
# symmetric difference: items appearing in only one set
primes ^ odds                     # with an operator
primes.symmetric_difference(odds) # equivalently with a method

{1, 2, 9}

# Control Flow in Python
Here we'll cover *conditional statements* (including "``if``", "``elif``", and "``else``"), *loop statements* (including "``for``" and "``while``" and the accompanying "``break``", "``continue``", and "``pass``").

## Conditional Statements: ``if``-``elif``-``else``:
Conditional statements, often referred to as *if-then* statements, allow the programmer to execute certain pieces of code depending on some Boolean condition.
A basic example of a Python conditional statement is this:

In [100]:
x = -15

if x == 0:
    print(x, "is zero")
elif x > 0:
    print(x, "is positive")
elif x < 0:
    print(x, "is negative")
else:
    print(x, "is unlike anything I've ever seen...")

-15 is negative


Note especially the use of colons (``:``) and whitespace to denote separate blocks of code.

Python adopts the ``if`` and ``else`` often used in other languages; its more unique keyword is ``elif``, a contraction of "else if".
In these conditional clauses, ``elif`` and ``else`` blocks are optional; additionally, you can optinally include as few or as many ``elif`` statements as you would like.

## ``for`` loops
Loops in Python are a way to repeatedly execute some code statement.
So, for example, if we'd like to print each of the items in a list, we can use a ``for`` loop:

In [101]:
for N in [2, 3, 5, 7]:
    print(N, end=' ') # print all on same line

2 3 5 7 

Notice the simplicity of the ``for`` loop: we specify the variable we want to use, the sequence we want to loop over, and use the "``in``" operator to link them together in an intuitive and readable way.
More precisely, the object to the right of the "``in``" can be any Python *iterator*.
An iterator can be thought of as a generalized sequence, and we'll discuss them in [Iterators](10-Iterators.ipynb).

For example, one of the most commonly-used iterators in Python is the ``range`` object, which generates a sequence of numbers:

In [102]:
for i in range(10):
    print(i, end=' ')

0 1 2 3 4 5 6 7 8 9 

Note that the range starts at zero by default, and that by convention the top of the range is not included in the output.
Range objects can also have more complicated values:

In [103]:
# range from 5 to 10
list(range(5, 10))

[5, 6, 7, 8, 9]

## ``while`` loops
The other type of loop in Python is a ``while`` loop, which iterates until some condition is met:

In [104]:
i = 0
while i < 10:
    print(i, end=' ')
    i += 1

0 1 2 3 4 5 6 7 8 9 

The argument of the ``while`` loop is evaluated as a boolean statement, and the loop is executed until the statement evaluates to False.

## ``break`` and ``continue``: Fine-Tuning Your Loops
There are two useful statements that can be used within loops to fine-tune how they are executed:

- The ``break`` statement breaks-out of the loop entirely
- The ``continue`` statement skips the remainder of the current loop, and goes to the next iteration

These can be used in both ``for`` and ``while`` loops.

Here is an example of using ``continue`` to print a string of odd numbers.
In this case, the result could be accomplished just as well with an ``if-else`` statement, but sometimes the ``continue`` statement can be a more convenient way to express the idea you have in mind:

In [105]:
for n in range(20):
    # if the remainder of n / 2 is 0, skip the rest of the loop
    if n % 2 == 0:
        continue
    print(n, end=' ')

1 3 5 7 9 11 13 15 17 19 

Here is an example of a ``break`` statement used for a less trivial task.
This loop will fill a list with all Fibonacci numbers up to a certain value:

In [107]:
a, b = 0, 1
amax = 100
L = []

while True:
    (a, b) = (b, a + b)
    if a > amax:
        break
    L.append(a)

print(L)

[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]


Notice that we use a ``while True`` loop, which will loop forever unless we have a break statement!

# Defining and Using Functions

So far, our scripts have been simple, single-use code blocks.
One way to organize our Python code and to make it more readable and reusable is to factor-out useful pieces into reusable *functions*.
Here we'll cover two ways of creating functions: the ``def`` statement, useful for any type of function, and the ``lambda`` statement, useful for creating short anonymous functions.

## Using Functions

Functions are groups of code that have a name, and can be called using parentheses.
We've seen functions before. For example, ``print`` is a function:

In [109]:
print('abc')

abc


Here ``print`` is the function name, and ``'abc'`` is the function's *argument*.

In addition to arguments, there are *keyword arguments* that are specified by name.
One available keyword argument for the ``print()`` function (in Python 3) is ``sep``, which tells what character or characters should be used to separate multiple items:

In [110]:
print(1, 2, 3)

1 2 3


In [111]:
print(1, 2, 3, sep='--')

1--2--3


When non-keyword arguments are used together with keyword arguments, the keyword arguments must come at the end.

## Defining Functions
Functions become even more useful when we begin to define our own, organizing functionality to be used in multiple places.
In Python, functions are defined with the ``def`` statement.
This following function generates the Fibonacci sequence up to `N`.

In [112]:
def fibonacci(N):
    L = []
    a, b = 0, 1
    while len(L) < N:
        a, b = b, a + b
        L.append(a)
    return L

In [114]:
fibonacci(10)

[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

Functions can return multiple values in a tuple, which is indicated by commas:

In [116]:
def empty_and_length(L):
    return (len(L) == 0), len(L)

L = [1, 2, 3]
empty_and_length(L)

(False, 3)

## Default Argument Values

Often when defining a function, there are certain values that we want the function to use *most* of the time, but we'd also like to give the user some flexibility.
In this case, we can use *default values* for arguments.
Consider the ``fibonacci`` function from before.
What if we would like the user to be able to play with the starting values?
We could do that as follows:

In [117]:
def fibonacci(N, a=0, b=1):
    L = []
    while len(L) < N:
        a, b = b, a + b
        L.append(a)
    return L

With a single argument, the result of the function call is identical to before:

In [118]:
fibonacci(10)

[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

But now we can use the function to explore new things, such as the effect of new starting values:

In [119]:
fibonacci(10, 0, 2)

[2, 2, 4, 6, 10, 16, 26, 42, 68, 110]

## ``*args`` and ``**kwargs``: Flexible Arguments
Sometimes you might wish to write a function in which you don't initially know how many arguments the user will pass.
In this case, you can use the special form ``*args`` and ``**kwargs`` to catch all arguments that are passed.
Here is an example:

In [120]:
def catch_all(*args, **kwargs):
    print("args =", args)
    print("kwargs = ", kwargs)

In [121]:
catch_all(1, 2, 3, a=4, b=5)

args = (1, 2, 3)
kwargs =  {'a': 4, 'b': 5}


In [122]:
catch_all('a', keyword=2)

args = ('a',)
kwargs =  {'keyword': 2}


Here it is not the names ``args`` and ``kwargs`` that are important, but the ``*`` characters preceding them.
``args`` and ``kwargs`` are just the variable names often used by convention, short for "arguments" and "keyword arguments".
The operative difference is the asterisk characters: a single ``*`` before a variable means "expand this as a sequence", while a double ``**`` before a variable means "expand this as a dictionary".

# Errors and Exceptions
No matter your skill as a programmer, you will eventually make a coding mistake.
Such mistakes come in three basic flavors:

- *Syntax errors:* Errors where the code is not valid Python (generally easy to fix)
- *Runtime errors:* Errors where syntactically valid code fails to execute, perhaps due to invalid user input (sometimes easy to fix)
- *Semantic errors:* Errors in logic: code executes without a problem, but the result is not what you expect (often very difficult to track-down and fix)

Here we're going to focus on how to deal cleanly with *runtime errors*.
As we'll see, Python handles runtime errors via its *exception handling* framework.

## Runtime Errors

If you've done any coding in Python, you've likely come across runtime errors.
They can happen in a lot of ways.

For example, if you try to reference an undefined variable:

In [123]:
print(Q)

NameError: name 'Q' is not defined

Or maybe you're trying to access a sequence element that doesn't exist:

In [124]:
L = [1, 2, 3]
L[1000]

IndexError: list index out of range

Note that in each case, Python is kind enough to not simply indicate that an error happened, but to spit out a *meaningful* exception that includes information about what exactly went wrong, along with the exact line of code where the error happened.
Having access to meaningful errors like this is immensely useful when trying to trace the root of problems in your code.

## Catching Exceptions: ``try`` and ``except``
The main tool Python gives you for handling runtime exceptions is the ``try``...``except`` clause.
Its basic structure is this:

In [125]:
try:
    print("this gets executed first")
except:
    print("this gets executed only if there is an error")

this gets executed first


Note that the second block here did not get executed: this is because the first block did not return an error.
Let's put a problematic statement in the ``try`` block and see what happens:

In [126]:
try:
    print("let's try something:")
    x = 1 / 0 # ZeroDivisionError
except:
    print("something bad happened!")

let's try something:
something bad happened!


Here we see that when the error was raised in the ``try`` statement (in this case, a ``ZeroDivisionError``), the error was caught, and the ``except`` statement was executed.

One way this is often used is to check user input within a function or another piece of code.
For example, we might wish to have a function that catches zero-division and returns some other value, perhaps a suitably large number like $10^{100}$:

In [127]:
def safe_divide(a, b):
    try:
        return a / b
    except:
        return 1E100

In [128]:
safe_divide(1, 2)

0.5

In [129]:
safe_divide(2, 0)

1e+100

There is a subtle problem with this code, though: what happens when another type of exception comes up? For example, this is probably not what we intended:

In [130]:
safe_divide (1, '2')

1e+100

Dividing an integer and a string raises a ``TypeError``, which our over-zealous code caught and assumed was a ``ZeroDivisionError``!
For this reason, it's nearly always a better idea to catch exceptions *explicitly*:

In [131]:
def safe_divide(a, b):
    try:
        return a / b
    except ZeroDivisionError:
        return 1E100

In [132]:
safe_divide(1, 0)

1e+100

In [133]:
safe_divide(1, '2')

TypeError: unsupported operand type(s) for /: 'int' and 'str'

## Raising Exceptions: ``raise``
We've seen how valuable it is to have informative exceptions when using parts of the Python language.
It's equally valuable to make use of informative exceptions within the code you write, so that users of your code (foremost yourself!) can figure out what caused their errors.

The way you raise your own exceptions is with the ``raise`` statement. For example:

In [134]:
raise RuntimeError("my error message")

RuntimeError: my error message

In [135]:
def fibonacci(N):
    L = []
    a, b = 0, 1
    while len(L) < N:
        a, b = b, a + b
        L.append(a)
    return L

One potential problem here is that the input value could be negative.
This will not currently cause any error in our function, but we might want to let the user know that a negative ``N`` is not supported.
Errors stemming from invalid parameter values, by convention, lead to a ``ValueError`` being raised:

In [136]:
def fibonacci(N):
    if N < 0:
        raise ValueError("N must be non-negative")
    L = []
    a, b = 0, 1
    while len(L) < N:
        a, b = b, a + b
        L.append(a)
    return L

In [137]:
fibonacci(10)

[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

In [138]:
fibonacci(-10)

ValueError: N must be non-negative

Now the user knows exactly why the input is invalid, and could even use a ``try``...``except`` block to handle it!

In [139]:
N = -10
try:
    print("trying this...")
    print(fibonacci(N))
except ValueError:
    print("Bad value: need to do something else")

trying this...
Bad value: need to do something else


# Iterators
Often an important piece of data analysis is repeating a similar calculation, over and over, in an automated fashion.
For example, you may have a table of a names that you'd like to split into first and last, or perhaps of dates that you'd like to convert to some standard format.
One of Python's answers to this is the *iterator* syntax.
We've seen this already with the ``range`` iterator:

In [140]:
for i in range(10):
    print(i, end=' ')

0 1 2 3 4 5 6 7 8 9 

Here we're going to dig a bit deeper.
It turns out that in Python, ``range`` is not a list, but is something called an *iterator*, and learning how it works is key to understanding a wide class of very useful Python functionality.

## Iterating over lists
Iterators are perhaps most easily understood in the concrete case of iterating through a list.
Consider the following:

In [141]:
for value in [2, 4, 6, 8, 10]:
    # do some operation
    print(value + 1, end=' ')

3 5 7 9 11 

The familiar "``for x in y``" syntax allows us to repeat some operation for each value in the list.
The fact that the syntax of the code is so close to its English description ("*for [each] value in [the] list*") is just one of the syntactic choices that makes Python such an intuitive language to learn and use.

But the face-value behavior is not what's *really* happening.
When you write something like "``for val in L``", the Python interpreter checks whether it has an *iterator* interface, which you can check yourself with the built-in ``iter`` function:

In [142]:
iter([2, 4, 6, 8, 10])

<list_iterator at 0x1069de020>

It is this iterator object that provides the functionality required by the ``for`` loop.
The ``iter`` object is a container that gives you access to the next object for as long as it's valid, which can be seen with the built-in function ``next``:

In [143]:
I = iter([2, 4, 6, 8, 10])

In [144]:
print(next(I))

2


In [145]:
print(next(I))

4


In [146]:
print(next(I))

6


## ``range()``: A List Is Not Always a List
Perhaps the most common example of this indirect iteration is the ``range()`` function in Python, which returns not a list, but a special ``range()`` object:

In [147]:
range(10)

range(0, 10)

``range``, like a list, exposes an iterator:

In [148]:
iter(range(10))

<range_iterator at 0x1069de520>

So Python knows to treat it *as if* it's a list:

In [149]:
for i in range(10):
    print(i, end=' ')

0 1 2 3 4 5 6 7 8 9 

The benefit of the iterator indirection is that *the full list is never explicitly created!*
We can see this by doing a range calculation that would overwhelm our system memory if we actually instantiated it.

In [150]:
N = 10 ** 12
for i in range(N):
    if i >= 10: break
    print(i, end=', ')

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 

If ``range`` were to actually create that list of one trillion values, it would occupy tens of terabytes of machine memory: a waste, given the fact that we're ignoring all but the first 10 values!

In fact, there's no reason that iterators ever have to end at all!
Python's ``itertools`` library contains a ``count`` function that acts as an infinite range:

In [151]:
from itertools import count

for i in count():
    if i >= 10:
        break
    print(i, end=', ')

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 

Had we not thrown-in a loop break here, it would go on happily counting until the process is manually interrupted or killed (using, for example, ``ctrl-C``).

## Useful Iterators
This iterator syntax is used nearly universally in Python built-in types as well as the more data science-specific objects we'll explore in later sections.
Here we'll cover some of the more useful iterators in the Python language:

### ``enumerate``
Often you need to iterate not only the values in an array, but also keep track of the index.
You might be tempted to do things this way:

In [152]:
L = [2, 4, 6, 8, 10]
for i in range(len(L)):
    print(i, L[i])

0 2
1 4
2 6
3 8
4 10


Although this does work, Python provides a cleaner syntax using the ``enumerate`` iterator:

In [153]:
for i, val in enumerate(L):
    print(i, val)

0 2
1 4
2 6
3 8
4 10


### ``zip``
Other times, you may have multiple lists that you want to iterate over simultaneously.
You could certainly iterate over the index as in the non-Pythonic example we looked at previously, but it is better to use the ``zip`` iterator, which zips together iterables:

In [154]:
L = [2, 4, 6, 8, 10]
R = [3, 6, 9, 12, 15]
for lval, rval in zip(L, R):
    print(lval, rval)

2 3
4 6
6 9
8 12
10 15


Any number of iterables can be zipped together, and if they are different lengths, the shortest will determine the length of the ``zip``.

### ``map`` and ``filter``
The ``map`` iterator takes a function and applies it to the values in an iterator:

In [155]:
# find the first 10 square numbers
square = lambda x: x ** 2
for val in map(square, range(10)):
    print(val, end=' ')

0 1 4 9 16 25 36 49 64 81 

The ``filter`` iterator looks similar, except it only passes-through values for which the filter function evaluates to True:

In [156]:
# find values up to 10 for which x % 2 is zero
is_even = lambda x: x % 2 == 0
for val in filter(is_even, range(10)):
    print(val, end=' ')

0 2 4 6 8 

# List Comprehensions
If you read enough Python code, you'll eventually come across the terse and efficient construction known as a *list comprehension*.
This is one feature of Python I expect you will fall in love with if you've not used it before; it looks something like this:

In [157]:
[i for i in range(20) if i % 3 > 0]

[1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19]

The result of this is a list of numbers which excludes multiples of 3.
While this example may seem a bit confusing at first, as familiarity with Python grows, reading and writing list comprehensions will become second nature.

## Basic List Comprehensions
List comprehensions are simply a way to compress a list-building for-loop into a single short, readable line.
For example, here is a loop that constructs a list of the first 12 square integers:

In [158]:
L = []
for n in range(12):
    L.append(n ** 2)
L

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]

The list comprehension equivalent of this is the following:

In [159]:
[n ** 2 for n in range(12)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]

This basic syntax, then, is ``[``*``expr``* ``for`` *``var``* ``in`` *``iterable``*``]``, where *``expr``* is any valid expression, *``var``* is a variable name, and *``iterable``* is any iterable Python object.

## Conditionals on the Iterator
You can further control the iteration by adding a conditional to the end of the expression.
In the first example of the section, we iterated over all numbers from 1 to 20, but left-out multiples of 3.
Look at this again, and notice the construction:

In [160]:
[val for val in range(20) if val % 3 > 0]

[1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19]

The expression ``(i % 3 > 0)`` evaluates to ``True`` unless ``val`` is divisible by 3.
Again, the English language meaning can be immediately read off: "Construct a list of values for each value up to 20, but only if the value is not divisible by 3".
Once you are comfortable with it, this is much easier to write – and to understand at a glance – than the equivalent loop syntax:

In [161]:
L = []
for val in range(20):
    if val % 3:
        L.append(val)
L

[1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19]

## Conditionals on the Value
If you've programmed in C, you might be familiar with the single-line conditional enabled by the ``?`` operator:
``` C
int absval = (val < 0) ? -val : val
```
Python has something very similar to this, which is most often used within list comprehensions, where a simple expression is desired:

In [162]:
val = -10
val if val >= 0 else -val

10

# Modules and Packages
One feature of Python that makes it useful for a wide range of tasks is the fact that it comes "batteries included" – that is, the Python standard library contains useful tools for a wide range of tasks.
On top of this, there is a broad ecosystem of third-party tools and packages that offer more specialized functionality.
Here we'll take a look at importing standard library modules, tools for installing third-party modules, and a description of how you can make your own modules.

## Loading Modules: the ``import`` Statement

For loading built-in and third-party modules, Python provides the ``import`` statement.
There are a few ways to use the statement, which we will mention briefly here, from most recommended to least recommended.

### Explicit module import

Explicit import of a module preserves the module's content in a namespace.
The namespace is then used to refer to its contents with a "``.``" between them.
For example, here we'll import the built-in ``math`` module and compute the cosine of pi:

In [163]:
import math
math.cos(math.pi)

-1.0

### Explicit module import by alias

For longer module names, it's not convenient to use the full module name each time you access some element.
For this reason, we'll commonly use the "``import ... as ...``" pattern to create a shorter alias for the namespace.
For example, the NumPy (Numerical Python) package, a popular third-party package useful for data science, is by convention imported under the alias ``np``:

In [164]:
import numpy as np
np.cos(np.pi)

-1.0

### Explicit import of module contents

Sometimes rather than importing the module namespace, you would just like to import a few particular items from the module.
This can be done with the "``from ... import ...``" pattern.
For example, we can import just the ``cos`` function and the ``pi`` constant from the ``math`` module:

In [165]:
from math import cos, pi
cos(pi)

-1.0

## Importing from Python's Standard Library

Python's standard library contains many useful built-in modules, which you can read about fully in [Python's documentation](https://docs.python.org/3/library/).
Any of these can be imported with the ``import`` statement, and then explored using the help function seen in the previous section.
Here is an extremely incomplete list of some of the modules you might wish to explore and learn about:

- ``os`` and ``sys``: Tools for interfacing with the operating system, including navigating file directory structures and executing shell commands
- ``math`` and ``cmath``: Mathematical functions and operations on real and complex numbers
- ``itertools``: Tools for constructing and interacting with iterators and generators
- ``functools``: Tools that assist with functional programming
- ``random``: Tools for generating pseudorandom numbers
- ``pickle``: Tools for object persistence: saving objects to and loading objects from disk
- ``json`` and ``csv``: Tools for reading JSON-formatted and CSV-formatted files.
- ``urllib``: Tools for doing HTTP and other web requests.

You can find information on these, and many more, in the Python standard library documentation: https://docs.python.org/3/library/.

## Importing from Third-Party Modules

One of the things that makes Python useful, especially within the world of data science, is its ecosystem of third-party modules.
These can be imported just as the built-in modules, but first the modules must be installed on your system.
The standard registry for such modules is the Python Package Index (*PyPI* for short), found on the Web at http://pypi.python.org/.
For convenience, Python comes with a program called ``pip`` (a recursive acronym meaning "pip installs packages"), which will automatically fetch packages released and listed on PyPI (if you use Python version 2, ``pip`` must be installed separately).
For example, if you'd like to install the ``supersmoother`` package that I wrote, all that is required is to type the following at the command line:
```
$ pip install supersmoother
```
The source code for the package will be automatically downloaded from the PyPI repository, and the package installed in the standard Python path (assuming you have permission to do so on the computer you're using).

For more information about PyPI and the ``pip`` installer, refer to the documentation at http://pypi.python.org/.

# A Preview of Data Science Tools
This section introduces a couple more important tools that we'll use in this course that are prevalent throughout
data science and machine learning applications with Python. These packages are pre-installed in most popular web-based
Python notebook applications such as [Google Colab](https://colab.research.google.com/). On your local machine, if you
are using [Anaconda or Miniconda](https://www.anaconda.com/download), these can be easily installed using the following 
command in your terminal:

`$ conda install numpy scipy pandas matplotlib scikit-learn`

## NumPy: Numerical Python

NumPy provides an efficient way to store and manipulate multi-dimensional dense arrays in Python.
The important features of NumPy are:

- It provides an ``ndarray`` structure, which allows efficient storage and manipulation of vectors, matrices, and higher-dimensional datasets.
- It provides a readable and efficient syntax for operating on this data, from simple element-wise arithmetic to more complicated linear algebraic operations.

In the simplest case, NumPy arrays look a lot like Python lists.
For example, here is an array containing the range of numbers 1 to 9 (compare this with Python's built-in ``range()``):

In [166]:
import numpy as np
x = np.arange(1, 10)
x

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

NumPy's arrays offer both efficient storage of data, as well as efficient element-wise operations on the data.
For example, to square each element of the array, we can apply the "``**``" operator to the array directly:

In [167]:
x ** 2

array([ 1,  4,  9, 16, 25, 36, 49, 64, 81])

Compare this with the much more verbose Python-style list comprehension for the same result:

In [168]:
[val ** 2 for val in range(1, 10)]

[1, 4, 9, 16, 25, 36, 49, 64, 81]

Unlike Python lists (which are limited to one dimension), NumPy arrays can be multi-dimensional.
For example, here we will reshape our ``x`` array into a 3x3 array:

In [169]:
M = x.reshape((3, 3))
M

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

A two-dimensional array is one representation of a matrix, and NumPy knows how to efficiently do typical matrix operations. For example, you can compute the transpose using ``.T``:

In [170]:
M.T

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

or a matrix-vector product using ``np.dot``:

In [171]:
np.dot(M, [5, 6, 7])

array([ 38,  92, 146])

and even more sophisticated operations like eigenvalue decomposition:

In [172]:
np.linalg.eigvals(M)

array([ 1.61168440e+01, -1.11684397e+00, -8.58274334e-16])

The full [Numpy documentation](https://numpy.org/doc/) is comprehensive and helpful. Usually, a quick Google search
can inform you on how to do something in `numpy` that you might want to do with vectors and matrices.

## Pandas: Labeled Column-oriented Data

Pandas is a much newer package than NumPy, and is in fact built on top of it.
What Pandas provides is a labeled interface to multi-dimensional data, in the form of a DataFrame object that will feel very familiar to users of R and related languages.
DataFrames in Pandas look something like this:

In [173]:
import pandas as pd
df = pd.DataFrame({'label': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'value': [1, 2, 3, 4, 5, 6]})
df

Unnamed: 0,label,value
0,A,1
1,B,2
2,C,3
3,A,4
4,B,5
5,C,6


The Pandas interface allows you to do things like select columns by name:

In [174]:
df['label']

0    A
1    B
2    C
3    A
4    B
5    C
Name: label, dtype: object

Apply string operations across string entries:

In [175]:
df['label'].str.lower()

0    a
1    b
2    c
3    a
4    b
5    c
Name: label, dtype: object

Apply aggregates across numerical entries:

In [176]:
df['value'].sum()

21

And, perhaps most importantly, do efficient database-style joins and groupings:

In [177]:
df.groupby('label').sum()

Unnamed: 0_level_0,value
label,Unnamed: 1_level_1
A,5
B,7
C,9


## Matplotlib MatLab-style scientific visualization

Matplotlib is currently the most popular scientific visualization packages in Python.
Even proponents admit that its interface is sometimes overly verbose, but it is a powerful library for creating a large range of plots.

To use Matplotlib, we can start by enabling the notebook mode (for use in the Jupyter notebook) and then importing the package as ``plt``"

In [None]:
import matplotlib.pyplot as plt

  plt.style.use('seaborn')  # make graphs in the style of R's ggplot


Now let's create some data (as NumPy arrays, of course) and plot the results:

In [188]:
x = np.linspace(0, 10)  # range of values from 0 to 10
y = np.sin(x)           # sine of these values
plt.plot(x, y)         # plot as a line

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x13a3feed0>]

This is the simplest example of a Matplotlib plot; for ideas on the wide range of plot types available, see [Matplotlib's online gallery](http://matplotlib.org/gallery.html) as well as other references listed in [Resources for Further Learning](16-Further-Resources.ipynb).

## SciPy: Scientific Python

SciPy is a collection of scientific functionality that is built on NumPy.
The package began as a set of Python wrappers to well-known Fortran libraries for numerical computing, and has grown from there.
The package is arranged as a set of submodules, each implementing some class of numerical algorithms.
Here is an incomplete sample of some of the more important ones for data science:

- ``scipy.fftpack``: Fast Fourier transforms
- ``scipy.integrate``: Numerical integration
- ``scipy.interpolate``: Numerical interpolation
- ``scipy.linalg``: Linear algebra routines
- ``scipy.optimize``: Numerical optimization of functions
- ``scipy.sparse``: Sparse matrix storage and linear algebra
- ``scipy.stats``: Statistical analysis routines

For example, let's take a look at interpolating a smooth curve between some data

In [189]:
from scipy import interpolate

# choose eight points between 0 and 10
x = np.linspace(0, 10, 8)
y = np.sin(x)

# create a cubic interpolation function
func = interpolate.interp1d(x, y, kind='cubic')

# interpolate on a grid of 1,000 points
x_interp = np.linspace(0, 10, 1000)
y_interp = func(x_interp)

# plot the results
plt.figure()  # new figure
plt.plot(x, y, 'o')
plt.plot(x_interp, y_interp);

<IPython.core.display.Javascript object>