SlideShare a Scribd company logo
1 of 234
Download to read offline
| @Apptrainers
• Introduction to Python
• Numpy
• Pandas
| @Apptrainers
content
| @Apptrainers
“In December 1989, I was looking for a "hobby"
programming project that would keep me occupied during
the week around Christmas. My office ... would be closed,
but I had a home computer, and not much else on my
hands. I decided to write an interpreter for the new
scripting language I had been thinking about lately: a
descendant of ABC that would appeal to Unix/C hackers. I
chose Python as a working title for the project, being in a
slightly irreverent mood (and a big fan of Monty Python's
Flying Circus).”
— Guido van Rossum
4| @Apptrainers
 The big technology companies have each largely aligned themselves with different languages
stacks.
 Oracle and IBM are aligned with Java (Oracle actually owns Java).
 Google are known for their use of Python (1997), a very versatile, dynamic and extensible
language, although in reality they are also heavy users of C++ and Java. They have also created
their own language called Go (2009).
5| @Apptrainers
 Easy to learn and powerful programming language
 It has efficient high-level data structures and a simple but effective approach to object-
oriented programming.
 Freely available in source or binary form for all major platforms from the Python Web
site, https://www.python.org/
The Python interpreter is easily extended with new functions and data types implemented
in C or C++ (or other languages callable from C).
Python is also suitable as an extension language for customizable applications.
Widely used (Google, NASA, Quora).
6
| @Apptrainers
When you run python program an interpreter will parse python program line by line basis, as
compared to compiled languages like C or C++, where compiler first compiles the program
and then start running.
Difference is that interpreted languages are little bit slow as compared to compiled languages.
7| @Apptrainers
 In python you don’t need to define variable data type ahead of time, python automatically
guesses the data type of the variable based on the type of value it contains.
8| @Apptrainers
Python codes are usually 1/3 or 1/5 of the java code. It means we can write less code in Python
to achieve the same thing as in Java.
9| @Apptrainers
 There are many good options for saving and manipulating code
Sublime text (unlimited free trial available)
Notepad++
Xcode (Mac)
TextWrangler (Mac)
TextEdit (Mac)
 Now there are multiple platforms for taking online courses for free
Coursera
Edx
Stanford Online
Khan Academy
Udacity
| @Apptrainers 10
 To download Python follow the instructions on the
official website!
https://www.python.org/
11| @Apptrainers
I would strongly recommend this video:
https://www.youtube.com/watch?v=HW29067qVWk
12| @Apptrainers
13| @Apptrainers
https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
https://github.com
14| @Apptrainers
“GitHub is a code hosting platform for version control and collaboration. It lets you and others
work together on projects from anywhere”.
GitHub accounts can be public (free) or private (not free)
A repository is usually used to organize a single project, It contains folders and files, images,
videos, spreadsheets, and data sets – anything your project needs.
15| @Apptrainers
Master in a repository:The final version
Branch:To try out new ideas that don’t affect
the master unless pull request is accepted. Any
changes committed to branch reflects for you
to keep track of different versions
Adding Commits:To Keep track (history) of
user progress on a branch or master.
Forking a repository: creates a copy of Repo.
Submit a pull request to owner so that the
owner can incorporate changes.
16| @Apptrainers
 Download Python and Jupyter Notebook
 Write a python code to print your name, your id, and your favorite quote!
 Save the project as .html and as .ipynb
 Install git and create a GitHub account
 Upload your first project as .html to e-learning
 Upload your first project as .ipynb to your Github account
Share the link of your Github with me on e-learning
17| @Apptrainers
https://www.tutorialspoint.com/execute_python_online.php
https://www.onlinegdb.com/online_python_compiler
18| @Apptrainers
You can type things directly into a running Python session
19| @Apptrainers
Most of the programming languages like C, C++, Java use braces { } to define a block of code.
Python uses indentation.
A code block (body of a function, loop etc.) starts with indentation and ends with the first
unindented line.The amount of indentation is up to you, but it must be consistent throughout
that block.
Generally four whitespaces are used for indentation and is preferred over tabs. Here is an
example.
for i in range(1,11):
print(i)
if i == 5:
break
Incorrect indentation will result into IndentationError.
20| @Apptrainers
In Python, we use the hash (#) symbol to start writing a comment.
It extends up to the newline character. Comments are for programmers for better
understanding of a program. Python Interpreter ignores comment.
#This is a comment
#print out Hello
print('Hello’)
If we have comments that extend multiple lines, one way of doing it is to use hash (#) in the
beginning of each line.
Another way of doing this is to use triple quotes, either ’‘ ' or ” ” ".
These triple quotes are generally used for multi-line strings. But they can be used as multi-line
comment as well.
"""This is also
a perfect example
of multi-line comments"""
21| @Apptrainers
expression: A data value or set of operations to compute a value.
Examples: 1 + 4 * 3
42
Arithmetic operators we will use:
+ - * / addition, subtraction, multiplication, division
% modulus, a.k.a. remainder
** exponentiation
precedence: Order in which operations are computed.
* / % ** have a higher precedence than + -
1 + 3 * 4 is 13
Parentheses can be used to force a certain order of evaluation.
(1 + 3) * 4 is 16
Operat
or
Description Example
= Assignment num = 7
+ Addition num = 2 + 2
- Subtraction num = 6 - 4
* Multiplication num = 5 * 4
/ Division num = 25 / 5
% Modulo num = 8 % 3
** Exponent num = 9 ** 2
22| @Apptrainers
When we divide integers with / , the quotient is also an integer.
 35 / 5 is 7
 84 / 10 is 8
 156 / 100 is 1
The % operator computes the remainder from a division of integers.
 The operators + - * / % ** ( ) all work for real numbers.
 The / produces an exact answer: 15.0 / 2.0 is 7.5
 The same rules of precedence also apply to real numbers:
Evaluate ( ) before * / % before + -
 When integers and reals are mixed, the result is a real number.
 Example: 1 / 2.0 is 0.5
The conversion occurs on a
per-operator basis
7 / 3 * 1.2 + 3 / 2
2 * 1.2 + 3 / 2
2.4 + 3 / 2
2.4 + 1
3.4
23| @Apptrainers
Python has useful commands for performing calculations.
Command name Description
abs(value) absolute value
ceil(value) rounds up
cos(value) cosine, in radians
floor(value) rounds down
log(value) logarithm, base e
log10(value) logarithm, base 10
max(value1, value2) larger of two values
min(value1, value2) smaller of two values
round(value) nearest whole number
sin(value) sine, in radians
sqrt(value) square root
Constant Description
e 2.7182818...
pi 3.1415926...
To use many of these commands, you
must write the following at the top of your
Python program:
from math import *
24| @Apptrainers
variable: A named piece of memory that can store a value.
Usage:
 Compute an expression's result,
 store that result into a variable,
 and use that variable later in the program.
assignment statement: Stores a value into a variable.
Syntax:
name = value
Examples: x = 5 gpa = 3.14
x 5 gpa 3.14
A variable that has been given a value can be used in expressions.
x + 4 is 9
Exercise: Evaluate the quadratic equation for a given a, b, and c.
25| @Apptrainers
 print : Produces text output on the console.
 Syntax:
print ("Message”)
print (Expression)
 Prints the given text message or expression value on the console, and moves the cursor down to the
next line.
print (Item1, Item2, ..., ItemN)
 Prints several messages and/or expressions on the same line.
 Examples:
print ("Hello, world!”)
age = 45
print ("You have", 65 - age, "years until retirement”)
Output:
Hello, world!
You have 20 years until retirement 26| @Apptrainers
 input : Reads a number from user input.
 You can assign (store) the result of input into a variable.
 Example:
age = input("How old are you? ")
print ("Your age is", age)
print ("You have", 65 - age, "years until retirement”)
Output:
How old are you? 53
Your age is 53
You have 12 years until retirement
 Exercise: Write a Python program that prompts the user for his/her amount of money, then
reports how many Nintendo Wiis the person can afford, and how much more money he/she
will need to afford an additional Wii.
27| @Apptrainers
for loop: Repeats a set of statements over a group of values.
 Syntax:
for variableName in groupOfValues:
statements
 We indent the statements to be repeated with tabs or spaces.
 variableName gives a name to each value, so you can refer to it in the statements.
 groupOfValues can be a range of integers, specified with the range function.
 Example:
for x in range(1, 6):
print (x, "squared is", x * x)
Output:
1 squared is 1
2 squared is 4
3 squared is 9
4 squared is 16
5 squared is 25
28| @Apptrainers
29| @Apptrainers
The range function specifies a range of integers:
 range(start, stop) - the integers between start (inclusive)
and stop (exclusive)
It can also accept a third value specifying the change between values.
 range(start, stop, step) - the integers between start (inclusive)
and stop (exclusive) by step
Example:
for x in range(5, 0, -1):
print (x)
print (”Hello!”)
Output:
5
4
3
2
1
Hello!
30| @Apptrainers
 Some loops incrementally compute a value that is initialized outside the loop. This is
sometimes called a cumulative sum.
sum = 0
for i in range(1, 11):
sum = sum + (i * i)
print ("sum of first 10 squares is", sum)
Output:
sum of first 10 squares is 385
Exercise: Write a Python program that computes the factorial of an integer.
31| @Apptrainers
if statement: Executes a group of statements only if a certain condition is true. Otherwise,
the statements are skipped.
Syntax:
if condition:
statements
Example:
gpa = 3.4
if gpa > 2.0:
print ("Your application is accepted.”)
32| @Apptrainers
if/else statement: Executes one block of statements if a certain
condition is True, and a second block of statements if it is False.
 Syntax:
if condition:
statements
else:
statements
Example:
gpa = 1.4
if gpa > 2.0:
print "Welcome to JUST University!"
else:
print "Your application is denied."
Multiple conditions can be chained with elif ("else if"):
if condition:
statements
elif condition:
statements
else:
statements
33| @Apptrainers
while loop: Executes a group of statements as long as a condition is True.
good for indefinite loops (repeat an unknown number of times)
Syntax:
while condition:
statements
Example:
number = 1
while number < 200:
print number,
number = number * 2
Output:
1 2 4 8 16 32 64 128
34| @Apptrainers
Many logical expressions use relational operators:
Logical expressions can be combined with logical operators:
Exercise: Write code to display and count the factors of a number.
Operator Example Result
and 9 != 6 and 2 < 3 True
or 2 == 3 or -1 < 5 True
not not 7 > 0 False
Operator Meaning Example Result
== equals 1 + 1 == 2 True
!= does not equal 3.2 != 2.5 True
< less than 10 < 5 False
> greater than 10 > 5 True
<= less than or equal to 126 <= 100 False
>= greater than or equal to 5.0 >= 5.0 True
35| @Apptrainers
 string: A sequence of text characters in a program.
 Strings start and end with quotation mark " or apostrophe ' characters.
 Examples:
"hello"
"This is a string"
"This, too, is a string. It can be very long!"
 A string may not span across multiple lines or contain a " character.
"This is not
a legal String."
"This is not a "legal" String either."
 A string can represent characters by preceding them with a backslash.
 t tab character
 n new line character
 " quotation mark character
  backslash character
 Example: "HellottherenHow are you?"
36| @Apptrainers
 Characters in a string are numbered with indexes starting at 0:
 Example:
name = "P. Diddy"
 Accessing an individual character of a string:
variableName [ index ]
 Example:
print name, "starts with", name[0]
Output:
P. Diddy starts with P
index 0 1 2 3 4 5 6 7
character P . D i d d y
37| @Apptrainers
len(string) - number of characters in a string (including spaces)
str.lower(string) - lowercase version of a string
str.upper(string) - uppercase version of a string
Example:
name = "Martin Douglas Stepp"
length = len(name)
big_name = str.upper(name)
print big_name, "has", length, "characters"
Output:
MARTIN DOUGLAS STEPP has 20 characters
38| @Apptrainers
A compound data type:
[0]
[2.3, 4.5]
[5, "Hello", "there", 9.8]
[]
Use len() to get the length of a list
>>> names = [“Ben",“Chen",“Yaqin"]
>>> len(names)
3
39| @Apptrainers
40| @Apptrainers
41| @Apptrainers
http://sebastianraschka.com/Articles/2014_python_2_3_key_diff.html
42| @Apptrainers
Certain features of Python are not loaded by default
In order to use these features, you’ll need to import the modules that contain them.
E.g.
import matplotlib.pyplot as plt
import numpy as np
43| @Apptrainers
f = 7 / 2
# in python 2, f will be 3, unless “from __future__ import division”
f = 7 / 2 # in python 3 f = 3.5
f = 7 // 2 # f = 3 in both python 2 and 3
f = 7 / 2. # f = 3.5 in both python 2 and 3
f = 7 / float(2) # f is 3.5 in both python 2 and 3
f = int(7 / 2) # f is 3 in both python 2 and 3
44| @Apptrainers
 Get the i-th element of a list
x = [i for i in range(10)] # is the list [0, 1, ..., 9]
zero = x[0] # equals 0, lists are 0-indexed
one = x[1] # equals 1
nine = x[-1] # equals 9, 'Pythonic' for last element
eight = x[-2] # equals 8, 'Pythonic' for next-to-last element
one_to_four = x[1:5] # [1, 2, 3, 4]
first_three = x[:3] # [0, 1, 2]
last_three = x[-3:] # [7, 8, 9]
three_to_end = x[3:] # [3, 4, ..., 9]
without_first_and_last = x[1:-1] # [1, 2, ..., 8]
copy_of_x = x[:] # [0, 1, 2, ..., 9]
another_copy_of_x = x[:3] + x[3:] # [0, 1, 2, ..., 9]
45| @Apptrainers
1 in [1, 2, 3] # True
0 in [1, 2, 3] # False
x = [1, 2, 3]
y = [4, 5, 6]
x.extend(y) # x is now [1,2,3,4,5,6]
x = [1, 2, 3]
y = [4, 5, 6]
z = x + y # z is [1,2,3,4,5,6]; x is unchanged.
x, y = [1, 2] # x is 1 and y is 2
[x, y] = 1, 2 # same as above
x, y = [1, 2] # same as above
x, y = 1, 2 # same as above
_, y = [1, 2] # y is 2, didn't care about the first element
46| @Apptrainers
>>> a = ['Mary', 'had', 'a', 'little', 'lamb']
>>> for i in range(len(a)):
... print(i, a[i])
...
0 Mary
1 had
2 a
3 little
4 lamb
47| @Apptrainers
What are the expected output for the following code?
a = list(range(10))
b = a
b[0] = 100
print(a)
a = list(range(10))
b = a[:]
b[0] = 100
print(a)
[100, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
a = [0, 1, 2, 3, 4]
b = a
c = a[:]
a == b
Out[129]: True
a is b
Out[130]: True
a == c
Out[132]: True
a is c
Out[133]: False
48| @Apptrainers
Similar to lists, but are immutable
a_tuple = (0, 1, 2, 3, 4)
Other_tuple = 3, 4
Another_tuple = tuple([0, 1, 2, 3, 4])
Hetergeneous_tuple = (‘john’, 1.1, [1, 2])
Can be sliced, concatenated, or repeated
a_tuple[2:4] # will print (2, 3)
Cannot be modified
a_tuple[2] = 5
TypeError: 'tuple' object does not support item assignment
Note: tuple is defined by comma, not
parentheses, which is only used for
convenience and grouping elements. So a = (1)
is not a tuple, but a = (1,) is.
49| @Apptrainers
Useful for returning multiple values from functions
Tuples and lists can also be used for multiple assignments
def sum_and_product(x, y):
return (x + y),(x * y)
sp = sum_and_product(2, 3) # equals (5, 6)
s, p = sum_and_product(5, 10) # s is 15, p is 50
x, y = 1, 2
[x, y] = [1, 2]
(x, y) = (1, 2)
x, y = y, x
50| @Apptrainers
a = [1, 2, 3, 4, 5, 6]
my_tuple=(a,)
my_tuple[0]=a #### ERROR
a = [1, 2, 3, 4, 5, 6]
my_tuple=(a)
my_tuple[0]=a #### No ERROR
a = [1, 2, 3, 4, 5, 6]
my_tuple=(a,)
my_tuple[0]=5 #### ERROR
a = [1, 2, 3, 4, 5, 6]
my_tuple=(a,)
my_tuple[0][0]=5 #### No ERROR
51| @Apptrainers
A dictionary associates values with unique keys
empty_dict = {} # Pythonic
empty_dict2 = dict() # less Pythonic
grades = { "Joel" : 80, "Tim" : 95 } # dictionary literal
joels_grade = grades["Joel"] # equals 80
grades["Tim"] = 99 # replaces the old value
grades["Kate"] = 100 # adds a third entry
num_students = len(grades) # equals 3
• Access/modify value with key
try:
kates_grade = grades["Kate"]
except KeyError:
print "no grade for Kate!" 52| @Apptrainers
53| @Apptrainers
Check for existence of key
joel_has_grade = "Joel" in grades # True
kate_has_grade = "Kate" in grades # False
joels_grade = grades.get("Joel", 0) # equals 80
kates_grade = grades.get("Kate", 0) # equals 0
no_ones_grade = grades.get("No One") # default
default is None
• Use “get” to avoid keyError and add default value
• Get all items
all_keys = grades.keys() # return a list of all keys
all_values = grades.values() # return a list of all values
all_pairs = grades.items() # a list of (key, value) tuples
#Which of the following is faster?
'Joel' in grades # faster. Hashtable
'Joel' in all_keys # slower. List.
In python3,The
following will not
return lists but
iterable objects
54| @Apptrainers
a = [0, 0, 0, 1]
any(a)
Out[135]: True
all(a)
Out[136]: False
55| @Apptrainers
try:
print 0 / 0
except ZeroDivisionError:
print ("cannot divide by zero")
https://docs.python.org/3/tutorial/errors.ht
ml
56| @Apptrainers
Functions are defined using def
def double(x):
"""this is where you put an optional docstring
that explains what the function does.
for example, this function multiplies its
input by 2"""
return x * 2
• You can call a function after it is defined
z = double(10) # z is 20
• You can give default values to parameters
def my_print(message="my default message"):
print (message)
my_print("hello") # prints 'hello'
my_print() # prints 'my default message‘ 57| @Apptrainers
Sometimes it is useful to specify arguments by name
def subtract(a=0, b=0):
return a – b
subtract(10, 5) # returns 5
subtract(0, 5) # returns -5
subtract(b = 5) # same as above
subtract(b = 5, a = 0) # same as above
58| @Apptrainers
Functions are objects too
In [12]: def double(x): return x * 2
...: DD = double;
...: DD(2)
...:
Out[12]: 4
In [16]: def apply_to_one(f):
...: return f(1)
...: x=apply_to_one(DD)
...: x
...:
Out[16]: 2
59| @Apptrainers
Small anonymous functions can be created with the lambda keyword.
The power of lambda is better shown when you use them
as an anonymous function inside another function.
def myfunc(n):
return lambda a : a * n
mydoubler = myfunc(2)
mytripler = myfunc(3)
print(mydoubler(11))
print(mytripler(11))
A lambda function can take any number of arguments, but can only
have one expression.
x = lambda a : a + 10
print(x(5))
x = lambda a, b, c : a * b - c
print(x(5, 6, 2))
60| @Apptrainers
pairs = [(2, 'two'), (3, 'three'), (1, 'one'), (4, 'four')]
pairs.sort(key = lambda pair: pair[0])
print (pairs)
Out[22]: [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
def getKey(pair): return pair[0]
pairs.sort(key=getKey)
print (pairs)
Out[107]: [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')
61| @Apptrainers
A very convenient way to create a new list
squares = [x * x for x in range(5)]
print (squares)
Out[52]: [0, 1, 4, 9, 16]
squares=[0,0,0,0,0]
for x in range(5):
squares[x] = x * x
print (squares)
Out[64]: [0, 1, 4, 9, 16] 62| @Apptrainers
In [68]: even_numbers = []
In [69]: for x in range(5):
...: if x % 2 == 0:
...: even_numbers.append(x)
...: even_numbers
Out[69]: [0, 2, 4]
In [65]: even_numbers = [x for x in range(5) if x % 2 == 0]
In [66]: even_numbers
Out[66]: [0, 2, 4]
Can also be used to filter list
63| @Apptrainers
More complex examples:
# create 100 pairs (0,0) (0,1) ... (9,8), (9,9)
pairs = [(x, y)
for x in range(10)
for y in range(10)]
# only pairs with x < y,
# range(lo, hi) equals
# [lo, lo + 1, ..., hi - 1]
increasing_pairs = [(x, y)
for x in range(10)
for y in range(x + 1, 10)]
[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9), (1, 2), (1, 3) …etc
64| @Apptrainers
Convenient tools in python to apply function to sequences of data
def double(x): return 2*x
b=range(5)
list(map(double, b))
Out[203]: [0, 2, 4, 6, 8]
In [204]: double(b)
Traceback (most recent call last):
TypeError: unsupported operand type(s) for *: 'int' and 'range'
def double(x): return 2*x
print ([double(i) for i in range(5)])
Out[205]: [0, 2, 4, 6, 8]
65| @Apptrainers
map_output = map(lambda x: x*2, [1, 2, 3, 4])
print(map_output) # Output: map object: <map object at 0x04D6BAB0>
list_map_output = list(map_output)
print(list_map_output) # Output: [2, 4, 6, 8]
map(lambda x : x*2, [1, 2, 3, 4]) #Output [2, 4, 6, 8]
map(lambda x, y: x + y, list_a, list_b) # Output: [11, 22, 33]
66| @Apptrainers
def is_even(x): return x%2==0
a=[0, 1, 2, 3]
list(filter(is_even, a))
Out[208]: [0, 2]
In [209]: [a[i] for i in a if is_even(i)]
Out[209]: [0, 2]
a = [1, 2, 3, 4, 5, 6]
print list(filter(lambda x : x % 2 == 0, a)) # Output: [2, 4, 6]
67| @Apptrainers
In [216]: from functools import reduce
In [217]: reduce(lambda x, y: x+y, range(10))
Out[217]: 45
In [220]: reduce(lambda x, y: x*y, [1, 2, 3, 4])
Out[220]: 24
68| @Apptrainers
Useful to combined multiple lists into a list of tuples
In [238]: list(zip(['a', 'b', 'c'], [1, 2, 3], ['A', 'B', 'C']))
Out[238]: [('a', 1, 'A'), ('b', 2, 'B'), ('c', 3, 'C')]
In [245]: names = ['James', 'Tom', 'Mary']
...: grades = [100, 90, 95]
...: list(zip(names, grades))
...:
Out[245]: [('James', 100), ('Tom', 90), ('Mary', 95)]
69| @Apptrainers
 file object = open(file_name [,
access_mode])
access_mode − The access_mode
determines the mode in which the file
has to be opened, i.e., read, write,
append, etc. A complete list of possible
values is given below in the table.This
is optional parameter and the default
file access mode is read (r).
70| @Apptrainers
71| @Apptrainers
read(): It reads the entire file and returns it contents in the form of a string
readline(): It reads the first line of the file i.e till a newline character or an EOF in case of a file
having a single line and returns a string
readlines(): It reads the entire file line by line and returns a list of line strings
1 hello 40 50 hi
This is my course
Welcome to this course n wish you all the best
f = open("my_file2.txt", 'w')
f.write("Hello Everyone!")
72| @Apptrainers
Notice how each piece of data is
separated by a comma.
73| @Apptrainers
74| @Apptrainers
| @Apptrainers
Numpy
Numerical Computing in Python
2
What is Numpy?
• Numpy, Scipy, and Matplotlib provide MATLAB-
like functionality in python.
• Numpy Features:
 Typed multidimentional arrays (matrices)
 Fast numerical computations (matrix math)
 High-level math functions
3
|@Apptrainers
Why do we need NumPy
Let’s see for ourselves!
4
|@Apptrainers
Why do we need NumPy
• Python does numerical computations slowly.
• 1000 x 1000 matrix multiply
 Python triple loop takes > 10 min.
 Numpy takes ~0.03 seconds
5
|@Apptrainers
NumPy Overview
1. Arrays
2. Shaping and transposition
3. Mathematical Operations
4. Indexing and slicing
5. Broadcasting
6
|@Apptrainers
Arrays
Structured lists of numbers.
• Vectors
• Matrices
• Images
• Tensors
• ConvNets
7
|@Apptrainers
Arrays
Structured lists of numbers.
• Vectors
• Matrices
• Images
• Tensors
• ConvNets
𝑝 𝑥
𝑝 𝑦
𝑝 𝑧
𝑎11 ⋯ 𝑎1𝑛
⋮ ⋱ ⋮
𝑎 𝑚1 ⋯ 𝑎 𝑚𝑛
8
|@Apptrainers
Arrays
Structured lists of numbers.
• Vectors
• Matrices
• Images
• Tensors
• ConvNets
9
|@Apptrainers
Arrays
Structured lists of numbers.
• Vectors
• Matrices
• Images
• Tensors
• ConvNets
10
|@Apptrainers
Arrays
Structured lists of numbers.
• Vectors
• Matrices
• Images
• Tensors
• ConvNets
11
|@Apptrainers
Arrays, Basic Properties
import numpy as np
a = np.array([[1,2,3],[4,5,6]],dtype=np.float32)
print a.ndim, a.shape, a.dtype
1. Arrays can have any number of dimensions, including zero (a scalar).
2. Arrays are typed: np.uint8, np.int64, np.float32, np.float64
3. Arrays are dense. Each element of the array exists and has the same type.
12
|@Apptrainers
Arrays, creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like,
np.ones_like
• np.random.random
13
|@Apptrainers
Arrays, creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like,
np.ones_like
• np.random.random
14
|@Apptrainers
Arrays, creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like,
np.ones_like
• np.random.random
15
|@Apptrainers
Arrays, creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like,
np.ones_like
• np.random.random
16
|@Apptrainers
Arrays, creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like,
np.ones_like
• np.random.random
17
|@Apptrainers
Arrays, creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like,
np.ones_like
• np.random.random
18
|@Apptrainers
Arrays, creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like,
np.ones_like
• np.random.random
19
|@Apptrainers
20
|@Apptrainers
Arrays, creation
• np.ones, np.zeros
• np.arange
• np.concatenate
• np.astype
• np.zeros_like,
np.ones_like
• np.random.random
21
|@Apptrainers
Arrays, danger zone
• Must be dense, no holes.
• Must be one type
• Cannot combine arrays of different shape
22
|@Apptrainers
Shaping
a = np.array([1,2,3,4,5,6])
a = a.reshape(3,2)
a = a.reshape(2,-1)
a = a.ravel()
1. Total number of elements cannot change.
2. Use -1 to infer axis shape
3. Row-major by default (MATLAB is column-major)
23
|@Apptrainers
import numpy as np
a = np.array([1,2,3,4,5,6])
print(a)
print('-'*20)
b=a.reshape(3,2)
print(b)
print('-'*20)
c=a.reshape(2,-1)
print(c)
print('-'*20)
d= a.ravel()
print(d)
24
|@Apptrainers
25
|@Apptrainers
Return values
• Numpy functions return either views or copies.
• Views share data with the original array, like
references in Java/C++. Altering entries of a
view, changes the same entries in the original.
• The numpy documentation says which functions
return views or copies
• np.copy, np.view make explicit copies and views.
26
|@Apptrainers
Transposition
a = np.arange(10).reshape(5,2)
a = a.T
a = a.transpose((1,0))
np.transpose permutes axes.
a.T transposes the first two axes.
27
|@Apptrainers
28
|@Apptrainers
29
|@Apptrainers
Saving and loading arrays
np.savez(‘data.npz’, a=a)
data = np.load(‘data.npz’)
a = data[‘a’]
1. NPZ files can hold multiple arrays
2. np.savez_compressed similar.
30
|@Apptrainers
Mathematical operators
• Arithmetic operations are element-wise
• Logical operator return a bool array
• In place operations modify the array
31
|@Apptrainers
Mathematical operators
• Arithmetic operations are element-wise
• Logical operator return a bool array
• In place operations modify the array
32
|@Apptrainers
Mathematical operators
• Arithmetic operations are element-wise
• Logical operator return a bool array
• In place operations modify the array
33
|@Apptrainers
Mathematical operators
• Arithmetic operations are element-wise
• Logical operator return a bool array
• In place operations modify the array
34
|@Apptrainers
Math, upcasting
Just as in Python and Java, the result of a math
operator is cast to the more general or precise
datatype.
uint64 + uint16 => uint64
float32 / int32 => float32
Warning: upcasting does not prevent
overflow/underflow. You must manually cast first.
Use case: images often stored as uint8. You should
convert to float32 or float64 before doing math.
35
|@Apptrainers
Math, universal functions
Also called ufuncs
Element-wise
Examples:
 np.exp
 np.sqrt
 np.sin
 np.cos
 np.isnan
36
|@Apptrainers
Math, universal functions
Also called ufuncs
Element-wise
Examples:
 np.exp
 np.sqrt
 np.sin
 np.cos
 np.isnan
37
|@Apptrainers
Math, universal functions
Also called ufuncs
Element-wise
Examples:
 np.exp
 np.sqrt
 np.sin
 np.cos
 np.isnan
38
|@Apptrainers
Indexing
x[0,0] # top-left element
x[0,-1] # first row, last column
x[0,:] # first row (many entries)
x[:,0] # first column (many entries)
Notes:
 Zero-indexing
 Multi-dimensional indices are comma-separated (i.e., a
tuple)
39
|@Apptrainers
40
|@Apptrainers
Python Slicing
Syntax: start:stop:step
a = list(range(10))
a[:3] # indices 0, 1, 2
a[-3:] # indices 7, 8, 9
a[3:8:2] # indices 3, 5, 7
a[4:1:-1] # indices 4, 3, 2 (this one is tricky)
41
|@Apptrainers
42
|@Apptrainers
Axes
a.sum() # sum all entries
a.sum(axis=0) # sum over rows
a.sum(axis=1) # sum over columns
a.sum(axis=1, keepdims=True)
1. Use the axis parameter to control which axis
NumPy operates on
2. Typically, the axis specified will disappear,
keepdims keeps all dimensions
43
|@Apptrainers
44
|@Apptrainers
Broadcasting
a = a + 1 # add one to every element
When operating on multiple arrays, broadcasting rules are
used.
Each dimension must match, from right-to-left
1. Dimensions of size 1 will broadcast (as if the value was
repeated).
2. Otherwise, the dimension must have the same shape.
3. Extra dimensions of size 1 are added to the left as needed.
45
|@Apptrainers
Broadcasting example
Suppose we want to add a color value to an image
a.shape is 100, 200, 3
b.shape is 3
a + b will pad b with two extra dimensions so it
has an effective shape of 1 x 1 x 3.
So, the addition will broadcast over the first and
second dimensions.
46
|@Apptrainers
Broadcasting failures
If a.shape is 100, 200, 3 but b.shape is 4 then a + b
will fail. The trailing dimensions must have the
same shape (or be 1)
47
|@Apptrainers
Tips to avoid bugs
1. Know what your datatypes are.
2. Check whether you have a view or a copy.
3. Know np.dot vs np.multiply.
48
|@Apptrainers
49
numpy.dot
numpy.dot(a, b, out=None)
Dot product of two arrays. Specifically,
• If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation).
• If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a @ b is preferred.
• If either a or b is 0-D (scalar), it is equivalent to multiply and using numpy.multiply(a, b) or a *
b is preferred.
• If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.
• If a is an N-D array and b is an M-D array (where M>=2), it is a sum product over the last axis of a and
the second-to-last axis of b:
dot(a, b)[i,j,k,m] =
(a[i,j,:] * b[k,:,m])
|@Apptrainers
50
|@Apptrainers
51
Numpy.multiply
|@Apptrainers
52
|@Apptrainers
| @Apptrainers
علم البيانات - Data Sience
What is Pandas?
Pandas is a Python module, which is rounding up the capabilities of Numpy,
Scipy and Matplotlab. The word pandas is an acronym which is derived
from:
"Python and data analysis" and "panel data".
There is often some confusion about whether Pandas is an alternative to
Numpy, SciPy and Matplotlib.
The truth is that it is built on top of Numpy. This means that Numpy is
required by pandas.
Scipy and Matplotlib on the other hand are not required by pandas but they
are extremely useful. That's why the Pandas project lists them as "optional
dependency".
| @Apptrainers
What is Pandas?
• Pandas is a software library written for the Python programming
language.
• It is used for data manipulation and analysis.
• It provides special data structures and operations for the
manipulation of numerical tables and time series.
| @Apptrainers| @Apptrainers
Common Data Structures in Pandas
• Series
• Data Frame
| @Apptrainers| @Apptrainers
Series
• A Series is a one-dimensional labelled array-like object.
• It is capable of holding any data type, e.g. integers, floats, strings,
Python objects, and so on.
• It can be seen as a data structure with two arrays: one functioning as
the index, i.e. the labels, and the other one contains the actual data.
| @Apptrainers
Example
import pandas as pd
S = pd.Series([11, 28, 72, 3, 5, 8])
S
The above code returns:
0 11
1 28
2 72
3 3
4 5
5 8
dtype: int64
| @Apptrainers
• We can directly access the index and the values of our Series S:
print(S.index)
print(S.values)
RangeIndex(start=0, stop=6, step=1)
[11 28 72 3 5 8]
| @Apptrainers
• If we compare this to creating an array in numpy, there are still lots of
similarities:
import numpy as np
X = np.array([11, 28, 72, 3, 5, 8])
print(X)
print(S.values)
# both are the same type:
print(type(S.values),
type(X))
[11 28 72 3 5 8]
[11 28 72 3 5 8]
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
| @Apptrainers
Another example:
fruits = ['apples', 'oranges', 'cherries', 'pears']
quantities = [20, 33, 52, 10]
S = pd.Series(quantities, index=fruits)
S
Output:
apples 20
oranges 33
cherries 52
pears 10
dtype: int64
| @Apptrainers
If we add two series with the same indices, we get a new series with the same
index and the corresponding values will be added:
fruits = ['apples', 'oranges', 'cherries', 'pears']
S = pd.Series([20, 33, 52, 10], index=fruits)
S2 = pd.Series([17, 13, 31, 32], index=fruits)
print(S + S2)
print(“sum of S: ", sum(S))
Output:
apples 37
oranges 46
cherries 83
pears 42
dtype: int64
sum of S: 115
| @Apptrainers
The indices do not have to be the same for the Series addition. The index will be the
"union" of both indices. If an index doesn't occur in both Series, the value for this Series
will be NaN:
fruits = ['peaches', 'oranges', 'cherries', 'pears']
fruits2 = ['raspberries', 'oranges', 'cherries', 'pears']
S = pd.Series([20, 33, 52, 10], index=fruits)
S2 = pd.Series([17, 13, 31, 32], index=fruits2)
print(S + S2)
Output:
cherries 83.0
oranges 46.0
peaches NaN
pears 42.0
raspberries NaN
dtype: float64
| @Apptrainers
fruits = ['apples', 'oranges', 'cherries', 'pears']
fruits_ro = ["mere", "portocale", "cireșe", "pere"]
S = pd.Series([20, 33, 52, 10], index=fruits)
S2 = pd.Series([17, 13, 31, 32], index=fruits_ro)
print(S+S2)
Output:
apples NaN
cherries NaN
cireșe NaN
mere NaN
oranges NaN
pears NaN
pere NaN
portocale NaN
dtype: float64
| @Apptrainers
It's possible to access single values of a Series or more than one value
by a list of indices:
print(S['apples'])
20
print(S[['apples', 'oranges', 'cherries']])
apples 20
oranges 33
cherries 52
dtype: int64
| @Apptrainers
Similar to Numpy we can use scalar operations or mathematical functions on a series:
import numpy as np
print((S + 3) * 4)
print("======================")
print(np.sin(S))
Output:
apples 92
oranges 144
cherries 220
pears 52
dtype: int64
======================
apples 0.912945
oranges 0.999912
cherries 0.986628
pears -0.544021
dtype: float64
| @Apptrainers
Pandas.Series.Apply
Series.apply(func, convert_dtype=True, args=(), **kwds)
Parameter Meaning
func
a function, which can be a NumPy function that will be
applied to the entire Series or a Python function that
will be applied to every single value of the series
convert_dtype
A boolean value. If it is set to True (default), apply will
try to find better dtype for elementwise function
results. If False, leave as dtype=object
args
Positional arguments which will be passed to the
function "func" additionally to the values from the
series.
**kwds
Additional keyword arguments will be passed as
keywords to the function
| @Apptrainers
S.apply(np.sin)
apples 0.912945
oranges 0.999912
cherries 0.986628
pears -0.544021
dtype: float64
| @Apptrainers
• We can also use Python lambda functions. Let's assume, we have the
following task: test the amount of fruit for every kind. If there are less
than 50 available, we will augment the stock by 10:
S.apply(lambda x: x if x > 50 else x+10 )
apples 30
oranges 43
cherries 52
pears 20
dtype: int64
| @Apptrainers
Filtering with a Boolean array:
S[S>30]
oranges 33
cherries 52
dtype: int64
| @Apptrainers
• A series can be seen as an ordered Python dictionary with a fixed
length.
"apples" in S
True
| @Apptrainers
• We can even pass a dictionary to a Series object, when we create it.
We get a Series with the dict's keys as the indices. The indices will be
sorted.
cities = {"London": 8615246, "Berlin": 3562166, "Madrid": 3165235,
"Rome": 2874038, "Paris": 2273305, "Vienna": 1805681,
"Bucharest":1803425, "Hamburg": 1760433, "Budapest": 1754000,
"Warsaw": 1740119, "Barcelona":1602386, "Munich": 1493900,
"Milan": 1350680}
city_series = pd.Series(cities)
print(city_series)
| @Apptrainers
NaN
One problem in dealing with data analysis tasks consists in missing
data. Pandas makes it as easy as possible to work with missing data.
my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart",
"Hamburg"]
my_city_series = pd.Series(cities, index=my_cities)
my_city_series
| @Apptrainers
• Due to the NaN values the population values for the other cities are
turned into floats. There is no missing data in the following examples,
so the values are int:
my_cities = ["London", "Paris", "Berlin", "Hamburg"]
my_city_series = pd.Series(cities, index=my_cities)
my_city_series
The Methods isnull() and notnull()
my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart",
"Hamburg"]
my_city_series = pd.Series(cities, index=my_cities)
print(my_city_series.isnull())
| @Apptrainers
print(my_city_series.notnull())
• We get also a NaN, if a value in the dictionary has a None:
d = {"a":23, "b":45, "c":None, "d":0}
S = pd.Series(d)
print(S)
| @Apptrainers
print(pd.isnull(S))
| @Apptrainers
Print(pd.notnull(S))
| @Apptrainers
Filtering out Missing Data
It's possible to filter out missing data with the Series method dropna. It
returns a Series which consists only of non-null data:
import pandas as pd
cities = {"London": 8615246, "Berlin": 3562166, "Madrid": 3165235, "Rome":
2874038, "Paris": 2273305, "Vienna": 1805681, "Bucharest":1803425,
"Hamburg": 1760433, "Budapest": 1754000, "Warsaw": 1740119,
"Barcelona":1602386, "Munich": 1493900, "Milan": 1350680}
my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart", "Hamburg"]
my_city_series = pd.Series(cities, index=my_cities)
print(my_city_series.dropna())
| @Apptrainers
| @Apptrainers
Filling in Missing Data
• In many cases you don't want to filter out missing data, but you want to fill in
appropriate data for the empty gaps. A suitable method in many situations will be
fillna:
print(my_city_series.fillna(0))
London 8615246.0
Paris 2273305.0
Zurich 0.0
Berlin 3562166.0
Stuttgart 0.0
Hamburg 1760433.0
dtype: float64
| @Apptrainers
• If we call fillna with a dictionary, we can provide the appropriate data, i.e.
the population of Zurich and Stuttgart:
missing_cities = {"Stuttgart":597939, "Zurich":378884}
my_city_series.fillna(missing_cities)
London 8615246.0
Paris 2273305.0
Zurich 378884.0
Berlin 3562166.0
Stuttgart 597939.0
Hamburg 1760433.0
dtype: float64
| @Apptrainers
cities = {"London": 8615246, "Berlin": 3562166, "Madrid": 3165235,
"Rome": 2874038, "Paris": 2273305, "Vienna": 1805681,
"Bucharest":1803425, "Hamburg": 1760433, "Budapest": 1754000,
"Warsaw": 1740119, "Barcelona":1602386, "Munich": 1493900,
"Milan": 1350680}
my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart",
"Hamburg"]
my_city_series = pd.Series(cities, index=my_cities)
my_city_series = my_city_series.fillna(0).astype(int)
print(my_city_series)
| @Apptrainers
London 8615246
Paris 2273305
Zurich 0
Berlin 3562166
Stuttgart 0
Hamburg 1760433
dtype: int64
| @Apptrainers
DataFrame
• The underlying idea of a DataFrame is based on spreadsheets. We
can see the data structure of a DataFrame as tabular and
spreadsheet-like.
• A DataFrame logically corresponds to a "sheet" of an Excel document.
• A DataFrame has both a row and a column index.
| @Apptrainers
• Like a spreadsheet or Excel sheet, a DataFrame object contains an
ordered collection of columns.
• Each column consists of a unique data type, but different columns can
have different types, e.g. the first column may consist of integers,
while the second one consists of Boolean values and so on.
• There is a close connection between the DataFrames and the Series
of Pandas.
• A DataFrame can be seen as a concatenation of Series, each Series
having the same index, i.e. the index of the DataFrame.
| @Apptrainers
import pandas as pd
years = range(2014, 2018)
shop1 = pd.Series([2409.14, 2941.01, 3496.83, 3119.55], index=years)
shop2 = pd.Series([1203.45, 3441.62, 3007.83, 3619.53], index=years)
shop3 = pd.Series([3412.12, 3491.16, 3457.19, 1963.10], index=years)
print(pd.concat([shop1, shop2, shop3]))
| @Apptrainers
| @Apptrainers
• This result is not what we have intended or expected. The reason is
that concat used 0 as the default for the axis parameter. Let's do it
with "axis=1":
shops_df = pd.concat([shop1, shop2, shop3], axis=1)
print(shops_df)
| @Apptrainers
| @Apptrainers
cities = ["Zürich", "Winterthur", "Freiburg"]
shops_df.columns = cities
print(shops_df)
# alternative way: give names to series:
shop1.name = "Zürich"
shop2.name = "Winterthur"
shop3.name = "Freiburg"
print("------")
shops_df2 = pd.concat([shop1, shop2, shop3], axis=1)
print(shops_df2)
| @Apptrainers
| @Apptrainers
print(type(shops_df))
<class 'pandas.core.frame.DataFrame'>
| @Apptrainers
DataFrames from Dictionaries
cities = {"name": ["London", "Berlin", "Madrid", "Rome", "Paris",
"Vienna", "Bucharest", "Hamburg", "Budapest", "Warsaw",
"Barcelona", "Munich", "Milan"],
"population": [8615246, 3562166, 3165235, 2874038, 2273305,
1805681, 1803425, 1760433, 1754000, 1740119, 1602386, 1493900,
1350680],
"country": ["England", "Germany", "Spain", "Italy", "France", "Austria",
"Romania", "Germany", "Hungary", "Poland", "Spain", "Germany",
"Italy"]}
city_frame = pd.DataFrame(cities)
print(city_frame)
| @Apptrainers
| @Apptrainers
Retrieving the Column Names
city_frame.columns.values
Output:
array(['country', 'name', 'population'], dtype=object)
| @Apptrainers
Custom Index
• We can see that an index (0,1,2, ...) has been automatically assigned
to the DataFrame. We can also assign a custom index to the
DataFrame object:
ordinals = ["first", "second", "third", "fourth", "fifth", "sixth", "seventh",
"eigth", "ninth", "tenth", "eleventh", "twelfth", "thirteenth"]
city_frame = pd.DataFrame(cities, index=ordinals)
print(city_frame)
| @Apptrainers
| @Apptrainers
Rearranging the Order of Columns
We can also define and rearrange the order of the columns at the time
of creation of the DataFrame.
This makes also sure that we will have a defined ordering of our
columns, if we create the DataFrame from a dictionary.
Dictionaries are not ordered.
| @Apptrainers
city_frame = pd.DataFrame(cities, columns=["name", "country",
"population"])
print(city_frame)
| @Apptrainers
| @Apptrainers
• But what if you want to change the column names and the ordering
of an existing DataFrame?
city_frame.reindex(["country", "name", "population"])
print(city_frame)
| @Apptrainers
| @Apptrainers
• Now, we want to rename our columns. For this purpose, we will use
the DataFrame method 'rename'. This method supports two calling
conventions
• (index=index_mapper, columns=columns_mapper, ...)
• (mapper, axis={'index', 'columns'}, ...)
• We will rename the columns of our DataFrame into Romanian names
in the following example.
• We set the parameter inplace to True so that our DataFrame will be
changed instead of returning a new DataFrame, if inplace is set to
False, which is the default!
| @Apptrainers
city_frame.rename(columns={"name":"Nume", "country":"țară",
"population":"populație"}, inplace=True)
print(city_frame)
| @Apptrainers
| @Apptrainers
Existing Column as the Index of a DataFrame
• We want to create a more useful index in the following example. We
will use the country name as the index, i.e. the list value associated to
the key "country" of our cities dictionary:
city_frame = pd.DataFrame(cities, columns=["name", "population"],
index=cities["country"])
print(city_frame)
| @Apptrainers
| @Apptrainers
• Alternatively, we can change an existing DataFrame.
• We can use the method set_index to turn a column into an index.
• "set_index" does not work in-place, it returns a new data frame with
the chosen column as the index:
| @Apptrainers
city_frame = pd.DataFrame(cities)
city_frame2 = city_frame.set_index("country")
print(city_frame2)
| @Apptrainers
| @Apptrainers
• We saw in the previous example that the set_index method returns a
new DataFrame object and doesn't change the original DataFrame. If
we set the optional parameter "inplace" to True, the DataFrame will
be changed in place, i.e. no new object will be created:
city_frame = pd.DataFrame(cities)
city_frame.set_index("country", inplace=True)
print(city_frame)
| @Apptrainers
| @Apptrainers
Label-Indexing on the Rows
• So far we have indexed DataFrames via the columns. We will
demonstrate now, how we can access rows from DataFrames via the
locators 'loc' and 'iloc'. ('ix' is deprecated and will be removed in the
future)
city_frame = pd.DataFrame(cities, columns=("name",
"population"), index=cities["country"])
print(city_frame.loc["Germany"])
| @Apptrainers
| @Apptrainers
| @Apptrainers
| @Apptrainers
Sum and Cumulative Sum
• We can calculate the sum of all the columns of a DataFrame or the
sum of certain columns:
print(city_frame.sum())
| @Apptrainers
city_frame["population"].sum()
33800614
| @Apptrainers
We can use "cumsum" to calculate the cumulative sum:
| @Apptrainers
Assigning New Values to Columns
• x is a Pandas Series.
• We can reassign the previously calculated cumulative sums to the
population column:
city_frame["population"] = x
print(city_frame)
| @Apptrainers
| @Apptrainers
• Instead of replacing the values of the population column
with the cumulative sum, we want to add the cumulative
population sum as a new column with the name
"cum_population".
city_frame = pd.DataFrame(cities, columns=["country",
"population", "cum_population"], index=cities["name"])
print(city_frame)
| @Apptrainers
| @Apptrainers
• We can see that the column "cum_population" is set to NaN, as we haven't
provided any data for it.
• We will assign now the cumulative sums to this column:
city_frame["cum_population"] =city_frame["population"].cumsum()
print(city_frame)
| @Apptrainers
| @Apptrainers
• We can also include a column name which is not contained
in the dictionary, when we create the DataFrame from the
dictionary. In this case, all the values of this column will be
set to NaN:
city_frame = pd.DataFrame(cities, columns=["country",
"area", "population"], index=cities["name"])
print(city_frame)
| @Apptrainers
| @Apptrainers
Accessing the Columns of a DataFrame
• There are two ways to access a column of a DataFrame. The result is
in both cases a Series:
# in a dictionary-like way:
print(city_frame["population"])
| @Apptrainers
| @Apptrainers
# as an attribute
print(city_frame.population)
| @Apptrainers
| @Apptrainers
print(type(city_frame.population))
<class 'pandas.core.series.Series'>
| @Apptrainers
city_frame.population
From the previous example, we can see that we
have not copied the population column. "p" is a
view on the data of city_frame.
| @Apptrainers
Assigning New Values to a Column
• The column area is still not defined. We can set all elements of the
column to the same value:
city_frame["area"] = 1572
print(city_frame)
| @Apptrainers
| @Apptrainers
• In this case, it will be definitely better to assign the exact area to the
cities. The list with the area values needs to have the same length as
the number of rows in our DataFrame.
# area in square km:
area = [1572, 891.85, 605.77, 1285, 105.4, 414.6, 228, 755, 525.2, 517,
101.9, 310.4, 181.8]
# area could have been designed as a list, a Series, an array or a scalar
city_frame["area"] = area
print(city_frame)
| @Apptrainers
| @Apptrainers
Sorting DataFrames
city_frame = city_frame.sort_values(by="area", ascending=False)
print(city_frame)
| @Apptrainers
Let's assume, we have only the areas of London, Hamburg and Milan.
The areas are in a series with the correct indices. We can assign this
series as well:
city_frame = pd.DataFrame(cities, columns=["country", "area",
"population"], index=cities["name"])
some_areas = pd.Series([1572, 755, 181.8], index=['London',
'Hamburg', 'Milan'])
city_frame['area'] = some_areas
print(city_frame)
| @Apptrainers
| @Apptrainers
Inserting new columns into existing
DataFrames
• In the previous example we have added the column area at creation
time. Quite often it will be necessary to add or insert columns into
existing DataFrames.
• For this purpose the DataFrame class provides a method "insert",
which allows us to insert a column into a DataFrame at a specified
location:
insert(self, loc, column, value, allow_duplicates=False)`
| @Apptrainers
| @Apptrainers
city_frame = pd.DataFrame(cities, columns=["country",
"population"], index=cities["name"])
idx = 1
city_frame.insert(loc=idx, column='area', value=area)
print(city_frame)
<class 'pandas.core.frame.DataFrame'>
| @Apptrainers
| @Apptrainers
| @Apptrainers
DataFrame from Nested Dictionaries
A nested dictionary of dictionaries can be passed to a DataFrame as
well.
The indices of the outer dictionary are taken as the columns and the
inner keys. i.e. the keys of the nested dictionaries, are used as the row
indices:
| @Apptrainers
| @Apptrainers
| @Apptrainers
• You like to have the years in the columns and the countries in the
rows? No problem, you can transpose the data:
growth_frame.T
| @Apptrainers
| @Apptrainers
• Consider:
growth_frame = growth_frame.T
growth_frame2 = growth_frame.reindex(["Switzerland", "Italy",
"Germany", "Greece"]) # remove France
print(growth_frame2)
| @Apptrainers
| @Apptrainers
Filling a DataFrame with random values:
import numpy as np
names = ['Frank', 'Eve', 'Stella', 'Guido', 'Lara']
index = ["January", "February", "March", "April", "May", "June", "July",
"August", "September", "October", "November", "December"]
df = pd.DataFrame((np.random.randn(12, 5)*1000).round(2),
columns=names, index=index)
print(df)
randn: returns sample or samples of random numbers from a normal
distribution with Mean as 1st argument and VAR as second argument.
| @Apptrainers
| @Apptrainers
Summary
• So far we have covered the following:
• Python 3.0 (scalers, lists, dictionaries, loops, selection, functions)
• Numpy
• Pandas
• The reason for studying these packages is to be able to program the 5
steps in any data science process.
| @Apptrainers

More Related Content

What's hot

Introduction to Python Programing
Introduction to Python ProgramingIntroduction to Python Programing
Introduction to Python Programingsameer patil
 
Chapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYA
Chapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYAChapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYA
Chapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYAMaulik Borsaniya
 
Basic Concepts in Python
Basic Concepts in PythonBasic Concepts in Python
Basic Concepts in PythonSumit Satam
 
Type hints in python & mypy
Type hints in python & mypyType hints in python & mypy
Type hints in python & mypyAnirudh
 
Introduction to Structure Programming with C++
Introduction to Structure Programming with C++Introduction to Structure Programming with C++
Introduction to Structure Programming with C++Mohamed Essam
 
Introduction to Python Programming
Introduction to Python ProgrammingIntroduction to Python Programming
Introduction to Python ProgrammingAkhil Kaushik
 
Python programming introduction
Python programming introductionPython programming introduction
Python programming introductionSiddique Ibrahim
 
Python Programming - I. Introduction
Python Programming - I. IntroductionPython Programming - I. Introduction
Python Programming - I. IntroductionRanel Padon
 
C++ question and answers
C++ question and answersC++ question and answers
C++ question and answersAdenKheire
 
Python indroduction
Python indroductionPython indroduction
Python indroductionFEG
 
Chapter 0 Python Overview (Python Programming Lecture)
Chapter 0 Python Overview (Python Programming Lecture)Chapter 0 Python Overview (Python Programming Lecture)
Chapter 0 Python Overview (Python Programming Lecture)IoT Code Lab
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterSudhang Shankar
 

What's hot (20)

Python basics
Python basicsPython basics
Python basics
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
C programming notes
C programming notesC programming notes
C programming notes
 
Introduction to Python Programing
Introduction to Python ProgramingIntroduction to Python Programing
Introduction to Python Programing
 
Chapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYA
Chapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYAChapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYA
Chapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYA
 
Basic Concepts in Python
Basic Concepts in PythonBasic Concepts in Python
Basic Concepts in Python
 
Type hints in python & mypy
Type hints in python & mypyType hints in python & mypy
Type hints in python & mypy
 
Python ppt
Python pptPython ppt
Python ppt
 
Symbolic mathematics
Symbolic mathematicsSymbolic mathematics
Symbolic mathematics
 
Introduction to Structure Programming with C++
Introduction to Structure Programming with C++Introduction to Structure Programming with C++
Introduction to Structure Programming with C++
 
Introduction to Python Programming
Introduction to Python ProgrammingIntroduction to Python Programming
Introduction to Python Programming
 
Python basics
Python basicsPython basics
Python basics
 
Python programming introduction
Python programming introductionPython programming introduction
Python programming introduction
 
basics of c++
basics of c++basics of c++
basics of c++
 
Presentation on python
Presentation on pythonPresentation on python
Presentation on python
 
Python Programming - I. Introduction
Python Programming - I. IntroductionPython Programming - I. Introduction
Python Programming - I. Introduction
 
C++ question and answers
C++ question and answersC++ question and answers
C++ question and answers
 
Python indroduction
Python indroductionPython indroduction
Python indroduction
 
Chapter 0 Python Overview (Python Programming Lecture)
Chapter 0 Python Overview (Python Programming Lecture)Chapter 0 Python Overview (Python Programming Lecture)
Chapter 0 Python Overview (Python Programming Lecture)
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC cluster
 

Similar to علم البيانات - Data Sience

python-online&offline-training-in-kphb-hyderabad (1) (1).pdf
python-online&offline-training-in-kphb-hyderabad (1) (1).pdfpython-online&offline-training-in-kphb-hyderabad (1) (1).pdf
python-online&offline-training-in-kphb-hyderabad (1) (1).pdfKosmikTech1
 
Help with Pyhon Programming Homework
Help with Pyhon Programming HomeworkHelp with Pyhon Programming Homework
Help with Pyhon Programming HomeworkHelpmeinhomework
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to pythonRanjith kumar
 
Python for Physical Science.pdf
Python for Physical Science.pdfPython for Physical Science.pdf
Python for Physical Science.pdfMarilouANDERSON
 
CSC2308 - PRINCIPLE OF PROGRAMMING II.pdf
CSC2308 - PRINCIPLE OF PROGRAMMING II.pdfCSC2308 - PRINCIPLE OF PROGRAMMING II.pdf
CSC2308 - PRINCIPLE OF PROGRAMMING II.pdfAbdulmalikAhmadLawan2
 
Python Programming - II. The Basics
Python Programming - II. The BasicsPython Programming - II. The Basics
Python Programming - II. The BasicsRanel Padon
 
Python interview questions and answers
Python interview questions and answersPython interview questions and answers
Python interview questions and answersRojaPriya
 
Lecture1_introduction to python.pptx
Lecture1_introduction to python.pptxLecture1_introduction to python.pptx
Lecture1_introduction to python.pptxMohammedAlYemeni1
 
Python (3).pdf
Python (3).pdfPython (3).pdf
Python (3).pdfsamiwaris2
 
Python interview questions and answers
Python interview questions and answersPython interview questions and answers
Python interview questions and answerskavinilavuG
 
00 C hello world.pptx
00 C hello world.pptx00 C hello world.pptx
00 C hello world.pptxCarla227537
 
Automation Testing theory notes.pptx
Automation Testing theory notes.pptxAutomation Testing theory notes.pptx
Automation Testing theory notes.pptxNileshBorkar12
 

Similar to علم البيانات - Data Sience (20)

Python fundamentals
Python fundamentalsPython fundamentals
Python fundamentals
 
python-online&offline-training-in-kphb-hyderabad (1) (1).pdf
python-online&offline-training-in-kphb-hyderabad (1) (1).pdfpython-online&offline-training-in-kphb-hyderabad (1) (1).pdf
python-online&offline-training-in-kphb-hyderabad (1) (1).pdf
 
Spsl iv unit final
Spsl iv unit  finalSpsl iv unit  final
Spsl iv unit final
 
Spsl iv unit final
Spsl iv unit  finalSpsl iv unit  final
Spsl iv unit final
 
Help with Pyhon Programming Homework
Help with Pyhon Programming HomeworkHelp with Pyhon Programming Homework
Help with Pyhon Programming Homework
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Python for Physical Science.pdf
Python for Physical Science.pdfPython for Physical Science.pdf
Python for Physical Science.pdf
 
Unit -1 CAP.pptx
Unit -1 CAP.pptxUnit -1 CAP.pptx
Unit -1 CAP.pptx
 
Introduction Of C++
Introduction Of C++Introduction Of C++
Introduction Of C++
 
lecture 2.pptx
lecture 2.pptxlecture 2.pptx
lecture 2.pptx
 
CSC2308 - PRINCIPLE OF PROGRAMMING II.pdf
CSC2308 - PRINCIPLE OF PROGRAMMING II.pdfCSC2308 - PRINCIPLE OF PROGRAMMING II.pdf
CSC2308 - PRINCIPLE OF PROGRAMMING II.pdf
 
Python Programming - II. The Basics
Python Programming - II. The BasicsPython Programming - II. The Basics
Python Programming - II. The Basics
 
Python interview questions and answers
Python interview questions and answersPython interview questions and answers
Python interview questions and answers
 
Pyhton-1a-Basics.pdf
Pyhton-1a-Basics.pdfPyhton-1a-Basics.pdf
Pyhton-1a-Basics.pdf
 
Lecture1_introduction to python.pptx
Lecture1_introduction to python.pptxLecture1_introduction to python.pptx
Lecture1_introduction to python.pptx
 
Python (3).pdf
Python (3).pdfPython (3).pdf
Python (3).pdf
 
Python interview questions and answers
Python interview questions and answersPython interview questions and answers
Python interview questions and answers
 
00 C hello world.pptx
00 C hello world.pptx00 C hello world.pptx
00 C hello world.pptx
 
Automation Testing theory notes.pptx
Automation Testing theory notes.pptxAutomation Testing theory notes.pptx
Automation Testing theory notes.pptx
 
Core python programming tutorial
Core python programming tutorialCore python programming tutorial
Core python programming tutorial
 

Recently uploaded

Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxraviapr7
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfMohonDas
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17Celine George
 
Presentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphPresentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphNetziValdelomar1
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17Celine George
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICESayali Powar
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfTechSoup
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxAditiChauhan701637
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptxraviapr7
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17Celine George
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxiammrhaywood
 
Human-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesHuman-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesMohammad Hassany
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17Celine George
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and stepobaje godwin sunday
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17Celine George
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.raviapr7
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17Celine George
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesCeline George
 

Recently uploaded (20)

Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptx
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdf
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17
 
Presentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphPresentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a Paragraph
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICE
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptx
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
 
Human-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesHuman-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming Classes
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and step
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 Sales
 

علم البيانات - Data Sience

  • 2. • Introduction to Python • Numpy • Pandas | @Apptrainers content
  • 4. “In December 1989, I was looking for a "hobby" programming project that would keep me occupied during the week around Christmas. My office ... would be closed, but I had a home computer, and not much else on my hands. I decided to write an interpreter for the new scripting language I had been thinking about lately: a descendant of ABC that would appeal to Unix/C hackers. I chose Python as a working title for the project, being in a slightly irreverent mood (and a big fan of Monty Python's Flying Circus).” — Guido van Rossum 4| @Apptrainers
  • 5.  The big technology companies have each largely aligned themselves with different languages stacks.  Oracle and IBM are aligned with Java (Oracle actually owns Java).  Google are known for their use of Python (1997), a very versatile, dynamic and extensible language, although in reality they are also heavy users of C++ and Java. They have also created their own language called Go (2009). 5| @Apptrainers
  • 6.  Easy to learn and powerful programming language  It has efficient high-level data structures and a simple but effective approach to object- oriented programming.  Freely available in source or binary form for all major platforms from the Python Web site, https://www.python.org/ The Python interpreter is easily extended with new functions and data types implemented in C or C++ (or other languages callable from C). Python is also suitable as an extension language for customizable applications. Widely used (Google, NASA, Quora). 6 | @Apptrainers
  • 7. When you run python program an interpreter will parse python program line by line basis, as compared to compiled languages like C or C++, where compiler first compiles the program and then start running. Difference is that interpreted languages are little bit slow as compared to compiled languages. 7| @Apptrainers
  • 8.  In python you don’t need to define variable data type ahead of time, python automatically guesses the data type of the variable based on the type of value it contains. 8| @Apptrainers
  • 9. Python codes are usually 1/3 or 1/5 of the java code. It means we can write less code in Python to achieve the same thing as in Java. 9| @Apptrainers
  • 10.  There are many good options for saving and manipulating code Sublime text (unlimited free trial available) Notepad++ Xcode (Mac) TextWrangler (Mac) TextEdit (Mac)  Now there are multiple platforms for taking online courses for free Coursera Edx Stanford Online Khan Academy Udacity | @Apptrainers 10
  • 11.  To download Python follow the instructions on the official website! https://www.python.org/ 11| @Apptrainers
  • 12. I would strongly recommend this video: https://www.youtube.com/watch?v=HW29067qVWk 12| @Apptrainers
  • 15. “GitHub is a code hosting platform for version control and collaboration. It lets you and others work together on projects from anywhere”. GitHub accounts can be public (free) or private (not free) A repository is usually used to organize a single project, It contains folders and files, images, videos, spreadsheets, and data sets – anything your project needs. 15| @Apptrainers
  • 16. Master in a repository:The final version Branch:To try out new ideas that don’t affect the master unless pull request is accepted. Any changes committed to branch reflects for you to keep track of different versions Adding Commits:To Keep track (history) of user progress on a branch or master. Forking a repository: creates a copy of Repo. Submit a pull request to owner so that the owner can incorporate changes. 16| @Apptrainers
  • 17.  Download Python and Jupyter Notebook  Write a python code to print your name, your id, and your favorite quote!  Save the project as .html and as .ipynb  Install git and create a GitHub account  Upload your first project as .html to e-learning  Upload your first project as .ipynb to your Github account Share the link of your Github with me on e-learning 17| @Apptrainers
  • 19. You can type things directly into a running Python session 19| @Apptrainers
  • 20. Most of the programming languages like C, C++, Java use braces { } to define a block of code. Python uses indentation. A code block (body of a function, loop etc.) starts with indentation and ends with the first unindented line.The amount of indentation is up to you, but it must be consistent throughout that block. Generally four whitespaces are used for indentation and is preferred over tabs. Here is an example. for i in range(1,11): print(i) if i == 5: break Incorrect indentation will result into IndentationError. 20| @Apptrainers
  • 21. In Python, we use the hash (#) symbol to start writing a comment. It extends up to the newline character. Comments are for programmers for better understanding of a program. Python Interpreter ignores comment. #This is a comment #print out Hello print('Hello’) If we have comments that extend multiple lines, one way of doing it is to use hash (#) in the beginning of each line. Another way of doing this is to use triple quotes, either ’‘ ' or ” ” ". These triple quotes are generally used for multi-line strings. But they can be used as multi-line comment as well. """This is also a perfect example of multi-line comments""" 21| @Apptrainers
  • 22. expression: A data value or set of operations to compute a value. Examples: 1 + 4 * 3 42 Arithmetic operators we will use: + - * / addition, subtraction, multiplication, division % modulus, a.k.a. remainder ** exponentiation precedence: Order in which operations are computed. * / % ** have a higher precedence than + - 1 + 3 * 4 is 13 Parentheses can be used to force a certain order of evaluation. (1 + 3) * 4 is 16 Operat or Description Example = Assignment num = 7 + Addition num = 2 + 2 - Subtraction num = 6 - 4 * Multiplication num = 5 * 4 / Division num = 25 / 5 % Modulo num = 8 % 3 ** Exponent num = 9 ** 2 22| @Apptrainers
  • 23. When we divide integers with / , the quotient is also an integer.  35 / 5 is 7  84 / 10 is 8  156 / 100 is 1 The % operator computes the remainder from a division of integers.  The operators + - * / % ** ( ) all work for real numbers.  The / produces an exact answer: 15.0 / 2.0 is 7.5  The same rules of precedence also apply to real numbers: Evaluate ( ) before * / % before + -  When integers and reals are mixed, the result is a real number.  Example: 1 / 2.0 is 0.5 The conversion occurs on a per-operator basis 7 / 3 * 1.2 + 3 / 2 2 * 1.2 + 3 / 2 2.4 + 3 / 2 2.4 + 1 3.4 23| @Apptrainers
  • 24. Python has useful commands for performing calculations. Command name Description abs(value) absolute value ceil(value) rounds up cos(value) cosine, in radians floor(value) rounds down log(value) logarithm, base e log10(value) logarithm, base 10 max(value1, value2) larger of two values min(value1, value2) smaller of two values round(value) nearest whole number sin(value) sine, in radians sqrt(value) square root Constant Description e 2.7182818... pi 3.1415926... To use many of these commands, you must write the following at the top of your Python program: from math import * 24| @Apptrainers
  • 25. variable: A named piece of memory that can store a value. Usage:  Compute an expression's result,  store that result into a variable,  and use that variable later in the program. assignment statement: Stores a value into a variable. Syntax: name = value Examples: x = 5 gpa = 3.14 x 5 gpa 3.14 A variable that has been given a value can be used in expressions. x + 4 is 9 Exercise: Evaluate the quadratic equation for a given a, b, and c. 25| @Apptrainers
  • 26.  print : Produces text output on the console.  Syntax: print ("Message”) print (Expression)  Prints the given text message or expression value on the console, and moves the cursor down to the next line. print (Item1, Item2, ..., ItemN)  Prints several messages and/or expressions on the same line.  Examples: print ("Hello, world!”) age = 45 print ("You have", 65 - age, "years until retirement”) Output: Hello, world! You have 20 years until retirement 26| @Apptrainers
  • 27.  input : Reads a number from user input.  You can assign (store) the result of input into a variable.  Example: age = input("How old are you? ") print ("Your age is", age) print ("You have", 65 - age, "years until retirement”) Output: How old are you? 53 Your age is 53 You have 12 years until retirement  Exercise: Write a Python program that prompts the user for his/her amount of money, then reports how many Nintendo Wiis the person can afford, and how much more money he/she will need to afford an additional Wii. 27| @Apptrainers
  • 28. for loop: Repeats a set of statements over a group of values.  Syntax: for variableName in groupOfValues: statements  We indent the statements to be repeated with tabs or spaces.  variableName gives a name to each value, so you can refer to it in the statements.  groupOfValues can be a range of integers, specified with the range function.  Example: for x in range(1, 6): print (x, "squared is", x * x) Output: 1 squared is 1 2 squared is 4 3 squared is 9 4 squared is 16 5 squared is 25 28| @Apptrainers
  • 30. The range function specifies a range of integers:  range(start, stop) - the integers between start (inclusive) and stop (exclusive) It can also accept a third value specifying the change between values.  range(start, stop, step) - the integers between start (inclusive) and stop (exclusive) by step Example: for x in range(5, 0, -1): print (x) print (”Hello!”) Output: 5 4 3 2 1 Hello! 30| @Apptrainers
  • 31.  Some loops incrementally compute a value that is initialized outside the loop. This is sometimes called a cumulative sum. sum = 0 for i in range(1, 11): sum = sum + (i * i) print ("sum of first 10 squares is", sum) Output: sum of first 10 squares is 385 Exercise: Write a Python program that computes the factorial of an integer. 31| @Apptrainers
  • 32. if statement: Executes a group of statements only if a certain condition is true. Otherwise, the statements are skipped. Syntax: if condition: statements Example: gpa = 3.4 if gpa > 2.0: print ("Your application is accepted.”) 32| @Apptrainers
  • 33. if/else statement: Executes one block of statements if a certain condition is True, and a second block of statements if it is False.  Syntax: if condition: statements else: statements Example: gpa = 1.4 if gpa > 2.0: print "Welcome to JUST University!" else: print "Your application is denied." Multiple conditions can be chained with elif ("else if"): if condition: statements elif condition: statements else: statements 33| @Apptrainers
  • 34. while loop: Executes a group of statements as long as a condition is True. good for indefinite loops (repeat an unknown number of times) Syntax: while condition: statements Example: number = 1 while number < 200: print number, number = number * 2 Output: 1 2 4 8 16 32 64 128 34| @Apptrainers
  • 35. Many logical expressions use relational operators: Logical expressions can be combined with logical operators: Exercise: Write code to display and count the factors of a number. Operator Example Result and 9 != 6 and 2 < 3 True or 2 == 3 or -1 < 5 True not not 7 > 0 False Operator Meaning Example Result == equals 1 + 1 == 2 True != does not equal 3.2 != 2.5 True < less than 10 < 5 False > greater than 10 > 5 True <= less than or equal to 126 <= 100 False >= greater than or equal to 5.0 >= 5.0 True 35| @Apptrainers
  • 36.  string: A sequence of text characters in a program.  Strings start and end with quotation mark " or apostrophe ' characters.  Examples: "hello" "This is a string" "This, too, is a string. It can be very long!"  A string may not span across multiple lines or contain a " character. "This is not a legal String." "This is not a "legal" String either."  A string can represent characters by preceding them with a backslash.  t tab character  n new line character  " quotation mark character  backslash character  Example: "HellottherenHow are you?" 36| @Apptrainers
  • 37.  Characters in a string are numbered with indexes starting at 0:  Example: name = "P. Diddy"  Accessing an individual character of a string: variableName [ index ]  Example: print name, "starts with", name[0] Output: P. Diddy starts with P index 0 1 2 3 4 5 6 7 character P . D i d d y 37| @Apptrainers
  • 38. len(string) - number of characters in a string (including spaces) str.lower(string) - lowercase version of a string str.upper(string) - uppercase version of a string Example: name = "Martin Douglas Stepp" length = len(name) big_name = str.upper(name) print big_name, "has", length, "characters" Output: MARTIN DOUGLAS STEPP has 20 characters 38| @Apptrainers
  • 39. A compound data type: [0] [2.3, 4.5] [5, "Hello", "there", 9.8] [] Use len() to get the length of a list >>> names = [“Ben",“Chen",“Yaqin"] >>> len(names) 3 39| @Apptrainers
  • 43. Certain features of Python are not loaded by default In order to use these features, you’ll need to import the modules that contain them. E.g. import matplotlib.pyplot as plt import numpy as np 43| @Apptrainers
  • 44. f = 7 / 2 # in python 2, f will be 3, unless “from __future__ import division” f = 7 / 2 # in python 3 f = 3.5 f = 7 // 2 # f = 3 in both python 2 and 3 f = 7 / 2. # f = 3.5 in both python 2 and 3 f = 7 / float(2) # f is 3.5 in both python 2 and 3 f = int(7 / 2) # f is 3 in both python 2 and 3 44| @Apptrainers
  • 45.  Get the i-th element of a list x = [i for i in range(10)] # is the list [0, 1, ..., 9] zero = x[0] # equals 0, lists are 0-indexed one = x[1] # equals 1 nine = x[-1] # equals 9, 'Pythonic' for last element eight = x[-2] # equals 8, 'Pythonic' for next-to-last element one_to_four = x[1:5] # [1, 2, 3, 4] first_three = x[:3] # [0, 1, 2] last_three = x[-3:] # [7, 8, 9] three_to_end = x[3:] # [3, 4, ..., 9] without_first_and_last = x[1:-1] # [1, 2, ..., 8] copy_of_x = x[:] # [0, 1, 2, ..., 9] another_copy_of_x = x[:3] + x[3:] # [0, 1, 2, ..., 9] 45| @Apptrainers
  • 46. 1 in [1, 2, 3] # True 0 in [1, 2, 3] # False x = [1, 2, 3] y = [4, 5, 6] x.extend(y) # x is now [1,2,3,4,5,6] x = [1, 2, 3] y = [4, 5, 6] z = x + y # z is [1,2,3,4,5,6]; x is unchanged. x, y = [1, 2] # x is 1 and y is 2 [x, y] = 1, 2 # same as above x, y = [1, 2] # same as above x, y = 1, 2 # same as above _, y = [1, 2] # y is 2, didn't care about the first element 46| @Apptrainers
  • 47. >>> a = ['Mary', 'had', 'a', 'little', 'lamb'] >>> for i in range(len(a)): ... print(i, a[i]) ... 0 Mary 1 had 2 a 3 little 4 lamb 47| @Apptrainers
  • 48. What are the expected output for the following code? a = list(range(10)) b = a b[0] = 100 print(a) a = list(range(10)) b = a[:] b[0] = 100 print(a) [100, 1, 2, 3, 4, 5, 6, 7, 8, 9] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] a = [0, 1, 2, 3, 4] b = a c = a[:] a == b Out[129]: True a is b Out[130]: True a == c Out[132]: True a is c Out[133]: False 48| @Apptrainers
  • 49. Similar to lists, but are immutable a_tuple = (0, 1, 2, 3, 4) Other_tuple = 3, 4 Another_tuple = tuple([0, 1, 2, 3, 4]) Hetergeneous_tuple = (‘john’, 1.1, [1, 2]) Can be sliced, concatenated, or repeated a_tuple[2:4] # will print (2, 3) Cannot be modified a_tuple[2] = 5 TypeError: 'tuple' object does not support item assignment Note: tuple is defined by comma, not parentheses, which is only used for convenience and grouping elements. So a = (1) is not a tuple, but a = (1,) is. 49| @Apptrainers
  • 50. Useful for returning multiple values from functions Tuples and lists can also be used for multiple assignments def sum_and_product(x, y): return (x + y),(x * y) sp = sum_and_product(2, 3) # equals (5, 6) s, p = sum_and_product(5, 10) # s is 15, p is 50 x, y = 1, 2 [x, y] = [1, 2] (x, y) = (1, 2) x, y = y, x 50| @Apptrainers
  • 51. a = [1, 2, 3, 4, 5, 6] my_tuple=(a,) my_tuple[0]=a #### ERROR a = [1, 2, 3, 4, 5, 6] my_tuple=(a) my_tuple[0]=a #### No ERROR a = [1, 2, 3, 4, 5, 6] my_tuple=(a,) my_tuple[0]=5 #### ERROR a = [1, 2, 3, 4, 5, 6] my_tuple=(a,) my_tuple[0][0]=5 #### No ERROR 51| @Apptrainers
  • 52. A dictionary associates values with unique keys empty_dict = {} # Pythonic empty_dict2 = dict() # less Pythonic grades = { "Joel" : 80, "Tim" : 95 } # dictionary literal joels_grade = grades["Joel"] # equals 80 grades["Tim"] = 99 # replaces the old value grades["Kate"] = 100 # adds a third entry num_students = len(grades) # equals 3 • Access/modify value with key try: kates_grade = grades["Kate"] except KeyError: print "no grade for Kate!" 52| @Apptrainers
  • 54. Check for existence of key joel_has_grade = "Joel" in grades # True kate_has_grade = "Kate" in grades # False joels_grade = grades.get("Joel", 0) # equals 80 kates_grade = grades.get("Kate", 0) # equals 0 no_ones_grade = grades.get("No One") # default default is None • Use “get” to avoid keyError and add default value • Get all items all_keys = grades.keys() # return a list of all keys all_values = grades.values() # return a list of all values all_pairs = grades.items() # a list of (key, value) tuples #Which of the following is faster? 'Joel' in grades # faster. Hashtable 'Joel' in all_keys # slower. List. In python3,The following will not return lists but iterable objects 54| @Apptrainers
  • 55. a = [0, 0, 0, 1] any(a) Out[135]: True all(a) Out[136]: False 55| @Apptrainers
  • 56. try: print 0 / 0 except ZeroDivisionError: print ("cannot divide by zero") https://docs.python.org/3/tutorial/errors.ht ml 56| @Apptrainers
  • 57. Functions are defined using def def double(x): """this is where you put an optional docstring that explains what the function does. for example, this function multiplies its input by 2""" return x * 2 • You can call a function after it is defined z = double(10) # z is 20 • You can give default values to parameters def my_print(message="my default message"): print (message) my_print("hello") # prints 'hello' my_print() # prints 'my default message‘ 57| @Apptrainers
  • 58. Sometimes it is useful to specify arguments by name def subtract(a=0, b=0): return a – b subtract(10, 5) # returns 5 subtract(0, 5) # returns -5 subtract(b = 5) # same as above subtract(b = 5, a = 0) # same as above 58| @Apptrainers
  • 59. Functions are objects too In [12]: def double(x): return x * 2 ...: DD = double; ...: DD(2) ...: Out[12]: 4 In [16]: def apply_to_one(f): ...: return f(1) ...: x=apply_to_one(DD) ...: x ...: Out[16]: 2 59| @Apptrainers
  • 60. Small anonymous functions can be created with the lambda keyword. The power of lambda is better shown when you use them as an anonymous function inside another function. def myfunc(n): return lambda a : a * n mydoubler = myfunc(2) mytripler = myfunc(3) print(mydoubler(11)) print(mytripler(11)) A lambda function can take any number of arguments, but can only have one expression. x = lambda a : a + 10 print(x(5)) x = lambda a, b, c : a * b - c print(x(5, 6, 2)) 60| @Apptrainers
  • 61. pairs = [(2, 'two'), (3, 'three'), (1, 'one'), (4, 'four')] pairs.sort(key = lambda pair: pair[0]) print (pairs) Out[22]: [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')] def getKey(pair): return pair[0] pairs.sort(key=getKey) print (pairs) Out[107]: [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four') 61| @Apptrainers
  • 62. A very convenient way to create a new list squares = [x * x for x in range(5)] print (squares) Out[52]: [0, 1, 4, 9, 16] squares=[0,0,0,0,0] for x in range(5): squares[x] = x * x print (squares) Out[64]: [0, 1, 4, 9, 16] 62| @Apptrainers
  • 63. In [68]: even_numbers = [] In [69]: for x in range(5): ...: if x % 2 == 0: ...: even_numbers.append(x) ...: even_numbers Out[69]: [0, 2, 4] In [65]: even_numbers = [x for x in range(5) if x % 2 == 0] In [66]: even_numbers Out[66]: [0, 2, 4] Can also be used to filter list 63| @Apptrainers
  • 64. More complex examples: # create 100 pairs (0,0) (0,1) ... (9,8), (9,9) pairs = [(x, y) for x in range(10) for y in range(10)] # only pairs with x < y, # range(lo, hi) equals # [lo, lo + 1, ..., hi - 1] increasing_pairs = [(x, y) for x in range(10) for y in range(x + 1, 10)] [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9), (1, 2), (1, 3) …etc 64| @Apptrainers
  • 65. Convenient tools in python to apply function to sequences of data def double(x): return 2*x b=range(5) list(map(double, b)) Out[203]: [0, 2, 4, 6, 8] In [204]: double(b) Traceback (most recent call last): TypeError: unsupported operand type(s) for *: 'int' and 'range' def double(x): return 2*x print ([double(i) for i in range(5)]) Out[205]: [0, 2, 4, 6, 8] 65| @Apptrainers
  • 66. map_output = map(lambda x: x*2, [1, 2, 3, 4]) print(map_output) # Output: map object: <map object at 0x04D6BAB0> list_map_output = list(map_output) print(list_map_output) # Output: [2, 4, 6, 8] map(lambda x : x*2, [1, 2, 3, 4]) #Output [2, 4, 6, 8] map(lambda x, y: x + y, list_a, list_b) # Output: [11, 22, 33] 66| @Apptrainers
  • 67. def is_even(x): return x%2==0 a=[0, 1, 2, 3] list(filter(is_even, a)) Out[208]: [0, 2] In [209]: [a[i] for i in a if is_even(i)] Out[209]: [0, 2] a = [1, 2, 3, 4, 5, 6] print list(filter(lambda x : x % 2 == 0, a)) # Output: [2, 4, 6] 67| @Apptrainers
  • 68. In [216]: from functools import reduce In [217]: reduce(lambda x, y: x+y, range(10)) Out[217]: 45 In [220]: reduce(lambda x, y: x*y, [1, 2, 3, 4]) Out[220]: 24 68| @Apptrainers
  • 69. Useful to combined multiple lists into a list of tuples In [238]: list(zip(['a', 'b', 'c'], [1, 2, 3], ['A', 'B', 'C'])) Out[238]: [('a', 1, 'A'), ('b', 2, 'B'), ('c', 3, 'C')] In [245]: names = ['James', 'Tom', 'Mary'] ...: grades = [100, 90, 95] ...: list(zip(names, grades)) ...: Out[245]: [('James', 100), ('Tom', 90), ('Mary', 95)] 69| @Apptrainers
  • 70.  file object = open(file_name [, access_mode]) access_mode − The access_mode determines the mode in which the file has to be opened, i.e., read, write, append, etc. A complete list of possible values is given below in the table.This is optional parameter and the default file access mode is read (r). 70| @Apptrainers
  • 72. read(): It reads the entire file and returns it contents in the form of a string readline(): It reads the first line of the file i.e till a newline character or an EOF in case of a file having a single line and returns a string readlines(): It reads the entire file line by line and returns a list of line strings 1 hello 40 50 hi This is my course Welcome to this course n wish you all the best f = open("my_file2.txt", 'w') f.write("Hello Everyone!") 72| @Apptrainers
  • 73. Notice how each piece of data is separated by a comma. 73| @Apptrainers
  • 77. What is Numpy? • Numpy, Scipy, and Matplotlib provide MATLAB- like functionality in python. • Numpy Features:  Typed multidimentional arrays (matrices)  Fast numerical computations (matrix math)  High-level math functions 3 |@Apptrainers
  • 78. Why do we need NumPy Let’s see for ourselves! 4 |@Apptrainers
  • 79. Why do we need NumPy • Python does numerical computations slowly. • 1000 x 1000 matrix multiply  Python triple loop takes > 10 min.  Numpy takes ~0.03 seconds 5 |@Apptrainers
  • 80. NumPy Overview 1. Arrays 2. Shaping and transposition 3. Mathematical Operations 4. Indexing and slicing 5. Broadcasting 6 |@Apptrainers
  • 81. Arrays Structured lists of numbers. • Vectors • Matrices • Images • Tensors • ConvNets 7 |@Apptrainers
  • 82. Arrays Structured lists of numbers. • Vectors • Matrices • Images • Tensors • ConvNets 𝑝 𝑥 𝑝 𝑦 𝑝 𝑧 𝑎11 ⋯ 𝑎1𝑛 ⋮ ⋱ ⋮ 𝑎 𝑚1 ⋯ 𝑎 𝑚𝑛 8 |@Apptrainers
  • 83. Arrays Structured lists of numbers. • Vectors • Matrices • Images • Tensors • ConvNets 9 |@Apptrainers
  • 84. Arrays Structured lists of numbers. • Vectors • Matrices • Images • Tensors • ConvNets 10 |@Apptrainers
  • 85. Arrays Structured lists of numbers. • Vectors • Matrices • Images • Tensors • ConvNets 11 |@Apptrainers
  • 86. Arrays, Basic Properties import numpy as np a = np.array([[1,2,3],[4,5,6]],dtype=np.float32) print a.ndim, a.shape, a.dtype 1. Arrays can have any number of dimensions, including zero (a scalar). 2. Arrays are typed: np.uint8, np.int64, np.float32, np.float64 3. Arrays are dense. Each element of the array exists and has the same type. 12 |@Apptrainers
  • 87. Arrays, creation • np.ones, np.zeros • np.arange • np.concatenate • np.astype • np.zeros_like, np.ones_like • np.random.random 13 |@Apptrainers
  • 88. Arrays, creation • np.ones, np.zeros • np.arange • np.concatenate • np.astype • np.zeros_like, np.ones_like • np.random.random 14 |@Apptrainers
  • 89. Arrays, creation • np.ones, np.zeros • np.arange • np.concatenate • np.astype • np.zeros_like, np.ones_like • np.random.random 15 |@Apptrainers
  • 90. Arrays, creation • np.ones, np.zeros • np.arange • np.concatenate • np.astype • np.zeros_like, np.ones_like • np.random.random 16 |@Apptrainers
  • 91. Arrays, creation • np.ones, np.zeros • np.arange • np.concatenate • np.astype • np.zeros_like, np.ones_like • np.random.random 17 |@Apptrainers
  • 92. Arrays, creation • np.ones, np.zeros • np.arange • np.concatenate • np.astype • np.zeros_like, np.ones_like • np.random.random 18 |@Apptrainers
  • 93. Arrays, creation • np.ones, np.zeros • np.arange • np.concatenate • np.astype • np.zeros_like, np.ones_like • np.random.random 19 |@Apptrainers
  • 95. Arrays, creation • np.ones, np.zeros • np.arange • np.concatenate • np.astype • np.zeros_like, np.ones_like • np.random.random 21 |@Apptrainers
  • 96. Arrays, danger zone • Must be dense, no holes. • Must be one type • Cannot combine arrays of different shape 22 |@Apptrainers
  • 97. Shaping a = np.array([1,2,3,4,5,6]) a = a.reshape(3,2) a = a.reshape(2,-1) a = a.ravel() 1. Total number of elements cannot change. 2. Use -1 to infer axis shape 3. Row-major by default (MATLAB is column-major) 23 |@Apptrainers
  • 98. import numpy as np a = np.array([1,2,3,4,5,6]) print(a) print('-'*20) b=a.reshape(3,2) print(b) print('-'*20) c=a.reshape(2,-1) print(c) print('-'*20) d= a.ravel() print(d) 24 |@Apptrainers
  • 100. Return values • Numpy functions return either views or copies. • Views share data with the original array, like references in Java/C++. Altering entries of a view, changes the same entries in the original. • The numpy documentation says which functions return views or copies • np.copy, np.view make explicit copies and views. 26 |@Apptrainers
  • 101. Transposition a = np.arange(10).reshape(5,2) a = a.T a = a.transpose((1,0)) np.transpose permutes axes. a.T transposes the first two axes. 27 |@Apptrainers
  • 104. Saving and loading arrays np.savez(‘data.npz’, a=a) data = np.load(‘data.npz’) a = data[‘a’] 1. NPZ files can hold multiple arrays 2. np.savez_compressed similar. 30 |@Apptrainers
  • 105. Mathematical operators • Arithmetic operations are element-wise • Logical operator return a bool array • In place operations modify the array 31 |@Apptrainers
  • 106. Mathematical operators • Arithmetic operations are element-wise • Logical operator return a bool array • In place operations modify the array 32 |@Apptrainers
  • 107. Mathematical operators • Arithmetic operations are element-wise • Logical operator return a bool array • In place operations modify the array 33 |@Apptrainers
  • 108. Mathematical operators • Arithmetic operations are element-wise • Logical operator return a bool array • In place operations modify the array 34 |@Apptrainers
  • 109. Math, upcasting Just as in Python and Java, the result of a math operator is cast to the more general or precise datatype. uint64 + uint16 => uint64 float32 / int32 => float32 Warning: upcasting does not prevent overflow/underflow. You must manually cast first. Use case: images often stored as uint8. You should convert to float32 or float64 before doing math. 35 |@Apptrainers
  • 110. Math, universal functions Also called ufuncs Element-wise Examples:  np.exp  np.sqrt  np.sin  np.cos  np.isnan 36 |@Apptrainers
  • 111. Math, universal functions Also called ufuncs Element-wise Examples:  np.exp  np.sqrt  np.sin  np.cos  np.isnan 37 |@Apptrainers
  • 112. Math, universal functions Also called ufuncs Element-wise Examples:  np.exp  np.sqrt  np.sin  np.cos  np.isnan 38 |@Apptrainers
  • 113. Indexing x[0,0] # top-left element x[0,-1] # first row, last column x[0,:] # first row (many entries) x[:,0] # first column (many entries) Notes:  Zero-indexing  Multi-dimensional indices are comma-separated (i.e., a tuple) 39 |@Apptrainers
  • 115. Python Slicing Syntax: start:stop:step a = list(range(10)) a[:3] # indices 0, 1, 2 a[-3:] # indices 7, 8, 9 a[3:8:2] # indices 3, 5, 7 a[4:1:-1] # indices 4, 3, 2 (this one is tricky) 41 |@Apptrainers
  • 117. Axes a.sum() # sum all entries a.sum(axis=0) # sum over rows a.sum(axis=1) # sum over columns a.sum(axis=1, keepdims=True) 1. Use the axis parameter to control which axis NumPy operates on 2. Typically, the axis specified will disappear, keepdims keeps all dimensions 43 |@Apptrainers
  • 119. Broadcasting a = a + 1 # add one to every element When operating on multiple arrays, broadcasting rules are used. Each dimension must match, from right-to-left 1. Dimensions of size 1 will broadcast (as if the value was repeated). 2. Otherwise, the dimension must have the same shape. 3. Extra dimensions of size 1 are added to the left as needed. 45 |@Apptrainers
  • 120. Broadcasting example Suppose we want to add a color value to an image a.shape is 100, 200, 3 b.shape is 3 a + b will pad b with two extra dimensions so it has an effective shape of 1 x 1 x 3. So, the addition will broadcast over the first and second dimensions. 46 |@Apptrainers
  • 121. Broadcasting failures If a.shape is 100, 200, 3 but b.shape is 4 then a + b will fail. The trailing dimensions must have the same shape (or be 1) 47 |@Apptrainers
  • 122. Tips to avoid bugs 1. Know what your datatypes are. 2. Check whether you have a view or a copy. 3. Know np.dot vs np.multiply. 48 |@Apptrainers
  • 123. 49 numpy.dot numpy.dot(a, b, out=None) Dot product of two arrays. Specifically, • If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation). • If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a @ b is preferred. • If either a or b is 0-D (scalar), it is equivalent to multiply and using numpy.multiply(a, b) or a * b is preferred. • If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b. • If a is an N-D array and b is an M-D array (where M>=2), it is a sum product over the last axis of a and the second-to-last axis of b: dot(a, b)[i,j,k,m] = (a[i,j,:] * b[k,:,m]) |@Apptrainers
  • 129. What is Pandas? Pandas is a Python module, which is rounding up the capabilities of Numpy, Scipy and Matplotlab. The word pandas is an acronym which is derived from: "Python and data analysis" and "panel data". There is often some confusion about whether Pandas is an alternative to Numpy, SciPy and Matplotlib. The truth is that it is built on top of Numpy. This means that Numpy is required by pandas. Scipy and Matplotlib on the other hand are not required by pandas but they are extremely useful. That's why the Pandas project lists them as "optional dependency". | @Apptrainers
  • 130. What is Pandas? • Pandas is a software library written for the Python programming language. • It is used for data manipulation and analysis. • It provides special data structures and operations for the manipulation of numerical tables and time series. | @Apptrainers| @Apptrainers
  • 131. Common Data Structures in Pandas • Series • Data Frame | @Apptrainers| @Apptrainers
  • 132. Series • A Series is a one-dimensional labelled array-like object. • It is capable of holding any data type, e.g. integers, floats, strings, Python objects, and so on. • It can be seen as a data structure with two arrays: one functioning as the index, i.e. the labels, and the other one contains the actual data. | @Apptrainers
  • 133. Example import pandas as pd S = pd.Series([11, 28, 72, 3, 5, 8]) S The above code returns: 0 11 1 28 2 72 3 3 4 5 5 8 dtype: int64 | @Apptrainers
  • 134. • We can directly access the index and the values of our Series S: print(S.index) print(S.values) RangeIndex(start=0, stop=6, step=1) [11 28 72 3 5 8] | @Apptrainers
  • 135. • If we compare this to creating an array in numpy, there are still lots of similarities: import numpy as np X = np.array([11, 28, 72, 3, 5, 8]) print(X) print(S.values) # both are the same type: print(type(S.values), type(X)) [11 28 72 3 5 8] [11 28 72 3 5 8] <class 'numpy.ndarray'> <class 'numpy.ndarray'> | @Apptrainers
  • 136. Another example: fruits = ['apples', 'oranges', 'cherries', 'pears'] quantities = [20, 33, 52, 10] S = pd.Series(quantities, index=fruits) S Output: apples 20 oranges 33 cherries 52 pears 10 dtype: int64 | @Apptrainers
  • 137. If we add two series with the same indices, we get a new series with the same index and the corresponding values will be added: fruits = ['apples', 'oranges', 'cherries', 'pears'] S = pd.Series([20, 33, 52, 10], index=fruits) S2 = pd.Series([17, 13, 31, 32], index=fruits) print(S + S2) print(“sum of S: ", sum(S)) Output: apples 37 oranges 46 cherries 83 pears 42 dtype: int64 sum of S: 115 | @Apptrainers
  • 138. The indices do not have to be the same for the Series addition. The index will be the "union" of both indices. If an index doesn't occur in both Series, the value for this Series will be NaN: fruits = ['peaches', 'oranges', 'cherries', 'pears'] fruits2 = ['raspberries', 'oranges', 'cherries', 'pears'] S = pd.Series([20, 33, 52, 10], index=fruits) S2 = pd.Series([17, 13, 31, 32], index=fruits2) print(S + S2) Output: cherries 83.0 oranges 46.0 peaches NaN pears 42.0 raspberries NaN dtype: float64 | @Apptrainers
  • 139. fruits = ['apples', 'oranges', 'cherries', 'pears'] fruits_ro = ["mere", "portocale", "cireșe", "pere"] S = pd.Series([20, 33, 52, 10], index=fruits) S2 = pd.Series([17, 13, 31, 32], index=fruits_ro) print(S+S2) Output: apples NaN cherries NaN cireșe NaN mere NaN oranges NaN pears NaN pere NaN portocale NaN dtype: float64 | @Apptrainers
  • 140. It's possible to access single values of a Series or more than one value by a list of indices: print(S['apples']) 20 print(S[['apples', 'oranges', 'cherries']]) apples 20 oranges 33 cherries 52 dtype: int64 | @Apptrainers
  • 141. Similar to Numpy we can use scalar operations or mathematical functions on a series: import numpy as np print((S + 3) * 4) print("======================") print(np.sin(S)) Output: apples 92 oranges 144 cherries 220 pears 52 dtype: int64 ====================== apples 0.912945 oranges 0.999912 cherries 0.986628 pears -0.544021 dtype: float64 | @Apptrainers
  • 142. Pandas.Series.Apply Series.apply(func, convert_dtype=True, args=(), **kwds) Parameter Meaning func a function, which can be a NumPy function that will be applied to the entire Series or a Python function that will be applied to every single value of the series convert_dtype A boolean value. If it is set to True (default), apply will try to find better dtype for elementwise function results. If False, leave as dtype=object args Positional arguments which will be passed to the function "func" additionally to the values from the series. **kwds Additional keyword arguments will be passed as keywords to the function | @Apptrainers
  • 143. S.apply(np.sin) apples 0.912945 oranges 0.999912 cherries 0.986628 pears -0.544021 dtype: float64 | @Apptrainers
  • 144. • We can also use Python lambda functions. Let's assume, we have the following task: test the amount of fruit for every kind. If there are less than 50 available, we will augment the stock by 10: S.apply(lambda x: x if x > 50 else x+10 ) apples 30 oranges 43 cherries 52 pears 20 dtype: int64 | @Apptrainers
  • 145. Filtering with a Boolean array: S[S>30] oranges 33 cherries 52 dtype: int64 | @Apptrainers
  • 146. • A series can be seen as an ordered Python dictionary with a fixed length. "apples" in S True | @Apptrainers
  • 147. • We can even pass a dictionary to a Series object, when we create it. We get a Series with the dict's keys as the indices. The indices will be sorted. cities = {"London": 8615246, "Berlin": 3562166, "Madrid": 3165235, "Rome": 2874038, "Paris": 2273305, "Vienna": 1805681, "Bucharest":1803425, "Hamburg": 1760433, "Budapest": 1754000, "Warsaw": 1740119, "Barcelona":1602386, "Munich": 1493900, "Milan": 1350680} city_series = pd.Series(cities) print(city_series)
  • 149. NaN One problem in dealing with data analysis tasks consists in missing data. Pandas makes it as easy as possible to work with missing data. my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart", "Hamburg"] my_city_series = pd.Series(cities, index=my_cities) my_city_series
  • 151. • Due to the NaN values the population values for the other cities are turned into floats. There is no missing data in the following examples, so the values are int: my_cities = ["London", "Paris", "Berlin", "Hamburg"] my_city_series = pd.Series(cities, index=my_cities) my_city_series
  • 152. The Methods isnull() and notnull() my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart", "Hamburg"] my_city_series = pd.Series(cities, index=my_cities) print(my_city_series.isnull()) | @Apptrainers
  • 154. • We get also a NaN, if a value in the dictionary has a None: d = {"a":23, "b":45, "c":None, "d":0} S = pd.Series(d) print(S) | @Apptrainers
  • 157. Filtering out Missing Data It's possible to filter out missing data with the Series method dropna. It returns a Series which consists only of non-null data: import pandas as pd cities = {"London": 8615246, "Berlin": 3562166, "Madrid": 3165235, "Rome": 2874038, "Paris": 2273305, "Vienna": 1805681, "Bucharest":1803425, "Hamburg": 1760433, "Budapest": 1754000, "Warsaw": 1740119, "Barcelona":1602386, "Munich": 1493900, "Milan": 1350680} my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart", "Hamburg"] my_city_series = pd.Series(cities, index=my_cities) print(my_city_series.dropna()) | @Apptrainers
  • 159. Filling in Missing Data • In many cases you don't want to filter out missing data, but you want to fill in appropriate data for the empty gaps. A suitable method in many situations will be fillna: print(my_city_series.fillna(0)) London 8615246.0 Paris 2273305.0 Zurich 0.0 Berlin 3562166.0 Stuttgart 0.0 Hamburg 1760433.0 dtype: float64 | @Apptrainers
  • 160. • If we call fillna with a dictionary, we can provide the appropriate data, i.e. the population of Zurich and Stuttgart: missing_cities = {"Stuttgart":597939, "Zurich":378884} my_city_series.fillna(missing_cities) London 8615246.0 Paris 2273305.0 Zurich 378884.0 Berlin 3562166.0 Stuttgart 597939.0 Hamburg 1760433.0 dtype: float64 | @Apptrainers
  • 161. cities = {"London": 8615246, "Berlin": 3562166, "Madrid": 3165235, "Rome": 2874038, "Paris": 2273305, "Vienna": 1805681, "Bucharest":1803425, "Hamburg": 1760433, "Budapest": 1754000, "Warsaw": 1740119, "Barcelona":1602386, "Munich": 1493900, "Milan": 1350680} my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart", "Hamburg"] my_city_series = pd.Series(cities, index=my_cities) my_city_series = my_city_series.fillna(0).astype(int) print(my_city_series) | @Apptrainers
  • 162. London 8615246 Paris 2273305 Zurich 0 Berlin 3562166 Stuttgart 0 Hamburg 1760433 dtype: int64 | @Apptrainers
  • 163. DataFrame • The underlying idea of a DataFrame is based on spreadsheets. We can see the data structure of a DataFrame as tabular and spreadsheet-like. • A DataFrame logically corresponds to a "sheet" of an Excel document. • A DataFrame has both a row and a column index. | @Apptrainers
  • 164. • Like a spreadsheet or Excel sheet, a DataFrame object contains an ordered collection of columns. • Each column consists of a unique data type, but different columns can have different types, e.g. the first column may consist of integers, while the second one consists of Boolean values and so on. • There is a close connection between the DataFrames and the Series of Pandas. • A DataFrame can be seen as a concatenation of Series, each Series having the same index, i.e. the index of the DataFrame. | @Apptrainers
  • 165. import pandas as pd years = range(2014, 2018) shop1 = pd.Series([2409.14, 2941.01, 3496.83, 3119.55], index=years) shop2 = pd.Series([1203.45, 3441.62, 3007.83, 3619.53], index=years) shop3 = pd.Series([3412.12, 3491.16, 3457.19, 1963.10], index=years) print(pd.concat([shop1, shop2, shop3])) | @Apptrainers
  • 167. • This result is not what we have intended or expected. The reason is that concat used 0 as the default for the axis parameter. Let's do it with "axis=1": shops_df = pd.concat([shop1, shop2, shop3], axis=1) print(shops_df) | @Apptrainers
  • 169. cities = ["Zürich", "Winterthur", "Freiburg"] shops_df.columns = cities print(shops_df) # alternative way: give names to series: shop1.name = "Zürich" shop2.name = "Winterthur" shop3.name = "Freiburg" print("------") shops_df2 = pd.concat([shop1, shop2, shop3], axis=1) print(shops_df2) | @Apptrainers
  • 172. DataFrames from Dictionaries cities = {"name": ["London", "Berlin", "Madrid", "Rome", "Paris", "Vienna", "Bucharest", "Hamburg", "Budapest", "Warsaw", "Barcelona", "Munich", "Milan"], "population": [8615246, 3562166, 3165235, 2874038, 2273305, 1805681, 1803425, 1760433, 1754000, 1740119, 1602386, 1493900, 1350680], "country": ["England", "Germany", "Spain", "Italy", "France", "Austria", "Romania", "Germany", "Hungary", "Poland", "Spain", "Germany", "Italy"]} city_frame = pd.DataFrame(cities) print(city_frame) | @Apptrainers
  • 174. Retrieving the Column Names city_frame.columns.values Output: array(['country', 'name', 'population'], dtype=object) | @Apptrainers
  • 175. Custom Index • We can see that an index (0,1,2, ...) has been automatically assigned to the DataFrame. We can also assign a custom index to the DataFrame object: ordinals = ["first", "second", "third", "fourth", "fifth", "sixth", "seventh", "eigth", "ninth", "tenth", "eleventh", "twelfth", "thirteenth"] city_frame = pd.DataFrame(cities, index=ordinals) print(city_frame) | @Apptrainers
  • 177. Rearranging the Order of Columns We can also define and rearrange the order of the columns at the time of creation of the DataFrame. This makes also sure that we will have a defined ordering of our columns, if we create the DataFrame from a dictionary. Dictionaries are not ordered. | @Apptrainers
  • 178. city_frame = pd.DataFrame(cities, columns=["name", "country", "population"]) print(city_frame) | @Apptrainers
  • 180. • But what if you want to change the column names and the ordering of an existing DataFrame? city_frame.reindex(["country", "name", "population"]) print(city_frame) | @Apptrainers
  • 182. • Now, we want to rename our columns. For this purpose, we will use the DataFrame method 'rename'. This method supports two calling conventions • (index=index_mapper, columns=columns_mapper, ...) • (mapper, axis={'index', 'columns'}, ...) • We will rename the columns of our DataFrame into Romanian names in the following example. • We set the parameter inplace to True so that our DataFrame will be changed instead of returning a new DataFrame, if inplace is set to False, which is the default! | @Apptrainers
  • 185. Existing Column as the Index of a DataFrame • We want to create a more useful index in the following example. We will use the country name as the index, i.e. the list value associated to the key "country" of our cities dictionary: city_frame = pd.DataFrame(cities, columns=["name", "population"], index=cities["country"]) print(city_frame) | @Apptrainers
  • 187. • Alternatively, we can change an existing DataFrame. • We can use the method set_index to turn a column into an index. • "set_index" does not work in-place, it returns a new data frame with the chosen column as the index: | @Apptrainers
  • 188. city_frame = pd.DataFrame(cities) city_frame2 = city_frame.set_index("country") print(city_frame2) | @Apptrainers
  • 190. • We saw in the previous example that the set_index method returns a new DataFrame object and doesn't change the original DataFrame. If we set the optional parameter "inplace" to True, the DataFrame will be changed in place, i.e. no new object will be created: city_frame = pd.DataFrame(cities) city_frame.set_index("country", inplace=True) print(city_frame) | @Apptrainers
  • 192. Label-Indexing on the Rows • So far we have indexed DataFrames via the columns. We will demonstrate now, how we can access rows from DataFrames via the locators 'loc' and 'iloc'. ('ix' is deprecated and will be removed in the future) city_frame = pd.DataFrame(cities, columns=("name", "population"), index=cities["country"]) print(city_frame.loc["Germany"]) | @Apptrainers
  • 196. Sum and Cumulative Sum • We can calculate the sum of all the columns of a DataFrame or the sum of certain columns: print(city_frame.sum()) | @Apptrainers
  • 198. We can use "cumsum" to calculate the cumulative sum: | @Apptrainers
  • 199. Assigning New Values to Columns • x is a Pandas Series. • We can reassign the previously calculated cumulative sums to the population column: city_frame["population"] = x print(city_frame) | @Apptrainers
  • 201. • Instead of replacing the values of the population column with the cumulative sum, we want to add the cumulative population sum as a new column with the name "cum_population". city_frame = pd.DataFrame(cities, columns=["country", "population", "cum_population"], index=cities["name"]) print(city_frame) | @Apptrainers
  • 203. • We can see that the column "cum_population" is set to NaN, as we haven't provided any data for it. • We will assign now the cumulative sums to this column: city_frame["cum_population"] =city_frame["population"].cumsum() print(city_frame) | @Apptrainers
  • 205. • We can also include a column name which is not contained in the dictionary, when we create the DataFrame from the dictionary. In this case, all the values of this column will be set to NaN: city_frame = pd.DataFrame(cities, columns=["country", "area", "population"], index=cities["name"]) print(city_frame) | @Apptrainers
  • 207. Accessing the Columns of a DataFrame • There are two ways to access a column of a DataFrame. The result is in both cases a Series: # in a dictionary-like way: print(city_frame["population"]) | @Apptrainers
  • 209. # as an attribute print(city_frame.population) | @Apptrainers
  • 212. city_frame.population From the previous example, we can see that we have not copied the population column. "p" is a view on the data of city_frame. | @Apptrainers
  • 213. Assigning New Values to a Column • The column area is still not defined. We can set all elements of the column to the same value: city_frame["area"] = 1572 print(city_frame) | @Apptrainers
  • 215. • In this case, it will be definitely better to assign the exact area to the cities. The list with the area values needs to have the same length as the number of rows in our DataFrame. # area in square km: area = [1572, 891.85, 605.77, 1285, 105.4, 414.6, 228, 755, 525.2, 517, 101.9, 310.4, 181.8] # area could have been designed as a list, a Series, an array or a scalar city_frame["area"] = area print(city_frame) | @Apptrainers
  • 217. Sorting DataFrames city_frame = city_frame.sort_values(by="area", ascending=False) print(city_frame) | @Apptrainers
  • 218. Let's assume, we have only the areas of London, Hamburg and Milan. The areas are in a series with the correct indices. We can assign this series as well: city_frame = pd.DataFrame(cities, columns=["country", "area", "population"], index=cities["name"]) some_areas = pd.Series([1572, 755, 181.8], index=['London', 'Hamburg', 'Milan']) city_frame['area'] = some_areas print(city_frame) | @Apptrainers
  • 220. Inserting new columns into existing DataFrames • In the previous example we have added the column area at creation time. Quite often it will be necessary to add or insert columns into existing DataFrames. • For this purpose the DataFrame class provides a method "insert", which allows us to insert a column into a DataFrame at a specified location: insert(self, loc, column, value, allow_duplicates=False)` | @Apptrainers
  • 222. city_frame = pd.DataFrame(cities, columns=["country", "population"], index=cities["name"]) idx = 1 city_frame.insert(loc=idx, column='area', value=area) print(city_frame) <class 'pandas.core.frame.DataFrame'> | @Apptrainers
  • 225. DataFrame from Nested Dictionaries A nested dictionary of dictionaries can be passed to a DataFrame as well. The indices of the outer dictionary are taken as the columns and the inner keys. i.e. the keys of the nested dictionaries, are used as the row indices: | @Apptrainers
  • 228. • You like to have the years in the columns and the countries in the rows? No problem, you can transpose the data: growth_frame.T | @Apptrainers
  • 230. • Consider: growth_frame = growth_frame.T growth_frame2 = growth_frame.reindex(["Switzerland", "Italy", "Germany", "Greece"]) # remove France print(growth_frame2) | @Apptrainers
  • 232. Filling a DataFrame with random values: import numpy as np names = ['Frank', 'Eve', 'Stella', 'Guido', 'Lara'] index = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"] df = pd.DataFrame((np.random.randn(12, 5)*1000).round(2), columns=names, index=index) print(df) randn: returns sample or samples of random numbers from a normal distribution with Mean as 1st argument and VAR as second argument. | @Apptrainers
  • 234. Summary • So far we have covered the following: • Python 3.0 (scalers, lists, dictionaries, loops, selection, functions) • Numpy • Pandas • The reason for studying these packages is to be able to program the 5 steps in any data science process. | @Apptrainers