Algorithm analysis
Introduction
Life’s too easy so let’s complicate it, shall we?
Are you ready to fight the invisible enemy of computational complexity? If not, please read the references first:
Erik Dassi’s algorithms slides 1 and 2
List performance
Python lists are generic containers, they are useful in a variety of scenarios but sometimes their perfomance can be disappointing, so it’s best to know and avoid potentially expensive operations. Table from the Python DS book Chapter 2.6: Lists:
Operation |
Big-O Efficiency |
Operation |
Big-O Efficiency |
|---|---|---|---|
index [] |
\(O(1)\) |
|
\(O(n)\) |
index assignment |
\(O(1)\) |
get slice |
\(O(k)\) |
append |
\(O(1)\) |
del slice |
\(O(n)\) |
pop() |
\(O(1)\) |
set slice |
\(O(n+k)\) |
pop(i) |
\(O(n)\) |
reverse |
\(O(n)\) |
extend(lb) |
\(O(m)\) |
la |
\(O(n+m)\) |
insert(i,item) |
\(O(n)\) |
sort |
\(O(n \log{n})\) |
del operator |
\(O(n)\) |
multiply |
\(O(nk)\) |
remove |
\(O(n)\) |
min |
\(O(n)\) |
iteration |
\(O(n)\) |
max |
\(O(n)\) |
len(lst) |
\(O(1)\) |
sum |
\(O(n)\) |
equality |
\(O(n)\) |
Fast or not?
Given a list x of \(n\) characters, for each of the following expressions, try giving the complexity:
x[n // 2]
x[n // 2] = 'd'
x.append('c')
x.insert(0, 'd')
x + x
x[3:5]
x.sort()
len(x)
x.pop(1)
x.pop(-1)
'z' in x
x.extend(1000)
y = [c for c in x]
list(range(n)) == list(range(n))
Sublist iteration performance
get slice time complexity is \(O(k)\), but what about memory? It’s the same!
So if you want to iterate a part of a list, beware of slicing! For example, depending pn Python version you have, slicing a list like this can occupy much more memory than necessary:
[1]:
# memory occupied
r = range(1000) # Python 2 creates a list: O(n)
# Python 3 creates a 'range' object: O(1)
# same for range slices: memory occupied
print([2*y for y in r[100:200]]) # r[100:200] Python 2 O(k)
# Python 3 O(1)
[200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398]
The reason is that, according to your Python interpreter, slicing like r[100:200]at loop start can create a new list. If we want to explicitly tell Python we just want to iterate through a sequence, we can use the so-called itertools. In particular, the islice method is handy, we can rewrite the list comprehension above with it like this:
[2]:
import itertools
r = range(1000)
print([2*y for y in itertools.islice(r, 100, 200)])
[200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398]
Some formulas
Lists - exercises
Who’s the fastest? Let’s start the racing game!
For each of the following code samples, try giving the worst-case running time in Big O notation.
Exercise - rollsroyce
x: intlst: list of \(n\) integers
[3]:
def rollsroyce(x, lst):
return x in lst
Exercise - honda
x: list of \(m\) elementslst: list of lists with \(n\) rows and \(m\) columns
[4]:
def honda(x, lst):
return x in lst
Exercise - lamborghini
lst: list of \(n\) elementsm: int
[5]:
def lamborghini(lst, m):
return lst * m
Exercise - maserati
lst: list of \(n\) elementsm: int
[6]:
def maserati(lst, m):
return [m for el in lst]
Exercise - toyota
lst: list of \(n\) elementsm: int
HINT: try rewriting code as a for, then add costs
[7]:
def toyota(lst, m):
return [lst[:m] for el in lst]
Exercise - mercedes
lst: list of \(n\) elements
[8]:
def mercedes(lst):
for i in range(len(lst)):
lst.append('z')
return lst
Exercise - acura
lst: list of \(n\) elementsm: int
[9]:
def acura(lst, m):
return lst[:m] + lst
Exercise - alfaromeo
lst: list of \(n\) elements
[10]:
def alfaromeo(lst):
return [(lst[0] + lst[-1]) for i in range(len(lst))]
Exercise - jeep
lst : list of \(n\) elements
[11]:
def jeep(lst):
return [lst[:i] for i in range(1, len(lst))]
Exercise - chevrolet
lst : list of \(n\) elements
[12]:
def chevrolet(lst):
for i in range(len(lst)):
lst.insert(0,'z')
return lst
Exercise - kia
lst : list of \(n\) characters
What is the worst case complexity?
What is the best case complexity?
[13]:
def kia(lst):
for i in range(len(lst) // 2):
lst.remove('c')
Exercise - aston_martin
lst: list of \(n\) elements
[14]:
def aston_martin(lst):
ret = []
if lst[0] > 5:
for i in range(len(lst)):
for j in range(len(lst)):
ret.append(lst[i])
else:
for i in range(len(lst)):
ret.append(lst[i] * 2)
return ret
Exercise - subaru
mat: list of \(n\) lists of \(n\) elements each
What’s the complexity?
Follow up: Can you write a more perfomant version?
[15]:
def subaru(mat):
n = len(mat)
ret = []
for i in range(n):
for j in range(n):
if i == n - j - 1:
ret.append(mat[i][j])
return ret
Exercise - dodge
mat: list of \(n\) lists of \(m\) columns
[16]:
def dodge(mat):
n = len(mat)
m = len(mat[0])
ret = []
for i in range(n):
for j in range(1,m):
if max(mat[i][:j]) == 3:
ret.append(mat[i][j])
return ret
Exercise - lotus
lst : list of \(n\) elements
[17]:
def lotus(lst):
while len(lst) > 0:
el = lst.pop(len(lst) // 2)
ret.append(el)
return ret
Exercise - jaguar
mat: list of \(n\) lists of \(m\) columns
[18]:
def jaguar(mat):
n = len(mat)
m = len(mat[0])
for i in range(1, n):
if mat[i-1][0] in mat[i]:
for j in range(n-1,0,-1):
mat[j] = [2*x for x in mat[j]]
return mat
Exercise - hyundai
lst: list of \(n\) elementsm: integer, \(0 <= m <= n\)
[19]:
def hyundai(lst, m):
lb = ['a' in range(m)]
lb.extend(lst)
for i in range(m):
lb.remove('a')
Exercise - buick
lb : list of \(n\) positive integers
HINT: Try giving a tight complexity bound for the function, considering we assume lb is only made of positive integers
[20]:
def buick(lb):
la = []
n = len(lb)
while len(lb) > 0:
el = lb.pop()
la.insert(0,el)
if sum(la) == 10:
for i in range(n):
for j in range(n):
la[0] += 1
return la
Exercise - saab
lst : list of \(n\) integers
To solve this properly you will need to resort to some classical analysis:
for determining worst case, try proving the sum is less than another sum
to see if the bound is tight, try determining the best case complexity \(\Omega\) by proving that the sum made by half of the terms (which half?) is greater than another sum: you should get the same result as worst case thus demonstrating you’ve got the \(\Theta\) complexity.
Can we write a much faster function?
[21]:
def saab(lst):
for i in range(1, len(lst)):
lst.sort(lst[0:i])
Sets performance
Let’s review set average operations costs from Python official wiki:
Operation |
Big-O Efficiency |
Operation |
Big-O Efficiency |
|---|---|---|---|
x in s |
\(O(1)\) |
union sa | sb |
\(O(n + m)\) |
intersection sa & sb |
\(O(min(n,m))\) |
difference |
\(O(n)\) |
sa.difference_update(sb) |
\(O(m)\) |
set(sequence) |
\(O(n)\) |
len(s) |
\(O(1)\) |
min(s) |
\(O(n)\) |
sum(s) |
\(O(n)\) |
max(s) |
\(O(n)\) |
Sets are pretty fast but notice in degenerate cases we might still get \(O(n)\) complexities instead of \(O(1)\) (like when there too many keys collisions or with ill-devised hashing functions). For the purposes of this course but won’t consider those occurrences.
WARNING: LOOPING ON SETS IS SLOW!
A loop typically takes \(O(n)\), but sets were designed to offer \(O(1)\) read / write access by using the square brackets operator [] or containment operator in.
If you’re thinking about using a loop, always ask yourself if you could rewrite your algorithm in some other way
HAVE YOU READ THE WARNING ABOVE??
People who don’t understand it are usually condemned to a dreaded yet-another-retake-of-sciprog exam. You’re warned.
Sets - exercises
Find the worst case compexity in Big-O notation for each of the following exercises.
Exercise - land_rover
sa: set of \(n\) integerssb: set of \(m\) integers
[22]:
def land_rover(sa, sb):
return [el in sa for el in sb]
Exercise - volkswagen
la: list of \(n\) integerslb: list of \(m\) integers
[23]:
def volkswagen(la, lb):
sa = set(la)
sc = set()
for el in lb:
if el % 2 != 0:
sc.add(el)
return (sa | sc) - (sa & sc)
Exercise - pontiac
mat : list of \(n\) lists of \(m\) integers
[24]:
def pontiac(mat):
ret = []
sets = [set(row) for row in mat]
for i in range(len(mat)):
for s in sets:
if mat[i][0]*2 in s:
ret.append(mat[i][0])
return ret
Exercise - volvo
la: list of \(n\) integerslb: list of \(m\) integers
[25]:
def volvo(la, lb):
ret = set(lb)
for i in range(len(la)):
sli = la[i+1:]
if la[i] in sli:
ret.add(la[i])
for el in la:
if el in ret:
ret.add(el*2)
return ret
Exercise - chrysler
la: list of \(n\) integers
HINT 1: remember the sum of first \(n\) integers is \(\displaystyle\sum_{i=1}^n{i}=\frac{n^2+n}{2}\)
HINT 2: remember the sum of first \(n\) squared integers is \(\displaystyle\sum_{i=1}^n{i^2}=\frac{n(n+1)(2n+1)}{6}\)
[26]:
def chrysler(la):
s = {0}
lb = []
for i in range(len(la)):
lb.extend([la[i]] * i)
if min(s) < 20:
s = {x for x in la}
s = s | set(lb)
return s
Dictionaries performance
Let’s review dictionary operations costs from the Python DS book Chapter 3.7: Dictionaries:
Operation |
Example |
Big-O Efficiency |
|---|---|---|
key containment |
|
\(O(1)\) |
read a value given a key |
|
\(O(1)\) |
write an association |
|
\(O(1)\) |
delete an association |
|
\(O(1)\) |
iterating keys / 1 |
|
\(O(n)\) |
iterating keys / 2 |
|
\(O(n)\) |
iterating values |
|
\(O(n)\) |
iterating associations |
|
\(O(n)\) |
shallow copy |
|
\(O(n)\) |
Dictionaries are pretty fast but notice in degenerate cases we might still get \(O(n)\) complexity instead of \(O(1)\), like when there too many keys collisions or with ill-devised hashing functions. For the purposes of this course but won’t consider those occurrences.
WARNING: LOOPING ON DICTIONARIES IS SLOW!
A loop typically takes \(O(n)\), but dictionaries were designed to offer \(O(1)\) read / write access by using the square brackets operator [] or containment operator in.
If you’re thinking about using a loop, always ask yourself if you could rewrite your algorithm in some other way
HAVE YOU READ THE WARNING ABOVE??
People who don’t understand it are usually condemned to a dreaded yet-another-retake-of-sciprog exam. You’re warned.
Dictionaries - exercises
Find the worst case compexity in Big-O notation for each of the following exercises.
Exercise - tesla
x: intd: dictionary with \(n\) key/value mappings
[27]:
def tesla(x, d):
return x in d
Exercise - bmw
d: dictionary of \(n\) key/value mappingsel: int
[28]:
def bmw(d, el):
if el in d:
return d[el]
return None
Exercise - nissan
d: dictionary of \(n\) key/value mappingsel: int
Does this code behaves differently from code before? Compare complexity.
[29]:
def nissan(d, el):
for k in d:
if el == k:
return d[el]
return None
Exercise - ferrari
x: intd: dictionary with \(n\) key/value mappings
Find the complexity and then ask yourself if we can do any better.
[30]:
def ferrari(x, d):
return x in d.values()
Exercise - bentley
x: intd: dictionary with \(n\) key/value mappings
[31]:
def bentley(x, d):
return x,x in d.items()
Exercise - mclaren
d: dictionary of \(n\) key/value mappingsk: int
[32]:
def mclaren(d,k):
ret = []
for i in range(k):
if i in d:
ret.append(d[i])
return ret
Exercise - fiat
d: dictionary of \(n\) key/value mappingsk: int
Compare this function to previous one. Is there any difference?
[33]:
def fiat(d,k):
lst = list(d.items())
ret = []
for i,v in lst:
if i >= 0 and i < k:
ret.append(d[i])
return ret
Exercise - mustang
d: dictionary of \(n\) key/value mappingslst: list of \(m\) integers
[34]:
def mustang(d, lst):
ret = {}
for el in lst:
if el in d:
if el in list(d.values()):
ret[el] = True
else:
ret[el] = False
return ret
Recursion
References:
When confronted with a problem, we can try to solve it by splitting its dimension in half (or more), look for solutions in each of the halves and then decide what to do with the found solutions, if any.
Several cases may occur:
No solution is found
One solution is found
Two solutions are found
case 1): we can only give up
case 2): we have only one solution, so we can just return that one
case 3): we have two solutions, so we need to decide what is the purpose of the algorithm
Is the purpose to …
find all possible solutions? Then we return both of them.
find the best solution, according to some measure of ‘goodness’? Then we measure each of the solutions and give back the highest scoring one.
always provide a combination of existing solutions, according to some combination method? Then we combine the found solutions and give them back
The Master Theorem
To get a precise calculation of complexity of recursive algorithms, let’s recall the Master Theorem:
Let \(a\) and \(b\) two integer constants such that \(\normalsize a \geq 1\) and \(\normalsize b \geq 2\), and let \(\normalsize c, \beta\) be two real constants such that \(\normalsize c > 0\) and \(\normalsize \beta \geq 0\). Let \(\normalsize T(n)\) be defined by the following recurrence:
Given \(\large {\alpha = \frac{\log{a}}{\log{b}} = \log_b{a}}\), then:
The theorem covers cases when:
input of size \(n\) is split in \(b\) subproblems
the algorithm is applied recursively \(n\) times
\(\normalsize cn^\beta\) is the cost of the algorithm after the recursive steps
Recursion - exercises
Exercise - yamaha
What does this function do?
What’s the time complexity? Give both an informal argument and a formal one using the Master Theorem.
lst: a list of \(n\) numbersi: an index from0to \(n\) included
level is the recursion level, we can use it to help ourselves while debugging
[35]:
def yamaha(lst, i, level=0):
print(' ' * level, lst)
print(' ' * level, i)
if i == 0:
return 0
else:
q = lst[i-1] + yamaha(lst, i-1, level + 1)
print(' ' * level, q)
return q
print(yamaha([5,7,2,9], 4))
[5, 7, 2, 9]
4
[5, 7, 2, 9]
3
[5, 7, 2, 9]
2
[5, 7, 2, 9]
1
[5, 7, 2, 9]
0
5
12
14
23
23
Exercise - cadillac
Look at the following code:
To get a better understanding, try adding an integer optional parameter
level=0with the recursion level, and print the function passages indented according to the recursionlevelWhat is this function doing?
Is recursion eventually terminating after some time?
What’s the time complexity? Try giving both an informal argument and a precise one using the Master Theorem
lst: list of \(n\) integers
[37]:
def cadillac(lst):
if len(lst) == 0:
return 0
if len(lst) == 1:
return lst[0]
k = len(lst) // 2
p = cadillac(lst[:k])
q = cadillac(lst[k:])
return p + q
a) Solution for code with added debug prints and level=0 param:
[38]:
def cadillac(lst, level=0):
raise Exception('TODO IMPLEMENT ME !')
res = cadillac([4,2,6,2,9,4,8,1,5,7,1])
print(res)
assert res == 49
[4, 2, 6, 2, 9, 4, 8, 1, 5, 7, 1]
[4, 2, 6, 2, 9]
[4, 2]
[4]
p= 4
[2]
q= 2
p= 6
[6, 2, 9]
[6]
p= 6
[2, 9]
[2]
p= 2
[9]
q= 9
q= 11
q= 17
p= 23
[4, 8, 1, 5, 7, 1]
[4, 8, 1]
[4]
p= 4
[8, 1]
[8]
p= 8
[1]
q= 1
q= 9
p= 13
[5, 7, 1]
[5]
p= 5
[7, 1]
[7]
p= 7
[1]
q= 1
q= 8
q= 13
q= 26
49
b,c,d) Answers to remaining questions:
Show answerExercise - ducati
The following function is similar to the previous one, but doesn’t create slices
lst: list of \(n\) integersi: left integer index includedj: right integer index excluded
a). Does it behave the same?
b). What’s the time complexity? Try giving both an informal argument and a precise one through the Master Theorem
[39]:
def ducati_rec(lst, i,j):
interval = j - i
if interval == 0:
return 0
if interval == 1:
return lst[i]
k = i + (interval // 2)
p = ducati_rec(lst,i,k)
q = ducati_rec(lst,k,j)
return p + q
def ducati(lst):
return ducati_rec(lst,0,len(lst))
Analysis - more exercises
Theory exercises (complexity, tree visits, graph visits) - by Alberto Montresor
[ ]: