# Algorithm analysis

## Introduction

Life’s too easy so let’s complicate it, shall we?

Are you ready to fight the invisible enemy of computational complexity? If not, please read the references first:

## List performance

Python lists are generic containers, they are useful in a variety of scenarios but sometimes their perfomance can be disappointing, so it’s best to know and avoid potentially expensive operations. Table from the Python DS book Chapter 2.6: Lists:

Operation

Big-O Efficiency

Operation

Big-O Efficiency

index []

$$O(1)$$

in (contains)

$$O(n)$$

index assignment

$$O(1)$$

get slice [x:y]

$$O(k)$$

append

$$O(1)$$

del slice

$$O(n)$$

pop()

$$O(1)$$

set slice

$$O(n+k)$$

pop(i)

$$O(n)$$

reverse

$$O(n)$$

extend(lb)

$$O(m)$$

la + lb

$$O(n+m)$$

insert(i,item)

$$O(n)$$

sort

$$O(n \log{n})$$

del operator

$$O(n)$$

multiply

$$O(nk)$$

remove

$$O(n)$$

min

$$O(n)$$

iteration

$$O(n)$$

max

$$O(n)$$

len(lst)

$$O(1)$$

sum

$$O(n)$$

equality

$$O(n)$$

### Fast or not?

Given a list x of $$n$$ characters, for each of the following expressions, try giving the complexity:

1. x[n // 2]

2. x[n // 2] = 'd'

3. x.append('c')

4. x.insert(0, 'd')

5. x + x

6. x[3:5]

7. x.sort()

8. len(x)

9. x.pop(1)

10. x.pop(-1)

11. 'z' in x

12. x.extend(1000)

13. y = [c for c in x]

14. list(range(n)) == list(range(n))


### Sublist iteration performance

get slice time complexity is $$O(k)$$, but what about memory? It’s the same!

So if you want to iterate a part of a list, beware of slicing! For example, depending pn Python version you have, slicing a list like this can occupy much more memory than necessary:

:

                 #                               memory occupied
r = range(1000)  # Python 2 creates a list:           O(n)
# Python 3 creates a 'range' object: O(1)

#                                   same for range slices: memory occupied
print([2*y for y in r[100:200]])  #  r[100:200] Python 2        O(k)
#             Python 3        O(1)

[200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398]


The reason is that, according to your Python interpreter, slicing like r[100:200]at loop start can create a new list. If we want to explicitly tell Python we just want to iterate through a sequence, we can use the so-called itertools. In particular, the islice method is handy, we can rewrite the list comprehension above with it like this:

:

import itertools

r = range(1000)

print([2*y for y in itertools.islice(r, 100, 200)])

[200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398]


## Some formulas

$\begin{split}\begin{gather*} & & & & && \text{Complexity}\\ 1 + 2 + 3 + \dots + n & = & \sum_{i=1}^n{i} & = & \frac{n^2+n}{2} && O(n^2) \\ 1^2 + 2^2 + 3^2 + \dots + n^2 & = & \sum_{i=1}^n{i^2} & = & \frac{n(n+1)(2n+1)}{6} && O(n^3) \\ \end{gather*}\end{split}$

## Lists - exercises

Who’s the fastest? Let’s start the racing game!

For each of the following code samples, try giving the worst-case running time in Big O notation.

### Exercise - rollsroyce

• x : int

• lst : list of $$n$$ integers

:

def rollsroyce(x, lst):
return x in lst


### Exercise - honda

• x : list of $$m$$ elements

• lst: list of lists with $$n$$ rows and $$m$$ columns

:

def honda(x, lst):
return x in lst


### Exercise - lamborghini

• lst : list of $$n$$ elements

• m : int

:

def lamborghini(lst, m):
return lst * m


### Exercise - maserati

• lst : list of $$n$$ elements

• m : int

:

def maserati(lst, m):
return [m for el in lst]


### Exercise - toyota

• lst : list of $$n$$ elements

• m : int

HINT: try rewriting code as a for, then add costs

:

def toyota(lst, m):
return [lst[:m] for el in lst]


### Exercise - mercedes

• lst : list of $$n$$ elements

:

def mercedes(lst):
for i in range(len(lst)):
lst.append('z')
return lst


### Exercise - acura

• lst : list of $$n$$ elements

• m : int

:

def acura(lst, m):
return lst[:m] + lst


### Exercise - alfaromeo

• lst : list of $$n$$ elements

:

def alfaromeo(lst):
return [(lst + lst[-1]) for i in range(len(lst))]


### Exercise - jeep

lst : list of $$n$$ elements

:

def jeep(lst):
return [lst[:i] for i in range(1, len(lst))]


### Exercise - chevrolet

lst : list of $$n$$ elements

:

def chevrolet(lst):
for i in range(len(lst)):
lst.insert(0,'z')
return lst


### Exercise - kia

lst : list of $$n$$ characters

1. What is the worst case complexity?

2. What is the best case complexity?

:

def kia(lst):
for i in range(len(lst) // 2):
lst.remove('c')


### Exercise - aston_martin

lst: list of $$n$$ elements

:

def aston_martin(lst):
ret = []
if lst > 5:

for i in range(len(lst)):
for j in range(len(lst)):
ret.append(lst[i])
else:
for i in range(len(lst)):
ret.append(lst[i] * 2)
return ret


### Exercise - subaru

• mat : list of $$n$$ lists of $$n$$ elements each

1. What’s the complexity?

2. Follow up: Can you write a more perfomant version?

:

def subaru(mat):
n = len(mat)
ret = []
for i in range(n):
for j in range(n):
if i == n - j - 1:
ret.append(mat[i][j])
return ret


### Exercise - dodge

• mat : list of $$n$$ lists of $$m$$ columns

:

def dodge(mat):
n = len(mat)
m = len(mat)
ret = []
for i in range(n):
for j in range(1,m):
if max(mat[i][:j]) == 3:
ret.append(mat[i][j])
return ret


### Exercise - lotus

lst : list of $$n$$ elements

:

def lotus(lst):
while len(lst) > 0:
el = lst.pop(len(lst) // 2)
ret.append(el)
return ret


### Exercise - jaguar

• mat : list of $$n$$ lists of $$m$$ columns

:

def jaguar(mat):
n = len(mat)
m = len(mat)
for i in range(1, n):
if mat[i-1] in mat[i]:
for j in range(n-1,0,-1):
mat[j] = [2*x for x in mat[j]]

return mat


### Exercise - hyundai

• lst : list of $$n$$ elements

• m: integer, $$0 <= m <= n$$

:

def hyundai(lst, m):

lb = ['a' in range(m)]

lb.extend(lst)

for i in range(m):
lb.remove('a')


### Exercise - buick

lb : list of $$n$$ positive integers

HINT: Try giving a tight complexity bound for the function, considering we assume lb is only made of positive integers

:

def buick(lb):
la = []
n = len(lb)
while len(lb) > 0:
el = lb.pop()
la.insert(0,el)
if sum(la) == 10:
for i in range(n):
for j in range(n):
la += 1

return la


### Exercise - saab

lst : list of $$n$$ integers

1. To solve this properly you will need to resort to some classical analysis:

1. for determining worst case, try proving the sum is less than another sum

2. to see if the bound is tight, try determining the best case complexity $$\Omega$$ by proving that the sum made by half of the terms (which half?) is greater than another sum: you should get the same result as worst case thus demonstrating you’ve got the $$\Theta$$ complexity.

1. Can we write a much faster function?

:

def saab(lst):

for i in range(1, len(lst)):
lst.sort(lst[0:i])


## Sets performance

Let’s review set average operations costs from Python official wiki:

Operation

Big-O Efficiency

Operation

Big-O Efficiency

x in s

$$O(1)$$

union sa | sb

$$O(n + m)$$

intersection sa & sb

$$O(min(n,m))$$

difference sa-sb

$$O(n)$$

sa.difference_update(sb)

$$O(m)$$

set(sequence)

$$O(n)$$

len(s)

$$O(1)$$

min(s)

$$O(n)$$

sum(s)

$$O(n)$$

max(s)

$$O(n)$$

Sets are pretty fast but notice in degenerate cases we might still get $$O(n)$$ complexities instead of $$O(1)$$ (like when there too many keys collisions or with ill-devised hashing functions). For the purposes of this course but won’t consider those occurrences.

WARNING: LOOPING ON SETS IS SLOW!

A loop typically takes $$O(n)$$, but sets were designed to offer $$O(1)$$ read / write access by using the square brackets operator [] or containment operator in.

If you’re thinking about using a loop, always ask yourself if you could rewrite your algorithm in some other way

HAVE YOU READ THE WARNING ABOVE??

People who don’t understand it are usually condemned to a dreaded yet-another-retake-of-sciprog exam. You’re warned.

## Sets - exercises

Find the worst case compexity in Big-O notation for each of the following exercises.

### Exercise - land_rover

• sa : set of $$n$$ integers

• sb : set of $$m$$ integers

:

def land_rover(sa, sb):
return [el in sa for el in sb]


### Exercise - volkswagen

• la : list of $$n$$ integers

• lb : list of $$m$$ integers

:

def volkswagen(la, lb):
sa = set(la)
sc = set()
for el in lb:
if el % 2 != 0:
return (sa | sc) - (sa & sc)


### Exercise - pontiac

mat : list of $$n$$ lists of $$m$$ integers

:

def pontiac(mat):
ret = []
sets = [set(row) for row in mat]
for i in range(len(mat)):
for s in sets:
if mat[i]*2 in s:
ret.append(mat[i])
return ret


### Exercise - volvo

• la : list of $$n$$ integers

• lb : list of $$m$$ integers

:

def volvo(la, lb):
ret = set(lb)
for i in range(len(la)):
sli = la[i+1:]
if la[i] in sli:
for el in la:
if el in ret:
return ret


### Exercise - chrysler

la: list of $$n$$ integers

HINT 1: remember the sum of first $$n$$ integers is $$\displaystyle\sum_{i=1}^n{i}=\frac{n^2+n}{2}$$

HINT 2: remember the sum of first $$n$$ squared integers is $$\displaystyle\sum_{i=1}^n{i^2}=\frac{n(n+1)(2n+1)}{6}$$

:

def chrysler(la):

s = {0}
lb = []
for i in range(len(la)):
lb.extend([la[i]] * i)
if min(s) < 20:
s = {x for x in la}
s = s | set(lb)
return s


## Dictionaries performance

Let’s review dictionary operations costs from the Python DS book Chapter 3.7: Dictionaries:

Operation

Example

Big-O Efficiency

key containment

key in d

$$O(1)$$

read a value given a key

d[key]

$$O(1)$$

write an association

d[key] = value

$$O(1)$$

delete an association

del d[key]

$$O(1)$$

iterating keys / 1

for key in d:

$$O(n)$$

iterating keys / 2

for key in d.keys():

$$O(n)$$

iterating values

for value in d.values():

$$O(n)$$

iterating associations

for key,value in d.items():

$$O(n)$$

shallow copy

d.copy()

$$O(n)$$

Dictionaries are pretty fast but notice in degenerate cases we might still get $$O(n)$$ complexity instead of $$O(1)$$, like when there too many keys collisions or with ill-devised hashing functions. For the purposes of this course but won’t consider those occurrences.

WARNING: LOOPING ON DICTIONARIES IS SLOW!

A loop typically takes $$O(n)$$, but dictionaries were designed to offer $$O(1)$$ read / write access by using the square brackets operator [] or containment operator in.

If you’re thinking about using a loop, always ask yourself if you could rewrite your algorithm in some other way

HAVE YOU READ THE WARNING ABOVE??

People who don’t understand it are usually condemned to a dreaded yet-another-retake-of-sciprog exam. You’re warned.

## Dictionaries - exercises

Find the worst case compexity in Big-O notation for each of the following exercises.

### Exercise - tesla

• x : int

• d: dictionary with $$n$$ key/value mappings

:

def tesla(x, d):
return x in d


### Exercise - bmw

• d : dictionary of $$n$$ key/value mappings

• el : int

:

def bmw(d, el):

if el in d:
return d[el]

return None


### Exercise - nissan

• d : dictionary of $$n$$ key/value mappings

• el : int

Does this code behaves differently from code before? Compare complexity.

:

def nissan(d, el):

for k in d:
if el == k:
return d[el]

return None


### Exercise - ferrari

• x : int

• d: dictionary with $$n$$ key/value mappings

Find the complexity and then ask yourself if we can do any better.

:

def ferrari(x, d):
return x in d.values()


### Exercise - bentley

• x : int

• d : dictionary with $$n$$ key/value mappings

:

def bentley(x, d):

return x,x in d.items()


### Exercise - mclaren

• d : dictionary of $$n$$ key/value mappings

• k : int

:

def mclaren(d,k):
ret = []
for i in range(k):
if i in d:
ret.append(d[i])
return ret


### Exercise - fiat

• d : dictionary of $$n$$ key/value mappings

• k : int

Compare this function to previous one. Is there any difference?

:

def fiat(d,k):
lst = list(d.items())
ret = []
for i,v in lst:
if i >= 0 and i < k:
ret.append(d[i])
return ret


### Exercise - mustang

• d : dictionary of $$n$$ key/value mappings

• lst : list of $$m$$ integers

:

def mustang(d, lst):

ret = {}
for el in lst:

if el in d:
if el in list(d.values()):
ret[el] = True
else:
ret[el] = False
return ret


## Recursion

References:

When confronted with a problem, we can try to solve it by splitting its dimension in half (or more), look for solutions in each of the halves and then decide what to do with the found solutions, if any.

Several cases may occur:

1. No solution is found

2. One solution is found

3. Two solutions are found

case 1): we can only give up

case 2): we have only one solution, so we can just return that one

case 3): we have two solutions, so we need to decide what is the purpose of the algorithm

Is the purpose to …

• find all possible solutions? Then we return both of them.

• find the best solution, according to some measure of ‘goodness’? Then we measure each of the solutions and give back the highest scoring one.

• always provide a combination of existing solutions, according to some combination method? Then we combine the found solutions and give them back

## Recursion - exercises

### Exercise - yamaha

What does this function do? What’s the time complexity?

• lst: a list of $$n$$ numbers

• i: an index from 0 to $$n$$ included

level is the recursion level, we can use it to help ourselves while debugging

:

def yamaha(lst, i, level=0):
print('  ' * level, lst)
print('  ' * level, i)

if i == 0:
return 0
else:
q = lst[i-1] + yamaha(lst, i-1, level + 1)
print('  ' * level, q)
return q

print(yamaha([5,7,2,9], 4))

 [5, 7, 2, 9]
4
[5, 7, 2, 9]
3
[5, 7, 2, 9]
2
[5, 7, 2, 9]
1
[5, 7, 2, 9]
0
5
12
14
23
23


Look at the following code:

1. Is recursion eventually terminating after some time?

2. What’s the time complexity?

3. What is this function doing?

To get a better understanding, try adding an integer optional parameter level=0 with the recursion level, and print the function passages indented according to the recursion level

• lst : list of $$n$$ integers

:

def cadillac(lst):

if len(lst) == 0:
return 0

if len(lst) == 1:
return lst

k = len(lst) // 2
return p + q


Code with added debug prints and level=0 param:

Show solution
:

def cadillac(lst, level=0):
raise Exception('TODO IMPLEMENT ME !')

print(res)
assert res == 49

 [4, 2, 6, 2, 9, 4, 8, 1, 5, 7, 1]
[4, 2, 6, 2, 9]
[4, 2]

p= 4

q= 2
p= 6
[6, 2, 9]

p= 6
[2, 9]

p= 2

q= 9
q= 11
q= 17
p= 23
[4, 8, 1, 5, 7, 1]
[4, 8, 1]

p= 4
[8, 1]

p= 8

q= 1
q= 9
p= 13
[5, 7, 1]

p= 5
[7, 1]

p= 7

q= 1
q= 8
q= 13
q= 26
49


### Exercise - ducati

The following function is similar to the previous one, but doesn’t create slices

• lst : list of $$n$$ integers

• i : left integer index included

• j : right integer index excluded

1. Does it behave the same?

2. What’s the time complexity?

:

def ducati_rec(lst, i,j):
interval =  j - i

if interval == 0:
return 0

if interval == 1:
return lst[i]

k = i + (interval // 2)
p = ducati_rec(lst,i,k)
q = ducati_rec(lst,k,j)
return p + q

def ducati(lst):
return ducati_rec(lst,0,len(lst))

[ ]: