Algorithm analysis and recursion¶

Browse files online

Introduction¶

Life’s too easy so let’s complicate it, shall we?

Are you ready to fight the invisible enemy of computational complexity? If not, please read the references first:

List performance¶

Python lists are generic containers, they are useful in a variety of scenarios but sometimes their perfomance can be disappointing, so it’s best to know and avoid potentially expensive operations. Table from the Python DS book Chapter 2.6: Lists:

Operation

Big-O Efficiency

Operation

Big-O Efficiency

index []

$$O(1)$$

in (contains)

$$O(n)$$

index assignment

$$O(1)$$

get slice [x:y]

$$O(k)$$

append

$$O(1)$$

del slice

$$O(n)$$

pop()

$$O(1)$$

set slice

$$O(n+k)$$

pop(i)

$$O(n)$$

reverse

$$O(n)$$

extend(lb)

$$O(m)$$

la + lb

$$O(n+m)$$

insert(i,item)

$$O(n)$$

sort

$$O(n \log{n})$$

del operator

$$O(n)$$

multiply

$$O(nk)$$

remove

$$O(n)$$

min

$$O(n)$$

iteration

$$O(n)$$

max

$$O(n)$$

len(lst)

$$O(1)$$

sum

$$O(n)$$

equality

$$O(n)$$

Fast or not?¶

Given a list x of $$n$$ characters, for each of the following expressions, try giving the complexity:

1. x[n // 2]

2. x[n // 2] = 'd'

3. x.append('c')

4. x.insert(0, 'd')

5. x + x

6. x[3:5]

7. x.sort()

8. len(x)

9. x.pop(1)

10. x.pop(-1)

11. 'z' in x

12. x.extend(1000)

13. y = [c for c in x]

14. list(range(n)) == list(range(n))


Sublist iteration performance¶

get slice time complexity is $$O(k)$$, but what about memory? It’s the same!

So if you want to iterate a part of a list, beware of slicing! For example, depending pn Python version you have, slicing a list like this can occupy much more memory than necessary:

[1]:

                 #                               memory occupied
r = range(1000)  # Python 2 creates a list:           O(n)
# Python 3 creates a 'range' object: O(1)

#                                   same for range slices: memory occupied
print([2*y for y in r[100:200]])  #  x[100:200] Python 2        O(k)
#             Python 3        O(1)

[200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398]


The reason is that, according to your Python interpreter, slicing like r[100:200]at loop start can create a new list. If we want to explicitly tell Python we just want to iterate through a sequence, we can use the so-called itertools. In particular, the islice method is handy, we can rewrite the list comprehension above with it like this:

[2]:

import itertools

r = range(1000)

print([2*y for y in itertools.islice(r, 100, 200)])

[200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398]


Some formulas¶

$\begin{split}\begin{gather*} & & & & && \text{Complexity}\\ 1 + 2 + 3 + \dots + n & = & \sum_{i=1}^n{i} & = & \frac{n^2+n}{2} && O(n^2) \\ 1^2 + 2^2 + 3^2 + \dots + n^2 & = & \sum_{i=1}^n{i^2} & = & \frac{n(n+1)(2n+1)}{6} && O(n^3) \\ \end{gather*}\end{split}$

Lists - exercises¶

Who’s the fastest? Let’s start the racing game!

For each of the following code samples, try giving the worst-case running time in Big O notation.

Exercise - rollsroyce¶

• x : int

• lst : list of $$n$$ integers

[3]:

def rollsroyce(x, lst):
return x in lst


Exercise - honda¶

• x : list of $$m$$ elements

• lst: list of lists with $$n$$ rows and $$m$$ columns

[4]:

def honda(x, lst):
return x in lst


Exercise - lamborghini¶

• lst : list of $$n$$ elements

• m : int

[5]:

def lamborghini(lst, m):
return lst * m


Exercise - maserati¶

• lst : list of $$n$$ elements

• m : int

[6]:

def maserati(lst, m):
return [m for el in lst]


Exercise - toyota¶

• lst : list of $$n$$ elements

• m : int

HINT: try rewriting code as a for, then add costs

[7]:

def toyota(lst, m):
return [lst[:m] for el in lst]


Exercise - mercedes¶

• lst : list of $$n$$ elements

[8]:

def mercedes(lst):
for i in range(len(lst)):
lst.append('z')
return lst


Exercise - acura¶

• lst : list of $$n$$ elements

• m : int

[9]:

def acura(lst, m):
return lst[:m] + lst


Exercise - alfaromeo¶

• lst : list of $$n$$ elements

[10]:

def alfaromeo(lst):
return [(lst[0] + lst[-1]) for i in range(len(lst))]


Exercise - jeep¶

lst : list of $$n$$ elements

[11]:

def jeep(lst):
return [lst[:i] for i in range(len(1,lst))]


Exercise - chevrolet¶

lst : list of $$n$$ elements

[12]:

def chevrolet(lst):
for i in range(len(lst)):
lst.insert(0,'z')
return lst


Exercise - kia¶

lst : list of $$n$$ characters

1. What is the worst case complexity?

2. What is the best case complexity?

[13]:

def kia(lst):
for i in range(len(lst) // 2):
lst.remove('c')


Exercise - aston_martin¶

lst: list of $$n$$ elements

[14]:

def aston_martin(lst):
ret = []
if lst[0] > 5:

for i in range(len(lst)):
for j in range(len(lst)):
ret.append(lst[i])
else:
for i in range(len(lst)):
ret.append(lst[i] * 2)
return ret


Exercise - subaru¶

• mat : list of $$n$$ lists of $$n$$ elements each

1. What’s the complexity?

2. Follow up: Can you write a more perfomant version?

[15]:

def subaru(mat):
n = len(mat)
ret = []
for i in range(n):
for j in range(n):
if i == n - j - 1:
ret.append(mat[i][j])
return ret


Exercise - dodge¶

• mat : list of $$n$$ lists of $$m$$ columns

[16]:

def dodge(mat):
n = len(mat)
m = len(mat[0])
ret = []
for i in range(n):
for j in range(1,m):
if max(mat[i][:j]) == 3:
ret.append(mat[i][j])
return ret


Exercise - lotus¶

lst : list of $$n$$ elements

[17]:

def lotus(lst):
while len(lst) > 0:
el = lst.pop(len(lst) // 2)
ret.append(el)
return ret


Exercise - jaguar¶

• mat : list of $$n$$ lists of $$m$$ columns

[18]:

def jaguar(mat):
n = len(mat)
m = len(mat[0])
for i in range(1, n):
if mat[i-1][0] in mat[i]:
for j in range(n-1,0,-1):
mat[j] = [2*x for x in mat[j]]

return mat


Exercise - hyundai¶

• lst : list of $$n$$ elements

• m: integer, $$0 <= m <= n$$

[19]:

def hyundai(lst, m):

lb = ['a' in range(m)]

lb.extend(lst)

for i in range(m):
lb.remove('a')


Exercise - buick¶

lb : list of $$n$$ positive integers

HINT: Try giving a tight complexity bound for the function, considering we assume lb is only made of positive integers

[1]:

def buick(lb):
la = []
n = len(lb)
while len(lb) > 0:
el = lb.pop()
la.insert(0,el)
if sum(la) == 10:
for i in range(n):
for j in range(n):
la[0] += 1

return la


Exercise - saab¶

lst : list of $$n$$ integers

1. To solve this properly you will need to resort to some classical analysis:

1. for determining worst case, try proving the sum is less than another sum

2. to see if the bound is tight, try determining the best case complexity $$\Omega$$ by proving that the sum made by half of the terms (which half?) is greater than another sum: you should get the same result as worst case thus demonstrating you’ve got the $$\Theta$$ complexity.

1. Can we write a much faster function?

[21]:

def saab(lst):

for i in range(1, len(lst)):
lst.sort(lst[0:i])


Sets performance¶

Let’s review set average operations costs from Python official wiki:

Operation

Big-O Efficiency

Operation

Big-O Efficiency

x in s

$$O(1)$$

union sa | sb

$$O(n + m)$$

intersection sa & sb

$$O(min(n,m))$$

difference sa-sb

$$O(n)$$

sa.difference_update(sb)

$$O(m)$$

set(sequence)

$$O(n)$$

len(s)

$$O(1)$$

min(s)

$$O(n)$$

sum(s)

$$O(n)$$

max(s)

$$O(n)$$

Sets are pretty fast but notice in degenerate cases we might still get $$O(n)$$ complexities instead of $$O(1)$$ (like when there too many keys collisions or with ill-devised hashing functions). For the purposes of this course but won’t consider those occurrences.

Sets - exercises¶

Find the worst case compexity in Big-O notation for each of the following exercises.

Exercise - land_rover¶

• sa : set of $$n$$ integers

• sb : set of $$m$$ integers

[22]:

def land_rover(sa, sb):
return [el in sa for el in sb]


Exercise - volkswagen¶

• la : list of $$n$$ integers

• lb : list of $$m$$ integers

[23]:

def volkswagen(la, lb):
sa = set(la)
sc = set()
for el in lb:
if el % 2 != 0:
return (sa | sc) - (sa & sc)


Exercise - pontiac¶

mat : list of $$n$$ lists of $$m$$ integers

[24]:

def pontiac(mat):
ret = []
sets = [set(row) for row in mat]
for i in range(len(mat)):
for s in sets:
if mat[i][0]*2 in s:
ret.append(mat[i][0])
return ret


Exercise - volvo¶

• la : list of $$n$$ integers

• lb : list of $$m$$ integers

[25]:


def volvo(la, lb):
ret = set(lb)
for i in range(len(la)):
sli = la[i+1:]
if la[i] in sli:
for el in la:
if el in ret:
return ret


Exercise - chrysler¶

la: list of $$n$$ integers

HINT 1: remember the sum of first $$n$$ integers is $$\displaystyle\sum_{i=1}^n{i}=\frac{n^2+n}{2}$$

HINT 2: remember the sum of first $$n$$ squared integers is $$\displaystyle\sum_{i=1}^n{i^2}=\frac{n(n+1)(2n+1)}{6}$$

[26]:

def chrysler(la):

s = {0}
lb = []
for i in range(len(la)):
lb.extend([la[i]] * i)
if min(s) < 20:
s = {x for x in la}
s = s | set(lb)
return s


Dictionaries performance¶

Let’s review dictionary operations costs from the Python DS book Chapter 3.7: Dictionaries:

Dictionaries are pretty fast but notice in degenerate cases we might still get $$O(n)$$ complexity instead of $$O(1)$$ (like when there too many keys collisions or with ill-devised hashing functions). For the purposes of this course but won’t consider those occurrences.

Operation

Big-O Efficiency

Operation

Big-O Efficiency

copy

$$O(n)$$

in operator

$$O(1)$$

get item

$$O(1)$$

set item

$$O(1)$$

delete item

$$O(1)$$

iteration

$$O(n)$$

Dictionaries - exercises¶

Find the worst case compexity in Big-O notation for each of the following exercises.

Exercise - tesla¶

• x : int

• d: dictionary with $$n$$ key/value mappings

[27]:

def tesla(x, d):
return x in d


Exercise - bmw¶

• d : dictionary of $$n$$ key/value mappings

• el : int

[28]:

def bmw(d, el):

if el in d:
return d[el]

return None


Exercise - nissan¶

• d : dictionary of $$n$$ key/value mappings

• el : int

Does this code behaves differently from code before? Compare complexity.

[29]:

def nissan(d, el):

for k in d:
if el == k:
return d[el]

return None


Exercise - ferrari¶

• x : int

• d: dictionary with $$n$$ key/value mappings

Find the complexity and then ask yourself if we can do any better.

[30]:

def ferrari(x, d):
return x in d.values()


Exercise - bentley¶

• x : int

• d : dictionary with $$n$$ key/value mappings

[31]:

def bentley(x, d):

return x,x in d.items()


Exercise - mclaren¶

• d : dictionary of $$n$$ key/value mappings

• k : int

[32]:

def mclaren(d,k):
ret = []
for i in range(k):
if i in d:
ret.append(d[i])
return ret


Exercise - fiat¶

• d : dictionary of $$n$$ key/value mappings

• k : int

Compare this function to previous one. Is there any difference?

[33]:

def fiat(d,k):
lst = list(d.items())
ret = []
for i,v in lst:
if i >= 0 and i < k:
ret.append(d[i])
return ret


Exercise - mustang¶

• d : dictionary of $$n$$ key/value mappings

• lst : list of $$m$$ integers

[34]:

def mustang(d, lst):

ret = {}
for el in lst:

if el in d:
if el in list(d.values()):
ret[el] = True
else:
ret[el] = False
return ret


Recursion¶

References:

When confronted with a problem, we can try to solve it by splitting its dimension in half (or more), look for solutions in each of the halves and then decide what to do with the found solutions, if any.

Several cases may occur:

1. No solution is found

2. One solution is found

3. Two solutions are found

case 1): we can only give up

case 2): we have only one solution, so we can just return that one

case 3): we have two solutions, so we need to decide what is the purpose of the algorithm

Is the purpose to …

• find all possible solutions? Then we return both of them.

• find the best solution, according to some measure of ‘goodness’? Then we measure each of the solutions and give back the highest scoring one.

• always provide a combination of existing solutions, according to some combination method? Then we combine the found solutions and give them back

Recursion - exercises¶

Exercise - gap_rec¶

✪✪ In a list $$L$$ containing $$n≥2$$ integers, a gap is an index $$i$$, $$0< i < n$$, such that $$L[i−1]< L[i]$$

If $$n≥2$$ and $$L[0]< L[n−1]$$, $$L$$ contains at least one gap

Design an algorithm that, given a list $$L$$ containing $$n≥2$$ integers such that $$L[0]< L[n−1]$$, finds a gap in the list.

We wrote here the gap function as pseudocode, try coding it in Python:

Use the following skeleton and add some test. To understand what’s going on, try copy-pasting in Python tutor

Notice that

• We created a function gap_rec to differentiate it from the iterative one

• Users of gap_rec function might want to call it by passing just a list, in order to find any gap in the whole list. So for convenience the new function gap_rec(L) only accepts a list, without indexes i and j. This function just calls the other function gap_rec_helper that will actually contain the recursive calls. So your task is to translate the pseudocode of gap into the Python code of gap_rec_helper, which takes as input the array and the indexes as gap does. Adding a helper function is a frequent pattern you can find when programming recursive functions.

CAREFUL ABOUT gap_rec ASSUMPTIONS!

The specification of gap_rec assumes the input is always a list of at least two elements, and that the first element is less or equal than the last one. If these conditions are not met, function behaviour could be completely erroneus!

When preconditions are not met, execution could stop because of an error like index out of bounds, or, even worse, we might get back some wrong index as a gap! To prevent misuse of the function, a good idea can be putting a check at the beginning of the gap_rec function. Such check should immediately stop the execution and raise an error if the parameters don’t satisfy the preconditions. One way to do this could be to to some assertion like this:

def gap_rec(L, i , j):
assert len(L) >= 2
assert L[0] <= L[len(L)-1]

• These commands will make python interrupt execution and throw an error as soon it detects list L is too small or with wrong values

• This kind of behaviour is also called fail fast, which is better than returning wrong values!

• You can put any condition you want after assert, but ideally they should be fast to execute.

• asserts might be better here than raise Exception constructs because asserts can be disabled with a flag passed to the interpreter. So, when you debug you can take advantage of them, and when the code is production quality and supposed to be bug free you can disable all assertions at once to gain in execution speed.

GOOD PRACTICE: Notice we wrote what the helper function is expected to receive as a comment. Writing down specs often helps understanding what the function is supposed to do, and helps users of your code as well!

VII COMMANDMENT: You shall also write on paper!

To get an idea of how gap_rec is working, draw histograms on paper like the following, with different heights at index m:

Notice how at each recursive call, we end up with a histogram that is similar to the inital one, that is, it respects the same preconditions (a list of size >= 2 where first element is smaller or equal than the last one)

Show solution
[35]:

def gap_rec(L, i, j):
raise Exception('TODO IMPLEMENT ME !')

def gap(L):
raise Exception('TODO IMPLEMENT ME !')

# try also to write asserts



Exercise - yamaha¶

What does this function do? What’s the time complexity?

• lst: a list of $$n$$ numbers

• i: an index from 0 to $$n$$ included

level is the recursion level, we can use it to help ourselves while debugging

[36]:


def yamaha(lst, i, level=0):
print('  ' * level, lst)
print('  ' * level, i)

if i == 0:
return 0
else:
q = lst[i-1] + yamaha(lst, i-1, level + 1)
print('  ' * level, q)
return q

print(yamaha([5,7,2,9], 4))

 [5, 7, 2, 9]
4
[5, 7, 2, 9]
3
[5, 7, 2, 9]
2
[5, 7, 2, 9]
1
[5, 7, 2, 9]
0
5
12
14
23
23


Exercise - mul2¶

Write a recursive function mul_rec which given a list RETURN a NEW list with all elements multiplied by two.

ONLY USE .append() to build output list: in order to use it, you will need the parameter acc, which is a list into which you will accumulate the elements found during the recursion. At the end of recursion you will return acc

MUST execute in $$O(n)$$

DO NOT use for, while cycles nor list comprehensions

DO NOT use slices

DO NOT use in operator, .find nor .index methods

DO NOT use .extend method nor + operator - they’re expensive !

Show solution
[38]:

def mul2_rec(lst, acc):
raise Exception('TODO IMPLEMENT ME !')

def mul2(lst):
return mul2_rec(lst, [])

assert(mul2([])) == []
assert(mul2([3])) == [6]
assert(mul2([5,8])) == [10,16]

nums = [4,6,2,7,3]
assert(mul2(nums)) == [8,12,4,14,6]
assert nums == [4,6,2,7,3]


Exercise - search_rec¶

Write a recursive function which RETURN True if el is in the first i elements of lst and False otherwise

• lst is a list of $$n$$ elements

• i: an index from 0 to $$n$$ included

DO NOT use in operator, .find nor .index methods

DO NOT use for, while cycles nor list comprehensions

DO NOT use slices

MUST execute in $$O(n)$$

Show solution
[39]:

def search_rec(el, lst, i):
raise Exception('TODO IMPLEMENT ME !')

def search(el, lst):
return find_rec(el, lst, len(lst))

assert search_rec(6, [], 0) == False
assert search_rec(5, [5], 0) == False
assert search_rec(5, [5], 1) == True
assert search_rec(2, [2,9], 0) == False
assert search_rec(2, [2,9], 1) == True
assert search_rec(2, [2,9], 2) == True
assert search_rec(5, [9,5], 2) == True
assert search_rec('c', [5,'r',6,'c'], 4) == True
assert search_rec(2, [4,'z',8,3,2], 5) == True
assert search_rec(7, [4,9,8,'x',2], 5) == False


Exercise - rev¶

Write a recursive function rev which given a list and an index i, RETURN a NEW sublist with the elements starting from i included in reverse order.

• lst is a list of $$n$$ integers

• i is an index between 0 included and $$n$$ included

>>> rev(['a','b','c','d','e'], i=0)
['e', 'd', 'c', 'b', 'a']
>>> rev(['a','b','c','d','e'], i=2)
['e', 'd', 'c']


ONLY USE .append() to build return list

MUST execute in $$O(n)$$

DO NOT use for, while cycles nor list comprehensions

DO NOT use .reverse() nor reversed()

DO NOT use slices

DO NOT use .extend method nor + operator - they’re expensive !

DO NOT use in operator, .find nor .index methods

Show solution
[41]:

def rev(lst, i=0):
raise Exception('TODO IMPLEMENT ME !')

assert(rev([])) == []
assert(rev(['z'])) == ['z']
assert(rev(['a','z'])) == ['z','a']

chars = ['a','b','c','d','e']
assert(rev(chars)) == ['e', 'd', 'c', 'b', 'a']
assert chars == ['a','b','c','d','e']


Exercise - unnest¶

Write a recursive function unnest_rec which takes a list with an element and a nested list, and outputs a NEW list with all the nested elements in sequence.

DO NOT use for nor while cycles or list comprehensions

DO NOT use slices

MUST execute in $$O(n)$$

NOTE 1: if you use the + operator or .extend methods, you will likely get a $$O(n^2)$$ complexity (remember + takes $$O(n)$$ and .extend $$O(m)$$).

.append would be much faster, but to do so you need the parameter acc, which is a list into which you will accumulate the elements found during the recursion. At the end of recursion you will return acc

NOTE 2: last list contains only one element

Show solution
[42]:


def unnest_rec(lst, acc):
raise Exception('TODO IMPLEMENT ME !')

def unnest(lst):
return unnest_rec(lst, [])

print(unnest(['a',['b',['c']]]))

assert unnest([]) == []
assert unnest(['a']) == ['a']
assert unnest(['a',['b']]) == ['a', 'b']
assert unnest(['a',['b',['c']]]) == ['a','b','c']
assert unnest(['z',['q',['r']]]) == ['z','q','r']
assert unnest(['a',['b',['c',['d',['e']]]]]) == ['a','b','c','d','e']

['a', 'b', 'c']


Look at the following code:

1. Is recursion eventually terminating after some time?

2. What’s the time complexity?

3. What is this function doing?

To get a better understanding, try adding an integer optional parameter level=0 with the recursion level, and print the function passages indented according to the recursion level

• lst : list of $$n$$ integers

[43]:

def cadillac(lst):

if len(lst) == 0:
return 0

if len(lst) == 1:
return lst[0]

k = len(lst) // 2
return p + q


Code with added debug prints and level=0 param:

Show solution
[44]:

def cadillac(lst, level=0):
raise Exception('TODO IMPLEMENT ME !')

print(res)
assert res == 49

 [4, 2, 6, 2, 9, 4, 8, 1, 5, 7, 1]
[4, 2, 6, 2, 9]
[4, 2]
[4]
p= 4
[2]
q= 2
p= 6
[6, 2, 9]
[6]
p= 6
[2, 9]
[2]
p= 2
[9]
q= 9
q= 11
q= 17
p= 23
[4, 8, 1, 5, 7, 1]
[4, 8, 1]
[4]
p= 4
[8, 1]
[8]
p= 8
[1]
q= 1
q= 9
p= 13
[5, 7, 1]
[5]
p= 5
[7, 1]
[7]
p= 7
[1]
q= 1
q= 8
q= 13
q= 26
49


Exercise - ducati¶

The following function is similar to the previous one, but doesn’t create slices

• lst : list of $$n$$ integers

• i : left integer index included

• j : right integer index excluded

1. Does it behave the same?

2. What’s the time complexity?

[45]:

def ducati_rec(lst, i,j):
interval =  j - i

if interval == 0:
return 0

if interval == 1:
return lst[i]

k = i + (interval // 2)
p = ducati_rec(lst,i,k)
q = ducati_rec(lst,k,j)
return p + q

def ducati(lst):
return ducati_rec(lst,0,len(lst))


Recursion - more exercises¶

Computer Science Circles: nice chapter about recursion with some exercises

W3 Resources: some recursion exercises

Edabit: Search by setting ‘Recursion’ as tag - notice many provided solutions are not actually in recursive format.

Geeks for geeks (Pyhton solutions are typically provided down in the pages):

[ ]: