Algorithm analysis

Introduction

Life’s too easy so let’s complicate it, shall we?

Are you ready to fight the invisible enemy of computational complexity? If not, please read the references first:

List performance

Python lists are generic containers, they are useful in a variety of scenarios but sometimes their perfomance can be disappointing, so it’s best to know and avoid potentially expensive operations. Table from the Python DS book Chapter 2.6: Lists:

Operation

Big-O Efficiency

Operation

Big-O Efficiency

index []

\(O(1)\)

in (contains)

\(O(n)\)

index assignment

\(O(1)\)

get slice [x:y]

\(O(k)\)

append

\(O(1)\)

del slice

\(O(n)\)

pop()

\(O(1)\)

set slice

\(O(n+k)\)

pop(i)

\(O(n)\)

reverse

\(O(n)\)

extend(lb)

\(O(m)\)

la + lb

\(O(n+m)\)

insert(i,item)

\(O(n)\)

sort

\(O(n \log{n})\)

del operator

\(O(n)\)

multiply

\(O(nk)\)

remove

\(O(n)\)

min

\(O(n)\)

iteration

\(O(n)\)

max

\(O(n)\)

len(lst)

\(O(1)\)

sum

\(O(n)\)

equality

\(O(n)\)

Fast or not?

Given a list x of \(n\) characters, for each of the following expressions, try giving the complexity:

  1. x[n // 2]
    
  2. x[n // 2] = 'd'
    
  3. x.append('c')
    
  4. x.insert(0, 'd')
    
  5. x + x
    
  6. x[3:5]
    
  7. x.sort()
    
  8. len(x)
    
  9. x.pop(1)
    
  10. x.pop(-1)
    
  11. 'z' in x
    
  12. x.extend(1000)
    
  13. y = [c for c in x]
    
  14. list(range(n)) == list(range(n))
    

Sublist iteration performance

get slice time complexity is \(O(k)\), but what about memory? It’s the same!

So if you want to iterate a part of a list, beware of slicing! For example, depending pn Python version you have, slicing a list like this can occupy much more memory than necessary:

[1]:
                 #                               memory occupied
r = range(1000)  # Python 2 creates a list:           O(n)
                 # Python 3 creates a 'range' object: O(1)

#                                   same for range slices: memory occupied
print([2*y for y in r[100:200]])  #  r[100:200] Python 2        O(k)
                                  #             Python 3        O(1)
[200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398]

The reason is that, according to your Python interpreter, slicing like r[100:200]at loop start can create a new list. If we want to explicitly tell Python we just want to iterate through a sequence, we can use the so-called itertools. In particular, the islice method is handy, we can rewrite the list comprehension above with it like this:

[2]:
import itertools

r = range(1000)

print([2*y for y in itertools.islice(r, 100, 200)])
[200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398]

Some formulas

\[\begin{split}\begin{gather*} & & & & && \text{Complexity}\\ 1 + 2 + 3 + \dots + n & = & \sum_{i=1}^n{i} & = & \frac{n^2+n}{2} && O(n^2) \\ 1^2 + 2^2 + 3^2 + \dots + n^2 & = & \sum_{i=1}^n{i^2} & = & \frac{n(n+1)(2n+1)}{6} && O(n^3) \\ \end{gather*}\end{split}\]

Lists - exercises

Who’s the fastest? Let’s start the racing game!

For each of the following code samples, try giving the worst-case running time in Big O notation.

Exercise - rollsroyce

  • x : int

  • lst : list of \(n\) integers

[3]:
def rollsroyce(x, lst):
    return x in lst
Show answer

Exercise - honda

  • x : list of \(m\) elements

  • lst: list of lists with \(n\) rows and \(m\) columns

[4]:
def honda(x, lst):
    return x in lst
Show answer

Exercise - lamborghini

  • lst : list of \(n\) elements

  • m : int

[5]:
def lamborghini(lst, m):
    return lst * m
Show answer

Exercise - maserati

  • lst : list of \(n\) elements

  • m : int

[6]:
def maserati(lst, m):
    return [m for el in lst]
Show answer

Exercise - toyota

  • lst : list of \(n\) elements

  • m : int

HINT: try rewriting code as a for, then add costs

[7]:
def toyota(lst, m):
    return [lst[:m] for el in lst]
Show answer

Exercise - mercedes

  • lst : list of \(n\) elements

[8]:
def mercedes(lst):
    for i in range(len(lst)):
        lst.append('z')
    return lst
Show answer

Exercise - acura

  • lst : list of \(n\) elements

  • m : int

[9]:
def acura(lst, m):
    return lst[:m] + lst
Show answer

Exercise - alfaromeo

  • lst : list of \(n\) elements

[10]:
def alfaromeo(lst):
    return [(lst[0] + lst[-1]) for i in range(len(lst))]
Show answer

Exercise - jeep

lst : list of \(n\) elements

[11]:

def jeep(lst): return [lst[:i] for i in range(1, len(lst))]
Show answer

Exercise - chevrolet

lst : list of \(n\) elements

[12]:
def chevrolet(lst):
    for i in range(len(lst)):
        lst.insert(0,'z')
    return lst
Show answer

Exercise - kia

lst : list of \(n\) characters

  1. What is the worst case complexity?

  2. What is the best case complexity?

[13]:
def kia(lst):
    for i in range(len(lst) // 2):
        lst.remove('c')
Show answer

Exercise - aston_martin

lst: list of \(n\) elements

[14]:
def aston_martin(lst):
    ret = []
    if lst[0] > 5:

        for i in range(len(lst)):
            for j in range(len(lst)):
                ret.append(lst[i])
    else:
        for i in range(len(lst)):
            ret.append(lst[i] * 2)
    return ret
Show answer

Exercise - subaru

  • mat : list of \(n\) lists of \(n\) elements each

  1. What’s the complexity?

  2. Follow up: Can you write a more perfomant version?

[15]:
def subaru(mat):
    n = len(mat)
    ret = []
    for i in range(n):
        for j in range(n):
            if i == n - j - 1:
                ret.append(mat[i][j])
    return ret
Show answer

Exercise - dodge

  • mat : list of \(n\) lists of \(m\) columns

[16]:
def dodge(mat):
    n = len(mat)
    m = len(mat[0])
    ret = []
    for i in range(n):
        for j in range(1,m):
            if max(mat[i][:j]) == 3:
                ret.append(mat[i][j])
    return ret
Show answer

Exercise - lotus

lst : list of \(n\) elements

[17]:
def lotus(lst):
    while len(lst) > 0:
        el = lst.pop(len(lst) // 2)
        ret.append(el)
    return ret
Show answer

Exercise - jaguar

  • mat : list of \(n\) lists of \(m\) columns

[18]:
def jaguar(mat):
    n = len(mat)
    m = len(mat[0])
    for i in range(1, n):
        if mat[i-1][0] in mat[i]:
            for j in range(n-1,0,-1):
                mat[j] = [2*x for x in mat[j]]

    return mat
Show answer

Exercise - hyundai

  • lst : list of \(n\) elements

  • m: integer, \(0 <= m <= n\)

[19]:
def hyundai(lst, m):

    lb = ['a' in range(m)]

    lb.extend(lst)

    for i in range(m):
        lb.remove('a')
Show answer

Exercise - buick

lb : list of \(n\) positive integers

HINT: Try giving a tight complexity bound for the function, considering we assume lb is only made of positive integers

[20]:
def buick(lb):
    la = []
    n = len(lb)
    while len(lb) > 0:
        el = lb.pop()
        la.insert(0,el)
        if sum(la) == 10:
            for i in range(n):
                for j in range(n):
                    la[0] += 1


    return la
Show answer

Exercise - saab

lst : list of \(n\) integers

  1. To solve this properly you will need to resort to some classical analysis:

  1. for determining worst case, try proving the sum is less than another sum

  2. to see if the bound is tight, try determining the best case complexity \(\Omega\) by proving that the sum made by half of the terms (which half?) is greater than another sum: you should get the same result as worst case thus demonstrating you’ve got the \(\Theta\) complexity.

  1. Can we write a much faster function?

[21]:
def saab(lst):

    for i in range(1, len(lst)):
        lst.sort(lst[0:i])
Show answer

Sets performance

Let’s review set average operations costs from Python official wiki:

Operation

Big-O Efficiency

Operation

Big-O Efficiency

x in s

\(O(1)\)

union sa | sb

\(O(n + m)\)

intersection sa & sb

\(O(min(n,m))\)

difference sa-sb

\(O(n)\)

sa.difference_update(sb)

\(O(m)\)

set(sequence)

\(O(n)\)

len(s)

\(O(1)\)

min(s)

\(O(n)\)

sum(s)

\(O(n)\)

max(s)

\(O(n)\)

Sets are pretty fast but notice in degenerate cases we might still get \(O(n)\) complexities instead of \(O(1)\) (like when there too many keys collisions or with ill-devised hashing functions). For the purposes of this course but won’t consider those occurrences.

WARNING: LOOPING ON SETS IS SLOW!

A loop typically takes \(O(n)\), but sets were designed to offer \(O(1)\) read / write access by using the square brackets operator [] or containment operator in.

If you’re thinking about using a loop, always ask yourself if you could rewrite your algorithm in some other way

HAVE YOU READ THE WARNING ABOVE??

People who don’t understand it are usually condemned to a dreaded yet-another-retake-of-sciprog exam. You’re warned.

Sets - exercises

Find the worst case compexity in Big-O notation for each of the following exercises.

Exercise - land_rover

  • sa : set of \(n\) integers

  • sb : set of \(m\) integers

[22]:
def land_rover(sa, sb):
    return [el in sa for el in sb]
Show answer

Exercise - volkswagen

  • la : list of \(n\) integers

  • lb : list of \(m\) integers

[23]:
def volkswagen(la, lb):
    sa = set(la)
    sc = set()
    for el in lb:
        if el % 2 != 0:
            sc.add(el)
    return (sa | sc) - (sa & sc)
Show answer

Exercise - pontiac

mat : list of \(n\) lists of \(m\) integers

[24]:
def pontiac(mat):
    ret = []
    sets = [set(row) for row in mat]
    for i in range(len(mat)):
        for s in sets:
            if mat[i][0]*2 in s:
                ret.append(mat[i][0])
    return ret
Show answer

Exercise - volvo

  • la : list of \(n\) integers

  • lb : list of \(m\) integers

[25]:

def volvo(la, lb): ret = set(lb) for i in range(len(la)): sli = la[i+1:] if la[i] in sli: ret.add(la[i]) for el in la: if el in ret: ret.add(el*2) return ret
Show answer

Exercise - chrysler

la: list of \(n\) integers

HINT 1: remember the sum of first \(n\) integers is \(\displaystyle\sum_{i=1}^n{i}=\frac{n^2+n}{2}\)

HINT 2: remember the sum of first \(n\) squared integers is \(\displaystyle\sum_{i=1}^n{i^2}=\frac{n(n+1)(2n+1)}{6}\)

[26]:
def chrysler(la):

    s = {0}
    lb = []
    for i in range(len(la)):
        lb.extend([la[i]] * i)
        if min(s) < 20:
            s = {x for x in la}
        s = s | set(lb)
    return s
Show answer

Dictionaries performance

Let’s review dictionary operations costs from the Python DS book Chapter 3.7: Dictionaries:

Operation

Example

Big-O Efficiency

key containment

key in d

\(O(1)\)

read a value given a key

d[key]

\(O(1)\)

write an association

d[key] = value

\(O(1)\)

delete an association

del d[key]

\(O(1)\)

iterating keys / 1

for key in d:

\(O(n)\)

iterating keys / 2

for key in d.keys():

\(O(n)\)

iterating values

for value in d.values():

\(O(n)\)

iterating associations

for key,value in d.items():

\(O(n)\)

shallow copy

d.copy()

\(O(n)\)

Dictionaries are pretty fast but notice in degenerate cases we might still get \(O(n)\) complexity instead of \(O(1)\), like when there too many keys collisions or with ill-devised hashing functions. For the purposes of this course but won’t consider those occurrences.

WARNING: LOOPING ON DICTIONARIES IS SLOW!

A loop typically takes \(O(n)\), but dictionaries were designed to offer \(O(1)\) read / write access by using the square brackets operator [] or containment operator in.

If you’re thinking about using a loop, always ask yourself if you could rewrite your algorithm in some other way

HAVE YOU READ THE WARNING ABOVE??

People who don’t understand it are usually condemned to a dreaded yet-another-retake-of-sciprog exam. You’re warned.

Dictionaries - exercises

Find the worst case compexity in Big-O notation for each of the following exercises.

Exercise - tesla

  • x : int

  • d: dictionary with \(n\) key/value mappings

[27]:
def tesla(x, d):
    return x in d
Show answer

Exercise - bmw

  • d : dictionary of \(n\) key/value mappings

  • el : int

[28]:
def bmw(d, el):

    if el in d:
        return d[el]

    return None
Show answer

Exercise - nissan

  • d : dictionary of \(n\) key/value mappings

  • el : int

Does this code behaves differently from code before? Compare complexity.

[29]:
def nissan(d, el):

    for k in d:
        if el == k:
            return d[el]

    return None
Show answer

Exercise - ferrari

  • x : int

  • d: dictionary with \(n\) key/value mappings

Find the complexity and then ask yourself if we can do any better.

[30]:
def ferrari(x, d):
    return x in d.values()
Show answer

Exercise - bentley

  • x : int

  • d : dictionary with \(n\) key/value mappings

[31]:
def bentley(x, d):

    return x,x in d.items()
Show answer

Exercise - mclaren

  • d : dictionary of \(n\) key/value mappings

  • k : int

[32]:
def mclaren(d,k):
    ret = []
    for i in range(k):
        if i in d:
            ret.append(d[i])
    return ret
Show answer

Exercise - fiat

  • d : dictionary of \(n\) key/value mappings

  • k : int

Compare this function to previous one. Is there any difference?

[33]:
def fiat(d,k):
    lst = list(d.items())
    ret = []
    for i,v in lst:
        if i >= 0 and i < k:
            ret.append(d[i])
    return ret
Show answer

Exercise - mustang

  • d : dictionary of \(n\) key/value mappings

  • lst : list of \(m\) integers

[34]:
def mustang(d, lst):

    ret = {}
    for el in lst:

        if el in d:
            if el in list(d.values()):
                ret[el] = True
        else:
            ret[el] = False
    return ret
Show answer

Recursion

References:

When confronted with a problem, we can try to solve it by splitting its dimension in half (or more), look for solutions in each of the halves and then decide what to do with the found solutions, if any.

Several cases may occur:

  1. No solution is found

  2. One solution is found

  3. Two solutions are found

case 1): we can only give up

case 2): we have only one solution, so we can just return that one

case 3): we have two solutions, so we need to decide what is the purpose of the algorithm

Is the purpose to …

  • find all possible solutions? Then we return both of them.

  • find the best solution, according to some measure of ‘goodness’? Then we measure each of the solutions and give back the highest scoring one.

  • always provide a combination of existing solutions, according to some combination method? Then we combine the found solutions and give them back

The Master Theorem

To get a precise calculation of complexity of recursive algorithms, let’s recall the Master Theorem:

Let \(a\) and \(b\) two integer constants such that \(\normalsize a \geq 1\) and \(\normalsize b \geq 2\), and let \(\normalsize c, \beta\) be two real constants such that \(\normalsize c > 0\) and \(\normalsize \beta \geq 0\). Let \(\normalsize T(n)\) be defined by the following recurrence:

\[\begin{split}\large T(n) = \begin{cases} a T(n/b)+cn^\beta & n > 1 \\ \Theta(n) & n \leq 1 \end{cases}\end{split}\]

Given \(\large {\alpha = \frac{\log{a}}{\log{b}} = \log_b{a}}\), then:

\[\begin{split}\large T(n) = \begin{cases} \Theta(n^\alpha) & \alpha > \beta \\ \Theta(n^\alpha \log{n}) & \alpha = \beta \\ \Theta(n^\beta) & \alpha < \beta \end{cases}\end{split}\]

The theorem covers cases when:

  • input of size \(n\) is split in \(b\) subproblems

  • the algorithm is applied recursively \(n\) times

  • \(\normalsize cn^\beta\) is the cost of the algorithm after the recursive steps

Recursion - exercises

Exercise - yamaha

  1. What does this function do?

  2. What’s the time complexity? Give both an informal argument and a formal one using the Master Theorem.

  • lst: a list of \(n\) numbers

  • i: an index from 0 to \(n\) included

level is the recursion level, we can use it to help ourselves while debugging

[35]:

def yamaha(lst, i, level=0): print(' ' * level, lst) print(' ' * level, i) if i == 0: return 0 else: q = lst[i-1] + yamaha(lst, i-1, level + 1) print(' ' * level, q) return q print(yamaha([5,7,2,9], 4))
 [5, 7, 2, 9]
 4
   [5, 7, 2, 9]
   3
     [5, 7, 2, 9]
     2
       [5, 7, 2, 9]
       1
         [5, 7, 2, 9]
         0
       5
     12
   14
 23
23
Show answer

Exercise - cadillac

Look at the following code:

  1. To get a better understanding, try adding an integer optional parameter level=0 with the recursion level, and print the function passages indented according to the recursion level

  2. What is this function doing?

  3. Is recursion eventually terminating after some time?

  4. What’s the time complexity? Try giving both an informal argument and a precise one using the Master Theorem

  • lst : list of \(n\) integers

[37]:
def cadillac(lst):

    if len(lst) == 0:
        return 0

    if len(lst) == 1:
        return lst[0]

    k = len(lst) // 2
    p = cadillac(lst[:k])
    q = cadillac(lst[k:])
    return p + q

a) Solution for code with added debug prints and level=0 param:

Show solution
[38]:
def cadillac(lst, level=0):
    raise Exception('TODO IMPLEMENT ME !')

res = cadillac([4,2,6,2,9,4,8,1,5,7,1])
print(res)
assert res == 49
 [4, 2, 6, 2, 9, 4, 8, 1, 5, 7, 1]
     [4, 2, 6, 2, 9]
         [4, 2]
             [4]
         p= 4
             [2]
         q= 2
     p= 6
         [6, 2, 9]
             [6]
         p= 6
             [2, 9]
                 [2]
             p= 2
                 [9]
             q= 9
         q= 11
     q= 17
 p= 23
     [4, 8, 1, 5, 7, 1]
         [4, 8, 1]
             [4]
         p= 4
             [8, 1]
                 [8]
             p= 8
                 [1]
             q= 1
         q= 9
     p= 13
         [5, 7, 1]
             [5]
         p= 5
             [7, 1]
                 [7]
             p= 7
                 [1]
             q= 1
         q= 8
     q= 13
 q= 26
49

b,c,d) Answers to remaining questions:

Show answer

Exercise - ducati

The following function is similar to the previous one, but doesn’t create slices

  • lst : list of \(n\) integers

  • i : left integer index included

  • j : right integer index excluded

a). Does it behave the same?

b). What’s the time complexity? Try giving both an informal argument and a precise one through the Master Theorem

[39]:
def ducati_rec(lst, i,j):
    interval =  j - i

    if interval == 0:
        return 0

    if interval == 1:
        return lst[i]

    k = i + (interval // 2)
    p = ducati_rec(lst,i,k)
    q = ducati_rec(lst,k,j)
    return p + q

def ducati(lst):
    return ducati_rec(lst,0,len(lst))
Show answer

Analysis - more exercises

Theory exercises (complexity, tree visits, graph visits) - by Alberto Montresor

[ ]: