Exam - Tue 02, July 2019
Scientific Programming - Data Science Master @ University of Trento
Download exercises and solution
Introduction
Taking part to this exam erases any vote you had before
Grading
Correct implementations: Correct implementations with the required complexity grant you full grade.
Partial implementations: Partial implementations might still give you a few points. If you just can’t solve an exercise, try to solve it at least for some subcase (i.e. array of fixed size 2) commenting why you did so.
Bonus point: One bonus point can be earned by writing stylish code. You got style if you:
do not infringe the Commandments
write pythonic code
avoid convoluted code like i.e.
if x > 5: return True else: return False
when you could write just
return x > 5
Valid code
WARNING: MAKE SURE ALL EXERCISE FILES AT LEAST COMPILE !!! 10 MINS BEFORE THE END OF THE EXAM I WILL ASK YOU TO DO A FINAL CLEAN UP OF THE CODE
WARNING: ONLY IMPLEMENTATIONS OF THE PROVIDED FUNCTION SIGNATURES WILL BE EVALUATED !!!!!!!!!
For example, if you are given to implement:
def f(x):
raise Exception("TODO implement me")
and you ship this code:
def my_f(x):
# a super fast, correct and stylish implementation
def f(x):
raise Exception("TODO implement me")
We will assess only the latter one f(x)
, and conclude it doesn’t work at all :P !!!!!!!
Helper functions
Still, you are allowed to define any extra helper function you might need. If your f(x)
implementation calls some other function you defined like my_f
here, it is ok:
# Not called by f, will get ignored:
def my_g(x):
# bla
# Called by f, will be graded:
def my_f(y,z):
# bla
def f(x):
my_f(x,5)
How to edit and run
To edit the files, you can use any editor of your choice, you can find them under Applications->Programming:
Visual Studio Code
Editra is easy to use, you can find it under Applications->Programming->Editra.
Others could be GEdit (simpler), or PyCharm (more complex).
To run the tests, use the Terminal which can be found in Accessories -> Terminal
IMPORTANT: Pay close attention to the comments of the functions.
WARNING: DON’T modify function signatures! Just provide the implementation.
WARNING: DON’T change the existing test methods, just add new ones !!! You can add as many as you want.
WARNING: DON’T create other files. If you still do it, they won’t be evaluated.
Debugging
If you need to print some debugging information, you are allowed to put extra print statements in the function bodies.
WARNING: even if print statements are allowed, be careful with prints that might break your function!
For example, avoid stuff like this:
x = 0
print(1/x)
What to do
Download
sciprog-ds-2019-07-02-exam.zip
and extract it on your desktop. Folder content should be like this:
sciprog-ds-2019-07-02-FIRSTNAME-LASTNAME-ID
exam-2019-07-02.ipynb
theory.txt
linked_sort.py
linked_sort_test.py
stacktris.py
stacktris_test.py
jupman.py
sciprog.py
Rename
sciprog-ds-2019-07-02-FIRSTNAME-LASTNAME-ID
folder: put your name, lastname an id number, likesciprog-ds-2019-07-02-john-doe-432432
From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.
Edit the files following the instructions in this worksheet for each exercise.
When done:
if you have unitn login: zip and send to examina.icts.unitn.it/studente
If you don’t have unitn login: tell instructors and we will download your work manually
Part A
Open Jupyter and start editing this notebook exam-2019-07-02.ipynb
A1 Botteghe storiche
NOTICE: this part of the exam was ported to softpython website
There you can find a more curated version (notice it may be longer than here)
You will work on the dataset of Botteghe storiche del Trentino (small shops, workshops of Trentino)
Data provider: Provincia Autonoma di Trento - dati.trentino.it
A function load_botteghe
is given to load the dataset (you don’t need to implement it):
[2]:
def load_botteghe():
"""Loads file data and RETURN a list of dictionaries with the botteghe dati
"""
import csv
with open('data/botteghe.csv', newline='', encoding='utf-8',) as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
lst = []
for d in reader:
lst.append(d)
return lst
botteghe = load_botteghe()
IMPORTANT: look at the dataset !
Here we show only first 5 rows, but to get a clear picture of the dataset you should explore it further.
[3]:
botteghe[:5]
[3]:
[OrderedDict([('Numero', '1'),
('Insegna', 'BAZZANELLA RENATA'),
('Indirizzo', 'Via del Lagorai'),
('Civico', '30'),
('Comune', 'Sover'),
('Cap', '38068'),
('Frazione/Località', 'Piscine di Sover'),
('Note', 'generi misti, bar - ristorante')]),
OrderedDict([('Numero', '2'),
('Insegna', 'CONFEZIONI MONTIBELLER S.R.L.'),
('Indirizzo', 'Corso Ausugum'),
('Civico', '48'),
('Comune', 'Borgo Valsugana'),
('Cap', '38051'),
('Frazione/Località', ''),
('Note', 'esercizio commerciale')]),
OrderedDict([('Numero', '3'),
('Insegna', 'FOTOGRAFICA TRINTINAGLIA UMBERTO S.N.C.'),
('Indirizzo', 'Largo Dordi'),
('Civico', '8'),
('Comune', 'Borgo Valsugana'),
('Cap', '38051'),
('Frazione/Località', ''),
('Note', 'esercizio commerciale, attività artigianale')]),
OrderedDict([('Numero', '4'),
('Insegna', 'BAR SERAFINI DI MINATI RENZO'),
('Indirizzo', ''),
('Civico', '24'),
('Comune', 'Grigno'),
('Cap', '38055'),
('Frazione/Località', 'Serafini'),
('Note', 'esercizio commerciale')]),
OrderedDict([('Numero', '6'),
('Insegna', 'SEMBENINI GINO & FIGLI S.R.L.'),
('Indirizzo', 'Via S. Francesco'),
('Civico', '35'),
('Comune', 'Riva del Garda'),
('Cap', '38066'),
('Frazione/Località', ''),
('Note', '')])]
We would like to know which different categories of bottega there are, and count them. Unfortunately, there is no specific field for Categoria, so we will need to extract this information from other fields such as Insegna
and Note
. For example, this Insegna
contains the category BAR
, while the Note
(commercial enterprise) is a bit too generic to be useful:
'Insegna': 'BAR SERAFINI DI MINATI RENZO',
'Note': 'esercizio commerciale',
while this other Insegna
contains just the owner name and Note
holds both the categories bar
and ristorante
:
'Insegna': 'BAZZANELLA RENATA',
'Note': 'generi misti, bar - ristorante',
As you see, data is non uniform:
sometimes the category is in the
Insegna
sometimes is in the
Note
sometimes is in both
sometimes is lowercase
sometimes is uppercase
sometimes is single
sometimes is multiple (
bar - ristorante
)
First we want to extract all categories we can find, and rank them according their frequency, from most frequent to least frequent.
To do so, you need to
count all words you can find in both
Insegna
andNote
fields, and sort them. Note you need to normalize the uppercase.consider a category relevant if it is present at least 11 times in the dataset.
filter non relevant words: some words like prepositions, type of company (
'S.N.C'
,S.R.L.
, ..), etc will appear a lot, and will need to be ignored. To detect them, you are given a list calledstopwords
.
NOTE: the rules above do not actually extract all the categories, for the sake of the exercise we only keep the most frequent ones.
A1.1 rank_categories
Show solution[4]:
def rank_categories(db, stopwords):
raise Exception('TODO IMPLEMENT ME !')
stopwords = ['',
'S.N.C.', 'SNC','S.A.S.', 'S.R.L.', 'S.C.A.R.L.', 'SCARL','S.A.S', 'COMMERCIALE','FAMIGLIA','COOPERATIVA',
'-', '&', 'C.', 'ESERCIZIO',
'IL', 'DE', 'DI','A', 'DA', 'E', 'LA', 'AL', 'DEL', 'ALLA', ]
categories = rank_categories(botteghe, stopwords)
categories
[4]:
[('BAR', 191),
('RISTORANTE', 150),
('HOTEL', 67),
('ALBERGO', 64),
('MACELLERIA', 27),
('PANIFICIO', 22),
('CALZATURE', 21),
('FARMACIA', 21),
('ALIMENTARI', 20),
('PIZZERIA', 16),
('SPORT', 16),
('TABACCHI', 12),
('FERRAMENTA', 12),
('BAZAR', 11)]
A1.2 plot
Now plot the 10 most frequent categories. Please pay attention to plot title, width and height, axis labels. Everything MUST display in a readable way.
[5]:
# write here
[6]:
A1.3 enrich
Once you found the categories, implement function enrich
, which takes the db and previously computed categories, and RETURN a NEW DB where the dictionaries are enriched with a new field Categorie
, which holds a list of the categories a particular bottega belongs to.
[7]:
def enrich(db, categories):
raise Exception('TODO IMPLEMENT ME !')
new_db = enrich(botteghe, rank_categories(botteghe, stopwords))
new_db[:6] #NOTE here we only show a sample
[7]:
[{'Numero': '1',
'Insegna': 'BAZZANELLA RENATA',
'Indirizzo': 'Via del Lagorai',
'Civico': '30',
'Comune': 'Sover',
'Cap': '38068',
'Frazione/Località': 'Piscine di Sover',
'Note': 'generi misti, bar - ristorante',
'Categorie': ['BAR', 'RISTORANTE']},
{'Numero': '2',
'Insegna': 'CONFEZIONI MONTIBELLER S.R.L.',
'Indirizzo': 'Corso Ausugum',
'Civico': '48',
'Comune': 'Borgo Valsugana',
'Cap': '38051',
'Frazione/Località': '',
'Note': 'esercizio commerciale',
'Categorie': []},
{'Numero': '3',
'Insegna': 'FOTOGRAFICA TRINTINAGLIA UMBERTO S.N.C.',
'Indirizzo': 'Largo Dordi',
'Civico': '8',
'Comune': 'Borgo Valsugana',
'Cap': '38051',
'Frazione/Località': '',
'Note': 'esercizio commerciale, attività artigianale',
'Categorie': []},
{'Numero': '4',
'Insegna': 'BAR SERAFINI DI MINATI RENZO',
'Indirizzo': '',
'Civico': '24',
'Comune': 'Grigno',
'Cap': '38055',
'Frazione/Località': 'Serafini',
'Note': 'esercizio commerciale',
'Categorie': ['BAR']},
{'Numero': '6',
'Insegna': 'SEMBENINI GINO & FIGLI S.R.L.',
'Indirizzo': 'Via S. Francesco',
'Civico': '35',
'Comune': 'Riva del Garda',
'Cap': '38066',
'Frazione/Località': '',
'Note': '',
'Categorie': []},
{'Numero': '7',
'Insegna': 'HOTEL RISTORANTE PIZZERIA “ALLA NAVE”',
'Indirizzo': 'Via Nazionale',
'Civico': '29',
'Comune': 'Lavis',
'Cap': '38015',
'Frazione/Località': 'Nave San Felice',
'Note': '',
'Categorie': ['RISTORANTE', 'HOTEL', 'PIZZERIA']}]
A2 dump
The multinational ToxiCorp wants to hire you for devising an automated truck driver which will deposit highly contaminated waste in the illegal dumps they own worldwide. You find it ethically questionable, but they pay well, so you accept.
A dump is modelled as a rectangular region of dimensions nrow
and ncol
, implemented as a list of lists matrix. Every cell i
, j
contains the tons of waste present, and can contain at most 7
tons of waste.
The dumpster truck will transport q
tons of waste, and try to fill the dump by depositing waste in the first row, filling each cell up to 7 tons. When the first row is filled, it will proceed to the second one from the left , then to the third one again from the left until there is no waste to dispose of.
Function dump(m, q)
takes as input the dump mat
and the number of tons q
to dispose of, and RETURN a NEW list representing a plan with the sequence of tons to dispose. If waste to dispose exceeds dump capacity, raises ValueError
.
NOTE: the function does not modify the matrix
Example:
m = [
[5,4,6],
[4,7,1],
[3,2,6],
[3,6,2],
]
dump(m, 22)
[2, 3, 1, 3, 0, 6, 4, 3]
For first row we dispose of 2,3,1 tons in three cells, for second row we dispose of 3,0,6 tons in three cells, for third row we only dispose 4, 3 tons in two cells as limit q=22 is reached.
Show solution[8]:
def dump(mat, q):
raise Exception('TODO IMPLEMENT ME !')
m1 = [
[5]
]
assert dump(m1,0) == [] # nothing to dump
m2 = [
[4]
]
assert dump(m2,2) == [2]
m3 = [
[5,4]
]
assert dump(m3,3) == [2, 1]
m3 = [
[5,7,3]
]
assert dump(m3,3) == [2, 0, 1]
m5 = [
[2,5], # 5 2
[4,3] # 3 1
]
assert dump(m5,11) == [5,2,3,1]
m6 = [ # tons to dump in each cell
[5,4,6], # 2 3 1
[4,7,1], # 3 0 6
[3,2,6], # 4 3 0
[3,6,2], # 0 0 0
]
assert dump(m6, 22) == [2,3,1,3,0,6,4,3]
try:
dump ([[5]], 10)
raise Exception("Should have failed !")
except ValueError:
pass
Part B
B1 Theory
Write the solution in separate ``theory.txt`` file
Let L1
and L2
be two lists containing n
lists, each of them of size n
. Compute the computational complexity of function fun()
with respect to n
.
def fun(L1,L2):
for r1 in L1:
for val in r1:
for r2 in L2:
if val = sum(r2):
print(val)
B2 Linked List sorting
Open a text editor and edit file linked_sort.py
B2.1 bubble_sort
You will implement bubble sort on a LinkedList
.
def bubble_sort(self):
""" Sorts in-place this linked list using the method of bubble sort
- MUST execute in O(n^2) where n is the length of the linked list
"""
Testing: python3 -m unittest linked_sort_test.BubbleSortTest
As a reference, you can look at this example_bubble
implementation below that operates on regular python lists. Basically, you will have to translate the for
cycles into two suitable while
and use node pointers.
NOTE: this version of the algorithm is inefficient as we do not use j
in the inner loop: your linked list implementation can have this inefficiency as well.
[9]:
def example_bubble(plist):
for j in range(len(plist)):
for i in range(len(plist)):
if i + 1 < len(plist) and plist[i]>plist[i+1]:
temp = plist[i]
plist[i] = plist[i+1]
plist[i+1] = temp
my_list = [23, 34, 55, 32, 7777, 98, 3, 2, 1]
example_bubble(my_list)
print(my_list)
[1, 2, 3, 23, 32, 34, 55, 98, 7777]
B2.2 merge
Implement this method:
def merge(self,l2):
""" Assumes this linkedlist and l2 linkedlist contain integer numbers
sorted in ASCENDING order, and RETURN a NEW LinkedList with
all the numbers from this and l2 sorted in DESCENDING order
IMPORTANT 1: *MUST* EXECUTE IN O(n1+n2) TIME where n1 and n2 are
the sizes of this and l2 linked_list, respectively
IMPORTANT 2: *DO NOT* attempt to convert linked lists to
python lists!
"""
Testing: python3 -m unittest linked_sort_test.MergeTest
B3 Stacktris
Open a text editor and edit file stacktris.py
A Stacktris
is a data structure that operates like the famous game Tetris, with some restrictions:
Falling pieces can be either of length 1 or 2. We call them
1-block
and2-block
respectivelyThe pit has a fixed width of 3 columns
2-block
s can only be in horizontal
We print a Stacktris
like this:
\ j 012
i
4 | 11| # two 1-block
3 | 22| # one 2-block
2 | 1 | # one 1-block
1 |22 | # one 2-block
0 |1 1| # on the ground there are two 1-block
In Python, we model the Stacktris
as a class holding in the variable _stack
a list of lists of integers, which models the pit:
class Stacktris:
def __init__(self):
""" Creates a Stacktris
"""
self._stack = []
So in the situation above the _stack
variable would look like this (notice row order is inverted with respect to the print)
[
[1,0,1],
[2,2,0],
[0,1,0],
[0,2,2],
[0,1,1],
]
The class has three methods of interest which you will implement, drop1(j)
, drop2h(j)
and _shorten
Example
Let’s see an example:
[10]:
from stacktris_sol import *
st = Stacktris()
At the beginning the pit is empty:
[11]:
st
[11]:
Stacktris:
EMPTY
We can start by dropping from the ceiling a block of dimension 1 into the last column at index j=2
. By doing so, a new row will be created, and will be a list containing the numbers [0,0,1]
IMPORTANT: zeroes are not displayed
[12]:
st.drop1(2)
DEBUG: Stacktris:
| 1|
[12]:
[]
Now we drop an horizontal block of dimension 2 (a 2-block
) having the leftmost block at column j=1
. Since below in the pit there is already the 1
block we previosly put, the new block will fall and stay upon it. Internally, we will add a new row as a python list containing the numbers [0,2,2]
[13]:
st.drop2h(1)
DEBUG: Stacktris:
| 22|
| 1|
[13]:
[]
We see the zeroth column is empty, so if we drop there a 1-block
it will fall to the ground. Internally, the zeroth list will become [1,0,1]
:
[14]:
st.drop1(0)
DEBUG: Stacktris:
| 22|
|1 1|
[14]:
[]
Now we drop again a 2-block
at column j=2
, on top of the previously laid one. This will add a new row as list [0,2,2]
.
[15]:
st.drop2h(1)
DEBUG: Stacktris:
| 22|
| 22|
|1 1|
[15]:
[]
In the game Tetris, when a row becomes completely filled it disappears. So if we drop a 1-block
to the leftmost column, the mid line should be removed.
NOTE: The messages on the console are just debug print, the function drop1
only returns the extracted line [1,2,2]
:
[16]:
st.drop1(0)
DEBUG: Stacktris:
| 22|
|122|
|1 1|
DEBUG: POPPING [1, 2, 2]
DEBUG: Stacktris:
| 22|
|1 1|
[16]:
[1, 2, 2]
Now we insert another 2-block
starting at j=0
. It will fall upon the previously laid one:
[17]:
st.drop2h(0)
DEBUG: Stacktris:
|22 |
| 22|
|1 1|
[17]:
[]
We can complete teh topmost row by dropping a 1-block
to the rightmost column. As a result, the row will be removed from the stack and the row will be returned by the call to drop1
:
[18]:
st.drop1(2)
DEBUG: Stacktris:
|221|
| 22|
|1 1|
DEBUG: POPPING [2, 2, 1]
DEBUG: Stacktris:
| 22|
|1 1|
[18]:
[2, 2, 1]
Another line completion with a drop1
at column j=0
:
[19]:
st.drop1(0)
DEBUG: Stacktris:
|122|
|1 1|
DEBUG: POPPING [1, 2, 2]
DEBUG: Stacktris:
|1 1|
[19]:
[1, 2, 2]
We can finally empty the Stacktris by dropping a 1-block
in the mod column:
[20]:
st.drop1(1)
DEBUG: Stacktris:
|111|
DEBUG: POPPING [1, 1, 1]
DEBUG: Stacktris:
EMPTY
[20]:
[1, 1, 1]
B3.1 _shorten
Start by implementing this private method:
def _shorten(self):
""" Scans the Stacktris from top to bottom searching for a completely filled line:
- if found, remove it from the Stacktris and return it as a list.
- if not found, return an empty list.
"""
If you wish, you can add debug prints but they are not mandatory
Testing: python3 -m unittest stacktris_test.ShortenTest
B3.2 drop1
Once you are done with the previous function, implement drop1
method:
NOTE: In the implementation, feel free to call the previously implemented _shorten
method.
def drop1(self, j):
""" Drops a 1-block on column j.
- If another block is found, place the 1-block on top of that block,
otherwise place it on the ground.
- If, after the 1-block is placed, a row results completely filled, removes
the row and RETURN it. Otherwise, RETURN an empty list.
- if index `j` is outside bounds, raises ValueError
"""
Testing: python3 -m unittest stacktris_test.Drop1Test
B3.3 drop2h
Once you are done with the previous function, implement drop2
method:
def drop2h(self, j):
""" Drops a 2-block horizontally with left block on column j,
- If another block is found, place the 2-block on top of that block,
otherwise place it on the ground.
- If, after the 2-block is placed, a row results completely filled,
removes the row and RETURN it. Otherwise, RETURN an empty list.
- if index `j` is outside bounds, raises ValueError
"""
Testing: python3 -m unittest stacktris_test.Drop2hTest
[ ]: