Exam - Wed 23, Jan 2019¶
Scientific Programming - Data Science Master @ University of Trento
Download exercises and solution¶
What to do¶
Download
sciprog-ds-2019-01-23-exam.zip
and extract it on your desktop. Folder content should be like this:
sciprog-ds-2019-01-23-FIRSTNAME-LASTNAME-ID
exam-2019-01-23.ipynb
list.py
list_test.py
tree.py
tree_test.py
jupman.py
sciprog.py
Rename
sciprog-ds-2019-01-23-FIRSTNAME-LASTNAME-ID
folder: put your name, lastname an id number, likesciprog-ds-2019-01-23-john-doe-432432
From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.
Edit the files following the instructions in this worksheet for each exercise. Every exercise should take max 25 mins. If it takes longer, leave it and try another exercise.
When done:
if you have unitn login: zip and send to examina.icts.unitn.it/studente
If you don’t have unitn login: tell instructors and we will download your work manually
Part A¶
Open Jupyter and start editing this notebook exam-2019-01-23.ipynb
A.1 table_to_adj¶
Suppose you have a table expressed as a list of lists with headers like this:
[2]:
m0 = [
['Identifier','Price','Quantity'],
['a',1,1],
['b',5,8],
['c',2,6],
['d',8,5],
['e',7,3]
]
where a
, b
, c
etc are the row identifiers (imagine they represent items in a store), Price
and Quantity
are properties they might have. NOTE: here we put two properties, but they might have n
properties !
We want to transform such table into a graph-like format as a dictionary of lists, which relates store items as keys to the properties they might have. To include in the list both the property identifier and its value, we will use tuples. So you need to write a function that transforms the above input into this:
[3]:
res0 = {
'a':[('Price',1),('Quantity',1)],
'b':[('Price',5),('Quantity',8)],
'c':[('Price',2),('Quantity',6)],
'd':[('Price',8),('Quantity',5)],
'e':[('Price',7),('Quantity',3)]
}
[4]:
def table_to_adj(table):
raise Exception('TODO IMPLEMENT ME !')
m0 = [
['I','P','Q']
]
res0 = {}
assert res0 == table_to_adj(m0)
m1 = [
['Identifier','Price','Quantity'],
['a',1,1],
['b',5,8],
['c',2,6],
['d',8,5],
['e',7,3]
]
res1 = {
'a':[('Price',1),('Quantity',1)],
'b':[('Price',5),('Quantity',8)],
'c':[('Price',2),('Quantity',6)],
'd':[('Price',8),('Quantity',5)],
'e':[('Price',7),('Quantity',3)]
}
assert res1 == table_to_adj(m1)
m2 = [
['I','P','Q'],
['a','x','y'],
['b','w','z'],
['c','z','x'],
['d','w','w'],
['e','y','x']
]
res2 = {
'a':[('P','x'),('Q','y')],
'b':[('P','w'),('Q','z')],
'c':[('P','z'),('Q','x')],
'd':[('P','w'),('Q','w')],
'e':[('P','y'),('Q','x')]
}
assert res2 == table_to_adj(m2)
m3 = [
['I','P','Q', 'R'],
['a','x','y', 'x'],
['b','z','x', 'y'],
]
res3 = {
'a':[('P','x'),('Q','y'), ('R','x')],
'b':[('P','z'),('Q','x'), ('R','y')],
}
assert res3 == table_to_adj(m3)
A.2 bus stops¶
Today we will analzye intercity bus network in GTFS format taken from dati.trentino.it, MITT service.
Original GTFS data was split in several files which we merged into dataset data/network.csv containing the bus stop times of three extra-urban routes. To load it, we provide this function:
[5]:
def load_stops():
"Loads file network.csv and RETURN a list of dictionaries with the stop times"
import csv
with open('data/network.csv', newline='', encoding='UTF-8') as csvfile:
reader = csv.DictReader(csvfile)
lst = []
for d in reader:
lst.append(d)
return lst
[6]:
stops = load_stops()
stops[0:2]
[6]:
[OrderedDict([('', '1'),
('route_id', '76'),
('agency_id', '12'),
('route_short_name', 'B202'),
('route_long_name',
'Trento-Sardagna-Candriai-Vaneze-Vason-Viote'),
('route_type', '3'),
('service_id', '22018091220190621'),
('trip_id', '0002402742018091220190621'),
('trip_headsign', 'Trento-Autostaz.'),
('direction_id', '0'),
('arrival_time', '06:25:00'),
('departure_time', '06:25:00'),
('stop_id', '844'),
('stop_sequence', '2'),
('stop_code', '2620'),
('stop_name', 'Sardagna'),
('stop_desc', ''),
('stop_lat', '46.064848'),
('stop_lon', '11.09729'),
('zone_id', '2620.0')]),
OrderedDict([('', '2'),
('route_id', '76'),
('agency_id', '12'),
('route_short_name', 'B202'),
('route_long_name',
'Trento-Sardagna-Candriai-Vaneze-Vason-Viote'),
('route_type', '3'),
('service_id', '22018091220190621'),
('trip_id', '0002402742018091220190621'),
('trip_headsign', 'Trento-Autostaz.'),
('direction_id', '0'),
('arrival_time', '06:26:00'),
('departure_time', '06:26:00'),
('stop_id', '5203'),
('stop_sequence', '3'),
('stop_code', '2620VD'),
('stop_name', 'Sardagna Civ. 22'),
('stop_desc', ''),
('stop_lat', '46.069494'),
('stop_lon', '11.095252'),
('zone_id', '2620.0')])]
Of interest to you are the fields route_short_name
, arrival_time
, and stop_lat
and stop_lon
which provide the geographical coordinates of the stop. Stops are already sorted in the file from earliest to latest.
Given a route_short_name
, like B202
, we want to plot the graph of bus velocity measured in km/hours at each stop. We define velocity at stop n
as
\(velocity_n = \frac{\Delta space_n}{\Delta time_n }\)
where
\(\Delta time_n = time_n - time_{n-1}\) as the time in hours the bus takes between stop \(n\) and stop \(n-1\).
and
\(\Delta space_n = space_n - space_{n-1}\) is the distance the bus has moved between stop \(n\) and stop \(n-1\).
We also set \(velocity_0 = 0\)
NOTE FOR TIME: When we say time in hours, it means that if you have the time as string 08:27:42
, its number in seconds since midnight is like:
[7]:
secs = 8*60*60+27*60+42
and to calculate the time in float hours you need to divide secs
by 60*60=3600
:
[8]:
hours_float = secs / (60*60)
hours_float
[8]:
8.461666666666666
NOTE FOR SPACE: Unfortunately, we could not find the actual distance as road length done by the bus between one stop and the next one. So, for the sake of the exercise, we will take the geo distance, that is, we will calculate it using the line distance between the points of the stops, using their geographical coordinates. The function to calculate the geo_distance
is already implemented :
[9]:
def geo_distance(lat1, lon1, lat2, lon2):
""" Return the geo distance in kilometers
between the points 1 and 2 at provided geographical coordinates.
"""
# Shamelessly copied from https://stackoverflow.com/a/19412565
from math import sin, cos, sqrt, atan2, radians
# approximate radius of earth in km
R = 6373.0
lat1 = radians(lat1)
lon1 = radians(lon1)
lat2 = radians(lat2)
lon2 = radians(lon2)
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
return R * c
In the following we see the bus line B102
, going from Sardagna to Trento. The graph should show something like the following.
We can see that as long as the bus is taking stops within Sardagna town, velocity (always intended as air-line velocity ) is high, but when the bus has to go to Trento, since there are many twists and turns on the road, it takes a while to arrive even if in geo-distance Trento is near, so actual velocity decreases. In such case it would be much more convenient to take the cable car.
These type of graphs might show places in the territory where shortcuts such as cable cars, tunnels or bridges might be helpful for transportation.
Show solution[10]:
def to_float_hour(time_string):
"""
Takes a time string in the format like 08:27:42
and RETURN the time since midnight in hours as a float (es 8.461666666666666)
"""
raise Exception('TODO IMPLEMENT ME !')
def plot(route_short_name):
""" Takes a route_short_name and *PLOTS* with matplotlib a graph of the velocity of
the the bus trip for that route
- just use matplotlib, you *don't* need pandas and *don't* need numpy
- xs positions MUST be in *float hours*, distanced at lengths proportional
to the actual time the bus arrives that stop
- xticks MUST show
- the stop name *NICELY* (with carriage returns)
- the time in *08:50:12 format*
- ys MUST show the velocity of the bus at that time
- assume velocity at stop 0 equals 0
- remember to set the figure width and heigth
- remember to set axis labels and title
"""
raise Exception('TODO IMPLEMENT ME !')
plot('B202')
xs = [6.416666666666667, 6.433333333333334, 6.45, 6.466666666666667, 6.516666666666667, 6.55, 6.566666666666666, 6.616666666666666, 6.65, 6.683333333333334]
ys = [0, 32.410644806589666, 25.440452145453996, 29.058090168277648, 4.151814096935986, 7.514788081665398, 24.226499833822754, 3.8149164687282586, 34.89698602693173, 14.321244382769315]
xticks = ['Sardagna\n06:25:00', 'Sardagna\nCiv.\n22\n06:26:00', 'Sardagna\nCiv.20\n06:27:00', 'Sardagna\nMaso\nScala\n06:28:00', 'Trento\nLoc.\nS.Antonio\n06:31:00', 'Trento\nVia\nSardagna\nCiv.\n104\n06:33:00', 'Trento\nMaso\nPedrotti\n06:34:00', 'Trento\nLoc.Conotter\n06:37:00', 'Trento\nVia\nBrescia\n4\n06:39:00', 'Trento\nAutostaz.\n06:41:00']
plot('B201')
xs = [18.25, 18.283333333333335, 18.333333333333332, 18.533333333333335, 18.75, 19.166666666666668]
ys = [0, 57.11513455659372, 27.731105466934423, 41.63842308087865, 28.5197376150513, 31.49374154105802]
xticks = ['Tione\nAutostazione\n18:15:00', 'Zuclo\nSs237\n"Superm.\nLidl"\n18:17:00', 'Saone\n18:20:00', 'Ponte\nArche\nAutost.\n18:32:00', 'Sarche\nCentro\nComm.\n18:45:00', 'Trento\nAutostaz.\n19:10:00']
plot('B301')
xs = [17.583333333333332, 17.666666666666668, 17.733333333333334, 17.766666666666666, 17.8, 17.833333333333332, 17.883333333333333, 17.9, 17.916666666666668, 17.933333333333334, 17.983333333333334, 18.0, 18.05, 18.066666666666666, 18.083333333333332, 18.1, 18.133333333333333, 18.15, 18.166666666666668, 18.183333333333334, 18.25, 18.266666666666666, 18.3, 18.316666666666666, 18.35, 18.383333333333333, 18.4]
ys = [0, 12.183536596091201, 11.250009180954352, 16.612469697023045, 20.32290877261807, 29.650645502388567, 43.45858933073937, 33.590326783093374, 51.14340770207765, 31.710506116846854, 24.12416002315475, 68.52690370810224, 66.54632979050625, 36.97129817779247, 29.62791050495846, 34.08490909322781, 29.184331044522004, 19.648559840967014, 37.7140096915846, 43.892216115372726, 33.48796397878209, 29.521341752309603, 32.83990219938084, 38.20505182104893, 27.292895333249888, 12.602972475349818, 28.804672730461583]
xticks = ['Trento\nAutostaz.\n17:35:00', 'Trento\nC.So\nTre\nNovembre\n17:40:00', 'Trento\nViale\nVerona\n17:44:00', 'Trento\nS.Bartolameo\n17:46:00', 'Trento\nViale\nVerona\nBig\nCenter\n17:48:00', 'Trento\nMan\n17:50:00', 'Mattarello\nLoc.Ronchi\n17:53:00', 'Mattarello\nVia\nNazionale\n17:54:00', 'Mattarello\n17:55:00', 'Mattarello\nEx\nSt.Vestimenta\n17:56:00', 'Acquaviva\n17:59:00', 'Acquaviva\nPizzeria\n18:00:00', 'Besenello\nPosta\nVecchia\n18:03:00', 'Besenello\nFerm.\nNord\n18:04:00', 'Besenello\n18:05:00', 'Besenello\nFerm.\nSud\n18:06:00', 'Calliano\nSp\n49\n"Cimitero"\n18:08:00', 'Calliano\n18:09:00', 'Calliano\nGrafiche\nManfrini\n18:10:00', 'Castelpietra\n18:11:00', 'Volano\n18:15:00', 'Volano\nVia\nDes\nTor\n18:16:00', 'Ss.12\nS.Ilario/Via\nStroperi\n18:18:00', 'S.Ilario\n18:19:00', 'Rovereto\nV.Le\nTrento\n18:21:00', 'Rovereto\nVia\nBarattieri\n18:23:00', 'Rovereto\nVia\nManzoni\n18:24:00']
Part B¶
B.1 Theory¶
Let L
a list of size n
, and i
and j
two indeces. Return the computational complexity of function fun()
with respect to n.
def fun(L,i,j):
if i==j:
return 0
else:
m = (i+j)//2
count = 0
for x in L[i:m]:
for y in L[m:j+1]:
if x==y:
count = count+1
left = fun(L,i,m)
right = fun(L,m+1,j)
return left+right+count
B.2 Linked List flatv¶
Suppose a LinkedList
only contains integer numbers, say 3,8,8,7,5,8,6,3,9. Implement method flatv
which scans the list: when it finds the first occurence of a node which contains a number which is less then the previous one, and the less than successive one, it inserts after the current one another node with the same data as the current one, and exits.
Example:
for Linked list 3,8,8,7,5,8,6,3,9
calling flatv
should modify the linked list so that it becomes
Linked list 3,8,8,7,5,5,8,6,3,9
Note that it only modifies the first occurrence found 7,5,8 to 7,5,5,8 and the successive sequence 6,3,9 is not altered
Open list.py
and implement this method:
def flatv(self):
B.3 Generic Tree rightmost¶
In the example above, the rightmost branch of a
is given by the node sequence a
,d
,n
Open tree.py
and implement this method:
def rightmost(self):
""" RETURN a list containing the *data* of the nodes
in the *rightmost* branch of the tree.
Example:
a
├b
├c
|└e
└d
├f
└g
├h
└i
should give
['a','d','g','i']
"""
[ ]: