# Exam - Wed 23, Jan 2019¶

Scientific Programming - Data Science Master @ University of Trento

### What to do¶

1. Download sciprog-ds-2019-01-23-exam.zip and extract it on your desktop. Folder content should be like this:

sciprog-ds-2019-01-23-FIRSTNAME-LASTNAME-ID
exam-2019-01-23.ipynb
list.py
list_test.py
tree.py
tree_test.py
jupman.py
sciprog.py

1. Rename sciprog-ds-2019-01-23-FIRSTNAME-LASTNAME-ID folder: put your name, lastname an id number, like sciprog-ds-2019-01-23-john-doe-432432

From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.

1. Edit the files following the instructions in this worksheet for each exercise. Every exercise should take max 25 mins. If it takes longer, leave it and try another exercise.

2. When done:

• if you have unitn login: zip and send to examina.icts.unitn.it/studente

## Part A¶

Open Jupyter and start editing this notebook exam-2019-01-23.ipynb

Suppose you have a table expressed as a list of lists with headers like this:

[2]:

m0 =    [
['Identifier','Price','Quantity'],
['a',1,1],
['b',5,8],
['c',2,6],
['d',8,5],
['e',7,3]
]



where a, b, c etc are the row identifiers (imagine they represent items in a store), Price and Quantity are properties they might have. NOTE: here we put two properties, but they might have n properties !

We want to transform such table into a graph-like format as a dictionary of lists, which relates store items as keys to the properties they might have. To include in the list both the property identifier and its value, we will use tuples. So you need to write a function that transforms the above input into this:

[3]:

res0 =  {
'a':[('Price',1),('Quantity',1)],
'b':[('Price',5),('Quantity',8)],
'c':[('Price',2),('Quantity',6)],
'd':[('Price',8),('Quantity',5)],
'e':[('Price',7),('Quantity',3)]
}

Show solution
[4]:

def table_to_adj(table):
raise Exception('TODO IMPLEMENT ME !')

m0 = [
['I','P','Q']
]
res0 = {}

m1 =    [
['Identifier','Price','Quantity'],
['a',1,1],
['b',5,8],
['c',2,6],
['d',8,5],
['e',7,3]
]
res1 = {
'a':[('Price',1),('Quantity',1)],
'b':[('Price',5),('Quantity',8)],
'c':[('Price',2),('Quantity',6)],
'd':[('Price',8),('Quantity',5)],
'e':[('Price',7),('Quantity',3)]
}

m2 =    [
['I','P','Q'],
['a','x','y'],
['b','w','z'],
['c','z','x'],
['d','w','w'],
['e','y','x']
]
res2 =  {
'a':[('P','x'),('Q','y')],
'b':[('P','w'),('Q','z')],
'c':[('P','z'),('Q','x')],
'd':[('P','w'),('Q','w')],
'e':[('P','y'),('Q','x')]
}

m3 = [
['I','P','Q', 'R'],
['a','x','y', 'x'],
['b','z','x', 'y'],
]

res3 = {
'a':[('P','x'),('Q','y'), ('R','x')],
'b':[('P','z'),('Q','x'), ('R','y')],

}



## A.2 bus stops¶

Today we will analzye intercity bus network in GTFS format taken from dati.trentino.it, MITT service.

Original GTFS data was split in several files which we merged into dataset data/network.csv containing the bus stop times of three extra-urban routes. To load it, we provide this function:

[5]:

def load_stops():
"Loads file network.csv and RETURN a list of dictionaries with the stop times"

import csv
with open('data/network.csv', newline='', encoding='UTF-8') as csvfile:
lst = []
lst.append(d)
return lst

[6]:

stops = load_stops()

stops[0:2]

[6]:

[OrderedDict([('', '1'),
('route_id', '76'),
('agency_id', '12'),
('route_short_name', 'B202'),
('route_long_name',
'Trento-Sardagna-Candriai-Vaneze-Vason-Viote'),
('route_type', '3'),
('service_id', '22018091220190621'),
('trip_id', '0002402742018091220190621'),
('direction_id', '0'),
('arrival_time', '06:25:00'),
('departure_time', '06:25:00'),
('stop_id', '844'),
('stop_sequence', '2'),
('stop_code', '2620'),
('stop_name', 'Sardagna'),
('stop_desc', ''),
('stop_lat', '46.064848'),
('stop_lon', '11.09729'),
('zone_id', '2620.0')]),
OrderedDict([('', '2'),
('route_id', '76'),
('agency_id', '12'),
('route_short_name', 'B202'),
('route_long_name',
'Trento-Sardagna-Candriai-Vaneze-Vason-Viote'),
('route_type', '3'),
('service_id', '22018091220190621'),
('trip_id', '0002402742018091220190621'),
('direction_id', '0'),
('arrival_time', '06:26:00'),
('departure_time', '06:26:00'),
('stop_id', '5203'),
('stop_sequence', '3'),
('stop_code', '2620VD'),
('stop_name', 'Sardagna Civ. 22'),
('stop_desc', ''),
('stop_lat', '46.069494'),
('stop_lon', '11.095252'),
('zone_id', '2620.0')])]


Of interest to you are the fields route_short_name, arrival_time, and stop_lat and stop_lon which provide the geographical coordinates of the stop. Stops are already sorted in the file from earliest to latest.

Given a route_short_name, like B202, we want to plot the graph of bus velocity measured in km/hours at each stop. We define velocity at stop n as

$$velocity_n = \frac{\Delta space_n}{\Delta time_n }$$

where

$$\Delta time_n = time_n - time_{n-1}$$ as the time in hours the bus takes between stop $$n$$ and stop $$n-1$$.

and

$$\Delta space_n = space_n - space_{n-1}$$ is the distance the bus has moved between stop $$n$$ and stop $$n-1$$.

We also set $$velocity_0 = 0$$

NOTE FOR TIME: When we say time in hours, it means that if you have the time as string 08:27:42, its number in seconds since midnight is like:

[7]:

secs = 8*60*60+27*60+42


and to calculate the time in float hours you need to divide secs by 60*60=3600:

[8]:

hours_float = secs / (60*60)
hours_float

[8]:

8.461666666666666


NOTE FOR SPACE: Unfortunately, we could not find the actual distance as road length done by the bus between one stop and the next one. So, for the sake of the exercise, we will take the geo distance, that is, we will calculate it using the line distance between the points of the stops, using their geographical coordinates. The function to calculate the geo_distance is already implemented :

[9]:

def geo_distance(lat1, lon1, lat2, lon2):
""" Return the geo distance in kilometers
between the points 1 and 2 at provided geographical coordinates.

"""
# Shamelessly copied from https://stackoverflow.com/a/19412565

from math import sin, cos, sqrt, atan2, radians

# approximate radius of earth in km
R = 6373.0

dlon = lon2 - lon1
dlat = lat2 - lat1

a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))

return R * c



In the following we see the bus line B102, going from Sardagna to Trento. The graph should show something like the following.

We can see that as long as the bus is taking stops within Sardagna town, velocity (always intended as air-line velocity ) is high, but when the bus has to go to Trento, since there are many twists and turns on the road, it takes a while to arrive even if in geo-distance Trento is near, so actual velocity decreases. In such case it would be much more convenient to take the cable car.

These type of graphs might show places in the territory where shortcuts such as cable cars, tunnels or bridges might be helpful for transportation.

Show solution
[10]:

def to_float_hour(time_string):
"""
Takes a time string in the format like 08:27:42
and RETURN the time since midnight in hours as a float (es 8.461666666666666)
"""
raise Exception('TODO IMPLEMENT ME !')

def plot(route_short_name):
""" Takes a route_short_name and *PLOTS* with matplotlib a graph of the velocity of
the the bus trip for that route

- just use matplotlib, you *don't* need pandas and *don't* need numpy
- xs positions MUST be in *float hours*,  distanced at lengths proportional
to the actual time the bus arrives that stop
- xticks MUST show
- the stop name *NICELY* (with carriage returns)
- the time in *08:50:12 format*
- ys MUST show the velocity of the bus at that time
- assume velocity at stop 0 equals 0
- remember to set the figure width and heigth
- remember to set axis labels and title
"""
raise Exception('TODO IMPLEMENT ME !')

plot('B202')

xs = [6.416666666666667, 6.433333333333334, 6.45, 6.466666666666667, 6.516666666666667, 6.55, 6.566666666666666, 6.616666666666666, 6.65, 6.683333333333334]
ys = [0, 32.410644806589666, 25.440452145453996, 29.058090168277648, 4.151814096935986, 7.514788081665398, 24.226499833822754, 3.8149164687282586, 34.89698602693173, 14.321244382769315]
xticks = ['Sardagna\n06:25:00', 'Sardagna\nCiv.\n22\n06:26:00', 'Sardagna\nCiv.20\n06:27:00', 'Sardagna\nMaso\nScala\n06:28:00', 'Trento\nLoc.\nS.Antonio\n06:31:00', 'Trento\nVia\nSardagna\nCiv.\n104\n06:33:00', 'Trento\nMaso\nPedrotti\n06:34:00', 'Trento\nLoc.Conotter\n06:37:00', 'Trento\nVia\nBrescia\n4\n06:39:00', 'Trento\nAutostaz.\n06:41:00']


plot('B201')

xs = [18.25, 18.283333333333335, 18.333333333333332, 18.533333333333335, 18.75, 19.166666666666668]
ys = [0, 57.11513455659372, 27.731105466934423, 41.63842308087865, 28.5197376150513, 31.49374154105802]
xticks = ['Tione\nAutostazione\n18:15:00', 'Zuclo\nSs237\n"Superm.\nLidl"\n18:17:00', 'Saone\n18:20:00', 'Ponte\nArche\nAutost.\n18:32:00', 'Sarche\nCentro\nComm.\n18:45:00', 'Trento\nAutostaz.\n19:10:00']


plot('B301')

xs = [17.583333333333332, 17.666666666666668, 17.733333333333334, 17.766666666666666, 17.8, 17.833333333333332, 17.883333333333333, 17.9, 17.916666666666668, 17.933333333333334, 17.983333333333334, 18.0, 18.05, 18.066666666666666, 18.083333333333332, 18.1, 18.133333333333333, 18.15, 18.166666666666668, 18.183333333333334, 18.25, 18.266666666666666, 18.3, 18.316666666666666, 18.35, 18.383333333333333, 18.4]
ys = [0, 12.183536596091201, 11.250009180954352, 16.612469697023045, 20.32290877261807, 29.650645502388567, 43.45858933073937, 33.590326783093374, 51.14340770207765, 31.710506116846854, 24.12416002315475, 68.52690370810224, 66.54632979050625, 36.97129817779247, 29.62791050495846, 34.08490909322781, 29.184331044522004, 19.648559840967014, 37.7140096915846, 43.892216115372726, 33.48796397878209, 29.521341752309603, 32.83990219938084, 38.20505182104893, 27.292895333249888, 12.602972475349818, 28.804672730461583]
xticks = ['Trento\nAutostaz.\n17:35:00', 'Trento\nC.So\nTre\nNovembre\n17:40:00', 'Trento\nViale\nVerona\n17:44:00', 'Trento\nS.Bartolameo\n17:46:00', 'Trento\nViale\nVerona\nBig\nCenter\n17:48:00', 'Trento\nMan\n17:50:00', 'Mattarello\nLoc.Ronchi\n17:53:00', 'Mattarello\nVia\nNazionale\n17:54:00', 'Mattarello\n17:55:00', 'Mattarello\nEx\nSt.Vestimenta\n17:56:00', 'Acquaviva\n17:59:00', 'Acquaviva\nPizzeria\n18:00:00', 'Besenello\nPosta\nVecchia\n18:03:00', 'Besenello\nFerm.\nNord\n18:04:00', 'Besenello\n18:05:00', 'Besenello\nFerm.\nSud\n18:06:00', 'Calliano\nSp\n49\n"Cimitero"\n18:08:00', 'Calliano\n18:09:00', 'Calliano\nGrafiche\nManfrini\n18:10:00', 'Castelpietra\n18:11:00', 'Volano\n18:15:00', 'Volano\nVia\nDes\nTor\n18:16:00', 'Ss.12\nS.Ilario/Via\nStroperi\n18:18:00', 'S.Ilario\n18:19:00', 'Rovereto\nV.Le\nTrento\n18:21:00', 'Rovereto\nVia\nBarattieri\n18:23:00', 'Rovereto\nVia\nManzoni\n18:24:00']


## B.1 Theory¶

Let L a list of size n, and i and j two indeces. Return the computational complexity of function fun() with respect to n.

def fun(L,i,j):
if i==j:
return 0
else:
m = (i+j)//2
count = 0
for x in L[i:m]:
for y in L[m:j+1]:
if x==y:
count = count+1
left = fun(L,i,m)
right = fun(L,m+1,j)
return left+right+count


Suppose a LinkedList only contains integer numbers, say 3,8,8,7,5,8,6,3,9. Implement method flatv which scans the list: when it finds the first occurence of a node which contains a number which is less then the previous one, and the less than successive one, it inserts after the current one another node with the same data as the current one, and exits.

Example:

calling flatv should modify the linked list so that it becomes

Note that it only modifies the first occurrence found 7,5,8 to 7,5,5,8 and the successive sequence 6,3,9 is not altered

Open list.py and implement this method:

def flatv(self):


## B.3 Generic Tree rightmost¶

In the example above, the rightmost branch of a is given by the node sequence a,d,n

Open tree.py and implement this method:

def rightmost(self):
""" RETURN a list containing the *data* of the nodes
in the *rightmost* branch of the tree.

Example:

a
├b
├c
|└e
└d
├f
└g
├h
└i

should give

['a','d','g','i']
"""

[ ]: