Exam - Wed 09, Feb 2022¶
Scientific Programming - Data Science Master @ University of Trento
Part A - Zoom surveillance¶
NOTICE: this part of the exam was ported to softpython website
There you can find a more curated version (notice it may be longer than here)
Open Jupyter and start editing this notebook exam-2022-02-09.ipynb
A training center holds online courses with Zoom software. Participants attendance is mandatory, and teachers want to determine who left, when and for what reason. Zoom allows to save a meeting log in a sort of CSV format which holds the timings of joins and leaves of each student. You will clean the file content and show relevant data in charts.
CSV format¶
You are provided with the file UserQos_12345678901.csv
. Unfortunately, it is a weird CSV which actually looks like two completely different CSVs were merged together, one after the other. It contains the following:
1st line: general meeting header
2nd line: general meeting data
3rd line: empty
4th line completely different header for participant sessions for that meeting. Each session contains a join time and a leave time, and each participant can have multiple sessions in a meeting.
5th line and following: sessions data
The file has lots of useless fields, try to explore it and understand the format (use LibreOffice Calc to help yourself)
Here we only show the few fields we are actually interested in, and examples of trasformations you should apply:
From general meeting information section:
Meeting ID
:123 4567 8901
Topic
:Hydraulics Exam
Start Time
:"Apr 17, 2020 02:00 PM"
should becomeApr 17, 2020
From participant sessions section:
Participant
:Luigi
Join Time
:01:54 PM
should become13:54
Leave Time
:03:10 PM(Luigi got disconnected from the meeting.Reason: Network connection error. )
should be split into two fields, one for actual leave time in15:10
format and another one for disconnection reason.
There are 3 possible disconnection reasons (try to come up with a general way to parse them - notice that there is no dot at the end of transformed string):
(Luigi got disconnected from the meeting.Reason: Network connection error. )
should becomeNetwork connection error
(Bowser left the meeting.Reason: Host closed the meeting. )
should becomeHost closed the meeting
(Princess Toadstool left the meeting.Reason: left the meeting.)
should becomeleft the meeting
Your first goal will be to load the dataset and restructure the data so it looks like this:
[['meeting_id', 'topic', 'date', 'participant', 'join_time', 'leave_time', 'reason'],
['123 4567 8901','Hydraulics Exam','Apr 17, 2020','Luigi','13:54','15:10','Network connection error'],
['123 4567 8901','Hydraulics Exam','Apr 17, 2020','Luigi','15:12','15:54','left the meeting'],
['123 4567 8901','Hydraulics Exam','Apr 17, 2020','Mario','14:02','14:16','Network connection error'],
['123 4567 8901','Hydraulics Exam','Apr 17, 2020','Mario','14:19','15:02','Network connection error'],
['123 4567 8901','Hydraulics Exam','Apr 17, 2020','Mario','15:04','15:50','Network connection error'],
['123 4567 8901','Hydraulics Exam','Apr 17, 2020','Mario','15:52','15:55','Network connection error'],
['123 4567 8901','Hydraulics Exam','Apr 17, 2020','Mario','15:56','16:00','Host closed the meeting'],
...
]
To fix the times, you will first need to implement the following function.
A1 time24¶
Show solution[2]:
def time24(t):
""" Takes a time string like '06:27 PM' and outputs a string like 18:27
"""
raise Exception('TODO IMPLEMENT ME !')
assert time24('12:00 AM') == '00:00' # midnight
assert time24('01:06 AM') == '01:06'
assert time24('09:45 AM') == '09:45'
assert time24('12:00 PM') == '12:00' # special case, it's actually midday
assert time24('01:27 PM') == '13:27'
assert time24('06:27 PM') == '18:27'
assert time24('10:03 PM') == '22:03'
A2 load¶
Implement a function which loads the file UserQos_12345678901.csv
and RETURN a list of lists.
To parse the file, you can use simple CSV reader as seen in class (there is no need to use pandas)
Show solution[3]:
import csv
def load(filepath):
raise Exception('TODO IMPLEMENT ME !')
meeting_log = load('UserQos_12345678901.csv')
from pprint import pprint
pprint(meeting_log, width=150)
[['meeting_id', 'topic', 'date', 'participant', 'join_time', 'leave_time', 'reason'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Luigi', '13:54', '15:10', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Luigi', '15:12', '15:54', 'left the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '14:02', '14:16', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '14:19', '15:02', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '15:04', '15:50', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '15:52', '15:55', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '15:56', '16:00', 'Host closed the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Bowser', '14:15', '14:30', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Bowser', '14:54', '15:03', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Bowser', '15:12', '15:40', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Bowser', '15:45', '16:00', 'Host closed the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Princess Toadstool', '13:56', '15:33', 'left the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '14:05', '14:10', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '14:15', '14:29', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '14:33', '15:10', 'left the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '15:25', '15:54', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '15:55', '16:00', 'Host closed the meeting']]
[4]:
EXPECTED_MEETING_LOG = \
[['meeting_id', 'topic', 'date', 'participant', 'join_time', 'leave_time', 'reason'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Luigi', '13:54', '15:10', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Luigi', '15:12', '15:54', 'left the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '14:02', '14:16', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '14:19', '15:02', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '15:04', '15:50', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '15:52', '15:55', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '15:56', '16:00', 'Host closed the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Bowser', '14:15', '14:30', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Bowser', '14:54', '15:03', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Bowser', '15:12', '15:40', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Bowser', '15:45', '16:00', 'Host closed the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Princess Toadstool', '13:56', '15:33', 'left the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '14:05', '14:10', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '14:15', '14:29', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '14:33', '15:10', 'left the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '15:25', '15:54', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '15:55', '16:00', 'Host closed the meeting']]
assert meeting_log[0] == EXPECTED_MEETING_LOG[0] # header
assert meeting_log[1] == EXPECTED_MEETING_LOG[1] # first Luigi row
assert meeting_log[1:3] == EXPECTED_MEETING_LOG[1:3] # Luigi rows
assert meeting_log[:4] == EXPECTED_MEETING_LOG[:4] # until first Mario row included
assert meeting_log == EXPECTED_MEETING_LOG # all table
A3.1 duration¶
Given two times as strings a
and b
in format like 17:34
, RETURN the duration in minutes between them as an integer.
To calculate gap durations, we assume a meeting NEVER ends after midnight
Show solution[5]:
def duration(a, b):
raise Exception('TODO IMPLEMENT ME !')
assert duration('15:00','15:34') == 34
assert duration('15:00','17:34') == 120 + 34
assert duration('15:50','16:12') == 22
assert duration('09:55','11:06') == 5 + 60 + 6
assert duration('00:00','00:01') == 1
#assert duration('11:58','00:01') == 3 # no need to support this case !!
A3.2 calc_stats¶
We want to know something about the time each participant has been disconnected from the exam. We call such intervals gaps
, which are the difference between a session leave time and successive session join time.
Implement the function calc_stats
that given a cleaned log produced by load
, RETURN a dictionary mapping each partecipant to a dictionary with these statistics:
max_gap
: the longest time in minutes in which the participant has been disconnectedgaps
: the number of disconnections happend to the participant during the meetingtime_away
: the total time in minutes during which the participant has been disconnected during the meeting
To calculate gap durations, we assume a meeting NEVER ends after midnight
For the data format details, see EXPECTED_STATS
below.
To test the function, you DON’T NEED to have correctly implemented previous functions
[6]:
def calc_stats(log):
raise Exception('TODO IMPLEMENT ME !')
stats = calc_stats(meeting_log)
# in case you had trouble implementing load function, use this:
#stats = calc_stats(EXPECTED_MEETING_LOG)
stats
[6]:
{'Luigi': {'max_gap': 2, 'gaps': 1, 'time_away': 2},
'Mario': {'max_gap': 3, 'gaps': 4, 'time_away': 8},
'Bowser': {'max_gap': 24, 'gaps': 3, 'time_away': 38},
'Princess Toadstool': {'max_gap': 0, 'gaps': 0, 'time_away': 0},
'Wario': {'max_gap': 15, 'gaps': 4, 'time_away': 25}}
[7]:
EXPECTED_STATS = {'Bowser': {'gaps': 3, 'max_gap': 24, 'time_away': 38},
'Luigi': {'gaps': 1, 'max_gap': 2, 'time_away': 2},
'Mario': {'gaps': 4, 'max_gap': 3, 'time_away': 8},
'Princess Toadstool': {'gaps': 0, 'max_gap': 0, 'time_away': 0},
'Wario': {'gaps': 4, 'max_gap': 15, 'time_away': 25}}
assert stats == EXPECTED_STATS
Part B¶
B1 Theory¶
Write the solution in separate ``theory.txt`` file
B1.1 myfun¶
Given a list L
of \(n\) elements, please compute the asymptotic computational complexity of the myfun
function, explaining your reasoning.
[8]:
def myfun(L):
n = len(L)
i = 1
s = 0
while i < n:
j = n
while j > 0:
for k in range(j, n, 2):
s += (i + j * k)
j = j//2
i = i*2
B 1.2 nlogn¶
What do we mean when we say that an algorithm has asymptotic computational complexity \(O(n logn)\)? What do we have to do to prove that an algorithm has asymptotic computational complexity \(O(n logn)\)?
B2 flatv¶
Open Visual Studio Code and start editing the folder on your desktop
Open linked_lists.py
and implement method flatv
.
Suppose a LinkedList
only contains integer numbers, say 3,8,8,7,5,8,6,3,9. Implement method flatv
which scans the list: when it finds the first occurence of a node which contains a number which is less then the previous one, and the less than successive one, it inserts after the current one another node with the same data as the current one, and exits.
Example:
for Linked list 3,8,8,7,5,8,6,3,9 calling flatv
should modify the linked list so that it becomes Linked list 3,8,8,7,5,5,8,6,3,9
Note that it only modifies the first occurrence found 7,5,8 to 7,5,5,8 and the successive sequence 6,3,9 is not altered. Implement this method:
def flatv(self):
Testing: python3 -m unittest lists_test.FlatvTest
B3 univalued_rec¶
Open bin_trees.py
and implement this method:
def univalued_rec(self):
""" RETURN True if the tree is univalued, otherwise RETURN False.
- a tree is univalued when all nodes have the same value as data
- MUST execute in O(n) where n is the number of nodes of the tree
- NOTE: with big trees a recursive solution would surely
exceed the call stack, but here we don't mind
"""
Testing: python3 -m unittest bin_tree_test.UnivaluedRecTest
Example:
[9]:
from bin_tree_test import bt
[10]:
t = bt(3, bt(3), bt(3, bt(3, bt(3, None, bt(3)))))
print(t)
3
├3
└3
├3
│├3
││├
││└3
│└
└
[11]:
t.univalued_rec()
[11]:
True
[12]:
t = bt(2, bt(3), bt(6, bt(3, bt(3, None, bt(3)))))
print(t)
2
├3
└6
├3
│├3
││├
││└3
│└
└
[13]:
t.univalued_rec()
[13]:
False
[ ]: