Exam - Mon 10, Feb 2020¶
Scientific Programming - Data Science @ University of Trento
Introduction¶
Taking part to this exam erases any vote you had before
What to do¶
Download
sciprog-ds-2020-02-10-exam.zip
and extract it on your desktop. Folder content should be like this:
sciprog-ds-2020-02-10-FIRSTNAME-LASTNAME-ID
exam-2020-02-10.ipynb
B1-theory.txt
B2_italian_queue_v2.py
B2_italian_queue_v2_test.py
jupman.py
sciprog.py
Rename
sciprog-ds-2020-02-10-FIRSTNAME-LASTNAME-ID
folder: put your name, lastname an id number, likesciprog-ds-2020-02-10-john-doe-432432
From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.
Edit the files following the instructions in this worksheet for each exercise. Every exercise should take max 25 mins. If it takes longer, leave it and try another exercise.
When done:
if you have unitn login: zip and send to examina.icts.unitn.it/studente
If you don’t have unitn login: tell instructors and we will download your work manually
Part A¶
Open Jupyter and start editing this notebook exam-2020-02-10.ipynb
WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of semantic relations. The resulting network of related words and concepts can be navigated with the browser. WordNet is also freely and publicly available for download, making it a useful tool for computational linguistics and natural language processing. Princeton University “About WordNet.” WordNet. Princeton University. 2010
In Python there are specialized libraries to read WordNet like NLTK, but for the sake of this exercise, you will parse the noun database as a text file which can be read line by line.
We will focus on names and how they are linked by IS A relation, for example, a dalmatian
IS A dog
(IS A is also called hypernym relation)
A1 parse_db¶
First, you will begin with parsing an excerpt of wordnet data/dogs.noun
, which is a noun database shown here in its entirety.
According to documentation, a noun database begins with several lines containing a copyright notice, version number, and license agreement: these lines all begin with two spaces and the line number like
1 This software and database is being provided to you, the LICENSEE, by
2 Princeton University under the following license. By obtaining, using
3 and/or copying this software and database, you agree that you have
Afterwards, each of following lines describe a noun synset, that is, a unique concept identified by a number called synset_offset
.
each synset can have many words to represent it - for example, the noun synset
02112993
has03
(w_cnt
) wordsdalmatian
coach_dog
,carriage_dog
.a synset can be linked to other ones by relations. The dalmatian synset is linked to
002
(p_cnt
) other synsets: to synset02086723
by the@
relation, and to synset02113184
by the~
relation. For our purposes, you can focus on the@
symbol which means IS A relation (also calledhypernym
). If you search for a line starting with02086723
, you will see it is the synset fordog
, so Wordnet is telling us adalmatian
IS Adog
.
WARNING 1: lines can be quite long so if they appear to span multiple lines don’t be fooled : remember each name definition only occupies one single line with no carriage returns!
WARNING 2: there are no empty lines between the synsets, here you see them just to visually separate the text blobs
1 This software and database is being provided to you, the LICENSEE, by
2 Princeton University under the following license. By obtaining, using
3 and/or copying this software and database, you agree that you have
4 read, understood, and will comply with these terms and conditions.:
5
6 Permission to use, copy, modify and distribute this software and
7 database and its documentation for any purpose and without fee or
8 royalty is hereby granted, provided that you agree to comply with
9 the following copyright notice and statements, including the disclaimer,
10 and that the same appear on ALL copies of the software, database and
11 documentation, including modifications that you make for internal
12 use or for distribution.
13
14 WordNet 3.1 Copyright 2011 by Princeton University. All rights reserved.
15
16 THIS SOFTWARE AND DATABASE IS PROVIDED "AS IS" AND PRINCETON
17 UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
18 IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PRINCETON
19 UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES OF MERCHANT-
20 ABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE
21 OF THE LICENSED SOFTWARE, DATABASE OR DOCUMENTATION WILL NOT
22 INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR
23 OTHER RIGHTS.
24
25 The name of Princeton University or Princeton may not be used in
26 advertising or publicity pertaining to distribution of the software
27 and/or database. Title to copyright in this software, database and
28 any associated documentation shall at all times remain with
29 Princeton University and LICENSEE agrees to preserve same.
01320032 05 n 02 domestic_animal 0 domesticated_animal 0 007 @ 00015568 n 0000 ~ 01320304 n 0000 ~ 01320544 n 0000 ~ 01320872 n 0000 ~ 02086723 n 0000 ~ 02124460 n 0000 ~ 02125232 n 0000 | any of various animals that have been tamed and made fit for a human environment
02085998 05 n 02 canine 0 canid 0 011 @ 02077948 n 0000 #m 02085690 n 0000 + 02688440 a 0101 ~ 02086324 n 0000 ~ 02086723 n 0000 ~ 02116752 n 0000 ~ 02117748 n 0000 ~ 02117987 n 0000 ~ 02119787 n 0000 ~ 02120985 n 0000 %p 02442560 n 0000 | any of various fissiped mammals with nonretractile claws and typically long muzzles
02086723 05 n 03 dog 0 domestic_dog 0 Canis_familiaris 0 023 @ 02085998 n 0000 @ 01320032 n 0000 #m 02086515 n 0000 #m 08011383 n 0000 ~ 01325095 n 0000 ~ 02087384 n 0000 ~ 02087513 n 0000 ~ 02087924 n 0000 ~ 02088026 n 0000 ~ 02089774 n 0000 ~ 02106058 n 0000 ~ 02112993 n 0000 ~ 02113458 n 0000 ~ 02113610 n 0000 ~ 02113781 n 0000 ~ 02113929 n 0000 ~ 02114152 n 0000 ~ 02114278 n 0000 ~ 02115149 n 0000 ~ 02115478 n 0000 ~ 02115987 n 0000 ~ 02116630 n 0000 %p 02161498 n 0000 | a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds; “the dog barked all night”
02106058 05 n 01 working_dog 0 016 @ 02086723 n 0000 ~ 02106493 n 0000 ~ 02107175 n 0000 ~ 02109506 n 0000 ~ 02110072 n 0000 ~ 02110741 n 0000 ~ 02110906 n 0000 ~ 02111074 n 0000 ~ 02111324 n 0000 ~ 02111699 n 0000 ~ 02111802 n 0000 ~ 02112043 n 0000 ~ 02112177 n 0000 ~ 02112339 n 0000 ~ 02112463 n 0000 ~ 02112613 n 0000 | any of several breeds of usually large powerful dogs bred to work as draft animals and guard and guide dogs
02112993 05 n 03 dalmatian 0 coach_dog 0 carriage_dog 0 002 @ 02086723 n 0000 ~ 02113184 n 0000 | a large breed having a smooth white coat with black or brown spots; originated in Dalmatia
02107175 05 n 03 shepherd_dog 0 sheepdog 0 sheep_dog 0 012 @ 02106058 n 0000 ~ 02107534 n 0000 ~ 02107903 n 0000 ~ 02108064 n 0000 ~ 02108157 n 0000 ~ 02108293 n 0000 ~ 02108507 n 0000 ~ 02108682 n 0000 ~ 02108818 n 0000 ~ 02109034 n 0000 ~ 02109202 n 0000 ~ 02109314 n 0000 | any of various usually long-haired breeds of dog reared to herd and guard sheep
02111324 05 n 02 bulldog 0 English_bulldog 0 003 @ 02106058 n 0000 + 01121448 v 0101 ~ 02111567 n 0000 | a sturdy thickset short-haired breed with a large head and strong undershot lower jaw; developed originally in England for bull baiting
02116752 05 n 01 wolf 0 007 @ 02085998 n 0000 #m 02086515 n 0000 ~ 01324999 n 0000 ~ 02117019 n 0000 ~ 02117200 n 0000 ~ 02117364 n 0000 ~ 02117507 n 0000 | any of various predatory carnivorous canine mammals of North America and Eurasia that usually hunt in packs
Field description¶
While parsing, skip the copyright notice. Then, each name definition follows the following format:
synset_offset lex_filenum ss_type w_cnt word lex_id [word lex_id...] p_cnt [ptr...] | gloss
synset_offset
: Number identifying the synset, for example02112993
. MUST be converted to a Python intlex_filenum
: Two digit decimal integer corresponding to the lexicographer file name containing the synset, for example03
. MUST be converted to a Python intss_type
: One character code indicating the synset type, store it as a string.
w_cnt
: Two digit hexadecimal integer indicating the number of words in the synset, for exampleb3
. MUST be converted to a Python int.
WARNING: w_cnt
is expressed as hexadecimal!
To convert an hexadecimal number like b3
to a decimal int you will need to specify the base 16 like in int('b3',16)
which produces the decimal integer 179
.
Afterwards, there will be
w_cnt
words, each represented by two fields (for example,dalmatian 0
). You MUST store these fields into a Python list calledwords
containing a dictionary for each word, having these fields:word
: ASCII form of a word (example:dalmatian
), with spaces replaced by underscore characters (_
)lex_id
: One digit hexadecimal integer (example:0
) that MUST be converted to a Python int
WARNING: lex_id
is expressed as hexadecimal!
To convert an hexadecimal number like b3
to a decimal int you will need to specify the base 16 like in int('b3',16)
which produces the decimal integer 179
.
p_cnt
: Three digit decimal integer indicating the number of pointers (that is, relations like for example IS A) from this synset to other synsets. MUST be converted to a Python int
WARNING: differently from w_cnt
, the value p_cnt
is expressed as decimal!
Afterwards, there will be
p_cnt
pointers, each represented by four fieldspointer_symbol
synset_offset
pos
source/target
(for example,@ 02086723 n 0000
). You MUST store these fields into a Python list calledptrs
containing a dictionary for each pointer, having these fields:pointer_symbol
: a symbol indicating the type of relation, for example@
(which represents IS A relation)synset_offset
: the identifier of the target synset, for example02086723
. You MUST convert this to a Python intpos
: just parse it as a string (we will not use it)source/target
: just parse it as a string (we will not use it)
WARNING: DO NOT assume first pointer is an @
(IS A) !!
In the full database, the root synset entity can’t possibly have a parent synset:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
00001740 03 n 01 entity 0 003 ~ 00001930 n 0000 ~ 00002137 n 0000 ~ 04431553 n 0000 | that which is perceived or known or inferred to have its own distinct existence (living or nonliving)
gloss
: Each synset contains a gloss (that is, a description). A gloss is represented as a vertical bar (|
), followed by a text string that continues until the end of the line. For example,a large breed having a smooth white coat with black or brown spots; originated in Dalmatia
implement parse_db¶
Show solution[2]:
def parse_db(filename):
""" Parses noun database filename as a text file and RETURN a dictionary containing
all the synset found. Each key will be a synset_offset mapping to a dictionary
holding the fields of the correspoing synset. See next printout for an example.
"""
raise Exception('TODO IMPLEMENT ME !')
[3]:
dogs_db = parse_db('data/dogs.noun')
from pprint import pprint
pprint(dogs_db)
{1320032: {'gloss': ' any of various animals that have been tamed and made fit '
'for a human environment\n',
'lex_filenum': 5,
'p_cnt': 7,
'ptrs': [{'pointer_symbol': '@',
'pos': 'n',
'source_target': '0000',
'synset_offset': 15568},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 1320304},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 1320544},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 1320872},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2086723},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2124460},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2125232}],
'ss_type': 'n',
'synset_offset': 1320032,
'w_cnt': 2,
'words': [{'lex_id': 0, 'word': 'domestic_animal'},
{'lex_id': 0, 'word': 'domesticated_animal'}]},
2085998: {'gloss': ' any of various fissiped mammals with nonretractile claws '
'and typically long muzzles \n',
'lex_filenum': 5,
'p_cnt': 11,
'ptrs': [{'pointer_symbol': '@',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2077948},
{'pointer_symbol': '#m',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2085690},
{'pointer_symbol': '+',
'pos': 'a',
'source_target': '0101',
'synset_offset': 2688440},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2086324},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2086723},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2116752},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2117748},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2117987},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2119787},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2120985},
{'pointer_symbol': '%p',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2442560}],
'ss_type': 'n',
'synset_offset': 2085998,
'w_cnt': 2,
'words': [{'lex_id': 0, 'word': 'canine'},
{'lex_id': 0, 'word': 'canid'}]},
2086723: {'gloss': ' a member of the genus Canis (probably descended from the '
'common wolf) that has been domesticated by man since '
'prehistoric times; occurs in many breeds; "the dog barked '
'all night" \n',
'lex_filenum': 5,
'p_cnt': 23,
'ptrs': [{'pointer_symbol': '@',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2085998},
{'pointer_symbol': '@',
'pos': 'n',
'source_target': '0000',
'synset_offset': 1320032},
{'pointer_symbol': '#m',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2086515},
{'pointer_symbol': '#m',
'pos': 'n',
'source_target': '0000',
'synset_offset': 8011383},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 1325095},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2087384},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2087513},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2087924},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2088026},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2089774},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2106058},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2112993},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2113458},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2113610},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2113781},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2113929},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2114152},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2114278},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2115149},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2115478},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2115987},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2116630},
{'pointer_symbol': '%p',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2161498}],
'ss_type': 'n',
'synset_offset': 2086723,
'w_cnt': 3,
'words': [{'lex_id': 0, 'word': 'dog'},
{'lex_id': 0, 'word': 'domestic_dog'},
{'lex_id': 0, 'word': 'Canis_familiaris'}]},
2106058: {'gloss': ' any of several breeds of usually large powerful dogs '
'bred to work as draft animals and guard and guide '
'dogs \n',
'lex_filenum': 5,
'p_cnt': 16,
'ptrs': [{'pointer_symbol': '@',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2086723},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2106493},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2107175},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2109506},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2110072},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2110741},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2110906},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2111074},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2111324},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2111699},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2111802},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2112043},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2112177},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2112339},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2112463},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2112613}],
'ss_type': 'n',
'synset_offset': 2106058,
'w_cnt': 1,
'words': [{'lex_id': 0, 'word': 'working_dog'}]},
2107175: {'gloss': ' any of various usually long-haired breeds of dog reared '
'to herd and guard sheep\n',
'lex_filenum': 5,
'p_cnt': 12,
'ptrs': [{'pointer_symbol': '@',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2106058},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2107534},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2107903},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2108064},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2108157},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2108293},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2108507},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2108682},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2108818},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2109034},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2109202},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2109314}],
'ss_type': 'n',
'synset_offset': 2107175,
'w_cnt': 3,
'words': [{'lex_id': 0, 'word': 'shepherd_dog'},
{'lex_id': 0, 'word': 'sheepdog'},
{'lex_id': 0, 'word': 'sheep_dog'}]},
2111324: {'gloss': ' a sturdy thickset short-haired breed with a large head '
'and strong undershot lower jaw; developed originally in '
'England for bull baiting \n',
'lex_filenum': 5,
'p_cnt': 3,
'ptrs': [{'pointer_symbol': '@',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2106058},
{'pointer_symbol': '+',
'pos': 'v',
'source_target': '0101',
'synset_offset': 1121448},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2111567}],
'ss_type': 'n',
'synset_offset': 2111324,
'w_cnt': 2,
'words': [{'lex_id': 0, 'word': 'bulldog'},
{'lex_id': 0, 'word': 'English_bulldog'}]},
2112993: {'gloss': ' a large breed having a smooth white coat with black or '
'brown spots; originated in Dalmatia \n',
'lex_filenum': 5,
'p_cnt': 2,
'ptrs': [{'pointer_symbol': '@',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2086723},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2113184}],
'ss_type': 'n',
'synset_offset': 2112993,
'w_cnt': 3,
'words': [{'lex_id': 0, 'word': 'dalmatian'},
{'lex_id': 0, 'word': 'coach_dog'},
{'lex_id': 0, 'word': 'carriage_dog'}]},
2116752: {'gloss': ' any of various predatory carnivorous canine mammals of '
'North America and Eurasia that usually hunt in packs \n',
'lex_filenum': 5,
'p_cnt': 7,
'ptrs': [{'pointer_symbol': '@',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2085998},
{'pointer_symbol': '#m',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2086515},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 1324999},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2117019},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2117200},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2117364},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2117507}],
'ss_type': 'n',
'synset_offset': 2116752,
'w_cnt': 1,
'words': [{'lex_id': 0, 'word': 'wolf'}]}}
A2 to_adj¶
Implement a function to_adj
which takes the parsed db and RETURN a graph-like data structure in adjacency list format. Each node represent a synset - as label use the first word of the synset. A node is linked to another one if there is a IS A relation among the nodes, so use the @
symbol to filter the hypernyms.
IMPORTANT: not all linked synsets are present in the dogs excerpt.
HINT: If you couldn’t implement the parse_db
function properly, use as data the result of the previous print.
[4]:
def to_adj(db):
raise Exception('TODO IMPLEMENT ME !')
dogs_graph = to_adj(dogs_db)
from pprint import pprint
pprint(dogs_graph)
{'bulldog': ['working_dog'],
'canine': [],
'dalmatian': ['dog'],
'dog': ['canine', 'domestic_animal'],
'domestic_animal': [],
'shepherd_dog': ['working_dog'],
'wolf': ['canine'],
'working_dog': ['dog']}
Check results¶
If parsing is right, you should get the following graph
DO NOT implement any drawing function, this is just for checking your results
[5]:
from sciprog import draw_adj
draw_adj(dogs_graph, options={'graph':{'rankdir':'BT'}})

A.3 hist¶
You are given a dictionary mapping each relation symbol (i.e. @
) to its description (i.e. Hypernym
).
Implement a function to draw the histogram of relation frequencies found in the relation links of the entire Wordnet, which can be loaded from the file data/data.noun
. If you previously implemented parse_db
in a correct way, you should be able to load the whole db. If for any reasons you can’t, try at least to draw the histogram of frequencies found in dogs_db
sort the histogram from greatest to lowest frequency
do not count the relations containing the word ‘domain’ inside (upper/lowercase)
do not count the ‘’ relation
display the relation names nicely, adding newlines if necessary
[6]:
relation_names = {
'!':'Antonym',
'@':'Hypernym',
'@i':'Instance Hypernym',
'~':'Hyponym',
'~i':'Instance Hyponym',
'#m':'Member holonym',
'#s':'Substance holonym',
'#p':'Part holonym',
'%m':'Member meronym',
'%s':'Substance meronym',
'%p':'Part meronym',
'=':'Attribute',
'+':'Derivationally related form',
';c':'Domain of synset - TOPIC', # DISCARD
'-c':'Member of this domain - TOPIC', # DISCARD
';r':'Domain of synset - REGION', # DISCARD
'-r':'Member of this domain - REGION', # DISCARD
';u':'Domain of synset - USAGE', # DISCARD
'-u':'Member of this domain - USAGE', # DISCARD
'\\': 'Pertainym (pertains to noun)' # DISCARD
}
def draw_hist(db):
raise Exception('TODO IMPLEMENT ME !')
[ ]:
[7]:
wordnet = parse_db('data/data.noun')
draw_hist(wordnet)
{'!': 2153,
'#m': 12287,
'#p': 9110,
'#s': 796,
'%m': 12287,
'%p': 9110,
'%s': 796,
'+': 37235,
'=': 638,
'@': 75915,
'@i': 8588,
'~': 75915,
'~i': 8588}

Part B¶
B1 Theory¶
Write the solution in separate ``theory.txt`` file
B1.1 complexity¶
Given a list 𝐿 of 𝑛 elements, please compute the asymptotic computational complexity of the following function, explaining your reasoning. Any ideas on how to improve the complexity of this code?
def my_fun(L):
n = len(L)
out = []
for i in range(n-2):
out.insert(0,L[i] + L[i+1] + L[i+2])
return out
B1.2 graph visits¶
Briefly describe the two classic ways of visiting the nodes of a graph.
B2 ItalianQueue v2¶
Open a text editor and have a look at file italian_queue_v2.py
In the original v1 implementation of the ItalianQueue we’ve already seen in class, enqueue
can take \(O(n)\): you will improve it by adding further indexing so it runs in \(O(1)\)
An ItalianQueue
is modelled as a LinkedList with two pointers, a _head
and a _tail
:
an element is enqueued scanning from
_head
until a matching group is found, in which case the element is inserted after (that is, at the right) of the matching group, otherwise the element is appended at the very end marked by_tail
an element is dequeued from the
_head
For this improved v2 version, you will use an additional dictionary _tails
which associates to each group present in the queue the node at the tail of that group sequence. This way, instead of scanning you will be able to directly jump to insertion point.
class ItalianQueue:
def __init__(self):
""" Initializes the queue.
- Complexity: O(1)
"""
self._head = None
self._tail = None
self._tails = {} # <---- NEW !
self._size = 0
Example:
If we have the following situation:
data : a -> b -> c -> d -> e -> f -> g -> h
group : x x y y y z z z
^ ^ ^ ^
| | | |
| _tails[x] _tails[y] _tails[z]
| |
_head _tail
By calling
q.enqueue('i','y')
We get:
data : a -> b -> c -> d -> e -> i -> f -> g -> h
group : x x y y y y z z z
^ ^ ^ ^
| | | |
| _tails[x] _tails[y] _tails[z]
| |
_head _tail
We can see here the complete run:
[8]:
from italian_queue_v2_sol import *
q = ItalianQueue()
print(q)
ItalianQueue:
_head: None
_tail: None
_tails: {}
[9]:
q.enqueue('a','x') # 'a' is the element,'x' is the group
[10]:
print(q)
ItalianQueue: a
x
_head: Node(a,x)
_tail: Node(a,x)
_tails: {'x': Node(a,x),}
[11]:
q.enqueue('c','y') # 'c' belongs to new group 'y', goes to the end of the queue
[12]:
print(q)
ItalianQueue: a->c
x y
_head: Node(a,x)
_tail: Node(c,y)
_tails: {'y': Node(c,y),
'x': Node(a,x),}
[13]:
q.enqueue('d','y') # 'd' belongs to existing group 'y', goes to the end of the group
[14]:
print(q)
ItalianQueue: a->c->d
x y y
_head: Node(a,x)
_tail: Node(d,y)
_tails: {'y': Node(d,y),
'x': Node(a,x),}
[15]:
q.enqueue('b','x') # 'b' belongs to existing group 'x', goes to the end of the group
[16]:
print(q)
ItalianQueue: a->b->c->d
x x y y
_head: Node(a,x)
_tail: Node(d,y)
_tails: {'y': Node(d,y),
'x': Node(b,x),}
[17]:
q.enqueue('f','z') # 'f' belongs to new group, goes at the end of the queue
[18]:
print(q)
ItalianQueue: a->b->c->d->f
x x y y z
_head: Node(a,x)
_tail: Node(f,z)
_tails: {'z': Node(f,z),
'y': Node(d,y),
'x': Node(b,x),}
[19]:
q.enqueue('e','y') # 'e' belongs to an existing group 'y', goes at the end of the group
[20]:
print(q)
ItalianQueue: a->b->c->d->e->f
x x y y y z
_head: Node(a,x)
_tail: Node(f,z)
_tails: {'z': Node(f,z),
'y': Node(e,y),
'x': Node(b,x),}
[21]:
q.enqueue('g','z') # 'g' belongs to an existing group 'z', goes at the end of the group
[22]:
print(q)
ItalianQueue: a->b->c->d->e->f->g
x x y y y z z
_head: Node(a,x)
_tail: Node(g,z)
_tails: {'z': Node(g,z),
'y': Node(e,y),
'x': Node(b,x),}
[23]:
q.enqueue('h','z') # 'h' belongs to an existing group 'z', goes at the end of the group
[24]:
print(q)
ItalianQueue: a->b->c->d->e->f->g->h
x x y y y z z z
_head: Node(a,x)
_tail: Node(h,z)
_tails: {'z': Node(h,z),
'y': Node(e,y),
'x': Node(b,x),}
[25]:
q.enqueue('h','z') # 'h' belongs to an existing group 'z', goes at the end of the group
[26]:
print(q)
ItalianQueue: a->b->c->d->e->f->g->h->h
x x y y y z z z z
_head: Node(a,x)
_tail: Node(h,z)
_tails: {'z': Node(h,z),
'y': Node(e,y),
'x': Node(b,x),}
[27]:
q.enqueue('i','y') # 'i' belongs to an existing group 'y', goes at the end of the group
[28]:
print(q)
ItalianQueue: a->b->c->d->e->i->f->g->h->h
x x y y y y z z z z
_head: Node(a,x)
_tail: Node(h,z)
_tails: {'z': Node(h,z),
'y': Node(i,y),
'x': Node(b,x),}
Dequeue is always from the head, without taking in consideration the group:
[29]:
q.dequeue()
[29]:
'a'
[30]:
print(q)
ItalianQueue: b->c->d->e->i->f->g->h->h
x y y y y z z z z
_head: Node(b,x)
_tail: Node(h,z)
_tails: {'z': Node(h,z),
'y': Node(i,y),
'x': Node(b,x),}
[31]:
q.dequeue() # removed last member of group 'x', key 'x' disappears from _tails['x']
[31]:
'b'
[32]:
print(q)
ItalianQueue: c->d->e->i->f->g->h->h
y y y y z z z z
_head: Node(c,y)
_tail: Node(h,z)
_tails: {'z': Node(h,z),
'y': Node(i,y),}
[33]:
q.dequeue()
[33]:
'c'
[34]:
print(q)
ItalianQueue: d->e->i->f->g->h->h
y y y z z z z
_head: Node(d,y)
_tail: Node(h,z)
_tails: {'z': Node(h,z),
'y': Node(i,y),}
B2.1 enqueue¶
Implement enqueue
:
def enqueue(self, v, g):
""" Enqueues provided element v having group g, with the following
criteria:
Queue is scanned from head to find if there is another element
with a matching group:
- if there is, v is inserted after the last element in the
same group sequence (so to the right of the group)
- otherwise v is inserted at the end of the queue
- MUST run in O(1)
"""
Testing: python3 -m unittest italian_queue_test.EnqueueTest
B2.2 dequeue¶
Implement dequeue
:
def dequeue(self):
""" Removes head element and returns it.
- If the queue is empty, raises a LookupError.
- MUST perform in O(1)
- REMEMBER to clean unused _tails keys
"""
IMPORTANT: you can test ``dequeue`` even if you didn’t implement ``enqueue`` correctly
Testing: python3 -m unittest italian_queue_test.DequeueTest
[ ]: