[1]:
#Please execute this cell
import jupman;

Midterm - Thu 07, Nov 2019

Scientific Programming - Data Science @ University of Trento

Introduction

  • Taking part to this exam erases any vote you had before

Grading

  • Correct implementations: Correct implementations with the required complexity grant you full grade.

  • Partial implementations: Partial implementations might still give you a few points. If you just can’t solve an exercise, try to solve it at least for some subcase (i.e. array of fixed size 2) commenting why you did so.

  • Bonus point: One bonus point can be earned by writing stylish code. You got style if you:

    • do not infringe the Commandments

    • write pythonic code

    • avoid convoluted code like i.e.

      if x > 5:
          return True
      else:
          return False
      

      when you could write just

      return x > 5
      

Valid code

WARNING: MAKE SURE ALL EXERCISE FILES AT LEAST COMPILE !!! 10 MINS BEFORE THE END OF THE EXAM I WILL ASK YOU TO DO A FINAL CLEAN UP OF THE CODE

WARNING: ONLY IMPLEMENTATIONS OF THE PROVIDED FUNCTION SIGNATURES WILL BE EVALUATED !!!!!!!!!

For example, if you are given to implement:

def f(x):
    raise Exception("TODO implement me")

and you ship this code:

def my_f(x):
    # a super fast, correct and stylish implementation

def f(x):
    raise Exception("TODO implement me")

We will assess only the latter one f(x), and conclude it doesn’t work at all :P !!!!!!!

Helper functions

Still, you are allowed to define any extra helper function you might need. If your f(x) implementation calls some other function you defined like my_f here, it is ok:

# Not called by f, will get ignored:
def my_g(x):
    # bla

# Called by f, will be graded:
def my_f(y,z):
    # bla

def f(x):
    my_f(x,5)

How to edit and run

To edit the files, you can use any editor of your choice, you can find them under Applications->Programming:

  • Visual Studio Code

  • Editra is easy to use, you can find it under Applications->Programming->Editra.

  • Others could be GEdit (simpler), or PyCharm (more complex).

To run the tests, use the Terminal which can be found in Accessories -> Terminal

IMPORTANT: Pay close attention to the comments of the functions.

WARNING: DON’T modify function signatures! Just provide the implementation.

WARNING: DON’T change the existing test methods, just add new ones !!! You can add as many as you want.

WARNING: DON’T create other files. If you still do it, they won’t be evaluated.

Debugging

If you need to print some debugging information, you are allowed to put extra print statements in the function bodies.

WARNING: even if print statements are allowed, be careful with prints that might break your function!

For example, avoid stuff like this:

x = 0
print(1/x)

What to do

  1. Download sciprog-ds-2019-11-07-exam.zip and extract it on your desktop. Folder content should be like this:

sciprog-ds-2019-11-07-FIRSTNAME-LASTNAME-ID
    jupman.py
    sciprog.py
    exam-2019-11-07.ipynb
  1. Rename sciprog-ds-2019-11-07-FIRSTNAME-LASTNAME-ID folder: put your name, lastname an id number, like sciprog-ds-2019-11-07-john-doe-432432

From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.

  1. Edit the files following the instructions in this worksheet for each exercise. Every exercise should take max 25 mins. If it takes longer, leave it and try another exercise.

  2. When done:

  • if you have unitn login: zip and send to examina.icts.unitn.it/studente

  • If you don’t have unitn login: tell instructors and we will download your work manually

Part A

Open Jupyter and start editing this notebook exam-2019-11-07.ipynb

You will work on a dataset of events which occur in the Municipality of Trento, in years 2019-20. Each event can be held during a particular day, two days, or many specified as a range. Events are written using natural language, so we will try to extract such dates, taking into account that information sometimes can be partial or absent.

Data provider: Comune di Trento

License: Creative Commons Attribution 4.0

WARNING: avoid constants in function bodies !!

In the exercises data you will find many names and connectives such as ‘Giovedì’, ‘Novembre’, ‘e’, ‘a’, etc. DO NOT put such constant names inside body of functions !! You have to write generic code which works with any input.

[2]:
import pandas as pd   # we import pandas and for ease we rename it to 'pd'
import numpy as np    # we import numpy and for ease we rename it to 'np'

# remember the encoding !
eventi = pd.read_csv('data/eventi.csv', encoding='UTF-8')
eventi.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 253 entries, 0 to 252
Data columns (total 35 columns):
remoteId                       253 non-null object
published                      253 non-null object
modified                       253 non-null object
Priorità                       253 non-null int64
Evento speciale                0 non-null float64
Titolo                         253 non-null object
Titolo breve                   1 non-null object
Sottotitolo                    227 non-null object
Descrizione                    224 non-null object
Locandina                      16 non-null object
Inizio                         253 non-null object
Termine                        252 non-null object
Quando                         253 non-null object
Orario                         251 non-null object
Durata                         6 non-null object
Dove                           252 non-null object
lat                            253 non-null float64
lon                            253 non-null float64
address                        241 non-null object
Pagina web                     201 non-null object
Contatto email                 196 non-null object
Contatto telefonico            196 non-null object
Informazioni                   62 non-null object
Costi                          132 non-null object
Immagine                       252 non-null object
Evento - manifestazione        252 non-null object
Manifestazione cui fa parte    108 non-null object
Tipologia                      252 non-null object
Materia                        252 non-null object
Destinatari                    24 non-null object
Circoscrizione                 109 non-null object
Struttura ospitante            220 non-null object
Associazione                   1 non-null object
Ente organizzatore             0 non-null float64
Identificativo                 0 non-null float64
dtypes: float64(5), int64(1), object(29)
memory usage: 69.3+ KB

We will concentrate on Quando (When) column:

[3]:
eventi['Quando']
[3]:
0      venerdì 5 aprile alle 20:30 in via degli Olmi ...
1                                Giovedì 7 novembre 2019
2                               Giovedì 14 novembre 2019
3                               Giovedì 21 novembre 2019
4                               Giovedì 28 novembre 2019
                             ...
248                               sabato 9 novembre 2019
249             da venerdì 8 a domenica 10 novembre 2019
250                              giovedì 7 novembre 2019
251                             giovedì 28 novembre 2019
252                             giovedì 21 novembre 2019
Name: Quando, Length: 253, dtype: object

A.1 leap_year

✪ A leap year has 366 days instead of regular 365. Yor are given some criteria to detect whether or not a year is a leap year. Implement them in a function which given a year as a number RETURN True if it is a leap year, False otherwise.

IMPORTANT: in Python there are predefined methods to detect leap years, but here you MUST write your own code!

  1. If the year is evenly divisible by 4, go to step 2. Otherwise, go to step 5.

  2. If the year is evenly divisible by 100, go to step 3. Otherwise, go to step 4.

  3. If the year is evenly divisible by 400, go to step 4. Otherwise, go to step 5.

  4. The year is a leap year (it has 366 days)

  5. The year is not a leap year (it has 365 days)

(if you’re curios about calendars, see this link)

Show solution
[4]:
def is_leap(year):
    raise Exception('TODO IMPLEMENT ME !')


assert is_leap(4)    == True
assert is_leap(104)  == True
assert is_leap(204)  == True
assert is_leap(400)  == True
assert is_leap(1600) == True
assert is_leap(2000) == True
assert is_leap(2400) == True
assert is_leap(2000) == True
assert is_leap(2004) == True
assert is_leap(2008) == True
assert is_leap(2012) == True

assert is_leap(1)    == False
assert is_leap(5)    == False
assert is_leap(100)  == False
assert is_leap(200)  == False
assert is_leap(1700) == False
assert is_leap(1800) == False
assert is_leap(1900) == False
assert is_leap(2100) == False
assert is_leap(2200) == False
assert is_leap(2300) == False
assert is_leap(2500) == False
assert is_leap(2600) == False

A.2 full_date

✪✪ Write function full_date which takes some natural language text representing a complete date and outputs a string in the format yyyy-mm-dd like 2019-03-25.

  • Dates will be expressed in Italian, so we report here the corresponding translations

  • your function should work regardless of capitalization of input

  • we assume the date to be always well formed

Examples:

At the begininning you always have day name (Mercoledì means Wednesday):

>>> full_date("Mercoledì 13 Novembre 2019")
"2019-11-13"

Right after day name, you may also find a day phase, like mattina for morning:

>>> full_date("Mercoledì mattina 13 Novembre 2019")
"2019-11-13"

Remember you can have lowercases and single digits which must be prepended by zero:

>>> full_date("domenica 4 dicembre 1923")
"1923-12-04"

For more examples, see assertions.

[5]:

days = ['lunedì', 'martedì', 'mercoledì', 'giovedì', 'venerdì', 'sabato', 'domenica']

months = ['gennaio', 'febbraio', 'marzo'    , 'aprile' , 'maggio'  , 'giugno',
          'luglio' , 'agosto'  , 'settembre', 'ottobre', 'novembre', 'dicembre' ]

#             morning,   afternoon,   evening, night
day_phase = ['mattina', 'pomeriggio', 'sera', 'notte']

Show solution
[6]:
def full_date(text):
    raise Exception('TODO IMPLEMENT ME !')

assert full_date("Giovedì 14 novembre 2019") == "2019-11-14"
assert full_date("Giovedì 7 novembre 2019") == "2019-11-07"
assert full_date("Giovedì pomeriggio 14 novembre 2019") == "2019-11-14"
assert full_date("sabato mattina 25 marzo 2017") == "2017-03-25"
assert full_date("Mercoledì 13 Novembre 2019") == "2019-11-13"
assert full_date("domenica 4 dicembre 1923") == "1923-12-04"

A.3 partial_date

✪✪✪ Write a function partial_date which takes a natural language text representing one or more dates, and RETURN only the FIRST date found, in the format yyyy-mm-dd. If the FIRST date contains insufficient information to form a complete date, in the returned date leave the characters 'yyyy' for unknown year, 'mm' for unknown months and 'dd' for unknown day.

NOTE: Here we only care about FIRST date, DO NOT attempt to fetch eventual missing information from the second date, we will deal will that in a later exercise.

Examples:

>>> partial_date("Giovedì 7 novembre 2019")
"2019-11-07"

>>> partial_date("venerdì 15 novembre")
"yyyy-11-15"

>>> partial_date("venerdì pomeriggio 15 e sabato mattina 16 novembre 2019")
"yyyy-mm-15"

For more examples, see asserts.

[7]:
connective_and = 'e'

connective_from = 'da'
connective_to = 'a'

days = ['lunedì', 'martedì', 'mercoledì', 'giovedì', 'venerdì', 'sabato', 'domenica']
months = ['gennaio', 'febbraio', 'marzo'    , 'aprile' , 'maggio'  , 'giugno',
          'luglio' , 'agosto'  , 'settembre', 'ottobre', 'novembre', 'dicembre' ]

             # morning,   afternoon,   evening, night
day_phases = ['mattina', 'pomeriggio', 'sera', 'notte']
Show solution
[8]:
def partial_date(text):
    raise Exception('TODO IMPLEMENT ME !')

# complete, uppercase day
assert partial_date("Giovedì 7 novembre 2019") == "2019-11-07"
assert partial_date("Giovedì 14 novembre 2019") == "2019-11-14"
# lowercase day
assert partial_date("mercoledì 13 novembre 2019") == "2019-11-13"
# lowercase, dayphase, missing month and year
assert partial_date("venerdì pomeriggio 15") == "yyyy-mm-15"
# single day, lowercase, no year
assert partial_date("venerdì 15 novembre") == "yyyy-11-15"

# no year,   hour / location to be discarded
assert partial_date("venerdì 5 aprile alle 20:30 in via degli Olmi 26 (Trento sud)")\
                    == "yyyy-04-05"

# two dates, 'and' connective ('e'), day phase morning/afternoon ('mattina'/'pomeriggio')
assert partial_date("venerdì pomeriggio 15 e sabato mattina 16 novembre 2019") \
                    == "yyyy-mm-15"

# two dates, begins with connective 'Da'
assert partial_date("Da lunedì 25 novembre a domenica 01 dicembre 2019") == "yyyy-11-25"
assert partial_date("da giovedì 12 a domenica 15 dicembre 2019") == "yyyy-mm-12"
assert partial_date("da giovedì 9 a domenica 12 gennaio 2020") == "yyyy-mm-09"
assert partial_date("Da lunedì 04 a domenica 10 novembre 2019") == "yyyy-mm-04"

A.4 parse_dates_and

✪✪✪ Write a function which, given a string representing two possibly partial dates separated by the e connective (and), RETURN a tuple holding the two extracted dates each in the format yyyy-mm-dd.

  • IMPORTANT: Notice that the year or month of the first date might actually be indicated in the second date ! In this exercise we want missing information in the first date to be filled in with year and/or month taken from second date.

  • HINT: implement this function calling previously defined functions. If you do so, it will be fairly easy.

Examples:

>>> parse_dates_and("venerdì pomeriggio 15 e sabato mattina 16 novembre 2019")
("2019-11-15", "2019-11-16")

>>> parse_dates_and("lunedì 4 e domenica 10 novembre")
("yyyy-11-04","yyyy-11-10")

For more examples, see asserts.

Show solution
[9]:

def parse_dates_and(text):
    raise Exception('TODO IMPLEMENT ME !')


# complete dates
assert parse_dates_and("lunedì 25 aprile 2018 e domenica 01 dicembre 2019") == ("2018-04-25","2019-12-01")

# exactly two dates, day phase morning/afternoon ('mattina'/'pomeriggio')
assert parse_dates_and("venerdì pomeriggio 15 e sabato mattina 16 novembre 2019") == ("2019-11-15", "2019-11-16")

# first date missing year
assert parse_dates_and("lunedì 13 settembre e sabato 25 dicembre 2019") == ("2019-09-13","2019-12-25")

# first date missing month and year
assert parse_dates_and("Giovedì 12 e domenica 15 dicembre 2019") == ("2019-12-12","2019-12-15")

assert parse_dates_and("giovedì 9 e domenica 12 gennaio 2020") == ("2020-01-09", "2020-01-12")

assert parse_dates_and("lunedì 4 e domenica 10 novembre 2019") == ("2019-11-04","2019-11-10")

# first missing month and year, second missing year
assert parse_dates_and("lunedì 4 e domenica 10 novembre") == ("yyyy-11-04","yyyy-11-10")

# first missing month and year, second missing month and year
assert parse_dates_and("lunedì 4 e domenica 10") == ("yyyy-mm-04","yyyy-mm-10")

A.5 Fake news generator

Functional illiteracy is reading and writing skills that are inadequate “to manage daily living and employment tasks that require reading skills beyond a basic level”

✪✪ Knowing that functional illiteracy is on the rise, a news website wants to fire obsolete human journalists and attract customers by feeding them with automatically generated fake news. You are asked to develop the algorithm for producing the texts: while ethically questionable, the company pays well, so you accept.

Typically, a fake news starts with a real subject, a real fact (the antecedent), and follows it with some invented statement (the consequence). You are provided by the company three databases, one with subjects, one with antecedents and one of consequences. To each antecedent and consequence is associated a topic.

Write a function fake_news which takes the databases and RETURN a list holding strings with all possible combinations of subjects, antecedents and consequences where the topic of antecedent matches the one of consequence. See desired output for more info.

NOTE: Your code MUST work with any database

Show solution
[10]:
db_subjects = [
    'Government',
    'Party X',
]

db_antecedents = [
    ("passed fiscal reform","economy"),
    ("passed jobs act","economy"),
    ("regulated pollution emissions", "environment"),
    ("restricted building in natural areas", "environment"),
    ("introduced more controls in agrifood production","environment"),
    ("changed immigration policy","foreign policy"),
]

db_consequences = [
    ("economy","now spending is out of control"),
    ("economy","this increased taxes by 10%"),
    ("economy","this increased deficit by a staggering 20%"),
    ("economy","as a consequence our GDP has fallen dramatically"),
    ("environment","businesses had to fire many employees"),
    ("environment","businesses are struggling to meet law requirements"),
    ("foreign policy","immigrants are stealing our jobs"),
]


def fake_news(subjects, antecedents,consequences):
    raise Exception('TODO IMPLEMENT ME !')


#fake_news(db_subjects, db_antecedents, db_consequences)
[11]:
print()
print("  *******************    EXPECTED OUTPUT   *******************")
print()
fake_news(db_subjects, db_antecedents, db_consequences)

  *******************    EXPECTED OUTPUT   *******************

[11]:
['Government passed fiscal reform, now spending is out of control',
 'Government passed fiscal reform, this increased taxes by 10%',
 'Government passed fiscal reform, this increased deficit by a staggering 20%',
 'Government passed fiscal reform, as a consequence our GDP has fallen dramatically',
 'Government passed jobs act, now spending is out of control',
 'Government passed jobs act, this increased taxes by 10%',
 'Government passed jobs act, this increased deficit by a staggering 20%',
 'Government passed jobs act, as a consequence our GDP has fallen dramatically',
 'Government regulated pollution emissions, businesses had to fire many employees',
 'Government regulated pollution emissions, businesses are struggling to meet law requirements',
 'Government restricted building in natural areas, businesses had to fire many employees',
 'Government restricted building in natural areas, businesses are struggling to meet law requirements',
 'Government introduced more controls in agrifood production, businesses had to fire many employees',
 'Government introduced more controls in agrifood production, businesses are struggling to meet law requirements',
 'Government changed immigration policy, immigrants are stealing our jobs',
 'Party X passed fiscal reform, now spending is out of control',
 'Party X passed fiscal reform, this increased taxes by 10%',
 'Party X passed fiscal reform, this increased deficit by a staggering 20%',
 'Party X passed fiscal reform, as a consequence our GDP has fallen dramatically',
 'Party X passed jobs act, now spending is out of control',
 'Party X passed jobs act, this increased taxes by 10%',
 'Party X passed jobs act, this increased deficit by a staggering 20%',
 'Party X passed jobs act, as a consequence our GDP has fallen dramatically',
 'Party X regulated pollution emissions, businesses had to fire many employees',
 'Party X regulated pollution emissions, businesses are struggling to meet law requirements',
 'Party X restricted building in natural areas, businesses had to fire many employees',
 'Party X restricted building in natural areas, businesses are struggling to meet law requirements',
 'Party X introduced more controls in agrifood production, businesses had to fire many employees',
 'Party X introduced more controls in agrifood production, businesses are struggling to meet law requirements',
 'Party X changed immigration policy, immigrants are stealing our jobs']