Midterm Sim - Mon 02, Nov 2020

Scientific Programming - Data Science @ University of Trento

Download exercises and solutions

This simulation gives you NO credit whatsoever: If you do everything wrong, you lose nothing. If you do everything correct, you gain nothing

What to do

  1. Download sciprog-ds-2020-11-02-exam.zip and extract it on your desktop.

  2. Rename sciprog-ds-2020-11-02-FIRSTNAME-LASTNAME-ID folder: put your name, lastname an id number, like sciprog-ds-2020-11-02-john-doe-432432

From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.

  1. Edit the files following the instructions in this worksheet for each exercise. Every exercise should take max 25 mins. If it takes longer, leave it and try another exercise.

  2. When done:

  • if you have unitn login: zip and send to examina.icts.unitn.it/studente

  • If you don’t have unitn login: tell instructors and we will download your work manually

Part A - Galactic Love

Open Jupyter and start editing this notebook exam-2020-11-02.ipynb

Since today is a pseudo-exam, you are going to do pseudo-science!

The company Astro Logic provides horoscopes to thousands of loyal customers, who each day require a number of divinitions. The most requested is whether or not they should engage in love affairs with a potential partner, who is chosen according to rigourous criteria like his/her astrological sign. You are then hired to devise a fancy visualization which given two astrological signs and their love compatibility, displays the constellations of their signs close when the the compatibility is high and far away when compatibility is low.

parse_stars

Let’s start with real astronomical data. You are given a database of constellations called stars.csv (we slightly tweaked it for this occasion - original data source: Space Telescope Science Institute)

[1]:
import pandas as pd

stars_df = pd.read_csv('stars.csv', encoding='UTF-8')
stars_df[0:32]
[1]:
constellation type ra dec description
0 Andromeda 0 3717 2539 move gamma 1
1 Andromeda 1 2091 2137 draw beta
2 Andromeda 1 1179 1851 draw delta
3 Andromeda 1 251 1745 draw alpha
4 Andromeda 0 1716 1405 move eta
5 Andromeda 1 1420 1456 draw zeta
6 Andromeda 1 1156 1758 draw epsilon
7 Andromeda 1 1179 1851 draw delta
8 Andromeda 1 1106 2023 draw pi
9 Andromeda 1 512 2320 draw theta
10 Andromeda 1 42544 2596 draw iota
11 Andromeda 1 42612 2660 draw kappa
12 Andromeda 1 42526 2787 draw lambda
13 Andromeda 0 42544 2596 move iota
14 Andromeda 1 41457 2539 draw omicron
15 Andromeda 0 1106 2023 move pi
16 Andromeda 1 2091 2137 draw beta
17 Andromeda 1 1702 2309 draw mu
18 Andromeda 1 1494 2464 draw nu
19 Andromeda 1 2085 2834 draw phi
20 Andromeda 1 2939 2917 draw 51
21 Andromeda -1 0 0 NaN
22 Antlia 0 17077 -2157 move epsilon
23 Antlia 2 18814 -1864 dotted alpha
24 Antlia 2 19701 -2228 dotted iota
25 Antlia -1 0 0 NaN
26 Apus 0 26635 -4742 move alpha
27 Apus 2 29803 -4733 dotted gamma
28 Apus 2 30092 -4651 dotted beta
29 Apus 2 29410 -4721 dotted delta 1
30 Apus 2 29803 -4733 dotted gamma
31 Apus -1 0 0 NaN

You will have to parse it so to obtain a dictionary which maps each constellation to its stars, expressed as a list of lists of points type and coordinates.

Since later we will need to show points in a 2d chart, you will have to transform the coordinates obtained from the data (right ascension and declination in degrees) as follows:

\(\large x = \frac{15}{1800}ra\)

\(\large y = \frac{dec}{60}\)

Show solution
[2]:
import csv

def parse_stars(filename):
    raise Exception('TODO IMPLEMENT ME !')

stars_db = parse_stars('stars.csv')

You can find the complete output in expected_stars_db.py

Excerpt:

{'Andromeda': [
                  [0, 30.974999999999998, 42.31666666666666],
                  [1, 17.425, 35.61666666666667],
                  [1, 9.825000000000001, 30.849999999999998],
                  [1, 2.091666666666667, 29.083333333333332],
                  [0, 14.3, 23.416666666666668],
                  [1, 11.833333333333332, 24.266666666666666],
                  [1, 9.633333333333333, 29.3],
                  [1, 9.825000000000001, 30.849999999999998],
                  [1, 9.216666666666667, 33.71666666666667],
                  [1, 4.266666666666667, 38.666666666666664],
                  [1, 354.5333333333333, 43.266666666666666],
                  [1, 355.09999999999997, 44.333333333333336],
                  [1, 354.3833333333333, 46.45],
                  [0, 354.5333333333333, 43.266666666666666],
                  [1, 345.475, 42.31666666666666],
                  [0, 9.216666666666667, 33.71666666666667],
                  [1, 17.425, 35.61666666666667],
                  [1, 14.183333333333334, 38.483333333333334],
                  [1, 12.45, 41.06666666666666],
                  [1, 17.375, 47.233333333333334],
                  [1, 24.491666666666667, 48.61666666666667],
                  [-1, 0.0, 0.0]
               ],
 'Antlia':   [
                  [0, 142.30833333333334, -35.95],
                  [2, 156.78333333333333, -31.066666666666666],
                  [2, 164.175, -37.13333333333333],
                  [-1, 0.0, 0.0]
             ],
  .
  .
  .
}

plot_stars 1

Write a function plot_stars to plot constellations

WARNING: DO NOT use GraphViz!

Even if we are making plots which look like networks, for these visualizations you just need basic matplotlib (and some creativity ;-)

WARNING: for now, ignore the new_center parameter

A point type can either be:

  • 0: start a new line not connected with the previous one

  • 1: connect previous point with a straight segment

  • 2: connect previous point with a dotted segment (draw it with linestyle=':' parameter)

  • -1: last point, ignore

Available colorschemes are 'M', 'F', or 'R' (red)

  • to set a black background, set plt.rcParams['axes.facecolor'] = 'black'

  • to get a nice glowing effect for the lines, draw twice: once with a thick line and dark color, and once with a thin line with a bright color. You can find the colors in color_schemes. To set them in plt.plot call, use linewidth (sets width in pixels) and color parameter, note color takes a single parameter

  • draw stars as white dots, setting markersize=6

Show solution
[3]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt

color_schemes = {
    'M': ('blue', '#039dfc'),
    'F': ('purple', 'pink'),
    'R': ('darkred', 'red')
}

def plot_stars(constellation_name, color_scheme, stars, new_center=None):
    raise Exception('TODO IMPLEMENT ME !')

from pprint import pprint
pprint(stars_db['Libra'])
plot_stars('Libra', 'M', stars_db)
[[0, 226.01666666666665, -25.266666666666666],
 [1, 222.71666666666667, -16.03333333333333],
 [1, 229.25, -9.366666666666667],
 [1, 233.875, -14.783333333333333],
 [1, 222.71666666666667, -16.03333333333333],
 [0, 233.875, -14.783333333333333],
 [1, 234.25, -28.133333333333333],
 [1, 234.65833333333333, -29.766666666666666],
 [-1, 0.0, 0.0]]
../../../_images/exams_2020-11-02_solutions_exam-2020-11-02-sol_16_1.png
[4]:
stars_db['Cancer']   # has type-2  dotted points
[4]:
[[0, 131.66666666666669, 28.75],
 [2, 130.8166666666667, 21.466666666666665],
 [2, 131.16666666666666, 18.15],
 [2, 134.61666666666667, 11.85],
 [0, 131.16666666666666, 18.15],
 [2, 124.125, 9.183333333333334],
 [-1, 0.0, 0.0]]
[18]:
plot_stars("Cancer", 'F', stars_db)
../../../_images/exams_2020-11-02_solutions_exam-2020-11-02-sol_18_0.png
[6]:
plot_stars("Cygnus", 'M', stars_db)  # mixed segment types
../../../_images/exams_2020-11-02_solutions_exam-2020-11-02-sol_19_0.png

plot_stars 2 - new_center

Change the previous function plot_stars so it accepts a new argument new_center, which is either None or a tuple of coordinates where the constellation should be centered:

  • be precise in determining the boundaries of the constellation

  • DO NOT assume the constallation has a fixed width nor height (so no constants in code!)

[7]:
fig = plt.figure(figsize=(30,7))
plt.xlim(0,360)
plt.ylim(-40,40)
plot_stars('Gemini', 'F',stars_db, new_center=None)  # no translation
../../../_images/exams_2020-11-02_solutions_exam-2020-11-02-sol_21_0.png
[8]:
fig = plt.figure(figsize=(30,7))
plt.xlim(0,360)
plt.ylim(-40,40)
plot_stars('Gemini', 'F',stars_db, new_center=(300, -20))  # centered in 300, -20
../../../_images/exams_2020-11-02_solutions_exam-2020-11-02-sol_22_0.png

parse_zodiac

You are given a file zodiac.csv. For each sign, the table contains astrological information and affinity with other signs, expressed as a relation matrix:

[9]:
import pandas as pd
df = pd.read_csv('zodiac.csv', encoding='UTF-8')
df[:4]
[9]:
Constellation House Glyph Symbol Dates Element Quality Ruling Planet Day/Night Aries ... Gemini Cancer Leo Virgo Libra Scorpius Sagittarius Capricornus Aquarius Pisces
0 Aries 1 ♈︎ Ram 21 March\n-\n20 April Fire Cardinal Mars Day NaN ... 4.0 NaN 5.0 NaN NaN NaN 5.0 NaN 4.0 NaN
1 Taurus 2 ♉︎ Bull 21 April\n-\n21 May Earth Fixed Venus Night NaN ... NaN 4.0 NaN 5.0 NaN NaN NaN 5.0 NaN 4.0
2 Gemini 3 ♊︎ Twins 22 May\n-\n21 June Air Mutable Mercury Day 4.0 ... NaN NaN 4.0 NaN 5.0 NaN NaN NaN 5.0 NaN
3 Cancer 4 ♋︎ Crab 22 June\n-\n21 July Water Cardinal Moon Night NaN ... NaN NaN NaN 4.0 NaN 5.0 NaN NaN NaN 5.0

4 rows × 21 columns

Parse the table so to get a a dictionary of dictionaries, with some selected data:

  • affinities are in the scale 1-5, normalize them to floats 0.0-1.0

  • dates contain \n , normalize them so to have dates separated by a dash as in 21 March-20 April

NOTE: To parse the file, a csv.reader is sufficient, it’s not necessary to use pandas - even if data seem to span multiple lines because of the \n in dates, note they are bounded by " so rows will be correctly parsed by csv.reader

You can find the complete output in expected_zodiac_db.py

{
 'Aquarius': {
              'affinities': {
                              'Aries': 0.8,
                              'Gemini': 1.0,
                              'Libra': 1.0,
                              'Sagittarius': 0.8
                             },
              'dates': '21 January-18 February',
              'glyph': '♒︎',
              'house': 11
             },
 'Aries':    {
              'affinities': {
                              'Aquarius': 0.8,
                              'Gemini': 0.8,
                              'Leo': 1.0,
                              'Sagittarius': 1.0
                            },
              'dates': '21 March-20 April',
              'glyph': '♈︎',
              'house': 1
             },
  .
  .
  .
}
Show solution
[10]:
import csv

def parse_zodiac(filename):
    raise Exception('TODO IMPLEMENT ME !')

zodiac_db = parse_zodiac('zodiac.csv')

from pprint import pprint
#pprint(zodiac_db, width=100)
assert zodiac_db['Aries']['dates'] == '21 March-20 April'
assert zodiac_db['Aries']['affinities'] == {'Aquarius': 0.8, 'Gemini': 0.8, 'Leo': 1.0, 'Sagittarius': 1.0}
assert zodiac_db['Aries']['glyph'] == '♈︎'
assert zodiac_db['Aries']['house'] == 1
assert zodiac_db['Gemini']['dates'] == '22 May-21 June'
assert zodiac_db['Gemini']['affinities'] == {'Aquarius': 1.0, 'Aries': 0.8, 'Leo': 0.8, 'Libra': 1.0}
assert zodiac_db['Gemini']['glyph'] == '♊︎'
assert zodiac_db['Gemini']['house'] == 3

plot_love

In stars.csv we inserted the special (fake!) constellation of 'Love': given the importance, we placed it at the center of the galaxy, positioned at x=180 degrees and y=0. If you try to plot it now, you should get something like this:

[11]:
# 'Aries',   'Taurus',   'Gemini',      'Cancer',      'Leo',      'Virgo',
# 'Libra',  'Scorpius',  'Sagittarius', 'Capricornus', 'Aquarius', 'Pisces'

fig = plt.figure(figsize=(30,7))
plt.xlim(0,360)
plot_stars('Gemini','F', stars_db)
plot_stars('Aquarius','M', stars_db)
plot_stars('Love','R', stars_db)    # fake!
../../../_images/exams_2020-11-02_solutions_exam-2020-11-02-sol_31_0.png

Given two astrological signs, place them on the same y=0 axis as the heart and make them symmetrically closer or farther from it according to their astrological affinity, also displaying their name and astrological glyph:

  • REMEMBER title and xlabels !

  • you can reuse previosly defined plot_stars function

  • constellations x centers should go from 50 to 150 degrees (and symmetrically, from -50 to -150)

  • BUT you will have to display reversed ticks: 100 50 0 for positive (and symmetrically 0 50 100 for negative)

For drawing text:

  • For increasing text size in title, xticks, xlabel, text calls, you can use fontsize=20 parameter (for glyphs you will need a bigger number)

  • for text inside the chart use use plt.text(x,y,"some text")

  • the glyph must be drawn bigger than the sign name, so you will need a separate call to plt.text

Show solution
[12]:
def plot_love(f_sign, m_sign, stars, zodiac):

    fig = plt.figure(figsize=(30,7))  # 30 inches large by 7 high
    plt.xlim(-175,175)

    raise Exception('TODO IMPLEMENT ME !')

plot_love('Gemini','Aquarius', stars_db, zodiac_db)  # 1.0 affinity
../../../_images/exams_2020-11-02_solutions_exam-2020-11-02-sol_36_0.png
[13]:
plot_love('Gemini','Leo', stars_db, zodiac_db)  # 0.8 affinity

../../../_images/exams_2020-11-02_solutions_exam-2020-11-02-sol_37_0.png
[14]:
plot_love('Gemini','Taurus', stars_db, zodiac_db) # 0.0 affinity
../../../_images/exams_2020-11-02_solutions_exam-2020-11-02-sol_38_0.png
[15]:
plot_love('Taurus','Capricornus', stars_db, zodiac_db)  # 1.0 affinity
../../../_images/exams_2020-11-02_solutions_exam-2020-11-02-sol_39_0.png
[16]:
plot_love('Leo','Libra', stars_db, zodiac_db)  # 0.8 affinity
../../../_images/exams_2020-11-02_solutions_exam-2020-11-02-sol_40_0.png
[17]:
plot_love('Taurus','Scorpius', stars_db, zodiac_db)  # 0.0 affinity
../../../_images/exams_2020-11-02_solutions_exam-2020-11-02-sol_41_0.png
[ ]: