Scientific Programming Lab

Data Science Master @University of Trento - AA 2020/21

Download:    PDF    EPUB    HTML    Github

Teaching assistant: David Leoni david.leoni@unitn.it website: davidleoni.it

This work is licensed under a Creative Commons Attribution 4.0 License CC-BY

cc-by jiu99

Timetable and lecture rooms

Due to the current situation regarding the Covid-19 pandemic, practicals will take place ONLINE this year. They will be held on Mondays from 14:30 to 16:30 and on Wednesdays from 11:30 to 12:30.

Practicals will use the Zoom platform (https://zoom.us/) and the link for the connection will be published on the practical page available in this site a few minutes before the start of the session.

This first part of the course will tentatively run from Wednesday, September 23rd, 2020 to Monday, November 2nd, 2020.

Moodle

In the Moodle page of the course you can find announcements and videos of the lectures.

News

  • Thursday 22 October: A TUTORING SERVICE for data science studentes has been set up, see announcement on Moodle

WARNING: Part A of this website is being gradually improved and transfered to en.softpython.org

Keep an eye an news to see what has been transfered so far. Whenever a page is moved, I will substitute the old page with a link.

MOVED SO FAR:

12 September 2020:

Old news

Slides

See Slides page

Office hours

To schedule a meeting, see here

Labs timetable

For the regular labs timetable please see:

Tutoring

A tutoring service for Scientific Programming has been set up, see announcement on Moodle

Please take advantage of it as much as possible so you don’t end up writing random code at the exam!

Resources

Part A Resources

Part B Resources

Editors

  • Visual Studio Code: the course official editor.

  • Spyder: Seems like a fine and simple editor

  • PyCharme Community Edition

  • Jupyter Notebook: Nice environment to execute Python commands and display results like graphs. Allows to include documentation in Markdown format

  • JupyterLab : next and much better version of Jupyter, although as of Sept 2018 is still in beta

  • PythonTutor, a visual virtual machine (very useful! can also be found in examples inside the book!)

Further readings

  • Rule based design by Lex Wedemeijer, Stef Joosten, Jaap van der woude: a very readable text on how to represent information using only binary relations with boolean matrices (not mandatory read, it only gives context and practical applications for some of the material on graphs presented during the course)

Exams

Exams dates: see Moodle

Past exams

Exam modalities

Make practice with the lab computers !!

Exam will be in Linux Ubuntu environment (even when online) - so learn how to browse folders there and if in presence also typing with noisy lab keyboards :-)

Sciprog exams are open book. You will only be given access to this documentation (if in presence you can also bring a printed version of the material listed below):

Expectations

This is a data science master, so you must learn to be a proficient programmer - no matter the background you have.

Exercises proposed during labs are an example of what you will get during the exam, BUT there is no way you can learn the required level of programming only doing exercises on this website or softpython. Fortunately, since Python is so trendy nowadays there are a zillion good resources to hone your skills - you can find some in Resources

To successfully pass the exam, you should be able to quickly solve exercises proposed during labs with difficulty ranging from ✪ to ✪✪✪ stars. By quickly I mean in half on hour you should be able to solve a three star exercise ✪✪✪. Typically, an exercise will be divided in two parts, the first easy ✪✪ to introduce you to the concept and the second more difficult ✪✪✪ to see if you really grasped the idea.

Before getting scared, keep in mind I’m most interested in your capability to understand the problem and find your way to the solution. In real life, junior programmers are often given by senior colleagues functions to implement based on specifications and possibly tests to make sure what they are implementing meets the specifications. Also, programmers copy code all of the time. This is why during the exam I give you tests for the functions to implement so you can quickly spot errors, and also let you use the course material (see exam modalities).

Part A expectations: performance does not matters: if you are able to run the required algorithm on your computer and the tests pass, it should be fine. Just be careful when given a 100Mb file, in that case sometimes bad code may lead to very slow execution and/or clog the memory.

In particular, in lab computers the whole system can even hang, so watch out for errors such as:

  • infinite while which keeps adding new elements to lists - whenever possible, prefer for loops

  • scanning a big pandas dataframe using a for in instead of pandas native transformations

Part B expectations: performance does matters (i.e. finding the diagonal of a matrix should take a time linearly proportional to \(n\), not \(n^2\)). Also, in this part we will deal with more complex data structures. Here we generally follow the Do It Yourself method, reimplementing things from scratch. So please, use the brain:

  • if the exercise is about sorting, do not call Python .sort() method !!!

  • if the exercise is about data structures, and you are thinking about converting the whole data structure (or part of it) into python lists, first, think about the computational cost of such conversion, and second, do ask the instructor for permission.

Grading

Taking part to an exam erases *any* vote you had before (except for Midterm B which of course doesn’t erase Midterm A taken in the same academic year)

Correct implementations: Correct implementations with the required complexity grant you full grade.

Partial implementations: Partial implementations might still give you a few points. If you just can’t solve an exercise, try to solve it at least for some subcase (i.e. array of fixed size 2) commenting why you did so.

When all tests pass hopefully should get full grade (although tests are never exhaustive!), but if the code is not correct you will still get a percentage. Percentage of course is subjective, and may depend on unfathomable factors such as the quantity of jam I found in the morning croissant that particular day. Jokes aside, the percentage you get is usually inversely proportional to the amount of time I spend fixing your algorithm.

After exams I publish the code with corrections. If all tests pass and you still don’t get 100% grade, you may come to my office questioning the grade. If tests don’t pass I’m less available for debating - I don’t like much complaints like ‘my colleague did the same error as me and got more points’ - even worse is complaining without having read the corrections.

Exams FAQ

First and foremost, I’m not the boss here, please refer to exam rules explained by Andrea Passerini slides.

I add here some further questions I sometimes receive - luckily, answers are pretty easy to remember.

I did good part A/B, can I only do part B/A on next exam?

No way.

Can I have additional retake just for me?

No way.

Can I have additional oral to increase the grade?

No way.

I have 7 + \(\sqrt{3}\) INF credits from a Summer School in Applied Calculonics, can I please give only Part B?

I’m not into credits engineering, please ask the administrative office or/and Passerini.

I have another request which does not concern corrections / possibly wrong grading

I’m not the boss, ask Passerini.

I’ve got 26.99 but this is my last exam and I really need 27 so I can get good master final outcome, could you please raise the grade of just that little 0.01?

Preposterous requests like this will be forwarded to our T-800 assistent, it’s very efficient :

judgment-day

Exams How To

Make sure all exercises at least compile!

Don’t forget duplicated code around!

If I see duplicated code, I don’t know what to grade, I waste time, and you don’t want me angry while grading.

Only implementations of provided function signatures will be evaluated !!

For example, if you are given to implement:

def f(x):
    raise Exception("TODO implement me")

and you ship this code:

def my_f(x):
    # a super fast, correct and stylish implementation

def f(x):
    raise Exception("TODO implement me")

We will assess only the latter one f(x), and conclude it doesn’t work at all :P !!!!!!!

Helper functions

Still, you are allowed to define any extra helper function you might need. If your f(x) implementation calls some other function you defined like my_f here, it is ok:

# Not called by f, will get ignored:
def my_g(x):
    # bla

# Called by f, will be graded:
def my_f(y,z):
    # bla

def f(x):
    my_f(x,5)

How to edit and run

Look in Applications->Programming:

  • Part A: Jupyter: open Terminal and type jupyter notebook

  • Part B: open Visual Studio Code

If for whatever reason tests don’t work in Visual Studio Code, be prepared to run them in the Terminal.

PAY close attention to function comments!

DON’T modify function signatures! Just provide the implementation

DON’T change existing test methods. If you want, you can add tests

DON’T create other files. If you still do it, they won’t be evaluated

Debugging

If you need to print some debugging information, you are allowed to put extra print statements in the function bodies

Even if print statements are allowed, be careful with prints that might break your function! For example, avoid stuff like this:

x = 0
print(1/x)

Acknowledgements

This website and related courses were funded mainly by Department of Information Engineering and Computer Science (DISI), University of Trento, and also Mathematics and CIBIO departments.

unitn-843724

cc-by-7172829

I wish also to thank Dr. Luca Bianco for the introductory material on Visual Studio Code and Python, and Dr. Alberto Montresor for having introduced me to first labs and slides on graphs.

All the material in this website is distributed with license CC-BY 4.0 International Attribution creativecommons.org/licenses/by/4.0/deed.en

Basically, you can freely redistribute and modify the content, just remember to cite University of Trento and the present author.

Technical notes: all website pages are easily modifiable Jupyter notebooks, that were converted to web pages using NBSphinx using template Jupman. Text sources are on Github at address github.com/DavidLeoni/sciprog-ds