Tuesday, 26 March 2019

Jupyter Notebooks


One of the first courses I studied on FutureLearn was "Learn to Code for Data Analysis". It used the Jupyter Notebook to allow the manipulate large amounts of data.

It uses a paradigm similar to spreadsheets with macros/scripting languages but inverts the emphasis. Normally with a spreadsheet, you start with the sheet of data and that is what you see. You then run some code against the data and return to the sheet.

With Jupyter Notebooks you do not look at the (raw) data but manipulate it in the background, you do not even need to look at the raw data (except during the development process).

The language used to manipulate the data is Python. As it is Python you can use the same code that you would use in a stand alone program in your notebook. If you have ever had to think which language you are using (or the version of the language you are using - such as Visual Basic, VB.Net or VBA), this is a great help.

I am not going to cover the installation, there is an installation guide here: 

You can export the results into various forms, including static HTML pages.

A simple demonstration of the power of Jupyter Notebooks

In an earlier posting, I showed a simple Python webscraping function to obtain historical currency exhange rates from the US Federal Reserve. This used a number of standard Python libraries.

Now, if you wanted to undertake analysis of exchange rates over time, you could build upon that Python code, but Jupyter Notebooks offers  a simpler way of manipulating the data. You can just use the Python code and the associated libraries.

I exported the results of the Jupyter Notebook as HTML and copied it into a post on this blog.

If you want to try it yourself, just copy and paste the cell contents into cells on your own Jupyter Notebook.

References