Tuesday, 26 March 2019

Jupyter Notebooks


One of the first courses I studied on FutureLearn was "Learn to Code for Data Analysis". It used the Jupyter Notebook to allow the manipulate large amounts of data.

It uses a paradigm similar to spreadsheets with macros/scripting languages but inverts the emphasis. Normally with a spreadsheet, you start with the sheet of data and that is what you see. You then run some code against the data and return to the sheet.

With Jupyter Notebooks you do not look at the (raw) data but manipulate it in the background, you do not even need to look at the raw data (except during the development process).

The language used to manipulate the data is Python. As it is Python you can use the same code that you would use in a stand alone program in your notebook. If you have ever had to think which language you are using (or the version of the language you are using - such as Visual Basic, VB.Net or VBA), this is a great help.

I am not going to cover the installation, there is an installation guide here: 

You can export the results into various forms, including static HTML pages.

A simple demonstration of the power of Jupyter Notebooks

In an earlier posting, I showed a simple Python webscraping function to obtain historical currency exhange rates from the US Federal Reserve. This used a number of standard Python libraries.

Now, if you wanted to undertake analysis of exchange rates over time, you could build upon that Python code, but Jupyter Notebooks offers  a simpler way of manipulating the data. You can just use the Python code and the associated libraries.

I exported the results of the Jupyter Notebook as HTML and copied it into a post on this blog.

If you want to try it yourself, just copy and paste the cell contents into cells on your own Jupyter Notebook.

References




Historical Currency Exchange Rates from the US Federal Reserve (Python)

Sometimes you want historical exchange rates, either to track the values over time, or the value at a particular time.

The US Federal Reserve has information on exchange rates, for example Sterling to US Dollars: https://www.federalreserve.gov/releases/h10/hist/dat00_uk.htm

Note that the direction of the exchange is not constant for all currencies.

Examining the structure of the web page showed the table  containing the data is identified by it having a class of "pubtables".

LXML is used to take the table data and allow it to be searched using XPATH.

The program outputs the data as a CSV file.

# Function to extract exchange rates from US Federal Reserve
from lxml import html
import requests
import collections
import csv
from datetime import datetime

url= "https://www.federalreserve.gov/releases/h10/hist/dat00_uk.htm"

def get_US_exchange_rates(aURL):
    exchangerates=collections.OrderedDict()
# Get the page from the URL
    print("Get page")
    page = requests.get(aURL)
# Make an HTML tree from the text
    print("Make HTML page")
    tree = html.fromstring(page.content)
# The data is in a table of class="pubtables"
    r=tree.xpath("//table[@class='pubtables']/tr")
    print("Iterate over xpath")
    for x in r:
        if x[1].text_content().strip()!="ND":
            exchangerates[datetime.strptime(x[0].text_content().strip(),"%d-%b-%y")]=float(x[1].text_content().strip() )  
    print("Done")
    return exchangerates


# Testing code 
if __name__ == "__main__":
    print ('Exchange Rates:')
    if True:
        c=get_US_exchange_rates(url)
        with open('exchange.csv', 'w') as csvfile:
            exchwriter=csv.writer(csvfile,delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
            for x in c:
                print(x,c[x])
                exchwriter.writerow([x,c[x]])