Downloading FTSE Data From Yahoo! and Quandl using Alteryx

Following on from another post on downloading Quandl into Tableau via Python and the Data Extract API, I wanted to try downloading the current components of the FTSE 100, and then for each one download the history of prices using Alteryx.

Yahoo! finance has a web page with the members of the index:

image_thumb31

The list of member is split over 3 pages. By default, Alteryx will automatically encode the URL, so I needed to replace the encoded %5E back to ^. You can change the configuration so that it doesn’t encode URLs if you prefer. I put the list of URLs into a Text Input and fed this into a download tool. Following the download, you end up with the URLs, Data and Headers. Finally, I used as Select tool to pick out just the downloaded HTML data.

image_thumb6

The next step is to generate the list of tickers. As there is a link on each on tickers, I used a regex tool to match hyperlinks starting with ‘/q?s=’. The regex tool in tokenize mode will extract the matches and put them into multiple rows. The regular expression I used was ‘href=”/q\?s=([^”]*)”’. I then filtered out the links which were to the FTSE index itself. The flow produced the following output:

image_thumb8

Quandl have a simple URL based API for retrieving data from Yahoo! finance. For ARM.L, the URL to retrieve a price history is https://www.quandl.com/api/v1/datasets/YAHOO/L_ARM.csv. I used a set of simple formulae to transform the ticker list into a these URLs:

image

One additional thing for Quandl is that you need to use an API key to be allowed to increase the throttle rate at which you can download data. They allow you to pass this as part of the URL. The url becomes , where xxxxx if the API key. If you register with the site then you can get an API key. As always when working with keys I didn’t want to put this into the workflow itself (and then hence into source control), fortunately Alteryx has a simple solution using the Windows registry. I stored the key within registry and then used the ReadRegistryString. This adjusts the URL formula to be:

https://www.quandl.com/api/v1/datasets/YAHOO/”+%5BExch%5D+”_”+%5BCode%5D+”.csv?auth_token=”+ReadRegistryString(‘HKEY_CURRENT_USER\Software\JDunkerley’, ‘QuandlKey’)

This worked for the vast majority of the tickers. Some of the series do not appear to be loaded into the Quandl YFinance database. For example, I cannot find BT-A.L within Quandl. What I decided to do was to revert to Yahoo finance directly for those tickers which failed. Quandl pages have a validate url which reverts back to the original source. For ARM.L, this is:

http://ichart.finance.yahoo.com/table.csv?s=ARM.L&d=31&e=12&f=2039&g=d&a=3&b=7&c=1902&ignore=.csv

This is reasonably easy to create as well. Taking the cases where the download from Quandl fails (determined based on when the Download Headers do not contain ‘HTTP/1.1 200 OK’), I then create a Yahoo URL and download direct from there. Unioning these results with those from Quandl, to produce the complete set of data for all tickers within the FTSE 100:

image

After this I used the methods from the previous post on parsing CSV files to read the data into a single Alteryx DB file. The only little catch is that Adjusted_Close is Adj_Close in the files direct from Yahoo. This can be easily fixed within the flow. The output data looks like:

image

The final workflow is in my Alteryx GitHub project or here:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s