Geocoding and Finding Nearest Station with Google Web Services

imageThis posted started as a question on the Alteryx community about finding the nearest underground (subway if you prefer!) station to an address. I don’t have much experience working with addresses and map data, but thought their was probably a way leveraging the APIs Google maps provide.

This feels very much like “standing on the shoulders of giants” that Scott Hanselman talks of in his blog post.

Geocoding

maps_64dpFirst task was to convert the address into a longitude and latitude. This process is called geocoding. Google Maps has a whole range of different APIs one of which is Geocoding. The geocoding API has a fairly straight forward syntax to call it using a standard http GET request:

https://maps.googleapis.com/maps/api/geocode/json?address=1600+Amphitheatre+Parkway,+Mountain+View,+CA&key=YOUR_API_KEY

The API requires an API_KEY. You can sign up for free here. There is a limit of 2,500 requests for free on the standard plan, but should you need more then you can enable pay and go paying $0.50 / 1000 extra request. If you don’t have a valid API key you will get a request denied message (although oddly not an HTTP error!):

image

imagesHaving got an API key, now wanted to experiment with API and see what format the data came back in. There are various different REST API testing tools out there (including just doing it in a browser), but my favourite is Postman – it allows straight forward testing of these kind of APIs and has a good JSON pretty printer. It also has a JavaScript testing engine inside it which allows you to do some simple scripting on the response. Finally, you can also use environment variables, so I can keep my API key in there and use it within the Uri by writing {{ApiKey}}.

To set up the environment and variable in Postman, click the environment drop down and select manage environment:

image

Inside the new window you can create a new environment (I chose to call it GoogleApi) and then add a new key called ApiKey with a value set to the ApiKey:

SNAGHTML3088185

Having created the environment, it should be available in the drop down on the right. Google’s geocoding API has a few different options. In this case I wanted a JSON format return and would be specifying an address. The request URL (using this environment variable) looks like:

https://maps.googleapis.com/maps/api/geocode/json?address=<Address>&key={{ApiKey}}

Taking a simple test case of looking for the address “Scott Logic, Newcastle”. Running this query in postman results in a JSON result like:

{
“results”: [
{
“address_components”: [
{
“long_name”: “Newcastle upon Tyne”,
“short_name”: “Newcastle upon Tyne”,
“types”: [
“locality”,
“political”
]
},…
],
“formatted_address”: “Newcastle upon Tyne, UK”,
“geometry”: {
“bounds”: {
“northeast”: { “lat”: 55.0453044, “lng”: -1.5326051 },
“southwest”: { “lat”: 54.9594399, “lng”: -1.7810817 }
},
“location”: { “lat”: 54.978252, “lng”: -1.61778 },
“location_type”: “APPROXIMATE”,
“viewport”: {
“northeast”: { “lat”: 55.0453044, “lng”: -1.5326051 },
“southwest”: { “lat”: 54.9594399, “lng”: -1.7810817 }
}
},
“partial_match”: true,
“place_id”: “ChIJzWRvDH6FfUgRkWGncrBS4gs”,
“types”: [ “locality”, “political” ]
}
],
“status”: “OK”
}

 

 

Some of the address components have been removed from the above, the full JSON file can be downloaded from here. There is a lot of information which can be read out this response. The first block address components tells us about the matched address, described as a set of fields:

Field Short Name Long Name
locality, political Newcastle upon Tyne Newcastle upon Tyne
postal_town Newcastle upon Tyne Newcastle upon Tyne
administrative_area_level_3, political Newcastle upon Tyne Newcastle upon Tyne
administrative_area_level_2, political Tyne and Wear Tyne and Wear
administrative_area_level_1, political England England
country, political GB United Kingdom

For my purpose, I only need the latitude and the longitude. These are stored within the geometry object as the location. The additional properties of the geometry object specify the region that appears on a Google map search.

I uses Postman’s test framework to write a small block of JavaScript which stores these values as environment values inside I can then use later:

var jsonData = JSON.parse(responseBody);
postman.setEnvironmentVariable(“lat”, jsonData.results[0].geometry.location.lat);
postman.setEnvironmentVariable(“lng”, jsonData.results[0].geometry.location.lng);

Having run this request the environment variables look like:

image

Finding The Nearest Station

So far we have converted the address into a latitude and longitude. The next step is to fund the nearest points of interest on the map. Google has another REST API to return information round a GPS co-ordinate – their places API. Again it requires the same API key as above. For this case, the Place Search will allow you to fin a list of the nearest stations.

The base URI is:

https://maps.googleapis.com/maps/api/place/nearbysearch/output?parameters

You can choose between JSON and XML as the output format (JSON is simpler to deal with so would recommend that). Likewise as before you must provide an API as the key argument. Unlike the geocoding API, it only takes a location as longitude, latitude. Finally, you must specify to either rank the results by distance (rankby=distance) or to search within a specify radius (radius=10000 to search within 10km). If you are using the ranking mode, you are also then required to provide either a type, keyword or name to search for.

To find the nearest station, I want to rank the results by distance from the GPS co-ordinate and specify that I am looking for a station. The type filter allows you specify one of the types listed here – for this case I wanted subway_station. The Uri using Postman’s variables is hence:

https://maps.googleapis.com/maps/api/place/nearbysearch/json?key={{ApiKey}}&location={{lat}},{{lng}}&rankby=distance&type=subway_station

You will receive back a JSON structure like:

{
“html_attributions”: [],
“next_page_token”: “CpQCBQEAAHI5uyVLuCCs3K37OvmGVgCNtscQ8uxa3xKJPNQ9mrFMxxwH0OxejgPfOyqZCNhrc6lidxbNyOjklu7fp71rBvbjqWazd2wfKG19ktEp6_QWuCaJPbYLRzpsJNNdiELV73BBw0oYWtHioJbNjNqPC6YsUrP_ByQXPBhiYN8VNI54ZJfHb5bwLSYuVBDwiJ9j4Vjf1vEqnPYnjyLr0Fmo0HDfJrDODQW2R7tgVau58EhpThj23Y-EYaC0ssTjsg3lltqgce1ujxb3Z6lf-TCXJAM9xs8tsBSA-Q8InQjxTbMHqUa5DvDUSi4l0-uHejdm1H6pEgYjdTJmRSnGhX-Z_H3ROBsGuWRPA7dDZ9LFxVZcEhBGeaTGsXmvk17sfFidGPCdGhTUCE975QsCFCutIUZJHXLkAZavdA”,
“results”: [
{
“geometry”: {
“location”: {
“lat”: 54.977481,
“lng”: -1.6138929
}
},
“icon”: “https://maps.gstatic.com/mapfiles/place_api/icons/generic_business-71.png”,
“id”: “d5b6ffa31de95679ef807796a9778a8a410a0418”,
“name”: “Haymarket”,
“photos”: [
{
“height”: 3085,
“html_attributions”: [
“<a href=\”https://maps.google.com/maps/contrib/107056009756478253671/photos\”>Bocaj Nesnaj</a>”
],
“photo_reference”: “CoQBcwAAABS6JL5Gh0S6dRdHx01dkPO4OoxXwS2hBRD58yQjYrNubcroTz4x40mY32hEY0nrP1iBZowoP5FhHDM45WdNQiswcUoxE2mkXYxBoXnC99wxHaGJPQ4-pvMc-jCkH18JnjXJfXzv5dAoNtgWav9Pq0_OO3xFTTl8nQg75Ch_SI2oEhCXxxcDj98Bun0BMW2FJGy_GhSHErdUum1PWUpazHkeUJUBfKYrFQ”,
“width”: 4896
}
],
“place_id”: “ChIJDyP2_MtwfkgRDqPu-wmHBMA”,
“reference”: “CmRdAAAA2nyuhb9XS9zptApVcc8D88IWc_zdZ-RuunDdkaVYC42U8n2hEuEF7nO4Ol0CWqzI13Nnt2iCn5Opgr9iHjGsOFCSbURd573RKgJWNaNw2VdKgdKIKXKWk8FTIFy2eFaAEhD76A2tt6zeX9bxOKOPUdw1GhSaGzIqZRA0-ZBQJ8lICKRFhlWYPQ”,
“scope”: “GOOGLE”,
“types”: [
“subway_station”,
“transit_station”,
“point_of_interest”,
“establishment”
],
“vicinity”: “United Kingdom”
}, …
],
“status”: “OK”
}

The list of the nearest 20 subway stations is inside the results array with a name property and a geometry location property. The locations are in increasing distance from the original search location. The table below shows the results for Scott Logic, Newcastle:

name geometry.location.lat geometry.location.lng
Haymarket 54.977481 -1.6138929
Monument Metro Station 54.9736621 -1.6134493
St James 54.97408 -1.62099
Manors 54.97403 -1.60484
Jesmond 54.98308 -1.60548
Central Station 54.96912 -1.61615
West Jesmond 54.9936284 -1.6097669
Gateshead 54.9614035 -1.6038992
Byker 54.97615 -1.57957
Ilford Road 55.00017 -1.61085
Gateshead Stadium 54.95782 -1.58808
Chillingham Road 54.98283 -1.57169
South Gosforth 55.00614 -1.60815
Walkergate 54.98549 -1.55908
Longbenton 55.00881 -1.59158
Regent Centre 55.01172 -1.62204
Felling 54.95298 -1.5713
Four Lane Ends 55.01013 -1.57862
Wansbeck Road 55.0142 -1.63489
Fawdon 55.01374 -1.64462

You can download the Postman collection file I used to run these request here.

Using HERE or Bing Maps

If you prefer not to use Google Maps API then both HERE or Bing Maps provide REST APIs.

HERE has a set developer Rest APIs, including a Geocoding API. In this case you must provide and app_id and an app_code. You can register for those here.  There is a basic plan, which is free, that includes this API thought limited amount per month. The geocode request looks like:

https://geocoder.cit.api.here.com/6.2/geocode.json?app_id={{app_id}}&app_code={{app_code}}&searchtext=Newcastle,UK

Again you can choose between JSON and XML. The JSON response is more complicated than in Google’s case but the matches are found as an array of objects at Response.View.Result. Each object contains a Location with a NavigationPosition with a Latitude and Longitude. I found the search was not as good as interpreting my input as Google’s.

I didn’t go any further with HERE, but there is a Places API which I am guessing would have the ability to find places of interest, such as a subway station. The most interesting feature that I saw within the HERE was the ability to do a batch geocoding job.

Bing has a massive set of APIs you can use. Geocoding is done by the Location service. As with both the other services, you need to obtain an API key. The instructions to get a key are here. Again there is a Basic key, which is free, that will cover the usage need for geocoding within annual limts (125k requests!). The location service can yet again return either JSON (default) or XML (use the output (o) query argument to specify, &o=xml). You can provide what you want to find in the query (q) argument:

http://dev.virtualearth.net/REST/v1/Locations?q=Newcastle,UK&key={{ApiKey}}

The structure of the JSON is pretty similar to the results from HERE. The matching location GPS co-ordinates are listed in the resourceSets array inside resources array with a point objects containing a pair of co-ordinates. Again the structure feels more complicated than Google’s and the search results seemed less in line with what I expected.

Reproducing In Alteryx

As this all started as a question on the community, back to Alteryx. Fortunately, downloading and parsing JSON data is part of it’s bread and butter.

As I don’t like having API keys littering workflows (any more than I do have them in code), I like to put them in as workflow constants. If you are sharing the workflow even this can be a problem, in which case I would put them in a flat file or in the windows registry. The constant can be access by going to the properties tool of the workflow:

2016-07-05_08-29-31

We can use a text input tool to contain the locations we want to search for, and then use a formula tool to construct the URL we need to request to get the GPS co-ordinates. After this the download tool and JSON parser tool can be used to download and read the JSON response. Finally some filtering on parsed result will give you the Latitude and Longitude.

2016-07-05_08-28-57

Basically, we then repeat the above process feeding in the latitude and longitude to create the required URL in another formula tool and then download and parse the result.

You can download the workflow from here.

Adjusting Alteryx Files For Different Versions…

Stealing this idea from Ken Black’s equivalent post on Tableau (sure he won’t mind!).

As we have just have version 10.6 release think worth publishing a little guide on how to make Alteryx compatible with old versions! This comes up on the community a lot when answering questions and a user doesn’t have the latest version of the designer.

There are various different file types Alteryx uses:

yxmd Standard Workflow file
yxmc Macro Workflow (all types)
yxwz Analytic App (wz stands for wizard I believe)
yxzp Workflow Package
yxdb Alteryx Database

The top three are all basically the same format – they are the Xml configuration of the workflow. The first section of the xml looks like:

image

You can also view this within Alteryx Designer in the XML View:

image

The AlteryxDocument node is the same for Workflows, Macros and Analytic Apps. The yxmdVer attribute is the part of the file which tells Alteryx which version this came from. If you want this file to be able to be opened by and older version you can change the number. The table below gives the version number you need for different Alteryx versions:

yxmdVer Alteryx Versions
9.5 9.5
10.0 10, 10.1
10.5 10.5, 10.6

So what are the risks? Well Alteryx themselves certainly don’t guarantee the formatthat a tool serialises its configuration from version to version will be consistent – though it normally is. The main issue is that the universe of tools that exist is continually growing and if it can’t find the tool the designer will add an unknown tool:

image

So that’s the easy ones. What about packages? These are actually just zip files. If you change the extension to zip you can take a look inside:

image

It’s just a case of updating the files inside the zip and then saving renaming it back to yxzp.

Finally, the database files. These have been in my limited Alteryx experience (only used it since about 9.1 I think), a fixed format. Ned has a blog post which links to an open source reader they released on these files, I think as long as you aren’t using a data type that didn’t exist in an older version you are fine!

Update:

Alteryx ACE Mark Frisch (a.k.a. MarqueeCrew) put together a macro to allow you change the version of your workflow. You can download it here.

Creating user controlled groupings in a Weighted Median Alteryx Macro

There was an interesting question posted on the Alteryx community about which macro was best for calculating a weighted median. Adam Riley has a macro inside the CReW macros which does this, but wanted to see if it was possible to create one allowing dynamic selection of the grouping (like in a Multi Row Formula). I thought I would document the process I went through.

All of the examples have been created in v10.5. Everything should work fine in older versions as well, but you will need to edit the yxmd or yxmc file and replace the version number at the top (use 10.0 for both v10 and v10.1).

Initial Computation

image

The calculation itself is fairly straight forward. Taking the Wikipedia page’s example as a starting point.

    1. First Normalise the Weights so they total 1
    2. Sort into ascending Value order
    3. Calculate the Running Total Weight
    4. Filter to those greater than or equal to 0.5
    5. Select the first row and you will have the Weighted Median

This flow can be download here.

Grouped Calculation

The next step was to change it so it computes the weighed median with a grouping. Stealing Adam Riley’s macro sample data to start with. A few fairly straight forward changes and then we are done.

  • Add the grouping to the summarise tool
    image
  • Replace the Append Fields tool with a Join Tool and join on the Group
  • Adjust the grouping in both the Multi Row Formula tool and the Sample Tool
    image

The grouped version can be downloaded here.

Converting To A MacroWeighted Median

Alteryx makes this very simple. Just right click on the text input and select Convert to Macro. Select the macro Input and then specify you want it to have a Field Map. (As a side when making this I noticed the Optional Incoming Connection box – this was added in 10.5 and is brilliant!).

image image

The field map causes the input to have columns renamed. Fortunately Alteryx makes it east to reverse this process. Drag the magnifying glass output to the lightning box on the Select tool and it will automatically add an Action tool reversing the Field Map.

image image

After that need to replace the Browse tool with a Macro Output tool. Finally a little bit of reconfiguring to specify that the output depends on the input and add an Icon.

image

This version of the macro can be downloaded from here.

Adding Dynamic Grouping

So far this has been fairly straight forward. The next step is to make it so we can select different sets and levels of grouping. Alteryx has an interface tool just for this – the List Box. I added a ListBox and connected it’s Question Input to the Macro Input tools Question output. I then specified I wanted it to generate a Custom List with a separator I expect to be unique (§).

image

If the user chose to select nothing then we would have a problem with the Join tool, which needs a join! In order to avoid this I added a formula tool returning a constant of 1 which can be used in all the groupings to avoid this issue.

As I am not returning either the Weight or the Value I removed the Reverse Field Map as expected this was going to just add complexity to the macro.

This is where it gets a lot more fiddly as we are going to be rewriting the Xml of tools (and also a complete pain to test).

Summarize Tool

The next step is to connect the ListBox to the Summarize tool.  Connecting the ListBox Question output to the Summarize tool’s action input automatically adds an Action tool in Update Value mode. Unfortunately this won’t be able to update the summarise tool without a degree of fiddling.

Lets take a look at the Xml config the summarise tool:

<Configuration>
<SummarizeFields>

    <SummarizeField field=“Constant” action=“GroupBy” rename=“Constant” />
    <SummarizeField field=“Group” action=“GroupBy” rename=“Group” />
<SummarizeField field=“Weight” action=“Sum” rename=“Sum_Weight” />
</SummarizeFields>
</Configuration>

The action tool is going to need to rewrite the section inside the SummarizeFields Node and replace it with some dynamic Xml. We know we are going to want to keep the Sum_Weight and the GroupBy Constant but are going to have generate a set of GroupBy entries depending on the values selected. So first we change the Action tool to target the correct Xml node and tell it to update the Inner Xml based on a formula.

image

The formula is the next problem. We have a list of fields with a separator coming out of the ListBox. A little experimenting with the Xml showed me the rename attribute is optional for a SummarizeField. That means we can do some substitution on the separator and produce a valid Xml string for the configuration.

  • We always want to have: <SummarizeField field=“Weight” action=“Sum” rename=“Sum_Weight” />
  • We also always want to have <SummarizeField field=“Constant” action=“GroupBy” rename=“Constant” />
  • If no grouping fields are selected we don’t want to add anything else.
  • Otherwise for every entry in the list box we need <SummarizeField field=FieldName action=“GroupBy” />
    First we add a prefix: <SummarizeField field=
    We can do this by replacing the separator with: action=“GroupBy” /><SummarizeField field=
    We then finally add a suffix: action=“GroupBy” />

The formula we need is:

‘<SummarizeField field=”Constant” action=”GroupBy” rename=”Constant” />’
+ IIF([#1]!=””, ‘<SummarizeField field=”‘ +
Replace([#1], ‘§’, ‘” action=”GroupBy” /><SummarizeField field=”‘)+
‘” action=”GroupBy” />’, “”)
+ ‘<SummarizeField field=”Weight” action=”Sum” rename=”Sum_Weight” />’

Multi Row Formula Tool and Sample Tool

Both of these tools have a similar section of Xml to work on GroupFields in the Sample tool and GroupByFields in the Multi Row Formula Tool. Both of them have a set of Field nodes within them, slightly annoyingly it has a name attribute in the Sample tool and field attribute in the Multi Row Formula Tool. The sample below is from the Sample tool:

<Configuration>
<GroupFields orderChanged=“False”>
<Field name=“Constant” />
<Field name=“Group” />

</GroupFields>
<Mode>First</Mode>
<N>1</N>
</Configuration>

The process is very similar to the SummarizeField above (for the Multi Row Formula you need to replace name with field):

  • We also always want to have <Field name=“Constant” />
  • If no grouping fields are selected we don’t want to add anything else.
  • Otherwise for every entry in the list box we need <Field name=FieldName />
    First we add a prefix: <Field name=
    We can do this by replacing the separator with: /><Field name=
    We then finally add a suffix: />

The formula is hence:

‘<Field name=”Constant” />’ + IIF([#1]!=””, ‘<Field name=”‘ + Replace([#1], ‘§’, ‘” /><Field name=”‘)+ ‘” />’, “”)

Join Tool

Saving the most complicated to last. The Join tool’s Xml looks like:

<Configuration joinByRecordPos=“False”>
<JoinInfo connection=“Left”>
<Field field=“Constant” />
<Field field=“Group” />
</JoinInfo>
  <JoinInfo connection=“Right”>
<Field field=“Constant” />
<Field field=“Group” />
</JoinInfo>
  <SelectConfiguration>
<Configuration outputConnection=“Join”>
<OrderChanged value=“False” />
<SelectFields>
<SelectField field=“Right_Constant” selected=“False” rename=“Right_Constant” />
<SelectField field=“Right_Group” selected=“False” rename=“Right_Group” />
<SelectField field=“*Unknown” selected=“True” />
</SelectFields>
</Configuration>
</SelectConfiguration>
</Configuration>

The JoinInfo nodes are both the same as the Multi Row Formula Tool. So we can just copy the action tools for it and target the JoinInfo nodes. This leaves the section in the SelectFields to do. This is quite similar to the SummarizeFields case above. The rules this time are:

  • We always want to have: <SelectField field=“*Unknown” selected=“True” />
  • We also always want to have <SelectField field=“Right_Constant” selected=“False” rename=“Right_Constant” />
  • If no grouping fields are selected we don’t want to add anything else.
  • Otherwise for every entry in the list box we need <SelectField field=“Right_FieldName selected=“False” />
    First we add a prefix: <SelectField field=
    We can do this by replacing the separator with: selected=“True” /><SelectField field=
    We then finally add a suffix: selected=“True” />

The formula for this final case is:

‘<SelectField field=”Right_Constant” selected=”False” rename=”Right_Constant” />’
+ IIF([#1]!=””,
‘<SelectField field=”Right_’ +
Replace([#1], ‘§’, ‘” selected=”False” /><SelectField field=”Right_’)+
‘” selected=”False” />’,
“”)
+ ‘<SelectField field=”*Unknown” selected=”True” />’

End Result

The final macro can be downloaded here. There is a simple test workflow here. The User sees:

image

and the end result is a table containing the specified grouping and the calculated Median values:

image

Alteryx Formula Add Ins (v1.1)

FormulaAddInsVersion 1.1 of my Alteryx formula add ins project has been released. These provide a set of utility functions for use wherever a formula can be used within Alteryx. The functions are developed in C++ and XML using the Alteryx SDK – I will write a few posts with the details of how to create a new function and how the unit test framework works in the next couple of weeks.

Download the zip file from here, and the function reference guide here. The source code can be downloaded directly from GitHub here.

Goal of the Project

The idea of this project is to:

  • Provide a set of useful functions to make data preparation and analysis simpler and easier.

This project started to make string filtering easier within version 9 with the creation of three simple XML based functions:

These are now part of version 10, and commented out in this project

    • StartsWith
    • EndsWith
    • Contains.

Over time, have expanded functions based on issues that come up in the community, particularly around handling dates.

  • Increase exposure to the formula add in SDK.

Alteryx provides a great SDK for creating both custom tools and new formula. Part of the goals of this project is to create a set of sample XML and C++ functions people could use as basis to create their own.

Installation

Alteryx looks for XML files in a folder called FormulaAddIn within the RuntimeData folder inside the bin directory of the Alteryx install. For an admin install, this will default to be: C:\Program Files\Alteryx\bin\RuntimeData.

There is an Install.bat script which will copy all the XML files and the DLL files into this folder. As Windows UAC protects this folder if within Program Files, the install script will ask for permission to install. The script will also unblock all files, as Windows will possibly have blocked them as they were downloaded from the internet.

If you have a non-admin install of Alteryx, it is possible the install script won’t be able to find the Alteryx folder. In this case or if the bat file fails you will need to copy the files manually.

To do this, copy all the XML and DLL files to %AlteryxBinDirectory%\RuntimeData\FormulaAddIn.

Function List

MiscUtils

A few additional general methods for data preparation.

  • IfNull: If first value is null return second
  • Coalesce: Given list of values return first non null value (C++)

MathUtils

Math based functions that I haven’t got a better home for!

  • Modulo: General Double Based Modulo function
  • HexBinX: Given an X,Y point and optional radius, get X co-ordinate of hexagonal bin’s centre
  • HexBinY: Given an X,Y point and optional radius get Y co-ordinate of hexagonal bin’s centre
  • Rand_Triangular: Given a uniform random number transform into a triangular distributed random
  • Avg: Average of a list of values ignoring NULL
  • Count: Count of a list of values ignoring NULL
  • Sum: Sum of a list of values ignoring NULL

DateUtils

Some additional functions for working with Dates and to a certain extent Times.

Date Time Creation Functions (Currently Just Concatenation)
  • MakeDate: Create a new date from Year, Month, Day
  • MakeTime: Create a new time from Hour, Minute, Second
  • MakeDateTime: Create a new DateTime from Year, Month, Day, Hour, Minute, Second
  • ToDate: Truncate a DateTime to a Date
  • ToDateTime: Appends midnight to a Date to create a DateTime
Date Part Functions (returns numbers)
  • Day: Get the day of the Month [1-31]
  • Month: Gets the month number [1-12]
  • Year: Gets the four digit year
  • WeekDay: Gets the day of the week [Sunday (0) through to Saturday (6)]
  • Quarter: Gets the quarter of the date [1-4]
  • OrdinalDay: Gets the day of the year [1-366]
Period Start and End
Other Date Functions
  • DateAdd: Equivalent to DateTimeAdd but returning a Date
  • BusinessDays: Number of weekdays between two dates
  • IsLeapYear: Is a year a leap year
    (takes a year not a date as an input use IsLearYear(Year([Date]))

StringUtils

A few extension methods for working with strings.

  • LeftPart: Gets the text before the first instance of a separator
  • RightPart: Gets the text after the first instance of a separator
  • Split: Splits a string into tokens and then returns the specified instance

Retired Functions (built into v10)

V1.1 Release Notes

  • Full set of unit tests for all the functions.
  • Wiki now contains a function reference for all the functions.
  • Improved Install.bat to pick up all xml and dll files.
  • New functions: AVG, COUNT, SUM, MAKETIME, MAKEDATETIME, TODATE, TODATETIME, DATEADD
  • HEXBINX and HEXBINY moved to C++ functions
  • Bug Fix: LEFTPART and RIGHTPART missing bracket.
  • Bug Fix: RIGHTPART return whole string if delimiter not found.
  • Bug Fix: MODULO deal with NULL or 0 for divisor.

First Pre-Release of JDTools for Alteryx (v0.1)

Having created my simple framework for making custom C# based Alteryx tools, I thought I would release an initial version for people to take a look and try. Please note these are pretty experimental and while stable enough on my PC, I don’t have the resources of Alteryx to test widely and confirm stability. Please let me know if you do use them and if they work or not for you.

I won’t go into too much technical detail on the code, that is for another post, but will give a feeling of how I worked through the initial three tools. The whole project is on GitHub and will continue to be developed there, I welcome any suggestions of features or bug reposrts (or if anyone interested in helping out let me know!).

No pretty installer or anything yet, just a simple batch file which will create the necessary ini file and copy to the correct location. Download the zip file from GitHub, extract it somewhere that you want to keep it and then run Install.bat.

After that open Alteryx and a new tool group should appear called JDTools:

image

Inside it you will find three tools:

imageDateTimeInput

A slightly expanded version of the Date Time Now tool. This was the first tool I created as a learning exercise, but still quite useful in itself. It will create a date field equal to one of the following values:

  • Today
  • Yesterday
  • StartOfWeek
  • StartOfMonth
  • StartOfYear
  • PreviousMonthEnd
  • PreviousYearEnd

If you want to get started on creating custom tools, this is probably a good place to look. The code for this tool is here.

imageCircuitBreakerTool

This is an expanded version of the the Test tool. The idea here is that the input data (I) is only passed onto the rest of the workflow if there is no data received in the breaker input (B). I always find not being able to pass through the test tool frustrating as often that was what I needed to happen.

This was a great learning exercise on handling input data, copying and caching records and passing on to the output. The code for this tool is here.

imageDateTimeParserTool

Probably the most generally useful tool, this exposes the .Net DateTime parse functions to Alteryx. For this I needed to find a way to allow the user to pick an input field (see the InputFieldTypeConverter), to copy most of the values, parse a date and add to the result. As this tool supports all formats that the DateTimeParseExact function does, I hope it will help the community deal with parsing all the random formats that keep coming up.

Again learnt loads writing this tool, but this one involves by far the most moving parts. The code for it is here.

More coming soon…

Lots more hopefully coming soon. I have lots of ideas I’d love to try (like using NuGet as a tool distibution platform, and my favourite a vlookup style join tool), so hopefully this is just the first of a stream of releases.

Updated 1st March 16 to v0.1.1 release. And to v0.1.2 to try and sort installer!

A Couple Of Pet Alteryx Projects

I am planning to do a blog series on extending Alteryx and various options that are provided to do this (but it takes me ages to write a post!). As part of learning some of the tricks for this I have been working on a set of Formula Addins (xml based)) and Custom DotNet Tools. I will post a lot more details later, but felt worth opening up the development work more publicly.

Formula AddIns:

image

At the moment, the formula add ins are centred around Date functions and a few random others. Take a look at the GitHub repository here: https://github.com/jdunkerley/AlteryxFormulaAddOns

I was pleased when I got the HexBins functions working in there as well. Formula is disgusting but I took the d3 implementation and created a function which reproduces it, and produces the same results as the Tableau functions.

Custom DotNet Tools:

image

The custom tools are just starting off but I think is a nice approach. The GitHub repo for this project is here: https://github.com/jdunkerley/AlteryxAddIns

It uses a lot of reflection to hide most of the boiler plate code needed. This means it is less flexible than the base API (for example, no support for union style inputs yet) but hopefully it will make it quicker and easier to create simple tools.

The BaseTool class handles the Plugin implementation. It takes a configuration type and an engine type. It follows the same rules as the BaseEngine class to work out Incoming and Outgoing connections. As a cheat, the WinForms PropertyGrid is used to create a simple GUI for configuring the tool. The standard XmlSerialiser is called on the configuration type to allows saving back into and reading from the workflow.

The BaseEngine class handles most of the work interfacing with Alteryx Engine, only requiring the configuration type and an implementation of PI_PushAllRecords. Outputs connections are just properties on the derived engine class of type PluginOutputConnectionHelper, using attributes to label and order them. Likewise Input connections are just IIncomingConnectionInterface properties, using the same attributes to label and order them.

At the moment, I haven’t finished the Incoming connection work, this will be built up next and hopefully then this project can be used to allow people to understand how to build up custom tools and expose the fantastic API to more people.

Downloading and Parsing Met Office Historic Station Data with Alteryx

imageI was having a look at the public data the Met office provide on their website. They have a huge amount freely available on their website. I was interested in the historical data they provided for each of the stations accessible via this web page.The page allows you to download the data one station at a time, but using Alteryx we can do better.

First things first we need to get a list of all the stations and the URLs to the data files. As this is a dynamically created map we can’t just grab the source of the page. However opening the page up within the Chrome Developer Tools (press F12 to access, similar tools are available in Internet Explorer/Edge or Firefox) and looking at the network tab lets us see all of the data files the page is requesting:

image

We can filter the long list down to just those made by JavaScript, using XHR option in the toolbar. Looking down the list I found historic.json. This contained a list of all of the stations on the map, grouped by ‘open’ or ‘closed’:

{
“closed”: {
“Cwmystwyth”: {
“id”: 371423,
“lat”: 52.35817,
“lon”: -3.80198,
“opened”: 1959,
“closed”: 2011,
“url”: “cwmystwythdata.txt”
}, …

The URL within each entry is relative so I needed to find the full URL for one from which I could generalise and download the data from. So I downloaded the Aberporth data set and recorded the URL: http://www.metoffice.gov.uk/pub/data/weather/uk/climate/stationdata/aberporthdata.txt

imageNext step was to create a flow in Alteryx which would download the historic.json data file and parse it into a table which I could then use to generate a list of URL to download each of the corresponding txt files. Alteryx has a JSON parse tool which can be used to split the JSON into a field based table. Version 10 also lets you use JSON data as an input or output tool, but as yet I can get it to connect to an http address, instead I used a Download Tool linked into the JSON Parse tool.  This produced a table like:

image

All the data I need was encoded in the JSON_Name and JSON_ValueString. Using a Text to Column tool set to split the JSON_Name field and then a Cross Tab tool to break this up and build a table. Final step was to drop the columns I didn’t need and to create the full URL. Final table looks like:

image

The final flow to download the JSON and parse looked like:

image

Reading the JSON Data via PowerQuery

A small shout out to PowerQuery in Excel for this process as well. You can get to a table of data from a JSON URL in a couple of clicks. Screenshots below are from the Excel 2016 version, but same is true in 2010/13 version I think. Go to Data tab in the ribbon, then From Other Sources, From Web. Enter the URL to the JSON file in the dialog box.

image

After that it is a couple of clicks to convert the records into a table, then expand the value column (unselect ‘Use original column name as prefix’) to produce a column based table like the one below.

image

image

The next step is to un-pivot all the columns (except the first one) and then to expand the resulting value column into a table, again excluding the original column name. Finally we create a couple of custom columns containing the full URL and isClosed flag and finally reorder and rename (very much like in Alteryx). The resulting table is shown below.

image

PowerQuery is a great free extension of Excel, and while no where near the power of Alteryx is a nice option for quick getting some data together. I find it especially easy at handling JSON data.

Downloading and Parsing the Data Files

Having got the list of stations and the URL to each of the data files the next step was to link up another download tool to the generated list of URLs. The download tool will read the data into a string field. The file format looks like:

image

The files are fixed with with some header text at the top. All fields are numbers which makes processing easier. Even though they fixed with, I used the same technique I used to read CSV files. Only difference is the separator is a space rather than a column. As I add a row index as part of the parsing process, I used this to pick the header row out and to skip the first few comment lines. The flow below downloads the files and parses into a table:

image

Final step is a little data tidying and converting types. I used the Find Replace tool to repair the few records which had additional notes on them. I converted the year and month into a date using the DateTimeParse function. The output looked like:

SNAGHTML8b182b6

You can download the workflow and output TDE file.

Musings from a developer and father, on this that and everything…

Follow

Get every new post delivered to your Inbox.

Join 209 other followers

%d bloggers like this: