Highlights from the Inspire Europe Conference

Inspire Europe

Last week, I attended the Alteryx Inspire Europe conference at Kings Place, London. This was the first time they had hosted the event outside of the US, and it was a great success with fantastic speakers and events, as well as selling out to over 400 people.

The night before the conference kicked off, about 100 users gathered for the London User Group. There was a Q&A session with Alteryx’s CCO Libby Duane and CTO Ned Harding, as well as a talk by Carto on their geographical visualisation platform and it’s integration with Alteryx. As always there were lively discussions. I spent a great time (unfortunately missing a lot of the Q&A session) chatting over my experiences with the SDK with Ben Gomez, Senior Product Manager at Alteryx. It was great to meet Ben in person and talk over some of the work I have been doing with SDK. The support I have been provided with over past 18 months has been fantastic (including some amazing one on one support from Ned).

Since I started using Alteryx early in 2015, I have found the community to be incredibly friendly and engaging. Over this time, I have got to know a lot of people, many of whom I had not met in person prior to the user group and conference. The user group was a great chance to meet many people and catch up with some others. It was great to see so many employees from Alteryx travel over from the US to be at the conference. A personal shout out to Mark Frish, one of the Alteryx ACEs, and a personal friend who specially made the trip from Chicago to attend the conference.

The conference kicked off with a keynote by Dean Stoecker, the CEO talking about some of the customer use cases and then about Daniel Boorstin’s Knowledge Trilogy. He also discussed the Alteryx for Good program and work it has been doing.

Dean Stoecker - Welcome to Inspire

Following this keynote, there were two separate tracks: The Data Analyst track and The Technology track. The Data Analyst track was given by customer’s describing their use cases and stories of how Alteryx has made a difference in their organisation. The Technology track concentrated on best practices and techniques for using Alteryx as well a session talking about the product road map and the plans for the platform.

Chatting to Ned in the Solution Centre

Throughout the conference, employees, ACEs and partners manned the Solution centre. This is an open space where anyone can bring their questions and difficult workflows and get help in solving any issues or problems they might have. While I didn’t have any specific workflows to discuss, I spent a great hour or so deep in conversation with Ned discussing the platform, engine, SDK and the pros and cons of C++ vs C# programming. Plus, on day two while in the solution centre, Dean demonstrated Alteryx Charades to us!

The second day kicked off with the COO George Mathew’s keynote. He showed off the integration with Carto, PowerBI and Tableau, as well as some of the new functionality coming in version 11. After this there were a couple more sessions in the separate tracks before CCO Libby Duane’s keynote. Libby concentrated on the community of users and announced the results of the Analytic Excellence Awards, the new members of the ACE program and presented the Grand Prix cup. Immediately following Libby’s keynote, the guest keynote was given by Tim Harford, author of the Undercover Economists. He spoke brilliantly about modern presentation of data and the dangers of disingenuous use of statistics and numbers.

A Brief History of Alteryx

The final keynote (or technology track session) was given by Ned Harding. Called ‘A Brief History of Alteryx’, he went over first the original inception of the product (frustration at processing census data) before going into detail over the architecture of the platform. He went all the way down to the merge sort algorithm used within the engine! This was a great overview of how it all works and well worth the time for anyone who wants to truly understand the platform and how to get the most from it.

GRAND PRIX

The Grand Prix is a tradition of the Inspire conferences. It is a data analytics speed race. For the inaugural European race, about 18 people entered the qualification round back in July. The qualification took place via WebEx with the competitors being given 7 workflows to work through. These range from cleansing and parsing data through to spatial analytics and predictive time-series work. For the first time, two of the competitors completed all seven of the problems. The three finalists selected were Charlie Archer from Javelin Group, Remco Diepenbroek from Infotopics and me.

The final took place at the end of the first day of the conference. Three identical workstations were set up in the ‘Battle Bridge’ conference room, with their screens also being shown on TVs on the wall behind. The race consisted of three ten minute rounds (laps), all based on analysing some road traffic accident data from the Leeds area. Each lap involved answering some questions worth one point each, with a bonus point awarded for finishing first. The first round concentrated on basic data preparation, the second round on spatial analytics and the final round was based on a logistical regression.

They make a big deal of all the contestants – giving it the feel of a sporting event rather than a data analytics competition. All the competitors are paired with a ‘Pit Crew’, who reviewed the answers as you went along and score the laps. It was great fun competition and a very close race, I was amazed to emerge the winner. I hope Charlie and Remco will compete again next year and will be cheering for them if they do!

The Inspire Europe Grand Prix

A thanks to Gene Rinas for being my Pit Crew and to Paul Banoub for leading the #TeamDunks support!

ACE PROGRAM

The ACE program is Alteryx’s way of recognising user advocates within the community. The program has some truly incredible members, many of whom I have got to know since starting to use Alteryx. It was an honour to be announced with Daniel Brun as the newest members of the fantastic group during Libby’s keynote.

Daniel, Libby & I

THANKS

Thanks to all the hard work by everyone organising and taking part in the conference. I look forward to Inspire Europe 2017 at the Tobacco Dock in London, even if it will mean handing over the Grand Prix cup to the next winner!

Photos in this post are courtesy of Alteryx. You can see more photos from Inspire Europe here.

Beyond Alteryx Macros (Part 2) – How to Create an XML Macro Function

imageIn the first part of this series, I gave an overview of the Alteryx platform and some of the functionality offered by the SDK. The first parts of the SDK I am going to look at concern adding additional functions into the formula function library. There are two methods of doing this: XML Macro functions and C-style DLL functions. This post will take a look at first of these – XML Macro functions.

This was the first part of the SDK I played with (not really realising the power of the SDK at that point), I came across them based on a blog by Adam RileyFormulaAddIn (or 3Rd Floor Flat part 2). My early experimenting was to add some additional string conditional functions – StartsWith, EndsWith and (following Ned’s suggestion) Contains. These are now built in (though I think as C++ functions in the library) to core functions library.

The function library is used within various of the core first party workflow tools, as well as within some reporting and tools for macros or apps design. One exception worth noting is the In-DB formula tool – this is a SQL function tool not linked to the formula library. Some of the tools using the formula system and the different operation they perform are listed below. This in not an exhaustive list, I have listed the ones I use but gives an idea of how widespread the function library is within Alteryx.

  • FormulaSNAGHTML42082bc
    This lets you create new fields or update existing fields using an expression based on existing values on the row
  • Multi-Row Formula
    This allows you to look ahead or look behind a number of rows and create a new field or update an existing field based on them
  • Multi-Field Formula
    This allows you to perform an operation over all or a subset of the fields, and either create a new field for each (the default) or overwrite the existing field (including changing the type). This can be very useful for data cleansing for example.
  • Filter
    You can use the formula based expressions as custom filters
  • Generate Rows
    This allows you to perform the equivalent of a For…Next loop within Alteryx, generating a set of rows for each input row.
    The initial value, loop step and termination expression are all formula expressions.
  • Testimage
    This tool lets you raise a warning or an error on a condition. This condition can be a formula expression.
    This is a fantastic tool and allows you to add safety checks to joins, etc. We will be using this in the next part of this series for creating test workflows.
  • Message
    This tool allows you send a message either at the start, at the end or based on a conditional expression. The message is also an expression as well
  • imageAction
    This tool allows you to take inputs from interface tools and then update the configuration of other tools within a macro or analytic app.

It is worth noting that XML Macro function cannot add anything that wasn’t possible before. In many ways, they are syntactic sugar. In other words, they have the power to make using function both a lot more pleasant and also a lot quicker. Take as a simple example, the Contains function. Prior to its inclusion in the library you had to write something like:

FindString([Haystack],[Needle])!=-1

While not a big change, the contains function reads cleaner:

Contains([Haystack],[Needle])

As a tutorial, we will create two simple functions:

  • PHI()
    This will return the Golden Ratio
  • IFNULL(<value>,<value_if_null>)
    This will return value unless null, in which case it will return value_if_null

We will look at the process to create both of these. In the next post, we will look at creating a simple installer script and setting up some testing workflows.

The Formula Tool

The basic formula tool is the best place to start when wanting to create new functions.

This post is based around the current version of the formula tool (as of version 10.6). A new UI was demonstrated as part of the keynotes at Inspire 2016 in San Diego (see George Mathew’s Keynote). As discussed in the previous posts, the UI and the engine are actually two separate DLLs connected via the XML configuration. The engine piece is not being changed (I don’t believe) and all formulas and extension functions should work as before within the workflow. From what I have seen the new UI is a fantastic step forward (and great demonstration of using HTML as the UI piece) – giving both IntelliSense suggestions and syntax colouring much like in Tableau.

The first step to creating a macro function is to first work out the formula for the function you want. The formula for the Phi and for IfNull are:

Phi

(1 + SQRT(5)) / 2

IfNull

IIF(ISNULL([value]),[value_if_null],[value])

You can set these up in the usual manner in a formula tool, and then test that it gives you the results you expect:

imageimage

For a XML macro the parameters are passed as P1, P2, P3 … The Phi function is not going to take any so we don’t need to change anything there. I find the easiest way to do this is to add a field called P1 and P2 in formula tool and then replace the fields in the function as needed (this will make Alteryx do all the work of writing the XML for us). You don’t need to surround the field with [ ], this is only needed if your field name contains operators or white space. The new formula tool and results table look like:

imageimage

We now need to get to the XML for the function. This will handle encoding special characters in the XML for us (such as “ and &). The table below gives the characters that need encoding:

Character XML Description
&quot; double quotation mark
&apos; apostrophe
< &lt; less than
> &gt; greater than
& &amp; ampersand

There are two ways simple ways to do this. Either save the workflow and open it in a text editor. If you look at the top of the configuration panel there is a number in brackets (5 in the case above). This is the Tool Id. You can use this to find the section of the xml you need (search for ToolID=”5”). You are looking for the Node element and the Configuration section within it:

image

An easier way to access the XML (and one to get very familiar with when doing SDK development) is to use the XML view within the Alteryx designer. You will need to turn this on within the user settings. Go to Options => User Settings => Edit User Settings. Then within the Advanced tab, there is a tick box ‘Display  XML in Properties Window’:

image

Having done that a new section appears in the Properties Window for the workflow and for the tools within it. For the workflow itself, it gives you read-only access to the whole yxmd file. For a tool, it gives the Configuration element for the tool:

image

Creating The Add In

We now have the XML expression for the functions we want to create, we now need to convert them into a formula add in so we can use them as functions. A formula add in is just an XML with each function defined as a Function Node. Lets start by creating an XML with the following contents:

<?xml version=“1.0” encoding=“utf-8”?>
<FormulaAddIn>
</FormulaAddIn>

I am not sure if it the document element name (FormulaAddIn) matters but have always used this. We now need to create Formula nodes for the two functions. Each Formula node has six required elements for a macro function:

Name Function name
Category Category name for function to appear under. This can be either one of the existing ones or a custom name.
NumParams Specifies the number of parameters. It has a required boolean attribute of variable. For XML Macro functions this must be false.
Description Text to display to inform the user what the function does.
InsertText The text to insert into the Expression box should the user double click on the function in the function list
Formula The actual formula to substitute in place of the function name with P1, P2, … as place holders for the parameters

Lets start by creating the PHI function. We can get the formula we need from the existing formula tool XML. We want this to be within the Math category. It takes 0 parameters. The XML for this looks like:

<Function>
<Name>PHI</Name>
<NumParams variable=“false”>0</NumParams>
<Category>Math</Category>
  <Description>The Golden Ratio</Description>
  <InsertText>PHI()</InsertText>
<Formula>(1 + SQRT(5)) / 2</Formula>
</Function>

Insert this block into the XML file. This specifies the function name (PHI), the category (Math) and description (The Golden Ratio)for the function list. If you double click on PHI within the list, it will insert PHI() into the expression box. This function will be replace with the formula ((1 + SQRT(5)) / 2) when the engine evaluates this expression. The NumParams node is specifying that this function takes no arguments. The variable attribute must be set to false for macro functions, this is for C-style DLL functions which we will look at in another post.

Now moving on to look at the IFNULL function:

<Function>
<Name>IFNULL</Name>
<NumParams variable=“false”>2</NumParams>
<Category>Specialized</Category>
  <Description>If the first parameter is null replace with the second</Description>
  <InsertText>IFNULL(Value, ValueIfNull)</InsertText>
<Formula>IIF(ISNULL(P1),P2,P1)</Formula>
</Function>

This block is very similar to the block for PHI. I have chosen to put the IFNULL function with the Specialized category. The formula is exactly the same as we saw in the formula tool above. The only very slight complication is the InsertText element. This is where you can leave placeholders for the parameters for the user to replace. It worth making these descriptive so the user knows what each parameter does. This is as close as you get to providing syntax help to the user.

We have now created the basic XML add in file. You can download this file from here. This needs to be saved to the bin\RuntimeData\FormulaAddIn within the Alteryx install directory. For an admin install of Alteryx, this will be probably something like C:\Program Files\Alteryx\bin\RuntimeData\FormulaAddIn and the folder will be protected by UAC. To get around UAC save to an unprotect location and then copy to the folder. If you have a non-admin install the equivalent install folder will probably be %LOCALAPPDATA%\Alteryx\bin\RuntimeData\FormulaAddIn. After this restart Alteryx, and the functions should be in your formula list (you don’t need to restart for the engine to be able to use the functions as the engine is a separate process to the designer):

imageimage

You can then use the functions within the formula expression as you would expect:

image

There is a sample workflow for these function you can download here. This workflow will write the add in file to a temporary location and install it on first run. You will get a UAC prompt on this run. You will need to restart Alteryx for it to appear in the function list. Once the add in is installed the UAC prompt wont trigger again. If you wish to uninstall the add in then disable the Installer tool container and enable the Uninstaller tool container.

Limitations of Macro Functions

First as we said above, these are just syntactic sugar. They get replaced with their formulae early on in the evaluation of an expression. This being said, they can make you expression significantly easier to read and write. For example, the IFNULL function means you do not need to repeat the first argument, if this is a long expression then this results in a nice simplification. I also believe the argument is only evaluated once so if the evaluation is hard then this can be a big performance improvement.

Debugging these add ins can be difficult. Alteryx will not give you any error messages if the XML is badly formed. The add ins will just not load and the functions will not be available in the designer or the engine. I have been playing with creating a XSD (XML Schema) file and then using an online XML validation tool (e.g. http://www.utilities-online.info/xsdvalidation/#.V6-VCZgrIcU) but haven’t completed this yet.

Because debugging them are hard – don’t take it too far. In my experimenting with then. I did go as far as to create HexBin functions (based off the d3.js code) in XML (see image below). They took a lot of tuning and playing with to get to working and were impossible to adjust and really understand. I have now re-implemented them as C-style DLL functions.

image

Final mention of some fun and games with recursive calls. I was trying to set up a number of business days function and wanted it to call itself if start was after end. Easy enough to express that in the formula but Alteryx does not like it. It will load the add in quite happily but the second you call the function it will just die! I think the substitution engine cannot cope and while fun to find a reliable way to make Alteryx die (as it is a very stable product), it is trivial to create another function and avoid this issue. I do not believe you can mark a function as internal or private and if you get to much complexity or layers worth thinking about C-style DLL functions instead of XML macros.

Next Post…

We have created our first add in. In the next post, I’ll take a look at source control, unit testing and creating an installer.

Beyond Alteryx Macros – Introduction to the SDK

image

Alteryx has a huge built in set of tools (about 170 tools in v10.1 I think), but when something is missing or you just have a specific way you want to do something, it has various options on how you can extend it.

The simplest way (and one of the most flexible ways) is to create a macro. These allow you to create a reusable workflow performing a specific task or calculation. The various types (Standard, Batch, Iteration and Location Optimisation) provide a huge amount of power on top the basic workflow. The designer makes it easy to take an existing workflow, convert it to take some inputs (via the Macro Input tool), and then produce some outputs (via the Macro Output). There is also a simple interface toolkit allowing them to have some configuration. The idea of this series of blog posts is to look at some other methods provided in the SDK rather than macros, but it is always worth bearing them in mind before diving into an SDK based solution.

The series will start with a overview of the Alteryx platform and the SDK, and then walk through how to create a XML Formula Add On extending the set of functions in the formula tool. After this I will talk about how to create some simple tests and an installation script for it. Then we will take a look at creating C++ based functions, before diving into creating custom tools in C#. I am planning to also look over the new HTML / JavaScript custom tools API (which were added in version 10.1).

All of this will be based on what I have learned from experimenting with the SDK. I’ll describe how I think it all plugs together. Ned Harding did a talk at Inspire on the internal structure of Alteryx – I am looking forward to hearing his Inspire Europe’s version and correcting all the mistakes in my understanding.

In general, I will use examples gathered from the questions asked in community or things I think are useful additions. All of the code will be available on GitHub – either as part of my Formula Add Ins or Custom Tools Add Ins. Hopefully by the end of the series, you will have learnt some very powerful ways to make your analysis easier and learnt some more about how Alteryx works.

Getting Started

The SDK is included in all Alteryx designer installs. It is in the APIs directory within the Alteryx install folder.

image

The AlteryxSDK.zip contains the C-style SDK for creating C/C++ custom tools and formula functions, but even more importantly it also contains the AlteryxSDK.chm documentation file. While you don’t need access to the contents in here to create either formula add-ins or .Net based custom tools, the information in the documentation file has been exceedingly helpful as I have built functions and tools. You will need to contact your sales team to get the password to extract the zip file.

The SampleCode folder contains the custom tools .Net API documentation and examples (as well as various automation examples using .Net). While this documentation was enough to get started – the additional detail provided inside AlteryxSDK.chm has been critical as I moved forward.

A Quick Overview of Alteryx Platform

A big proviso here – all of this is based on my experimenting with SDK and APIs and hence liable to be hugely inaccurate and missing vast sections as I only really know what I need to build my tool set. If you want the true structure and details then you will need to get to Ned Harding’s talk at Inspire Europe 2016.

Alteryx Designer itself consists of two main parts – the Designer and the Engine. The Designer is the front end the user interacts with to create a workflow that the Engine will run. A workflow is defined by an XML based configuration file which tells the engine what tools to create, how to configure them, and how they connect together. The diagram below shows some of the components of the platform:

AlteryxEngine

The Designer

The designer has various core service and features. It is responsible for hosting the engine when running a workflow – collecting the results into Browse Anywhere data stores and log messages into the log buffer. Tools within the designer break into about three different types. A few areas are core functionality of the designer (the Interface tools for example) and are built into it (I believe these are only for things which a designer specific). These cannot be extended using the SDK.

Quite a lot of the tools are actually pre-built macros designed by Alteryx themselves. These act exactly the same as user created macros. You can learn a lot about what is possible and how to build macros by taking a look at these. If you go to Options => User Settings => Edit User Settings, under the Canvas tag there is an option to display an indicator to show something is a macro. You can then right click and view the inside of the macro:

imageimage

Finally the vast majority of the tools come from plugins. Alteryx really seems to practice ‘dogfooding’ – using the SDK to build their own tools. If you go to the bin\Plugins directory within the Alteryx install path you will find a lot of different DLLs. These contain the core tool set of Alteryx.

SNAGHTMLa9191

The designer hosts a collection of DLLs, containing one or more plugins. Each plugin has a configuration designer and an entry point for engine to call. The entry point does not need to point to the same DLL or even be based in the same technology as the configuration GUI. This means you can use the platform which suits you best to design the GUI (either .Net or HTML) and use a different one for the computation (e.g. C++). Historically, Alteryx have used C# for their GUI controls and C++ for the engine side, though I expect to see more HTML based GUIs in the upcoming versions.

One additional part of the designer which is useful to be aware of when working with the SDK are is Default setting file. This tells the designer how to arrange the ribbon and layout the tools. While not technically part of the SDK should you want to put a space in your tool’s name the only way is in this configuration file. It is in the bin/RunTimeData folder and called DefaultSettings.xml.

The Engine

The engine is the magic of Alteryx. It is an exceptionally fast platform for working with data. It is hugely expandable – for example the In DB library is a Engine Plugin adding a completely different way of interacting with a database than was there before, and it can cope with massive volumes of data (even on relatively modest hardware):

2016-06-08_10-05-36.jpg

I am not exactly sure where in the process the Xml in the workflows and macros are decoded (I guess in the Engine itself), but the engine converts it into a calls onto functions in the Engine Plugins. The Entry Points specify the name of DLL and the name of the method to call. The Xml specifies how to the data flows through the workflow.

The formula tool is itself just part of one of the base plugins. It however adds some additional features to the SDK. The XML formula add ins allow you to add additional functions either as simple XML macros or as references to C-style functions.

Other Parts

On top of these two parts, there is also a command line interface to execute workflows (AlteryxEngineCmd.exe) and the server product. I have no experience of the server product and only very limited experience of the AlteryxEngineCmd (I had brief access to it at my previous job). All of the techniques from the SDK work in both these environments I believe although only those relating to the engine are relevant, but I have never used them in anger.

The SDK

There are three versions of the SDK: C-Style, .Net API and the new HTML/JS API. One fantastic feature is you can mix and match as mentioned above – e.g. use HTML to define the GUI and C++ for the engine. I am not sure if 100% free choice – when I get to writing about the HTML API will try calling a C# engine plugin!

C-Style SDK

The C-Style SDK allows you to create either functions for the Formula tool or new engine plugins. In this blog series, I will take a look at using this SDK to create new functions using just XML and also using C/C++ based functions. I am far from an expert on C++ so have not tried using it to create Engine Plugins with it, but this is the platform Alteryx themselves use. The plugins for the designer are .Net based (or HTML based) so the C/C++ SDK wont help with these.

The SDK also provides the ability to run the engine from your own C code. I do not have the appropriate license for this or any experience with using it.

This is the fastest and most native of the engine APIs.

.Net API

The .Net API allows you to create custom tools both the GUI side and the engine side. There are some limitations on what is possible with this API (no access to Browse Anywhere for instance, and I reckon no way to do something like In DB tools via it!) but it is very flexible and allows you to create tools that act and feel like first party ones.

The formula functionality of the C-Style SDK is only available in C.

As with the C-Style SDK, there is an automation style add in allowing you to run the engine and workflows from your own .Net code. Again you need a special license to use this functionality so will be beyond the scope of this series.

HTML/JS

The HTML API is the newest. It has the ability to create custom tools both the GUI and the engine side. This is the furthest away from the native engine code so I expect performance to be slower. That said if you want to read data from web pages it is going to be by far the easiest.

I am still playing with this API but hope to include it as part of this series. There is also some interesting work going on about how to package and release these (I hope it will be extended to cover the older SDKs too), which might make them more appealing within the community context.

Next Post…

First up, will be creating pure XML based, formula functions.

Geocoding and Finding Nearest Station with Google Web Services

imageThis posted started as a question on the Alteryx community about finding the nearest underground (subway if you prefer!) station to an address. I don’t have much experience working with addresses and map data, but thought their was probably a way leveraging the APIs Google maps provide.

This feels very much like “standing on the shoulders of giants” that Scott Hanselman talks of in his blog post.

Geocoding

maps_64dpFirst task was to convert the address into a longitude and latitude. This process is called geocoding. Google Maps has a whole range of different APIs one of which is Geocoding. The geocoding API has a fairly straight forward syntax to call it using a standard http GET request:

https://maps.googleapis.com/maps/api/geocode/json?address=1600+Amphitheatre+Parkway,+Mountain+View,+CA&key=YOUR_API_KEY

The API requires an API_KEY. You can sign up for free here. There is a limit of 2,500 requests for free on the standard plan, but should you need more then you can enable pay and go paying $0.50 / 1000 extra request. If you don’t have a valid API key you will get a request denied message (although oddly not an HTTP error!):

image

imagesHaving got an API key, now wanted to experiment with API and see what format the data came back in. There are various different REST API testing tools out there (including just doing it in a browser), but my favourite is Postman – it allows straight forward testing of these kind of APIs and has a good JSON pretty printer. It also has a JavaScript testing engine inside it which allows you to do some simple scripting on the response. Finally, you can also use environment variables, so I can keep my API key in there and use it within the Uri by writing {{ApiKey}}.

To set up the environment and variable in Postman, click the environment drop down and select manage environment:

image

Inside the new window you can create a new environment (I chose to call it GoogleApi) and then add a new key called ApiKey with a value set to the ApiKey:

SNAGHTML3088185

Having created the environment, it should be available in the drop down on the right. Google’s geocoding API has a few different options. In this case I wanted a JSON format return and would be specifying an address. The request URL (using this environment variable) looks like:

https://maps.googleapis.com/maps/api/geocode/json?address=<Address>&key={{ApiKey}}

Taking a simple test case of looking for the address “Scott Logic, Newcastle”. Running this query in postman results in a JSON result like:

{
“results”: [
{
“address_components”: [
{
“long_name”: “Newcastle upon Tyne”,
“short_name”: “Newcastle upon Tyne”,
“types”: [
“locality”,
“political”
]
},…
],
“formatted_address”: “Newcastle upon Tyne, UK”,
“geometry”: {
“bounds”: {
“northeast”: { “lat”: 55.0453044, “lng”: -1.5326051 },
“southwest”: { “lat”: 54.9594399, “lng”: -1.7810817 }
},
“location”: { “lat”: 54.978252, “lng”: -1.61778 },
“location_type”: “APPROXIMATE”,
“viewport”: {
“northeast”: { “lat”: 55.0453044, “lng”: -1.5326051 },
“southwest”: { “lat”: 54.9594399, “lng”: -1.7810817 }
}
},
“partial_match”: true,
“place_id”: “ChIJzWRvDH6FfUgRkWGncrBS4gs”,
“types”: [ “locality”, “political” ]
}
],
“status”: “OK”
}

 

 

Some of the address components have been removed from the above, the full JSON file can be downloaded from here. There is a lot of information which can be read out this response. The first block address components tells us about the matched address, described as a set of fields:

Field Short Name Long Name
locality, political Newcastle upon Tyne Newcastle upon Tyne
postal_town Newcastle upon Tyne Newcastle upon Tyne
administrative_area_level_3, political Newcastle upon Tyne Newcastle upon Tyne
administrative_area_level_2, political Tyne and Wear Tyne and Wear
administrative_area_level_1, political England England
country, political GB United Kingdom

For my purpose, I only need the latitude and the longitude. These are stored within the geometry object as the location. The additional properties of the geometry object specify the region that appears on a Google map search.

I uses Postman’s test framework to write a small block of JavaScript which stores these values as environment values inside I can then use later:

var jsonData = JSON.parse(responseBody);
postman.setEnvironmentVariable(“lat”, jsonData.results[0].geometry.location.lat);
postman.setEnvironmentVariable(“lng”, jsonData.results[0].geometry.location.lng);

Having run this request the environment variables look like:

image

Finding The Nearest Station

So far we have converted the address into a latitude and longitude. The next step is to fund the nearest points of interest on the map. Google has another REST API to return information round a GPS co-ordinate – their places API. Again it requires the same API key as above. For this case, the Place Search will allow you to fin a list of the nearest stations.

The base URI is:

https://maps.googleapis.com/maps/api/place/nearbysearch/output?parameters

You can choose between JSON and XML as the output format (JSON is simpler to deal with so would recommend that). Likewise as before you must provide an API as the key argument. Unlike the geocoding API, it only takes a location as longitude, latitude. Finally, you must specify to either rank the results by distance (rankby=distance) or to search within a specify radius (radius=10000 to search within 10km). If you are using the ranking mode, you are also then required to provide either a type, keyword or name to search for.

To find the nearest station, I want to rank the results by distance from the GPS co-ordinate and specify that I am looking for a station. The type filter allows you specify one of the types listed here – for this case I wanted subway_station. The Uri using Postman’s variables is hence:

https://maps.googleapis.com/maps/api/place/nearbysearch/json?key={{ApiKey}}&location={{lat}},{{lng}}&rankby=distance&type=subway_station

You will receive back a JSON structure like:

{
“html_attributions”: [],
“next_page_token”: “CpQCBQEAAHI5uyVLuCCs3K37OvmGVgCNtscQ8uxa3xKJPNQ9mrFMxxwH0OxejgPfOyqZCNhrc6lidxbNyOjklu7fp71rBvbjqWazd2wfKG19ktEp6_QWuCaJPbYLRzpsJNNdiELV73BBw0oYWtHioJbNjNqPC6YsUrP_ByQXPBhiYN8VNI54ZJfHb5bwLSYuVBDwiJ9j4Vjf1vEqnPYnjyLr0Fmo0HDfJrDODQW2R7tgVau58EhpThj23Y-EYaC0ssTjsg3lltqgce1ujxb3Z6lf-TCXJAM9xs8tsBSA-Q8InQjxTbMHqUa5DvDUSi4l0-uHejdm1H6pEgYjdTJmRSnGhX-Z_H3ROBsGuWRPA7dDZ9LFxVZcEhBGeaTGsXmvk17sfFidGPCdGhTUCE975QsCFCutIUZJHXLkAZavdA”,
“results”: [
{
“geometry”: {
“location”: {
“lat”: 54.977481,
“lng”: -1.6138929
}
},
“icon”: “https://maps.gstatic.com/mapfiles/place_api/icons/generic_business-71.png”,
“id”: “d5b6ffa31de95679ef807796a9778a8a410a0418”,
“name”: “Haymarket”,
“photos”: [
{
“height”: 3085,
“html_attributions”: [
“<a href=\”https://maps.google.com/maps/contrib/107056009756478253671/photos\”>Bocaj Nesnaj</a>”
],
“photo_reference”: “CoQBcwAAABS6JL5Gh0S6dRdHx01dkPO4OoxXwS2hBRD58yQjYrNubcroTz4x40mY32hEY0nrP1iBZowoP5FhHDM45WdNQiswcUoxE2mkXYxBoXnC99wxHaGJPQ4-pvMc-jCkH18JnjXJfXzv5dAoNtgWav9Pq0_OO3xFTTl8nQg75Ch_SI2oEhCXxxcDj98Bun0BMW2FJGy_GhSHErdUum1PWUpazHkeUJUBfKYrFQ”,
“width”: 4896
}
],
“place_id”: “ChIJDyP2_MtwfkgRDqPu-wmHBMA”,
“reference”: “CmRdAAAA2nyuhb9XS9zptApVcc8D88IWc_zdZ-RuunDdkaVYC42U8n2hEuEF7nO4Ol0CWqzI13Nnt2iCn5Opgr9iHjGsOFCSbURd573RKgJWNaNw2VdKgdKIKXKWk8FTIFy2eFaAEhD76A2tt6zeX9bxOKOPUdw1GhSaGzIqZRA0-ZBQJ8lICKRFhlWYPQ”,
“scope”: “GOOGLE”,
“types”: [
“subway_station”,
“transit_station”,
“point_of_interest”,
“establishment”
],
“vicinity”: “United Kingdom”
}, …
],
“status”: “OK”
}

The list of the nearest 20 subway stations is inside the results array with a name property and a geometry location property. The locations are in increasing distance from the original search location. The table below shows the results for Scott Logic, Newcastle:

name geometry.location.lat geometry.location.lng
Haymarket 54.977481 -1.6138929
Monument Metro Station 54.9736621 -1.6134493
St James 54.97408 -1.62099
Manors 54.97403 -1.60484
Jesmond 54.98308 -1.60548
Central Station 54.96912 -1.61615
West Jesmond 54.9936284 -1.6097669
Gateshead 54.9614035 -1.6038992
Byker 54.97615 -1.57957
Ilford Road 55.00017 -1.61085
Gateshead Stadium 54.95782 -1.58808
Chillingham Road 54.98283 -1.57169
South Gosforth 55.00614 -1.60815
Walkergate 54.98549 -1.55908
Longbenton 55.00881 -1.59158
Regent Centre 55.01172 -1.62204
Felling 54.95298 -1.5713
Four Lane Ends 55.01013 -1.57862
Wansbeck Road 55.0142 -1.63489
Fawdon 55.01374 -1.64462

You can download the Postman collection file I used to run these request here.

Using HERE or Bing Maps

If you prefer not to use Google Maps API then both HERE or Bing Maps provide REST APIs.

HERE has a set developer Rest APIs, including a Geocoding API. In this case you must provide and app_id and an app_code. You can register for those here.  There is a basic plan, which is free, that includes this API thought limited amount per month. The geocode request looks like:

https://geocoder.cit.api.here.com/6.2/geocode.json?app_id={{app_id}}&app_code={{app_code}}&searchtext=Newcastle,UK

Again you can choose between JSON and XML. The JSON response is more complicated than in Google’s case but the matches are found as an array of objects at Response.View.Result. Each object contains a Location with a NavigationPosition with a Latitude and Longitude. I found the search was not as good as interpreting my input as Google’s.

I didn’t go any further with HERE, but there is a Places API which I am guessing would have the ability to find places of interest, such as a subway station. The most interesting feature that I saw within the HERE was the ability to do a batch geocoding job.

Bing has a massive set of APIs you can use. Geocoding is done by the Location service. As with both the other services, you need to obtain an API key. The instructions to get a key are here. Again there is a Basic key, which is free, that will cover the usage need for geocoding within annual limts (125k requests!). The location service can yet again return either JSON (default) or XML (use the output (o) query argument to specify, &o=xml). You can provide what you want to find in the query (q) argument:

http://dev.virtualearth.net/REST/v1/Locations?q=Newcastle,UK&key={{ApiKey}}

The structure of the JSON is pretty similar to the results from HERE. The matching location GPS co-ordinates are listed in the resourceSets array inside resources array with a point objects containing a pair of co-ordinates. Again the structure feels more complicated than Google’s and the search results seemed less in line with what I expected.

Reproducing In Alteryx

As this all started as a question on the community, back to Alteryx. Fortunately, downloading and parsing JSON data is part of it’s bread and butter.

As I don’t like having API keys littering workflows (any more than I do have them in code), I like to put them in as workflow constants. If you are sharing the workflow even this can be a problem, in which case I would put them in a flat file or in the windows registry. The constant can be access by going to the properties tool of the workflow:

2016-07-05_08-29-31

We can use a text input tool to contain the locations we want to search for, and then use a formula tool to construct the URL we need to request to get the GPS co-ordinates. After this the download tool and JSON parser tool can be used to download and read the JSON response. Finally some filtering on parsed result will give you the Latitude and Longitude.

2016-07-05_08-28-57

Basically, we then repeat the above process feeding in the latitude and longitude to create the required URL in another formula tool and then download and parse the result.

You can download the workflow from here.

Adjusting Alteryx Files For Different Versions…

Stealing this idea from Ken Black’s equivalent post on Tableau (sure he won’t mind!).

As we have just have version 10.6 release think worth publishing a little guide on how to make Alteryx compatible with old versions! This comes up on the community a lot when answering questions and a user doesn’t have the latest version of the designer.

There are various different file types Alteryx uses:

yxmd Standard Workflow file
yxmc Macro Workflow (all types)
yxwz Analytic App (wz stands for wizard I believe)
yxzp Workflow Package
yxdb Alteryx Database

The top three are all basically the same format – they are the Xml configuration of the workflow. The first section of the xml looks like:

image

You can also view this within Alteryx Designer in the XML View:

image

The AlteryxDocument node is the same for Workflows, Macros and Analytic Apps. The yxmdVer attribute is the part of the file which tells Alteryx which version this came from. If you want this file to be able to be opened by and older version you can change the number. The table below gives the version number you need for different Alteryx versions:

yxmdVer Alteryx Versions
9.5 9.5
10.0 10, 10.1
10.5 10.5, 10.6

So what are the risks? Well Alteryx themselves certainly don’t guarantee the formatthat a tool serialises its configuration from version to version will be consistent – though it normally is. The main issue is that the universe of tools that exist is continually growing and if it can’t find the tool the designer will add an unknown tool:

image

So that’s the easy ones. What about packages? These are actually just zip files. If you change the extension to zip you can take a look inside:

image

It’s just a case of updating the files inside the zip and then saving renaming it back to yxzp.

Finally, the database files. These have been in my limited Alteryx experience (only used it since about 9.1 I think), a fixed format. Ned has a blog post which links to an open source reader they released on these files, I think as long as you aren’t using a data type that didn’t exist in an older version you are fine!

Update:

Alteryx ACE Mark Frisch (a.k.a. MarqueeCrew) put together a macro to allow you change the version of your workflow. You can download it here.

Creating user controlled groupings in a Weighted Median Alteryx Macro

There was an interesting question posted on the Alteryx community about which macro was best for calculating a weighted median. Adam Riley has a macro inside the CReW macros which does this, but wanted to see if it was possible to create one allowing dynamic selection of the grouping (like in a Multi Row Formula). I thought I would document the process I went through.

All of the examples have been created in v10.5. Everything should work fine in older versions as well, but you will need to edit the yxmd or yxmc file and replace the version number at the top (use 10.0 for both v10 and v10.1).

Initial Computation

image

The calculation itself is fairly straight forward. Taking the Wikipedia page’s example as a starting point.

    1. First Normalise the Weights so they total 1
    2. Sort into ascending Value order
    3. Calculate the Running Total Weight
    4. Filter to those greater than or equal to 0.5
    5. Select the first row and you will have the Weighted Median

This flow can be download here.

Grouped Calculation

The next step was to change it so it computes the weighed median with a grouping. Stealing Adam Riley’s macro sample data to start with. A few fairly straight forward changes and then we are done.

  • Add the grouping to the summarise tool
    image
  • Replace the Append Fields tool with a Join Tool and join on the Group
  • Adjust the grouping in both the Multi Row Formula tool and the Sample Tool
    image

The grouped version can be downloaded here.

Converting To A MacroWeighted Median

Alteryx makes this very simple. Just right click on the text input and select Convert to Macro. Select the macro Input and then specify you want it to have a Field Map. (As a side when making this I noticed the Optional Incoming Connection box – this was added in 10.5 and is brilliant!).

image image

The field map causes the input to have columns renamed. Fortunately Alteryx makes it east to reverse this process. Drag the magnifying glass output to the lightning box on the Select tool and it will automatically add an Action tool reversing the Field Map.

image image

After that need to replace the Browse tool with a Macro Output tool. Finally a little bit of reconfiguring to specify that the output depends on the input and add an Icon.

image

This version of the macro can be downloaded from here.

Adding Dynamic Grouping

So far this has been fairly straight forward. The next step is to make it so we can select different sets and levels of grouping. Alteryx has an interface tool just for this – the List Box. I added a ListBox and connected it’s Question Input to the Macro Input tools Question output. I then specified I wanted it to generate a Custom List with a separator I expect to be unique (§).

image

If the user chose to select nothing then we would have a problem with the Join tool, which needs a join! In order to avoid this I added a formula tool returning a constant of 1 which can be used in all the groupings to avoid this issue.

As I am not returning either the Weight or the Value I removed the Reverse Field Map as expected this was going to just add complexity to the macro.

This is where it gets a lot more fiddly as we are going to be rewriting the Xml of tools (and also a complete pain to test).

Summarize Tool

The next step is to connect the ListBox to the Summarize tool.  Connecting the ListBox Question output to the Summarize tool’s action input automatically adds an Action tool in Update Value mode. Unfortunately this won’t be able to update the summarise tool without a degree of fiddling.

Lets take a look at the Xml config the summarise tool:

<Configuration>
<SummarizeFields>

    <SummarizeField field=“Constant” action=“GroupBy” rename=“Constant” />
    <SummarizeField field=“Group” action=“GroupBy” rename=“Group” />
<SummarizeField field=“Weight” action=“Sum” rename=“Sum_Weight” />
</SummarizeFields>
</Configuration>

The action tool is going to need to rewrite the section inside the SummarizeFields Node and replace it with some dynamic Xml. We know we are going to want to keep the Sum_Weight and the GroupBy Constant but are going to have generate a set of GroupBy entries depending on the values selected. So first we change the Action tool to target the correct Xml node and tell it to update the Inner Xml based on a formula.

image

The formula is the next problem. We have a list of fields with a separator coming out of the ListBox. A little experimenting with the Xml showed me the rename attribute is optional for a SummarizeField. That means we can do some substitution on the separator and produce a valid Xml string for the configuration.

  • We always want to have: <SummarizeField field=“Weight” action=“Sum” rename=“Sum_Weight” />
  • We also always want to have <SummarizeField field=“Constant” action=“GroupBy” rename=“Constant” />
  • If no grouping fields are selected we don’t want to add anything else.
  • Otherwise for every entry in the list box we need <SummarizeField field=FieldName action=“GroupBy” />
    First we add a prefix: <SummarizeField field=
    We can do this by replacing the separator with: action=“GroupBy” /><SummarizeField field=
    We then finally add a suffix: action=“GroupBy” />

The formula we need is:

‘<SummarizeField field=”Constant” action=”GroupBy” rename=”Constant” />’
+ IIF([#1]!=””, ‘<SummarizeField field=”‘ +
Replace([#1], ‘§’, ‘” action=”GroupBy” /><SummarizeField field=”‘)+
‘” action=”GroupBy” />’, “”)
+ ‘<SummarizeField field=”Weight” action=”Sum” rename=”Sum_Weight” />’

Multi Row Formula Tool and Sample Tool

Both of these tools have a similar section of Xml to work on GroupFields in the Sample tool and GroupByFields in the Multi Row Formula Tool. Both of them have a set of Field nodes within them, slightly annoyingly it has a name attribute in the Sample tool and field attribute in the Multi Row Formula Tool. The sample below is from the Sample tool:

<Configuration>
<GroupFields orderChanged=“False”>
<Field name=“Constant” />
<Field name=“Group” />

</GroupFields>
<Mode>First</Mode>
<N>1</N>
</Configuration>

The process is very similar to the SummarizeField above (for the Multi Row Formula you need to replace name with field):

  • We also always want to have <Field name=“Constant” />
  • If no grouping fields are selected we don’t want to add anything else.
  • Otherwise for every entry in the list box we need <Field name=FieldName />
    First we add a prefix: <Field name=
    We can do this by replacing the separator with: /><Field name=
    We then finally add a suffix: />

The formula is hence:

‘<Field name=”Constant” />’ + IIF([#1]!=””, ‘<Field name=”‘ + Replace([#1], ‘§’, ‘” /><Field name=”‘)+ ‘” />’, “”)

Join Tool

Saving the most complicated to last. The Join tool’s Xml looks like:

<Configuration joinByRecordPos=“False”>
<JoinInfo connection=“Left”>
<Field field=“Constant” />
<Field field=“Group” />
</JoinInfo>
  <JoinInfo connection=“Right”>
<Field field=“Constant” />
<Field field=“Group” />
</JoinInfo>
  <SelectConfiguration>
<Configuration outputConnection=“Join”>
<OrderChanged value=“False” />
<SelectFields>
<SelectField field=“Right_Constant” selected=“False” rename=“Right_Constant” />
<SelectField field=“Right_Group” selected=“False” rename=“Right_Group” />
<SelectField field=“*Unknown” selected=“True” />
</SelectFields>
</Configuration>
</SelectConfiguration>
</Configuration>

The JoinInfo nodes are both the same as the Multi Row Formula Tool. So we can just copy the action tools for it and target the JoinInfo nodes. This leaves the section in the SelectFields to do. This is quite similar to the SummarizeFields case above. The rules this time are:

  • We always want to have: <SelectField field=“*Unknown” selected=“True” />
  • We also always want to have <SelectField field=“Right_Constant” selected=“False” rename=“Right_Constant” />
  • If no grouping fields are selected we don’t want to add anything else.
  • Otherwise for every entry in the list box we need <SelectField field=“Right_FieldName selected=“False” />
    First we add a prefix: <SelectField field=
    We can do this by replacing the separator with: selected=“True” /><SelectField field=
    We then finally add a suffix: selected=“True” />

The formula for this final case is:

‘<SelectField field=”Right_Constant” selected=”False” rename=”Right_Constant” />’
+ IIF([#1]!=””,
‘<SelectField field=”Right_’ +
Replace([#1], ‘§’, ‘” selected=”False” /><SelectField field=”Right_’)+
‘” selected=”False” />’,
“”)
+ ‘<SelectField field=”*Unknown” selected=”True” />’

End Result

The final macro can be downloaded here. There is a simple test workflow here. The User sees:

image

and the end result is a table containing the specified grouping and the calculated Median values:

image

Alteryx Formula Add Ins (v1.1)

FormulaAddInsVersion 1.1 of my Alteryx formula add ins project has been released. These provide a set of utility functions for use wherever a formula can be used within Alteryx. The functions are developed in C++ and XML using the Alteryx SDK – I will write a few posts with the details of how to create a new function and how the unit test framework works in the next couple of weeks.

Download the zip file from here, and the function reference guide here. The source code can be downloaded directly from GitHub here.

Goal of the Project

The idea of this project is to:

  • Provide a set of useful functions to make data preparation and analysis simpler and easier.

This project started to make string filtering easier within version 9 with the creation of three simple XML based functions:

These are now part of version 10, and commented out in this project

    • StartsWith
    • EndsWith
    • Contains.

Over time, have expanded functions based on issues that come up in the community, particularly around handling dates.

  • Increase exposure to the formula add in SDK.

Alteryx provides a great SDK for creating both custom tools and new formula. Part of the goals of this project is to create a set of sample XML and C++ functions people could use as basis to create their own.

Installation

Alteryx looks for XML files in a folder called FormulaAddIn within the RuntimeData folder inside the bin directory of the Alteryx install. For an admin install, this will default to be: C:\Program Files\Alteryx\bin\RuntimeData.

There is an Install.bat script which will copy all the XML files and the DLL files into this folder. As Windows UAC protects this folder if within Program Files, the install script will ask for permission to install. The script will also unblock all files, as Windows will possibly have blocked them as they were downloaded from the internet.

If you have a non-admin install of Alteryx, it is possible the install script won’t be able to find the Alteryx folder. In this case or if the bat file fails you will need to copy the files manually.

To do this, copy all the XML and DLL files to %AlteryxBinDirectory%\RuntimeData\FormulaAddIn.

Function List

MiscUtils

A few additional general methods for data preparation.

  • IfNull: If first value is null return second
  • Coalesce: Given list of values return first non null value (C++)

MathUtils

Math based functions that I haven’t got a better home for!

  • Modulo: General Double Based Modulo function
  • HexBinX: Given an X,Y point and optional radius, get X co-ordinate of hexagonal bin’s centre
  • HexBinY: Given an X,Y point and optional radius get Y co-ordinate of hexagonal bin’s centre
  • Rand_Triangular: Given a uniform random number transform into a triangular distributed random
  • Avg: Average of a list of values ignoring NULL
  • Count: Count of a list of values ignoring NULL
  • Sum: Sum of a list of values ignoring NULL

DateUtils

Some additional functions for working with Dates and to a certain extent Times.

Date Time Creation Functions (Currently Just Concatenation)
  • MakeDate: Create a new date from Year, Month, Day
  • MakeTime: Create a new time from Hour, Minute, Second
  • MakeDateTime: Create a new DateTime from Year, Month, Day, Hour, Minute, Second
  • ToDate: Truncate a DateTime to a Date
  • ToDateTime: Appends midnight to a Date to create a DateTime
Date Part Functions (returns numbers)
  • Day: Get the day of the Month [1-31]
  • Month: Gets the month number [1-12]
  • Year: Gets the four digit year
  • WeekDay: Gets the day of the week [Sunday (0) through to Saturday (6)]
  • Quarter: Gets the quarter of the date [1-4]
  • OrdinalDay: Gets the day of the year [1-366]
Period Start and End
Other Date Functions
  • DateAdd: Equivalent to DateTimeAdd but returning a Date
  • BusinessDays: Number of weekdays between two dates
  • IsLeapYear: Is a year a leap year
    (takes a year not a date as an input use IsLearYear(Year([Date]))

StringUtils

A few extension methods for working with strings.

  • LeftPart: Gets the text before the first instance of a separator
  • RightPart: Gets the text after the first instance of a separator
  • Split: Splits a string into tokens and then returns the specified instance

Retired Functions (built into v10)

V1.1 Release Notes

  • Full set of unit tests for all the functions.
  • Wiki now contains a function reference for all the functions.
  • Improved Install.bat to pick up all xml and dll files.
  • New functions: AVG, COUNT, SUM, MAKETIME, MAKEDATETIME, TODATE, TODATETIME, DATEADD
  • HEXBINX and HEXBINY moved to C++ functions
  • Bug Fix: LEFTPART and RIGHTPART missing bracket.
  • Bug Fix: RIGHTPART return whole string if delimiter not found.
  • Bug Fix: MODULO deal with NULL or 0 for divisor.

Musings from a developer and father, on this that and everything…

%d bloggers like this: