This post is the first of a set of posts aiming to demonstrate different ways to build the same Custom Tool. The goal is to create a tool which does the same as the RecordID tool but supports Grouping and Sorting. While this isn’t a complicated thing to do within Alteryx, it will serve as a reasonable demonstration of the different options for building tools. The plan is to cover:
- Standard Macro with Interface tools packaged as YXI
This first post looks at creating a workflow as a test case. And then creating a macro to do it more generally and then finally packaging this macro as a YXI file.
All the code and the processed data for this post is on GitHub at https://github.com/jdunkerley/AlteryxRecordID.
The RecordID UI looks like:
The goal of the macro is to have all of this functionality:
- Choice of Field Name
- Choice of Initial Value
- Choice of Type (Int16, Int32, Int64, and String)
- Choice of Position (First or Last)
But also to add:
- Grouping Columns (the count restarts every time you change grouping)
- Sorting Columns (columns the data is sorted by within the groups)
Initial Test Workflow and Data
For a test data set, I chose to use house prices within the UK from 2016 as I wanted a reasonably large dataset (about a 1 million records) to work with. The git repository accompanying this post has the workflow used to extract the data and the result yxdb used for testing. The cleansed data looks like:
Contains HM Land Registry data © Crown copyright and database right 2017. This data is licensed under the Open Government Licence v3.0.
- Transaction unique identifier (String): A reference number which is generated automatically recording each published sale. The number is unique and will change each time a sale is recorded.
- Price (Int32): Sale price stated on the transfer deed.
- Date of Transfer (Date): Date when the sale was completed, as stated on the transfer deed.
- Postcode (String): This is the postcode used at the time of the original transaction.
- PostCodeArea (String)
- PostCodeDistrict (String)
- PostCodeSector (Byte)
- PostCodeUnit (String)
- Property Type (String): D = Detached, S = Semi-Detached, T = Terraced, F = Flats/Maisonettes, O = Other
- NewBuild (Bool): Is it a Newly Built Property
- Duration (String): F = Freehold, L= Leasehold
- PAON (V_String): Primary Addressable Object Name. Typically the house number or name
- SAON (V_String): Secondary Addressable Object Name.
- Street (V_String)
- Locality (V_String)
- Town/City (V_String)
- District (V_String)
- County (V_String)
- PPDCategory Type (String)
The goal is to create a RecordID, grouped by County and the PostCodeArea (i.e., resetting as you cross each group) and then sorted by Price descending. Fairly straightforward as a specific workflow:
In this workflow, I include the grouping fields in the Sort as well, which means the data is sorted only once. In Alteryx, anytime there is a grouping an implicit Sort is added to the workflow to create the grouped data. As Alteryx attaches metadata, it is aware of the way the data is sorted, so it is very efficient if the grouping matches the sort.
As an aside – a tale of premature optimisation! As I include a sort tool before the multi-row formula, I wondered if it was possible to remove the grouping to improve performance. First some experimenting with the sorting:
Sort (Area, County, Price (desc)) matching with the grouping (Area, County):
Input Data (2) 885589 records were read from "C:\Repos\AlteryxRecordID\2016PPData.yxdb" Browse (5) 885589 records Sort (3) Profile Time: 2476.86ms, 59.74% Multi-Row Formula (4) Profile Time: 956.73ms, 23.08% Input Data (2) Profile Time: 507.26ms, 12.24% Browse (5) Profile Time: 205.04ms, 4.95%
If we make the sort not include the grouping fields, the process is then considerably slower. This makes sense as effectively Alteryx has to perform two complete sort operations on the dataset:
Sort (Price (desc)) matching with the grouping (Area, County):
Input Data (2) 885589 records were read from "C:\Repos\AlteryxRecordID\2016PPData.yxdb" Browse (5) 885589 records Sort (3) Profile Time: 3077.87ms, 48.51% Multi-Row Formula (4) Profile Time: 2106.16ms, 33.19% Input Data (2) Profile Time: 769.76ms, 12.13% Browse (5) Profile Time: 391.12ms, 6.16%
Even a slight mismatch (e.g., County – descending) in the sort tool forces Alteryx to do a sort with impact on the performance.
Next, I wondered, if we can take this even further. If we do the sort explicitly then we can remove the grouping from the Multi-Row and adjust the formula to look for a break in a group:
IIF([Row-1:PostcodeArea] == [PostcodeArea] AND [Row-1:County]==[County],[Row-1:RecordID],0)+1
The odd thing here was this resulted in a much slower result than the original case:
Input Data (2) 885589 records were read from "C:\Repos\AlteryxRecordID\2016PPData.yxdb" Browse (5) 885589 records Sort (3) Profile Time: 2851.47ms, 48.18% Multi-Row Formula (4) Profile Time: 2067.09ms, 34.93% Input Data (2) Profile Time: 771.59ms, 13.04% Browse (5) Profile Time: 227.68ms, 3.85%
I guess that Alteryx parallelizes the processing of groups across multiple threads allowing for quicker net execution than the expression above which is forced into single threaded evaluation across the whole dataset.
Anyway back to the main point of this post.
Creating the Macro
The first step is to create a macro which recreates a RecordID tool without the grouping.
There is quite a lot going on in here:
- The Multi-Field tool creates a new field with the RecordID called
- The Detour tool is used to move the new field to the beginning or the end.
- The two Select tools are used to rename the fields and set the type of the new fields.
- Finally, a Multi-Field tool is used to pad the value if a
WStringis the requested format. The formula is updated by an action tool with the following expression:
IIF( [#1] IN ("String", "WString"), "PadLeft(TRIM([_CurrentField_]),[#1],'0')", "[_CurrentField_]")
At this point, the macro covers most of the functionality of the RecordID tool but lacks the initial value. To add this need to add a Numeric Up-Down control and then drag the question output (the magnifying glass) to the question input on the Multi-Row formula. This makes the value of the control available as a constant in the expression ([#1]). The new formula becomes:
Next to add the grouping. The List Box control allows for you to connect the input and select a set of fields from it. To do this connect the question of the Macro Input to the List Box, and then it should set itself up. Unfortunately to control the grouping we need to do a little more work. Looking at the XML in the sample workflow shows what the configuration needs to look like:
<GroupByFields> <Field field="PostCodeArea" /> <Field field="County" /> </GroupByFields>
We can do this within an Action tool. In the List Box tool switch it to
Generate Custom List. Next, drag the output from the List Box to the Action input of the Multi-Row formula tool. This will automatically create an Action tool. To configure the Action tool choose the action
Update Raw XML with Formula and select the
GroupByFields node in the XML element to update tree view. The formula is a little fiddly:
IIF( IsEmpty([#1]) OR [#1]='""', "<GroupByFields />", '<GroupByFields><Field field=' + Replace([#1], '","', '"/><Field field="') + '/></GroupByFields>')
This takes the custom list and converts it to XML. It checks for the list being empty. Oddly in my testing, Alteryx passes a string of
"" rather than an empty or
NULL string. I found it easier to target the Outer XML and include the
GroupByFields, but I think it would work just as well if targetting the inner XML.
A brief aside on testing these expressions. It’s not easy! The Debug view doesn’t show the column names in the List Box so you can’t pick and choose. The best way I have found is to add a Message tool to your macro and make the Action tool update this and build the expression up piece by piece. Ned added a Message function to the function library in version 11.5 which is fantastic, but the message written out during the Action evaluation does not appear to be captured.
Next, we need to add a Sort tool to the Macro. Unlike the Multi-Row tool, we need to have a sort applied for the tool to be correctly configured. To work around this, I added a formula tool to create a
__placeholder__ column equal to a boolean of true. The Sort can then default on this. Finally, a Select tool is used to remove the extra column.
Looking at the XML of the Sort tool:
<SortInfo locale="0"> <Field field="__PlaceHolder__" order="Ascending" /> </SortInfo>
Let’s start by adding in the Grouping Fields. This is similar to adding the grouping to the Multi-Row tool. The only difference is I chose to replace Inner XML rather than the Outer XML as there must always be a sorted field.
IIF( IsEmpty([#1]) OR [#1]='""', '<Field field="__PlaceHolder__" order="Ascending" />', '<Field field=' + Replace([#1], '","', '" order="Ascending" /><Field field="') + ' order="Ascending" />')
Next, we need to add sorting. The first part is the same as grouping fields. Ideally, I don’t want to be able to sort by a grouped field, but the built-in UI won’t let me build that so will live without. Also, I can’t control the order with the List Box. Add a List Box and connect it up to the Action already linked to the Sort tool. This will come in as Input 2. The formula needs some tweaking:
IIF( (IsEmpty([#1]) OR [#1]='""') AND (IsEmpty([#2]) OR [#1]='""'), '<Field field="__PlaceHolder__" order="Ascending" />', IIF(IsEmpty([#1]) OR [#1]='""', "", '<Field field=' + Replace([#1], '","', '" order="Ascending" /><Field field="') + ' order="Ascending" />') + IIF(IsEmpty([#2]) OR [#2]='""', "", '<Field field=' + Replace([#2], '","', '" order="Ascending" /><Field field="') + ' order="Ascending" />') )
Finally, I want to be able to choose some fields to be descending. While not the perfect UI I again decided to use a List Box. This time the goal is to construct a regular expression we can use to update the Ascending to Descending. To help with this, I chose to make the List Box use a | separator. The expression becomes:
IIF( (IsEmpty([#1]) OR [#1]='""') AND (IsEmpty([#2]) OR [#2]='""'), '<Field field="__PlaceHolder__" order="Ascending" />', IIF(IsEmpty([#1]) OR [#1]='""', "", '<Field field=' + Replace([#1], '","', '" order="Ascending" /><Field field="') + ' order="Ascending" />') + IIF( IsEmpty([#2]) OR [#2]='""', "", REGEX_Replace( '<Field field=' + Replace([#2], '","', '" order="Ascending" /><Field field="') + ' order="Ascending" />', '<Field field=(' + [#3] + ') order="Ascending" />', '<Field field=$1 order="Descending" />')))
It’s not perfect, but I think it should work as long as the field name doesn’t contain single quotes. The final macro looks like:
The final steps are to add the metadata, image and check the UI layout. The screenshot below shows the configuration and layout I used:
Testing the macro
Having built the macro, I wanted some tests for it. First, I tested on a small set of data testing a few variations. The table below shows my small test set:
I wanted to test just creating a record id, creating a grouped record id (by Parent and Key) and creating a grouped and sorted id (group by Parent, sort by Key Ascending, Value Descending). I also wanted to check the different field types were handled successfully. This ended up being 6 cases.
To make testing a little quicker, I use a macro developed as part of the OmniBus and Abacus projects. The ResultCompare macro takes and Expected dataset and an Actual result as inputs. It compares the field names, positions, and types. If there are any discrepancies, it reports these as a warning. It then examines all the data in inputs comparing row to row, column by column. The current version is available at https://github.com/jdunkerley/AlteryxAddIns/blob/master/Test%20Workflows/ResultCompare.yxmc.
The final test workflow looks:
Finally, I tested with the large data set:
In this case, I just compared the resulting RecordID. I could have used the Result Compare again but as the data set is large kept it simple and stupid!
Packaging the Macro
I wanted to make it a little more automated so time to turn to PowerShell. Looking inside the XML of the macro all the details we need are inside:
<?xml version="1.0"?> <AlteryxDocument yxmdVer="11.3"> <Properties> ... <MetaInfo> <NameIsFileName value="False" /> <Name>Grouped RecordID</Name> <Description>Macro creating a RecordID with Grouping and Sorting.</Description> <RootToolName /> <ToolVersion>1.0</ToolVersion> <ToolInDb value="False" /> <CategoryName>Preparation</CategoryName> <SearchTags>unique, identifier, data,preparation category, recordid, order, arrange, group</SearchTags> <Author>James Dunkerley</Author> <Company /> <Copyright>Copyright (c) 2017 James Dunkerley (MIT license)</Copyright> <DescriptionLink actual="https://github.com/jdunkerley/AlteryxAddIns/wiki/Grouped-Record-ID" displayed="https://github.com/jdunkerley/AlteryxAddIns/wiki/Grouped-Record-ID" /> <Example> <Description /> <File /> </Example> </MetaInfo> ... <RunTimeProperties> ... <MacroImage><!-- Base64 Encoded PNG Image File //--></MacroImage> ... </RunTimeProperties> </Properties>
It’s a fairly straightforward process using PowerShell to convert this into a YXI file. I created a script which does the following:
- Create the
Config.xmlfile holding the XML metadata. The result looks like:
- Extract the macro image file, decode the Base64 into a PNG file and save as
- Move any used Macro into a subfolder and reference. This is a recursive process so that it will walk down layer after layer.
Currently, the script only embeds contained macros. I am planning to deal with data files as well but haven’t added that yet. I also experimented with including an example workflow into the Macro. While you can add entries into the XML for this by hand (and the createYXI will include the file), as of 11.5 the path needs to be inside the
Examples folder of Alteryx install directory, so I haven’t taken this any further.
The current version of the PowerShell script can be downloaded from GitHub.
The final packaged macro file structure looks like (rename the yxi to zip to view inside):
├── GroupedRecordID │ └── GroupedRecordID.yxmc ├── Config.xml └── logo.png
To make it quicker to run I also added a batch file which calls the create script:
PowerShell -C "../Scripts/CreateMacroYXI.ps1 ./GroupedRecordID.yxmc"
Running the YXI pop’s up to the installer and then adds the tool to the Preparation category:
If you wish to publish this to the Alteryx Gallery, Peter’s post has a clear set of steps of how to do this.
The completed YXI file can be downloaded from GitHub.
While not a perfect UI the macro will do everything I want. It works on the large scale and is simple to distribute as a YXI file.
**To download files from GitHub right click on the Raw link and choose
Save Link As ...
- Sample Workflow
- Macro Test Workflow
- Macro Sample Workflow
- YXI File
- CreateYXI script
- Git Hub Repository
- Source Code as of this Post
- Peter’s YXI Post