Alteryx – Download, unzip and read data from an online archive
Sometimes the data you want is sitting in a compressed archive on the Internet. In Alteryx, the process of downloading the archive, unzipping it, and reading its contents in a single workflow can be tricky. Here’s a quick blog to show you how it can be done.
We’ll be working with a .zip archive I’ve uploaded to my Github. It’s public facing, so feel free to practice with this file. Inside the archive is a single .csv file containing some dummy data. Our objectives are as follows:
- Download .zip file to Alteryx temporary directory using the Download tool
- Write simple .bat file to unzip the .zip file
- Run the .bat file using the Run Command tool
- Read in the unzipped data using a Dynamic Select tool
Step 1 – Download .zip from Github
In the first container we have a Text Input tool with the URL to the location of the .zip file on Github. Note that we stick ‘?raw=true’ on the end of the URL – this is Github specific, and makes sure we are just downloading the contents of the file and not the Github interface. Next we want to specify, in a formula tool, where on our machines we would like to download the .zip file to. In Alteryx, you can reference the temporary Alteryx directory with the [Engine.TempFilePath] variable. Note that this directory changes every time you run Alteryx, and is deleted once the session is finished. In the Download tool, we specify that we want to download the link in our [URL] field. Additionally, we need to download “To a File”, and we can specify the filename from our [DownloadLocation] field.
If you run this section of the workflow, and check your temporary directory, you should see the Archived_Data.zip file has appeared on our computer. So far so good.
We’re going to separate the next three steps with a Block Until Done tool to ensure that they are performed in a consecutive order.
Step 2 – Write a .bat script
Now we need to unzip the file. To do that we’re going to write something called a .bat script, again to the temporary directory, which can be executed by the Run Command tool in Step 3 to unzip the file. We are going to write this in a new field using a Formula tool.
The structure of the .bat file should be like so:
powershell -command "Expand-Archive 'C:\PathToAlteryxTemporaryDirectory\Archived_Data.zip' 'C:\PathToAlteryxTemporaryDirectory\'"
This is using the ‘Expand-Archive’ function from Powershell, to unzip the ‘Archived_Data.zip’ file to the temporary Alteryx directory. Note that this only works for .zip files, and might not work for more niche archive types. For example, it doesn’t work on Tableau packaged workbook files. In the case that you want to expand a more niche archive, I would use something like 7zip instead of the powershell command. The downside is that 7zip is not a native windows program, and whoever runs this workflow would need to have it downloaded to their machine.
Here’s how I constructed the script in the formula tool:
It can be tricky to get right with all of the quote marks – remember that if you want to string out a quote mark (“) you should wrap it in inverted commas (‘), and vice versa.
You can name the field whatever you like, but prefix it with a comment symbol ‘::’. This will ensure that the field name itself is not executed when we run the script.
When you’ve got it right, stick a Select tool after it to only keep our new .bat script field. We then need to write the .bat file to our temporary directory. In input/output tools you can specify the temporary directory like so – %Engine.TempFilePath%. Also make sure you set the delimiter to ‘\0’ (no delimiter) and that ‘Quote Output Fields’ is set to ‘Never’. We have enough quote marks in our script already, I think.
Run this and check the temporary directory again. You should see the UnzipArchive.bat file in there. Open it in a text editor to check that the structure is correct and if so, move onto the next section.
Step 3 – Run .bat script to unzip archive
The next branch off our Block Until Done tool is connected to the Run Command tool. I find this to be a bit of a weird tool. It makes you specify an output and I have no idea why. My feelings about this tool aside, we can make it work for this purpose.
In the Output field, write to a dummy .yxdb file in the temp directory – we’re never going to open this. In the Command section, point towards our newly written .bat file in the temp directory. Set the Working Directory to be the temp directory as well, just in case.
Step 4 – Read in the unzipped .csv data file
The last step is to read in the contents of that archive we just unzipped. For this we can use a Dynamic Input tool. The annoying thing about this tool is that you need to specify a template file that actually exists, so point that to any .csv file on your computer, or create a dummy .csv file for this purpose and stick it in the same folder as your workflow – you can then reference it with a relative path “.\MyDummyFile.csv”. In this example I’ve used the Data_In_Archive.csv that we just extracted from the .zip archive in the temp directory.
We need to tell the Dynamic Input tool what file we would like to read in. I’ve specified all .csv files “*.csv” in the temp directory. This would read in any .csv files that were extracted from the archive – in this case we know there’s only one, but there could well be more in other examples.
Make sure the Action field of the Dynamic Input tool is set to “Change Entire File Path”.
That should do it. It’s quite a few steps for such a simple task but you could easily macro it up if you’re going to use it again and again. Here’s an image of the full workflow, which you can download from the same Github repository as the example .zip file.