Cache data in a concrete step in Alteryx 2018.3
Probably one of the most demanded functionalities in the Alteryx community: “I want to save or cache the data at certain point of my workflow so I don’t have to run it completely again when I fix a small thing.” Especially useful when you have some concrete processes that require more run time like complicate joins, clustering, data cleansing, spatial match, etc.
The good news is that you can do now this from Alteryx version 2018.3!
How does it work?
Very simple. Just right click in the tool you want to cache the data, and select the option “Cache and Run Workflow”.
Some things will happen then. As you can imagine, the workflow will be executed, so be careful if you have some outputs and you don’t want to overwrite any data. Once the workflow finishes, you will see something new: tools before the caching appear inside a grey “bubble” and the tool we have selected to cache the data appears inside a blue bubble.
If you now continue creating your workflow and run it again after one or more steps, you will notice it takes less time to run, because Alteryx doesn’t execute the entire workflow again as it used to do. Now, because the data is cached until the Data Cleansing tool (in our example) the workflow will start running from that point.
Things to have in mind when caching
First, if you change the configuration of the tool cached or anyone before, the cache will be cleared and you need to cache it again if needed. Alteryx has make sure you are aware of this and that you don’t do any configuration changes by mistake adding a note when you click on any of the tools.
So you can click “Continue” as shown in the image above and see the configuration of the tools. But as mentioned, any change will clean the cache. Important to remember that viewing the tool’s configuration doesn’t clear the cache so don’t be afraid of clicking the “continue” button.
Secondly, and probably more important, this only works while you are working in your workflow. Caches are cleared when a workflow is closed. So don’t expect your data to be cached until a certain point if you close the workflow and open it later or the next day. That will not work, and you will need to cache it again. But still is an amazing feature to have and use during development of big data sets or compute-intensive tools like joins, spatial match, data cleansing, sorting, etc.