Integrating R in Alteryx
The integration of R in Alteryx allows predictive analysis to happen in a more accessible for with ease of use. A key thing to remember when working with creating a predictive model is that every modelling paradigm in R has a predict function.
The image below is what the R tool in Alteryx looks like:
The R tool is a code editor for users. The R tool accepts multiple inputs and labels them in order of connection as #1, #2, and so on. The tool outputs up to 5 data streams from its anchors, labeled 1 through 5.
#1 refers to any input connection label and #5 refers to any output anchor.
To simply start off: how do I read my data using the R tool?
The most key thing you’ll learn with the R tool is reading and writing out data this is done differently to how you write R in R Studio, unlike R Studio data is read using the read.Alteryx() and write.Alteryx() functions.Thi
df <- read.Alteryx(“#1″, mode=”data.frame”)
If you want to access your data and see it in the output you have to remember to write: write.Alteryx()
When we have input data into the R Tool and don’t attach any outputs. As an example, let’s use the function names, which lists all the column headers for a data frame; this will enable us to know what reference to use the different columns of data individually.
Why is reading and outputting data in R different in Alteryx?
R is being used within the Alteryx software, for Alteryx and R to be used together, there needs to be data passed between Alteryx and R. This happens through the package Alteryx developed called AlteryxRDataX. This adds an advantage to Alteryx users because it allows them to benefit from all the R community by installing R packages, this increases Alteryx functionality. A lot of these functions can be auto-populated from a drop-down menu in the R Tool configuration.
There is a great tool you can install built by Dan Putler that allows you to install R packages to access within the R tool: https://bit.ly/2HkW3kJ
Using a dataset which is just names of boys and girls as Input “#1”, I read my data and I want to find out what my column headers are:
My.Dataset <- read.Alteryx(“#1″, mode=”data.frame”)
Because I haven’t written write.Alteryx(), my R Tool doesn’t produce anything in any of its 5 outputs. Instead, it lists what would be found in the R command window in the “Messages” section of the Configuration panel.
This also specifies which version of R the R tool is working on, this is important because some packages can’t be installed in an older / newer version of R which means you can’t access them in your library().
In Alteryx you will be used to using the Select Tool which not only selects the fields that you want but it also defines the data type of that field, however, this differs in R.
In general, there are two data types that you will want to use within the R Tool:
- Dimensions, which R calls factors
- Measures, which R calls numerics.
A key thing to be aware of is that when reading string data from Alteryx, R can sometimes misclassify these data as character type. For this reason, you will want to check each column’s data types with this simple line of code:
sapply: applies the function class to all of the columns of the specified dataset if you do want to convert a data type to something different it recommended to do that before in Alteryx instead of trying to do it in R tool, one of the biggest reasons for this is that it’s easier. If R is incorrectly assigning the wrong data type then you can use as.factor to force the conversion within your code in the R tool.
It’s also key to note that all of the predictive tools in Alteryx are built using the R tools, this allows for accessible predictive analysis without the need of knowing how to code but also it makes predictive analysis too accessible to people who don’t understand the maths underlie predictive analysis, which is why it’s crucial to first understand how predictive analysis works before making decisions based on the output.