Leveraging R to perform Sentiment Analysis in Alteryx
Before we start, my colleague Paul Houghton has already produced a great blog on performing sentiment analysis via the MS Cognitive Services and via a dictionary, this blog is slightly different, in that I will look at how we can leverage the ‘code friendly’ aspect of the Alteryx Platform, through the R tool, to perform sentiment analysis.
A couple of my Colleagues, Nils Macher and Soha Elghany looked at, and developed this concept whilst working on a client engagement, I have since ‘ripped it off’, and created an Analytical Application which is hosted in the Alteryx Public Gallery, that allows users to score there own files for sentiment (note, the application is pending approval as it uses the ‘R’ tool to perform the analysis (as mentioned several times already :)))
This post by Gwilym Lockwood (another colleague), also gives a great example of applying sentiment analysis, in this case to the messages he shares in his relationship with his partner!
So tell me more about the syuzhet package…
In order to perform our sentiment analysis in Alteryx, using the R tool, we need a library to use; in this case, we will be using the ‘syuzhet‘ library.
This library is not installed by default with Alteryx’s R installer, so point 1 is, install the package, which you can do by following the guidelines in this post.
The syuzhet library allows for an array of different methodologies to perform sentiment analysis, which make use of different ‘dictionaries’ to return the sentiment score.
This resource gives a lot more detail than I can ever give, with much greater wisdom on the subject area too!
So how can I leverage this package in Alteryx?
Well, it’s simple(ish) really; first, we must contain our data table to the R tool; we must then read our data into R by using the ‘read.Alteryx’ function.
At the beginning of our code, we should also load the syuzhet library…
##load the syuzhet package
tab <-read.Alteryx(“#1″, mode=”data.frame”)
##read our single column table into Alteryx as a dataframe
Now, in order to use this package, the field we wish to score must be a character vector.
tab$text <- as.character(tab$text)
##convert our text field to be a character vector
The next step is to score our text column for sentiment, which can be done by using the get_sentiment() function; in this example, I have scored sentiment using each of the different methodologies available in this library.
tab$afinn = get_sentiment(tab$text, method=”afinn”)
tab$bing = get_sentiment(tab$text, method=”bing”)
tab$nrc = get_sentiment(tab$text, method=”nrc”)
tab$syuzhet = get_sentiment(tab$text, method=”syuzhet”)
##score our text field for sentiment using the 4 different methodologies
Finally, we can output our dataframe, titled tab, to the ‘1’ output anchor…
##write our dataframe out of the R tool so it can be transformed further using standard Alteryx tools
This initial piece of code will provide us with an output along the lines of that shown in the image below
You’ll see how each of the different methodologies have different scales and different results based on it’s dictionaries interpretation of the sentence.
I would advise that you select the methodology in advance of writing your code, as otherwise you may select the methodology that best represents your expectations; a level of selection bias if you will.
With the nrc methodology it is possible to supplement your text streams further, and view exactly what emotions were identified in the text string; we can then write this out into our workflow as a separate output.
emotions <- get_nrc_sentiment(tab$text)
##get the individual emotion scores for each text string for the nrc methodology
##write out the emotion scores to the 2nd output node
Once we have generated this 2nd table, we can then merge the two data streams outside of the R tool using the standard join tool (if you are proficient in R, of course you could do this inside, but I prefer to do any tasks that are possible with the standard Alteryx tools, with the standard Alteryx tools).
The output of this additional emotions table is shown below…
You can see how the second sentence contains only negative emotions, such as anger, disgust, fear and sadness; this allows you to take your analysis a step deeper and identify possible micro-trends in peoples interactions with your business.
And that is how you can use R and Alteryx to perform sentiment analysis on your data; simple right!
If you want the workflow shown in this example, then just download the app, and start taking it apart and embedding in your own workflows and macros!