What is R
R is a very popular data programing language. R is especially used in data analysis, statistics calculations, predictions, data mining and machine learning. R is used by data scientist, economist, genetic scientists, and statisticians. R has very wide usage in real life. Healthcare, finance, marketing, and manufacturing are some of them.
In short, R is/has:
- open source
- flexible for statistics calculation
- huge chart ability
- suitable for big data analyses
- a large community
- the ability to integrate with many programs (Microsoft SQL Server , SAP HANA Oracle vs.)
The capabilities of the R program improve with additional packages. You can easily find or develop additional libraries for R.
What is a “Word cloud”
A “Word cloud” is a technique that allows us to focus on the most repetitive words in a text, article or any word sequence. Using this technique, we can analyze keywords in a word sequence.
To give a simple example, if we analyze product comments of an online shopping website, we can find most common words. These words can tell us what our customers think about this product. Now we will make our article word cloud. As shown in below, our article keywords are “SQL” and “server” if we combine the two words, our article is about probably about “SQL Server”.
Microsoft BI and SQL Server R Services
Microsoft is one of the key players in Business Intelligence market. Gartner positions Microsoft as a leader in Business Intelligence and analytics platforms. Microsoft Business Intelligence platform includes a lot of capable tools (Power BI, Power View, Reporting Service, Microsoft SQL Server Mobile Report, Analysis Services and Integration Services).
Specifically, Microsoft has developed some very strong features for SQL Server. The first come to mind include columnstore index, R script integration, Python integration and graph database. Azure Machine Learning is an impressive tool for predictive analytics. PolyBase allows us to process data outside of the database via T-SQL. With PolyBase we can access Hadoop data. When we look at all these features, the Microsoft BI stack without a doubt, very impressive.
In SQL Server 2016, Microsoft added support for the R language. This capability opens the door to the magical world or R script. With this feature, SQL Server users can now easily calculate using advanced statistical data and predictions.
Let’s give a real-life example now. You are working with a dataset and you are looking for the relationship between two numerical columns. One of the easiest ways to find this relationship is a linear regression (linear regression is a statistical method that summarizes the relationship between numerical data). Now let’s see how we can find this statistical relationship. We are passing dataset and columns to the “sp_execute_external_script” procedure. In R script we are using the linearMod function and then we are getting summary of linear. In the example below we are searching relations between UnitPrice and order quantity.
Now, in this article we will create word cloud in SQL Server Report Builder. If you want you can deploy this report to SQL Server Reporting Service.
Requirements
-
SQL Server 2016 or above (R Services must installed)
Set up SQL Server R Services (In-Database) -
SQL Server 2016 Report Builder
Download Microsoft SQL Server 2016 Report Builder
Getting started
We will perform these steps to show our word cloud on SQL Server Report Builder;
- Enable external scripts option
- Test R script SQL Server Services
- Install R script packages
- Create word cloud script on R with SQL Server Management Studio
- Connect R word cloud script to SQL Server Report Builder
Enable external scripts option
In SSMS you will need to check configuration for “external scripts enables”. If this option is “0” we have to change it to 1.
1 2 3 |
EXEC SP_CONFIGURE 'external scripts enabled'; |
Now we will enable R script run value.
1 2 3 4 5 6 7 |
EXEC SP_CONFIGURE 'external scripts enabled',1 GO RECONFIGURE GO EXEC SP_CONFIGURE 'external scripts enabled' |
We enabled external scripts. SQL Server 2016 only supports R script.
SQL Server 2017 supports R script and Python. We will test R script execution on SQL Server Management Studio.
1 2 3 4 5 6 |
EXEC sp_execute_external_script @language = N'R' ,@script = N'print("Hello SQLShack") print("R Script is working")' |
Now we will install word cloud packages on the SQL Server R Service;
-
Open the SQL Server R Services installation path
You can find it in this folder
“C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\R_SERVICES\bin” - Run console with administrator rights.
We will install word cloud and other packages. This process will take a little longer.
install.packages(“tm”) # for text mining
install.packages(“SnowballC”) # for text stemming
install.packages(“wordcloud”) # word-cloud generator
install.packages(“RColorBrewer”) # color palettes
In this step we will create data for our R script word cloud.
1 2 3 4 5 6 7 8 |
DROP TABLE IF EXISTS FruitList CREATE TABLE FruitList (FruitName VARCHAR(2000)) INSERT INTO FruitList VALUES ('Orange Lemon Melon Banana Pear Apple Peach Pear Apple Peach Orange Papaya Pear Apple Peach Banana') |
Now we will generate R script for the word cloud.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
EXECUTE sp_execute_external_script @language = N'R' , @script = N' library("tm") library("SnowballC") library("wordcloud") library("RColorBrewer") imageDir <- ''C:\\temp''; image_filename = tempfile(pattern = "plot_", tmpdir = imageDir, fileext = ".jpg") print(image_filename); jpeg(filename=image_filename, width=600, height = 800); text <- InputDataSet$FruitName docs <- Corpus(VectorSource(text)) inspect(docs) dtm <- TermDocumentMatrix(docs) m <- as.matrix(dtm) v <- sort(rowSums(m),decreasing=TRUE) d <- data.frame(word = names(v),freq=v) head(d, 10) set.seed(1234) aaa <- wordcloud(words = d$word, freq = d$freq , min.freq = 1 , max.words=200, random.order=FALSE, rot.per=0.35) dev.off(); OutputDataSet <- data.frame(data=readBin(file(image_filename, "rb"), what=raw(), n=1e6))' , @input_data_1 = N'select FruitName from FruitList' WITH RESULT SETS ((plot varbinary(max))); |
We will analyze R script code, piece by piece
This part of script will create a jpeg file.
1 2 3 4 |
imageDir <- ''C:\\temp''; image_filename = tempfile(pattern = "plot_", tmpdir = imageDir, fileext = ".jpg") print(image_filename); jpeg(filename=image_filename, width=600, height = 800); |
In this part, we will send our row data to the R script
1 |
text <- InputDataSet$FruitName |
This will create our word cloud if you run this script in R studio you can see word cloud. But you have to set some values to text variable.
1 2 3 4 5 6 7 8 9 10 11 |
docs <- Corpus(VectorSource(text)) inspect(docs) dtm <- TermDocumentMatrix(docs) m <- as.matrix(dtm) v <- sort(rowSums(m),decreasing=TRUE) d <- data.frame(word = names(v),freq=v) head(d, 10) set.seed(1234) aaa <- wordcloud(words = d$word, freq = d$freq , min.freq = 1 , max.words=200, random.order=FALSE, rot.per=0.35) |
And the finally. this part of code will generate our binary output and we will see below output.
1 2 |
dev.off(); OutputDataSet <- data.frame(data=readBin(file(image_filename, "rb"), what=raw(), n=1e6)) |
In this part of our article we will connect our word cloud to SQL Server 2016 Report Builder.
We will open report builder and create new report.
We will create a blank report.
Right click data source and add data source.
We will select Use a connection embedded in my report then click build button enter server name or IP and last step step select or enter a database name. Next test the connection.
We will add a data set for the executed query.
Now we will select the Use a dataset embedded in my report then select Data source and paste our query.
Now we will add an image to our report. Select the image and add the image to report.
Select the image source Database and select image data. Change the image size and select Orginal size
Finally, we will run the report and view our word cloud!
Conclusion
SQL Server R script Services is a powerful tool. It provides a gateway to the R script world to SQL Server and Microsoft BI tools. In this article we looked R script SQL Server Integration, created R script on SQL Server Manegment Studio and we generated a word cloud report.
- SQL Performance Tuning tips for newbies - April 15, 2024
- SQL Unit Testing reference guide for beginners - August 11, 2023
- SQL Cheat Sheet for Newbies - February 21, 2023