This article is about SQL Machine Learning which is one of the most interesting topics equally attractive to both beginners and professionals of different areas of expertise.
Most importantly this article intends to give the basic understanding that anyone with some background in databases or business intelligence requires to begin their SQL Machine Learning journey.
Additionally, easy-to-understand tips equipped with concepts that help the readers to know the connection between SQL and machine learning along with simple examples to implement at your own pace are part and parcel of this article.
About Machine Learning
Let us first define machine learning so that we know the very basics of what we are going to explore further ahead in this article.
What is Machine Learning
Machine learning is a field of Artificial Intelligence or Computer Science which makes use of (data) models that improve their accuracy with time through experience somewhat similar to humans but with a lot more limitations.
Simple definition
We can simply say that it is all about making machines (data models) learn independently (automatically with limitations) without any pre-programmed or hard-coded logic or instructions.
Who works in this field
Data scientists, Machine Learning Engineers, Data Analyst Programmers, and BI Professionals are capable of working in this area; however, it cannot only be limited to these professionals.
We can say any data professional with the right capability can make use of SQL Machine Learning or similar technologies to achieve their business or professional life objectives whether it be a company revenue generation goal or highly beneficial research in the field of medicine.
About SQL Machine Learning (ML)
As the name suggests SQL ML is a service that you install on your SQL Server instance to use it for the desired purposes.
Two modes of installation
The SQL Machine Learning service can be installed in the following two ways:
- You install the SQL Server instance, and you add the service straight away during the installation
- You have already installed a SQL instance and you add it later on as a feature
Three flavors (languages) of installation
There are three flavors for using SQL Machine Learning which means we can add any of the three languages to utilize this service:
- Python
- R
- Java
Three Benefits
Now we know that SQL Server (databases) is one of the best places for the data to reside but in the past Data, Scientists had to take the data out of SQL to work on it for machine learning purposes.
So, SQL ML provides three main benefits including addressing the above-mentioned concerns:
- In-Database processing (no need to take data out of SQL)
- Full data set processing (we can have full-length data set processing without worrying about space or storage)
- Built-in Security wrapper (SQL security remains intact during the ML processes as well)
Using and running Python Code (SQL Machine Learning)
There are many merits of using Python code in SQL as part of ML:
- The same query editor that we use for writing and running SQL scripts can be used for running Python code
- The Python coding is supported in both main Microsoft SQL database development and management tools like SQL Server Management Studio (SSMS) and Azure Data Studio.
- You have the freedom to store Python code as procedures
- The stored procedures that contain Python code can be reused by other developers easily
SQL Machine Learning (ML) Installation
Let us assume we are installing SQL Server for the first time, and we want to install SQL ML service at the same time so this means we have to install it as a feature.
Please remember if you have multiple instances installed then you can add ML service to any of the installed instances plus it is an optional service that you have to exclusively choose to install.
Please download the SQL Server 2019 version or any compatible version that supports ML by visiting the following page:
SQL Server Downloads | Microsoft
Run the setup and choose New SQL Server stand-alone installation:
After running some configuration checks then Select Python under Machine Learning Services and Language in the Feature Selection tab:
Accept the terms:
If the Next button is disabled then you have to get Python files from the links mentioned below:
Once the installation files are downloaded, please copy them into the Setup folder as follows:
You should be now able to proceed further with SQL Server installation along with SQL Machine Learning service:
Verifying SQL Server Machine Language Installation
Open SQL Server Management Studio (SSMS) or Azure Data Studio and connect to your locally installed SQL Server instance:
Enable External Scripts (sp_configure)
Now the first thing is to enable the external scripts and here in our case, it is Python scripts.
Please run the following script against the Master database of the instance:
1 2 3 4 5 |
-- Enable Python scripts EXEC sp_configure 'external scripts enabled',1 RECONFIGURE WITH OVERRIDE |
The following output is shown:
Check the status of the external script
Please run the following statement to check the status now:
1 2 3 4 |
-- Check sp_configure sp_configure |
The output is shown below:
Please note even after enabling the external scripts you still see that config_value or run_value is 0 for the external scripts enabled column then please consider restarting the SQL instance by clicking on the name of the instance in the Object Explorer and clicking Restart.
Test Python Code
Let us now test some Python code.
Please remember Python is strictly case sensitive (Print is not the same as print) and the code should be well indented too.
Print out
You can simply print out the article titled SQL Machine Learning in Simple Words in Python by using the print command as follows:
1 2 3 4 5 6 |
-- Print article title EXEC sp_execute_external_script @language=N'Python', @script=N'print("SQL Server Machine Learning in Simple Words")' |
The output is as follows:
Get System Version
You can view the system version by the following code:
1 2 3 4 5 6 7 |
-- Get System Version EXEC sp_execute_external_script @language=N'Python', @script=N'import sys print(sys.version)' |
The output is as follows:
Using Variables in Python
Unlike SQL you don’t need to declare a variable type in Python. A python variable can simply be defined and initialized.
Subtracting two numbers in Python
To subtract two numbers in Python we run the following script using two variables First and Second:
1 2 3 4 5 6 7 8 9 10 |
-- Subtracting two variables (numbers) EXEC sp_execute_external_script @language=N'Python', @script=N' First=100 Second=200 print(Second-First, "<=result (second - first)")' |
The results are as follows:
Loop in Python
We can also use looping constructs in Python (SQL Machine Learning language) such as While or For loop. A simple While loop can be executed as follows:
1 2 3 4 5 6 7 8 9 10 11 12 |
-- While looping in Python EXEC sp_execute_external_script @language=N'Python', @script=N' i=1 while i<6: print("Counter", i) i=i+1 ' |
The results are shown below:
SQL Machine Learning (ML) using Python
The three most important objectives of SQL Machine Learning using Python are as follows:
- We need to send/load data from SQL into Python (script)
- Process the loaded/imported data in Python
- Return the processed data to SQL (in a form understandable by SQL)
The Role of InputDataSet variable from Query Output (results) to Python Processing
In other words, we need to understand the following flow:
- SQL query output is changed into the Python script
- This is done by redirecting the SQL output to a special variable called InputDataSet
- Then Python Data Frame processes the data that is in InputDataSet
Now we simply need T-SQL to import SQL data into Python for machine learning purposes such as to be able to process it via Python since Python is a highly efficient language for data processing.
Another way to understand this is as follows:
- Import (data from SQL into Python)
- Manipulate (data in Python)
- Export (data from Python back to SQL)
This can be illustrated as follows:
Setting up the Sample database
Let us quickly build a sample database called StudentsExamDemo:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
-- USER master database USE master; ALTER DATABASE [StudentsExamDemo] SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE [StudentsExamDemo] ; CREATE DATABASE StudentsExamDemo; GO USE StudentsExamDemo -- Create Student table DROP TABLE IF EXISTS dbo.Student CREATE TABLE [dbo].[Student] ( [StudentId] INT IDENTITY (1, 1) NOT NULL, [Name] VARCHAR (50) NOT NULL, [Score] INT, ); -- Populate student table SET IDENTITY_INSERT [dbo].[Student] ON INSERT INTO [dbo].[Student] ([StudentId], [Name], [Score]) VALUES (1, N'Atif', 80) INSERT INTO [dbo].[Student] ([StudentId], [Name], [Score]) VALUES (2, N'Asim', 70) INSERT INTO [dbo].[Student] ([StudentId], [Name], [Score]) VALUES (3, N'Sarah',75) INSERT INTO [dbo].[Student] ([StudentId], [Name], [Score]) VALUES (4, N'Mike',85) INSERT INTO [dbo].[Student] ([StudentId], [Name], [Score]) VALUES (5, N'Peter',65) SET IDENTITY_INSERT [dbo].[Student] OFF |
SET IDENTITY_INSERT [dbo].[Student] OFF
Load data from SQL into Python (Import data into Python using T-SQL) – InputData1
In the pursuit of SQL Machine Learning let us now perform the first step of loading this data into Python script which means we are going to convert this data into a Python Data Frame.
Now we have to think in terms of variables to do these tasks. For example, there is a variable that will be assigned to the SQL script to be loaded into the Python script for the second step which is manipulating data. We make use of the input_data_1 property to initialize with SQL script of interest to be then manipulated by Python (data frame).
Build and run the script as follows:
1 2 3 4 5 6 7 8 9 |
--Load data from SQL into Python (Import data into Python using T-SQL) – InputData1 EXEC sp_execute_external_script @language=N'Python', @script=N' ', @input_data_1=N'SELECT * FROM dbo.Student'; |
The results are as follows:
The importance of the Input_Data_1 property and InputDataSet variable
Now, input_data_1 is the property that is going to take the SQL script and convert it into a data frame for Python. When input_data_1 converts the SQL output (resultset) into a data frame it is called Pandas Data Frame and it can be accessed by a variable called InputDataSet.
So, InputDataSet is the variable that can display the contents of the result set ready to be processed (manipulated) using SQL Machine Learning service compatible language like Python.
We can see all this in action by running the following script:
1 2 3 4 5 6 7 8 |
--View the data frame by printing out the value of the InputDataSet variable (after we imported SQL into the Python data frame using the input_data_1 property) EXEC sp_execute_external_script @language=N'Python', @script=N'print(InputDataSet) ', @input_data_1=N'SELECT * FROM dbo.Student'; |
The output is as follows:
So, we can see that the variable InputDataSet shows the contents of the Python Data Frame which was first imported into Python with the help of the input_data_1 property.
An interesting point to note here is that in Python the index begins with 0.
Manipulating the Python Data Frame (Processing/Manipulation)
Now we have the data in the required form but the reason we did all that is just to be able to use the best technology (SQL Machine Learning service) to manipulate the data to explore it and to get value out of it.
To manipulate the data (which now exists as Python Data Frame) we make use of Python libraries also called Pandas Data Frame API reference which is a list of all the available functions that we can use to understand the data from SQL Machine Learning perspective.
Finding integer location iloc of the data (pandas.DataFrame.iloc)
Let us find the integer location of the data by using the iloc property of the Pandas Data Frame:
1 2 3 4 5 6 7 8 |
--Finding integer location iloc of the data (pandas.DataFrame.iloc) EXEC sp_execute_external_script @language=N'Python', @script=N'print(InputDataSet.iloc[[4]]) ', @input_data_1=N'SELECT * FROM dbo.Student'; |
The result set is as follows:
So, this result showed us the student at index location 4 in the data frame.
Similarly, we can also view the data by specifying a range of indexes. So, to view the data at an index ranging from 1 to 3 we can run the following script:
1 2 3 4 5 6 7 8 |
--view the data at an index ranging from 1 to 3 (pandas.DataFrame.iloc) EXEC sp_execute_external_script @language=N'Python', @script=N'print(InputDataSet.iloc[1:3]) ', @input_data_1=N'SELECT * FROM dbo.Student'; |
The output is as follows:
Please note that the range 1:3 means the index location 1 to 2 while index 3 is not included.
Using the head function to manipulate the data frame (pandas.DataFrame.head)
Similarly, we can use the head function to manipulate data by inputting the number of rows to be displayed.
For example, to only view the first 3 rows we will use the head function as follows:
1 2 3 4 5 6 7 8 |
--View only the first three rows using a head function (pandas.DataFrame.head) EXEC sp_execute_external_script @language=N'Python', @script=N'print(InputDataSet.head(3)) ', @input_data_1=N'SELECT * FROM dbo.Student'; |
The result set is as follows:
Please note that there is a rich library of functions for a comprehensive data manipulation although we just focus on the basic functions available in SQL Machine Learning using Python.
Export Data Frame (back to SQL)
We began with importing our SQL result set into Python followed by converting it into a Python data frame and applying some manipulations on the data but finally, we need to push the data back to SQL.
The main benefit of SQL Machine Learning using Python is that we can revert to SQL after doing the complex manipulations and this can be used for the following purposes:
- Reporting based on the returned data set in SQL
- Publishing the returned data set in SQL to a website
- Using the returned data set in SQL as a parameter to be passed into another function for further work (processing)
Export Manipulated data using OutputDataSet Variable
There is a handy variable OutputDataSet available that exports the data back to SQL and this is done by assigning manipulated data frame to it where the manipulated data frame is held by InputDataSet.
Example: Finding the shape of the student table
Let us take an example of the student table we have in our sample database. Now we want to know more about our table which means we want to know the student in at index 0 and 1.
Let us write a script that does the following:
- Import the table into Python Data Frame
- Get the shape of the table (number of rows and columns)
- Export the shape back into SQL
This is done as follows:
1 2 3 4 5 6 7 8 9 |
--Importing SQL into Python Data Frame, finding student rows at index 0 (first row) and 1 (second row), Exporting into SQL EXEC sp_execute_external_script @language=N'Python', @script=N'OutputDataSet=InputDataSet.iloc[0:2] ', @input_data_1=N'SELECT * FROM dbo.Student' WITH RESULT SETS (([StudentId] INT, [Name] VARCHAR(50), [Score] int)); |
The results can be seen below:
Conclusion
The most important part is to understand the business needs of your organization and how machine learning or simply this three-step process (import, manipulate and export to SQL) can help you to achieve those business goals and objectives. You can formalize this process based on what you learn from the day-to-day data needs of your organization.
Congratulations, you have successfully learned the basics of SQL Machine Learning and can also explore it further now.
- How to create and query the Python PostgreSQL database - August 15, 2024
- SQL Machine Learning in simple words - May 15, 2023
- MySQL Cluster in simple words - February 23, 2023