Security has been one of the prime concerns of database developers since the inception of database management systems. Various data protection schemes have been introduced to provide secure access to sensitive data.
One such security feature introduced in SQL Server 2016 is called dynamic data masking. Dynamic data masking (as distinct from static data masking) is used to hide data from the user on the client-side.
It is important to mention that data masking is not the same as data encryption and should not be used as a primary security layer. It is only used to mask sensitive information such as social security numbers, email addresses, phone numbers, credit card numbers, etc. Following are the examples of dynamic data masking:
- Phone number -> xxxx-xxx-xx-5458
- Email -> xxxx@xxxx.com
The data is actually not masked physically in the database. Rather, the data is masked in the query result. By default, all the users will see the masked data in the output. The unmasked data is visible in the actual database.
For an introduction to Dynamic Data Masking in Azure see Implementing Dynamic Data Masking in Azure SQL database article, and for an overview of Dynamic Data Masking see Using Dynamic Data Masking in SQL Server 2016 to protect sensitive data article.
Data masking types
Default
The default mask, masks complete values in the specified column. To specify a mask for a particular column, you have to use the “MASKED WITH” clause. Inside the MASKED WITH clause, you have to specify the FUNCTION that you want to use for masking. If you want to perform default masking, you use the “default()” function.
Let’s take a look at a simple example of default masking. Let’s create a database named Masks. Execute the following script:
1 |
CREATE DATABASE Masks |
Inside the Masks database, we will create a table DefaultMask with four columns: ID, Name, BirthDate, and Social_Security. The Name, BirthDate and Social_Security columns will have default dynamic data masking. Execute the following script:
1 2 3 4 5 6 7 8 9 10 11 12 |
USE Masks DROP TABLE IF EXISTS DefaultMask; CREATE TABLE DefaultMask ( ID INT IDENTITY (1,1) PRIMARY KEY NOT NULL ,Name VARCHAR(255) MASKED WITH (FUNCTION = 'default()') NULL ,BirthDate DATE MASKED WITH (FUNCTION = 'default()') NOT NULL ,Social_Security BIGINT MASKED WITH (FUNCTION = 'default()') NOT NULL ); GO |
Let’s now insert some dummy data in the DefaultMask table that we just created:
1 2 3 4 5 6 7 8 |
INSERT INTO DefaultMask ( Name, BirthDate, Social_Security ) VALUES ('James Jones', '1998-06-01', 784562145987), ( 'Pat Rice', '1982-08-12', 478925416938), ('George Eliot', '1990-05-07', 794613976431); |
Execute the following query to SELECT all the records from the DefaultMask table:
1 |
SELECT * FROM DefaultMask |
The output looks like this:
In the output, you can see the unmasked values. This is because we returned the record as a database user that has full access rights. Let’s create a new user that can only access the DefaultMask table and then select the records using our new user. Execute the following script:
1 2 3 4 5 6 7 8 9 |
DROP USER IF EXISTS DefaultMaskTestUser; CREATE USER DefaultMaskTestUser WITHOUT LOGIN; GRANT SELECT ON DefaultMask TO DefaultMaskTestUser; EXECUTE AS USER = 'DefaultMaskTestUser'; SELECT * FROM DefaultMask; REVERT; |
In the output, you can see that all of the data has been masked:
Partial
The default mask hides everything in the column it is applied to. What if we want to partially display the information in the column while leaving some part of it hidden?
This is where partial masks come in handy. To use a partial mask, you have to pass “partial(start characters, mask, end characters” as the value for the function parameter of the MASKED WITH clause. It is important to mention that the partial mask is only applicable to string type columns.
Let’s take a look at a simple example of a partial mask. Let’s create a table PartialMask in the Masks database that we created earlier. Execute the following script:
1 2 3 4 5 6 7 8 9 10 11 |
USE Masks DROP TABLE IF EXISTS PartialMask; CREATE TABLE PartialMask ( ID INT IDENTITY (1,1) PRIMARY KEY NOT NULL ,Name VARCHAR(255) MASKED WITH (FUNCTION = 'partial(2, "XXXX",2)') NULL ,Comment NVARCHAR(255) MASKED WITH (FUNCTION = 'partial(5, "XXXX", 5)') NOT NULL ); GO |
The PartialMask table has three columns ID, Name and Comment. The Name and the Comment columns have been partially masked. For the Name column, we used the “partial(2, XXXX, 2)” mask. This mask will display the first two characters and the last two characters of the string value in the Name column. The remaining characters will be replaced by XXXX. Similarly, for the Comment column. the first 5 and last 5 characters will be displayed while the remaining characters will be masked by XXXX.
Let’s insert some dummy data in our PartialMask table:
1 2 3 4 5 6 7 8 |
INSERT INTO PartialMask ( Name, Comment ) VALUES ('James Jones', 'The tea was fantastic'), ( 'Pat Rice', 'I like these mangoes' ), ('George Eliot', 'I do not really like this'); |
Let’s select all the records from the PartialMask table using the database user:
1 |
SELECT * FROM PartialMask |
The output looks like this:
Again, let’s create a new user and grant him/her access to the PartialMask table to see what results he/she gets. Execute the following script:
1 2 3 4 5 6 7 |
DROP USER IF EXISTS PartialMaskTestUser; CREATE USER PartialMaskTestUser WITHOUT LOGIN; GRANT SELECT ON PartialMask TO PartialMaskTestUser; EXECUTE AS USER = 'PartialMaskTestUser'; SELECT * FROM PartialMask |
The output of the script looks like this:
As you can see from the output, for Name column, we have the first 2 and last 2 characters displayed and for the Comment column, we have the first 5 and last 5 characters displayed while the rest of the data is hidden.
The email mask is used to dynamically mask data which is in the email format. The function used is “email()”. Let’s create a new table with a column called Email and mask it using an email mask. Execute the following script:
1 2 3 4 5 6 7 8 9 10 11 |
USE Masks DROP TABLE IF EXISTS EmailMask; CREATE TABLE EmailMask ( ID INT IDENTITY (1,1) PRIMARY KEY NOT NULL ,Email VARCHAR(255) MASKED WITH (FUNCTION = 'email()') NULL ); GO |
The following script inserts some dummy records into the EmailMask table:
1 2 3 4 5 6 7 8 |
INSERT INTO EmailMask ( Email ) VALUES ('nickijames@yahoo.com'), ( 'loremipsum@gmail.com' ), ('geowani@hotmail.com'); |
Let’s first retrieve the records using the database user:
1 |
SELECT * FROM EmailMask |
The output looks like this:
Let’s create a new user and then access the data from the EmailMask table to see masked data. Execute the following script:
1 2 3 4 5 6 7 8 9 |
DROP USER IF EXISTS EmailMaskTestUser; CREATE USER EmailMaskTestUser WITHOUT LOGIN; GRANT SELECT ON EmailMask TO EmailMaskTestUser; EXECUTE AS USER = 'EmailMaskTestUser'; SELECT * FROM EmailMask REVERT; |
The output looks like this:
Random
The Random mask is used to mask the integer columns with random values. The range for random values is specified by the random function. Look at the following example. Execute the following script:
1 2 3 4 5 6 7 8 9 10 11 12 |
USE Masks DROP TABLE IF EXISTS RandomMask; CREATE TABLE RandomMask ( ID INT IDENTITY (1,1) PRIMARY KEY NOT NULL ,SSN BIGINT MASKED WITH (FUNCTION = 'random(1,99)') NOT NULL ,Age INT MASKED WITH (FUNCTION = 'random(1,9)') NOT NULL ); GO |
In the script above, we masked the SSN and Age columns. The values in the SSN column will be replaced by a value between 1 and 99 while the values in the Age column will be replaced by a value between 1 and 9.
Let’s insert some dummy data in the RandomMask table that we just created:
1 2 3 4 5 6 7 8 |
INSERT INTO RandomMask ( SSN, Age ) VALUES (478512369874, 56), (697412365824, 78), (896574123589, 28); |
Finally, we will create a new user that will access the data from the RandomMask table:
1 2 3 4 5 6 7 8 9 |
DROP USER IF EXISTS RandomMaskTestUser; CREATE USER RandomMaskTestUser WITHOUT LOGIN; GRANT SELECT ON RandomMask TO RandomMaskTestUser; EXECUTE AS USER = 'RandomMaskTestUser'; SELECT * FROM RandomMask REVERT; |
The output looks like this:
From the output, you can see that the values for the SSN and Age columns have been masked using our random mask.
Conclusion
In this article, we saw how to perform dynamic data masking with the help of several examples. Dynamic Data Masking feature was introduced in SQL Server 2016 to enhance data security on the client-side. There are four major types of masks available in SQL Server: Default, Partial, Random, and Email and we went through all of them.
Suggested Links:
- Working with the SQL MIN function in SQL Server - May 12, 2022
- SQL percentage calculation examples in SQL Server - January 19, 2022
- Working with Power BI report themes - February 25, 2021