Database migration is a vital task in any environment, complex or otherwise. Seamless migrations are the goal but the efforts required to ensure it are tremendous.
Backing up and restoring the database is surely a preferred and a robust approach, but does it work well in all situations? How do we plan this when the source and the destination databases need different configurations? How do we make such a migration seamless?
I recently worked on a database migration project, which needed a collation setting of case sensitivity that is “SQL_Latin1_General_CP1_CS_AS”, but the source setting of the database was “SQL_Latin1_General_CP1_CI_AS”.
Anyone would feel that it is fairly straightforward to restore the database and change the collation of the database to the said one. But when I dug into the database, I found there were various references being made on the columns, and that clustered index was created on many columns. Just like you cannot change the collation on a column that is being referenced by a foreign key, you cannot change the collation setting on a column that is set as the clustered index, which is the primary key column of the table.
A global SQL Server Collation Sequence update is the answer to the puzzle, but it is a multi-step operation. The first step is to identify the case sensitivity of the collation assigned to the repository database. If you find that the repository database is not case sensitive, you must create a new empty (blank) case-sensitive database to migrate the data to. You must then import the data from the old case-insensitive repository database to the new empty case-sensitive database, and then establish the new case-sensitive database as the repository database.
Background
The collation setting provides sorting rules, case sensitivity, and accent sensitivity properties of character data types such as char and varchar. They apply not only to data stored in tables, but also to all text used in SQL Server, including variable names, metadata, and temporary tables. If the collation is case sensitive, then the uppercase letters are treated differently than the lowercase letters. Objects can also be set to accent sensitive in which letters with accent characters will sort in a specific order.
Collation is by default applied to the entire SQL Server instance. Also, when we install SQL Server, it picks the collation setting from the Windows system language. If the systems language setting is set to English (US), then the collation for SQL Server would be “SQL_Latin1_General_CP1_CI_AS” by default.
Collation Names
SQL Server maintains a list of pre-defined collations that conform to the following pattern
SQL_<SortRules><[_Pref]>_<CPCodepage>_<CaseSensitivity>_<AccentSensitivity>
Each field within the collation name is separated by an underscore character (_). Values in these fields display characteristics of the collation and its sequence.
The SortRules and CPCodepage fields identify the name of the language/locale or alphabet that the collation was designed to support and the character numbering rule used when sorting terms. The CaseSensitivity and AccentSensitivity fields identify additional sorting rules to use when two letters within that language/locale or alphabet do not share the same case or accent.
To identify the case sensitivity of a selected collation, review the fifth term (CaseSensitivity) in the collation name. This term can contain one of the following values:
- CI : Case Insensitive
- CS : Case Sensitive
To create a case-sensitive repository database, select the collation that supports the language/locale or alphabet used by your repository database and include the value CS in the fifth term of the collation name.
How to identify collation setting and its various levels
Oftentimes, we’re stuck at a point where we’re unable to find what collation setting is applied to a certain object. Let us look at some of the useful queries that will help you find the collation and its levels.
First of all, you need to know that collation can be set four levels
Server-level collations
- Set during the installation of SQL Server
- Default collation for all the databases, both system and user-defined databases
If you’re uncertain what collation has been assigned to a certain SQL Server instance, use the SERVERPROPERTY system function to find that out:
1 2 3 |
SELECT CONVERT (varchar, SERVERPROPERTY('collation')) SQLServerCollation; |
Review the fifth term of the collation name returned from the query. If the fifth term is CI, the database collation is case-insensitive
Database-level collations
- It can be set during the creation of database using the collate clause of create or alter database SQL statement
- Collation property of a newly-created database would be inherited from the instance, if we don’t explicitly mention the collation setting.
1 2 3 4 5 |
SELECT name, collation_name FROM sys.databases; --OR SELECT CONVERT (varchar, DATABASEPROPERTYEX('SQLShack','collation')); |
In this case the fifth term is CS, which means that the database collation is case-sensitive.
Column-level collation
- It can be set during the time of the creation of the table or altered using the collate clause
- A column is a child object of the database, so the collation is inherited from the database, unless explicitly changed.
1 2 3 |
SELECT name, collation_name FROM sys.columns |
Expression-level collation
- Set when a SQL statement is run using the collate clause
1 2 3 |
SELECT name, collation_name FROM sys.columns order by name collate Latin1_General_100_CI_AS |
The sort order is based on the defined collation setting
To view the all the collations supported in SQL Server, you can use the following query:
1 2 3 4 |
--The sys.fn_helpcollations system table function is used to view all collations supported by SQL Server 2017 SELECT name, description FROM sys.fn_helpcollations(); |
Preparing a Case-Insensitive Repository Database for Migration
The first step in preparing the case-insensitive repository database for transfer to a case-sensitive database is to run the SQL query, DBCC CHECKDB WITH DATA_PURITY. This query identifies any issues you must address before transferring the data.
- In the Object Explorer pane, right-click the repository database objects, and then click New Query.
-
Type the following statement in the SQL Query pane.
12345USE WF_REPPGODBCC CHECKDB WITH DATA_PURITY
- Click Execute
In case of any errors, troubleshoot and fix the problems and run all of the above steps again. In this case, all of the statistics are set to zero (0), which means that there are no issues.
Create the database backup
The second step is to create a backup copy of the case-insensitive version of the repository database using the SQL Server Database Backup. We can use this backup database to recover the data if in case issues occur during or after the transfer from the case-insensitive database to the case-sensitive database
Creating a Case-sensitive Database
- In the Object Explorer pane, right-click the Databases folder, and then click New Database.
-
Type the name of your new repository database in the Name field of the New Database dialog box.
For example, let’s call the new database, WF_REPP - In the Select a page pane, click Options.
- Open the Collation field drop-down list, and click the case-sensitive collation SQL_Latin1_General_CP1_CS_AS
- Click OK.
- An icon for the new repository database appears in the Databases folder in the Object Explorer pane.
Scripting objects of case in-sensitive repository – Generate and Publish Scripts Wizard
- In Object Explorer, expand Databases, right-click a database, point to Tasks, and then click Generate Scripts. Follow the steps in the wizard to script the database objects.
-
On the Choose Objects page, select entire database and all the database objects.
- Click Next
- On the Set Scripting Options page, select Save to new query window
-
To specify advanced scripting options, select the Advanced button and leave the default value of collation and change script to drop and create
-
On the Summary page, review your selections. Click Previous to change your selections. Click Next to generate a script of the objects you selected. On the Save or Publish Scripts page, monitor the progress of the script generation.
- Fix the code and check related objects such UDF and TVFs, Views, computed columns, and constraints.
- Execute the script on the case-sensitive repository.
Identify the dependency
By now, we know how to fix all the logical objects of the database. Let’s now take a look at fixing the collation settings on the tables. It is important to understand the foreign key relationships between the objects. Use the following query to list the dependent objects, where the collation setting SQL_Latin1_General_CPI_CI_AS of the column
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
Select ROW_NUMBER() OVER (ORDER BY object_name(rkeyid), c1.name) as ID, object_name(rkeyid) Parent_Table, object_name(fkeyid) Child_Table, object_name(constid) FKey_Name, c1.name FKey_Col, c2.name Ref_KeyCol From sys.sysforeignkeys s INNER JOIN sys.syscolumns c1 on ( s.fkeyid = c1.id AND s.fkey = c1.colid ) INNER JOIN syscolumns c2 on ( s.rkeyid = c2.id AND s.rkey = c2.colid ) WHERE c1.collation='SQL_Latin1_General_CP1_CI_AS' Order by Parent_Table,Child_Table |
-
Identify Primary Key column dependency
12345678910111213SELECT si.name AS PrimaryKey,OBJECT_NAME(ic.object_id) TableName,COL_NAME(ic.object_id,ic.column_id) ColumnName,sc.collation_name CollationFROM sys.indexes AS si INNER JOINsys.index_columns AS ic ON si.object_id = ic.object_id ANDsi.index_id = ic.index_id INNER JOINsys.columns sc ON ic.object_id=sc.object_id ANDsc.column_id=ic.column_idWHERE si.is_primary_key=1AND sc.collation_name like 'SQL_Latin1_General_CP1_CI_AS'
-
The above information clearly shows that using ALTER COLUMN to modify the collation will not work until the table is rebuilt. Let’s see a demo to change the collation of Primary Key column
123456789101112131415161718192021222324252627282930313233--Step 1: Create the table with CS (Case-Sensitive) collationCREATE TABLE [dbo].[SQLShackDemo]([customer_id] [varchar](20) COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL,[customer_name] [varchar](20) NOT NULL,[phone_number] [int] NULL,[sales_total] [int] NULL,CONSTRAINT [PK_CustomerID] PRIMARY KEY CLUSTERED([customer_id] ASC))--Step 2: Insert the Dummy dataInsert into SQLShackDemo values(1,'Prashanth',168082,1000),(2,'Jayaram',68082,1500),(3,'thanVitha',68083,2500)SELECT * from [SQLShackDemo]--Step 3: Drop the PK constraintALTER TABLE [SQLShackDemo] DROP CONSTRAINT [PK_CustomerID]--Step 4: Alter the column with the new collation settingALTER TABLE [<Table>] ALTER COLUMN [<Column>] <DataType(Size)> COLLATE <NewCollationType>ALTER TABLE [SQLShackDemo] ALTER COLUMN [customer_id] [varchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL--Step 5: re-add the Primary Key constraintALTER TABLE [SQLShackDemo] ADD CONSTRAINT [PK_CustomerID] PRIMARY KEY (customer_id)SELECT * from [SQLShackDemo]
Here’s what we did in the last steps:
- Script drop and re-create a foreign key constraint
- Drop all the all foreign key constraints
- Migrate data using SQL Server Import and Export Wizard
- Modify the target database by re-enabling the foreign key constraints
- Rebuild the indexes
Please note that the above steps are an option only when working on a very few tables. Doing this on a huge table would be a daunting task, not to mention the significant amount of resources that would be consumed if the data is too large.
Summary
Many times, as administrators, we test various methodologies of migration. This article is an effort to demonstrate a simple technique to move the data to a different (new) database with a different configuration while ensuring to maintain data integrity.
SQL Server supports storing objects that have different collations within a database—we saw how we can set collation at multiple levels.
The execution plan may vary depending on the type of collation used in the transact SQL statement. The behavior of SQL execution may be different results with different collation settings.
If you want to use a collation other than the default when installing SQL Server, be sure to change the collation on the Collation tab of the wizard’s Server Configuration screen.
You should try your best to get the server collation right when you install SQL Server because changing it after the installation is no small feat. You must take steps such as backing up the data, rebuilding the master database, recreating the user databases and all the objects within and importing the data into the newly created tables.
Fortunately, instead of changing the server collation, you can assign a different collation to your user databases, and you can assign a specific collation to even a character column. But as usual, better to keep in mind the inheritance, so as to keep things simple and manageable.
- Stairway to SQL essentials - April 7, 2021
- A quick overview of database audit in SQL - January 28, 2021
- How to set up Azure Data Sync between Azure SQL databases and on-premises SQL Server - January 20, 2021