I have seen that SQL developers use varchar(max) data while designing the tables or temporary tables. We might not be sure about the data length, or we want to eliminate the string or binary truncation error.
Is it a good practice to use varchar(max) for each usage?
We can define a specific range for the varchar (n) data type, and it is the recommended way to do so. In order to gain understanding about this data type, read SQL varchar(n) article.
We will discuss the use of varchar max and its implications, comparison with the varchar (n) data type in this article.
Overview of the VARCHAR(max) SQL Server Data Type
The SQL Server 2005 introduced this varchar(max) data type. It replaces the large blob object Text, NText and Image data types. All these data types can store data up to 2 GB. As you might be aware that the basic unit of storage in SQL Server is a page. The page size is 8 KB (8192 byes) in SQL Server, and it is fixed. On a page, SQL Server uses 96 bytes for the page header. We can store 8096 bytes ( 8192-96 bytes) for data in SQL Server. Apart from this, page also contains row overhead and row offset and leaves 8000 bytes to use for data storage. Due to this, we can store up to 8000 bytes of data using varchar (8000) data type.
You might think of using the varchar(max) data type to store 2 GB data to resolve the string truncation issues.
Let’s create a few sample tables with different size in varchar data type. We will also create a table with a varchar(max) data type.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
CREATE TABLE dbo.Employee_varchar_2000 (id INT IDENTITY PRIMARY KEY, Col1 VARCHAR(2000) ); CREATE TABLE dbo.Employee_Varchar_4500 (id INT IDENTITY PRIMARY KEY, Col1 VARCHAR(4500) ); CREATE TABLE dbo.Employee_Varchar_8000 (id INT IDENTITY PRIMARY KEY, Col1 VARCHAR(8000) ); CREATE TABLE dbo.Employee_Varchar_Max (id INT IDENTITY PRIMARY KEY, Col1 VARCHAR(MAX) ); |
Let’s insert records into these sample tables using the following queries.
1 2 3 4 5 6 7 8 9 10 11 12 |
INSERT INTO Employee_varchar_2000 (Col1) SELECT REPLICATE('A', 2000); INSERT INTO Employee_varchar_4500 (Col1) SELECT REPLICATE('A', 4500); INSERT INTO Employee_varchar_8000 (Col1) SELECT REPLICATE('A', 8000); INSERT INTO Employee_varchar_max (Col1) SELECT REPLICATE('A', 8000); |
We can verify the data length in these tables using the following queries.
1 2 3 4 5 6 7 8 9 10 |
Use SQLShackDemo go SELECT LEN(col1) AS columnlength FROM Employee_varchar_2000; SELECT LEN(col1) AS columnlength FROM Employee_varchar_4500; SELECT LEN(col1) AS columnlength FROM Employee_varchar_8000; SELECT LEN(col1) AS columnlength FROM Employee_varchar_max; |
In the following screenshot, we can verify the data length is similar to existing table column length.
Now, we can check the object statistics like page count, row count, and allocation unit using the DMV sys.dm_db_index_physical_stats.
1 2 3 4 5 6 7 8 9 |
SELECT OBJECT_NAME([object_id]) AS TableName, alloc_unit_type_desc, record_count, page_count, round(avg_page_space_used_in_percent,0) as avg_page_space_used_in_percent , min_record_size_in_bytes, max_record_size_in_bytes FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'DETAILED') WHERE OBJECT_NAME([object_id]) LIKE 'Employee_varchar%'; |
We can see that all tables contains the allocation unit IN_ROW_Data. SQL Server stores all data in the IN_ROW_Data allocation unit.
We cannot insert more than 8000 bytes data in the varchar(n) data type. If we try to do so , we get the following error message.
1 2 3 4 5 6 7 |
INSERT INTO Employee_varchar_8000 (Col1) SELECT REPLICATE('A', 8001); Go INSERT INTO Employee_varchar_8000 (Col1) SELECT REPLICATE('A', 10000); |
It inserts the data successfully but truncates the values to 8000 characters. Similar truncation occurs for the Employee_varchar_max table containing the varchar(max) data type.
We need to cast the value to varchar(max) and insert for the length above 8000 characters. We get the error message while trying to insert records in Employee_varchar_8000 table.
1 2 |
INSERT INTO Employee_varchar_8000 (Col1) SELECT REPLICATE(CONVERT(VARCHAR(max), 'x'), 8001); |
It successfully inserts records in the Employee_varchar_max table.
Rerun the query to check the allocation unit. We get the LOB_Data allocation unit to store the data more than 8000 bytes in the Employee_Varchar_Max table. We have a pointer to this data in the IN_Row_DATA allocation unit.
We can get the following conclusion from this.
- SQL Server uses the IN_ROW_DATA page for the varchar(max) data type if the data is less than or equal to 8000 bytes.
- If the data grows beyond the 8000 bytes, SQL Server uses LOB_DATA page for the varchar(max) data type
Performance comparison between varchar(max) and varchar(n) data type
Let’s insert 10,000 records into each of the tables we created earlier. We want to check the data insertion time. You can use the ApexSQL Generate tool to insert the data without writing the t-SQL code for it.
In the following screenshot, you can note the following.
- Employee_varchar_2000 insertion time 0.08 Seconds
- Employee_varchar_4500 insertion time 0.19 Seconds
- Employee_varchar_8000 insertion time 0.31 Seconds
- Employee_varchar_Max insertion time 2.72 Seconds
Indexes on VARCHAR(N) and VARCHAR(MAX) columns
As a DBA, you might not design the table. However, it is required to create an Index on the tables to improve the performance of the query.
We can create an index on the key column of the table holding varchar(n) data type.
1 2 3 |
CREATE INDEX IX_Employee_varchar_2000_1 ON dbo.Employee_varchar_2000(col1) GO |
If we try to do the same for the varchar(max) data type, it gives the following error message.
1 2 3 |
CREATE INDEX IX_Employee_varchar_max ON dbo.Employee_varchar_max(col1) GO |
Msg 1919, Level 16, State 1, Line 23 Column ‘col1’ in table ‘dbo.Employee_varchar_max’ is of a type that is invalid for use as a key column in an index.
We can use the varchar(max) column as an included column in the index, but you cannot perform the index seek on this column. It will also require additional storage. Therefore, you should avoid creating an index with the varchar(max) data type.
Execution plan comparison
Let’s compare the execution plan of two select statements.
In the first query, we want to retrieve data from the Employee_Varchar_2000 table and get the actual execution plan.
In the actual execution plan, we can see a non-clustered index seek operator.
If we run the same query with the varchar(max) data type, it uses a clustered index scan operator, and it can be a resource-intensive operator depending upon the number of rows in the table.
select col1 from Employee_varchar_max where col1 like ‘xxxx%’
Let’s compare the execution plan using the Compare Showplan option of SSMS. To compare two execution plans, save one execution plan by right click on the plan and Save Execution Plan as and provide the location to save the plan.
In another query execution plan, right-click and choose Compare Showplan. It opens a window, and you can specify the path of the earlier saved execution plan.
In the following screenshot, you can see the comparison between both execution plans.
- The estimated CPU cost is higher in the varchar(max) data type for a similar activity as of varchar(2000)
- For the varchar(max) it uses clustered index scan operator and scans all records. You can see this the estimated number of rows is 10000 while in the varchar(2000) data type it uses index seek operator and estimated number of rows is 1.96078
- Estimated row size 4035 B is greater than in varchar(max) compare to the 1011 B for the varchar(2000) data type
Difference between the varchar(max) and varchar(n) data type
varchar(max) | varchar(n) |
We can store up to 2 GB of data in this data type | We can store up to 8000 bytes data in this data type |
It uses the allocation unit IN_ROW_Data up to 8000 bytes of data. If data is more than 8000 bytes, it uses the LOB_Data page and stores its pointer in the IN_ROW_Data page | It stores data in the standard data page |
We cannot create an index on the key column of the varchar(max) data type | We can create an index on this data type |
We cannot compress the LOB data | We can compress data for this data type |
Data retrieval and updation on the LOB data is relatively slow | We do not face such issue in the varchar(n) data type |
Conclusion
In this article, we demonstrated varchar(max) data type and also explored several differences between the varchar(max) and varchar(n) data types. You should use an appropriate data type. We should consider the database design, performance, compression, indexes in mind. You should review the data types in your database and change it if required with proper testing.
- Understanding PostgreSQL SUBSTRING function - September 21, 2024
- How to install PostgreSQL on Ubuntu - July 13, 2023
- How to use the CROSSTAB function in PostgreSQL - February 17, 2023