The XML data type is a very common data type that is used to store unstructured or heterogeneous data in SQL Server. In this article, we will discuss the use of the XML data type along with its benefits, disadvantages, and limitations within various use cases.
Prior to the use of the JSON data type in SQL Server, developers worked with the XML data type. These days, we store all data in the relational model with a dynamic column structure. However, early data models may not fit with the relational model. The conventional approach was to store information as XML and then store the XML document in the file system. Sometimes in worse situations, a physical disk can raise problems with the read/write operation with the XML document. So, it makes more data availability to store XML files in the database within the XML data typed column.
The XML data type was introduced with SQL Server 2005 and it was dominant in the enterprise products. Before SQL Server 2005, the XML document was stored inside the VARCHAR or TEXT information type. However, it raised XML validation issues. Indeed, even with an XML data type, we confronted numerous performance issues. For example, querying with an XML tag, performing data manipulations within a document or other documents in the same table or cross table, or making updates resulted in cumbersome XML. The main problem was the cost of parsing the fragment each time it was accessed.
XML has many different use cases in SQL Server. Without XML, a developer would use the VARCHAR or TEXT data types on a table column to store the XML document. As an industry-standard, we need to store unstructured data in a database, where the end-user chooses the structure of information. Unfortunately, this is a tedious way to manage and apply logic or metrics to unstructured information.
These cases increase the data processing time on the back-end platform. As we can see in scenario 1 above, the static pieces of information are being stored in a database with the normal tabular form. To display that form of data to the user, the back-end application needs to retrieve it from the database and then process it for return to the front-end application. With this method, it will take considerable data processing time and consume application resources to prepare large information sets.
In scenario 2, we show storing the same kind of details in XML format in the database. In this scenario, we do not process the information for return to the front-end application.
Today’s platforms, such as those for e-commerce, dynamic site builders, audit and others, are pushing user-defined forms of data to the database. Nowadays, we also have the NoSQL database engines which can store JSON based documents; those were not available earlier. In SQL Server, the XML and JSON documents may have little overhead with the taking up more space compared to other data typed data considering repetitive Element and Attribute names in the XML / JSON data. Even, XML data may not be quickly processable as compare to the regular SQL statements. Because it doesn’t follow a SQL Server standard to store the data into the table. However, we can fetch and manipulate the data from the XML with the business logic using the XPATH method with XQuery.
What is the best way to store unstructured data in the SQL Server database?
Most of the user-defined forms related platforms stores the XML template into the database with XML data type and reference to that template user information will fill-up in the XML document. A template will be the blank structure to the user for filling the details and filled XML will be stored into the database as a user response.
XML Template:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
<employee> <firstname type="textbox" /> <lastname type="textbox" /> <email type="textbox" /> <title type="textbox" /> <division_list type="dropdown"> <division isselected="1">Network</division> <division isselected="0">Database</division> <division isselected="0">Development</division> </division_list> <Employee_type_list type="dropdown"> <Employee_type isselected="1">Permenant</Employee_type> <Employee_type isselected="0">Contract</Employee_type> <Employee_type isselected="0">Internship</Employee_type> </Employee_type_list> <address type="textbox" /> <Age type="textbox" /> </employee> |
Here, the same XML has been filled by the end user’s input. A user can update the same XML again.
XML with user Information:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
<employee> <firstname type="textbox">Jimmy</firstname> <lastname type="textbox">Bischoff</lastname> <email type="textbox">jimmy0@adventure-works.com</email> <title type="textbox">Database Engineer</title> <division_list type="dropdown"> <division isselected="0">Network</division> <division isselected="0">Database</division> <division isselected="1">Development</division> </division_list> <Employee_type_list type="dropdown"> <Employee_type isselected="0">Permanent</Employee_type> <Employee_type isselected="1">Contract</Employee_type> <Employee_type isselected="0">Internship</Employee_type> </Employee_type_list> <address type="textbox">51, United States</room> <Age type="textbox">27</Age> </employee> |
This is considered one of the best approaches to using the XML data type in SQL Server; however, each approach has its own limitations, advantages and disadvantages:
Advantages of XML data type in SQL Server:
- XML stores unstructured data in a relational database model
- XML itself validates the information inside the file with tags and its values using DTD (Document Type Definition) and schema
- XML data type stores independent data structure, hence it can be integrated or used in other database sources easily
- In a way, the XML data type reduces the back-end application burden as the XML can be easily used with a UI
- The XML data type can be used with the input parameter in a function or stored procedure
Disadvantages of XML data type in SQL Server
- The XML data type consumes more space when compared with the relational (row/column) format in SQL Server data pages because the XML format is redundant with the XML tag-value set
- The query execution time and data manipulation process could take longer compared to a normalized structure
- A large XML document will use more server resources like CPU, memory, IO and few others in SQL Server
- XML query structure is complex
Limitation of XML data type
- Primary, unique or foreign constraints cannot be applied on an XML-based column
- The XML data type cannot be used in remote query execution; it needs to be cast or converted with VARCHAR in a query
- Index performance cannot be achieved on the key of XML with a pre-defined XML path
- To perform any action (SELECT, INSERT, UPDATE or DELETE) on the XML column of the table, a user needs to follow the XML query standard
XML declaration in SQL Server
XML is a SQL Server system data type. The variable and column of the table can be defined with the XML column in SQL Server. The current SQL Server databases are native XML integrated and allow the user to store big unstructured data into the XML format and index over the XML data type in order to enhance the Query performance:
1 2 3 4 5 6 7 8 9 |
DECLARE @var_XML XML = '' SET @var_XML = '<employee> <firstname type="textbox">Jimmy</firstname> <lastname type="textbox">Bischoff</lastname> <email type="textbox">jimmy0@adventure-works.com</email> <title type="textbox">Database Engineer</title> </employee>' SELECT @var_XML as employee_xml |
The XML data type itself validates the input values while assigning value to the XML column or variable. If something is wrong with the input, the query will return an error. For example, I removed one closing tag (</title>) in the above XML, while assigning the value to the variable. It will return an error “end tag does not match start tag” while assigning the value to the variable:
1 2 3 4 5 6 7 8 9 |
DECLARE @var_XML XML = '' SET @var_XML = '<employee> <firstname type="textbox">Jimmy</firstname> <lastname type="textbox">Bischoff</lastname> <email type="textbox">jimmy0@adventure-works.com</email> <title type="textbox">Database Engineer </employee>' SELECT @var_XML as employee_xml |
Error message:
As explained in the above sample data, the way of storing the data into the XML is in plain text format. XML is the easiest way to exchange data between multiple different systems or applications.
Conclusion
XML is quite useful for a hierarchical data structure, which could be very difficult to manage in an RDBMS. For unstructured data, it’s an out-of-the-box process to extract it with XML if you are using SQL Server. Here, we explained the basics of XML and the role of the XML data type in SQL Server. We will have more detail about the SELECT, INSERT, UPDATE and DELETE functions in upcoming XQUERY articles.
- Page Life Expectancy (PLE) in SQL Server - July 17, 2020
- How to automate Table Partitioning in SQL Server - July 7, 2020
- Configuring SQL Server Always On Availability Groups on AWS EC2 - July 6, 2020