This article provides a step-by-step guide for migrating SQL Server graph databases to Neo4j using C#.
Why migrating graph databases from SQL Server?
Even if Microsoft introduced the SQL Server graph database in 2017, this feature is still far from the leading database providers in this domain, especially Neo4j. There are many articles published online where you can find different comparisons between graph technologies. In brief, the following are some reasons to migrate graphs from SQL Server to Neo4j:
- Neo4j reached a higher level of maturity since it was first released more than 10 years
- While SQL Server graphs are slowly evolving and improved, a wide community contributes to the Neo4j open source project
- Neo4j is open-source, while SQL Server graph technology is not
- Cypher is more straightforward than the extended graph functionalities in SQL Server
- In SQL Server, graphs cannot be visualized without an external tool
- Most importantly, Microsoft is a customer for Neo4j!
Figure 1 – Neo4j customers
When are SQL Server graphs useful?
One question comes to mind: “Are SQL Server graph functionalities useless?” In my opinion, there is no general answer to this question. Suppose your data is stored within SQL Server databases. You need to implement some graphs within your ecosystem and integrate them with relational data. In that case, SQL Server graph database is preferable. Moreover, you may need all functionalities provided by Neo4j in several cases.
You can also direct to this article, An introduction to a SQL Server 2017 graph database, to gain a quick understanding of SQL Server graph databases.
Migrating SQL Server graphs to Neo4j
Even if I prefer using SSIS for data transfer operations, Neo4j doesn’t have any official or stable (free) SSIS component. While searching, I found a third-party component that is still in the beta version.
Another approach for migrating SQL Server graphs to Neo4j is to export data into flat files and then import them into Neo4j.
The third approach is to develop a small application using C# to migrate Nodes and Edges created in SQL Server to a Neo4j database. This approach is explained in detail in this section.
Note that in this tutorial, we will use Visual Studio 2019. And the source SQL Server database is built based on this article published previously on SQL Shack: Understanding graph databases in SQL Server.
Create a new console application project
First of all, we should open Visual Studio and create a new console application, as shown in the images below:
Figure 2 – Adding a console application
Figure 3 – Naming project
Installing NuGet packages
To build this project, we have to install two NuGet packages:
- Neo4jClient: A third party class library that is used to connect with Neo4j from .Net applications
- System.Data.SqlClient: A library developed by Microsoft, previously it was a part of the .Net framework. Since we are using .Net core to build the application, we need to install it using NuGet packages
To install NuGet packages, in the Visual Studio menu strip, go to Tools > NuGet package Manager > Manage NuGet packages for solution…
Figure 4 – Opening Nuget packages manager
When the Nuget packages manager window appears, go to Browse, and search for Neo4jClient. Then, as shown in the image below, click on the package, select the project, and click “Install”.
Figure 5 – Install Neo4jClient NuGet package
Once installed, repeat the same steps with the System.Data.SqlClient NuGet package.
Adding Column, Node, and Edge classes
In order to read and write columns, nodes, and edges information, we will create three classes as follows:
Column.cs class
1 2 3 4 5 6 7 8 9 10 11 12 13 |
namespace SQLToNeo4j { public class Column { public string Name { get; set; } public string DataType { get; set; } public Column(string name, string type) { Name = name; DataType = type; } } } |
Node.cs Class
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
using System.Collections.Generic; namespace SQLToNeo4j { public class Node { public string ID { get; set; } public string Label { get; set; } public Dictionary<string,object> Properties { get; set; } public Node(string id, string label, Dictionary<string, object> properties = null) { ID = id; Label = label; if (properties != null) Properties = properties; else Properties = new Dictionary<string, object>(); } } } |
Edge.cs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
using System.Collections.Generic; namespace SQLToNeo4j { public class Edge { public string ID { get; set; } public string Label { get; set; } public Dictionary<string, object> Properties { get; set; } public string FromNode { get; set; } public string FromNodeID { get; set; } public string ToNode { get; set; } public string ToNodeID { get; set; } public Edge(string id, string label, string from, string fromID, string to, string toID, Dictionary<string, object> properties = null) { ID = id; Label = label; FromNode = from; FromNodeID = fromID; ToNodeID = toID; ToNode = to; if (properties != null) Properties = properties; else Properties = new Dictionary<string, object>(); } } } |
Reading data from SQL Server
To read data from the SQL Server database, I create a class called SQLReader.cs. The only parameter the user needs to pass in this class constructor is the connection string (SQL Server instance + database + authentication).
I will define two functions GetNodes() and GetEdges() to read nodes and edges from the SQL Server dabatase.
SQLReader.cs Class
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
using System; using System.Collections.Generic; using System.Linq; using System.Data; using System.Data.SqlClient; namespace SQLToNeo4j { public class SQLReader : IDisposable { public List<Node> Nodes { get; set; } public List<Edge> Edges { get; set; } public string ConnectionString { get; set; } public SQLReader(string connection) { ConnectionString = connection; } public void GetNodes() { Nodes = new List<Node>(); //initialize connection using (SqlConnection sqlcon = new SqlConnection(ConnectionString)) { sqlcon.Open(); //retrieving nodes tables using (SqlCommand sqlnodes = new SqlCommand("select name from sys.tables where is_node = 1", sqlcon)) { SqlDataReader tablesreader; tablesreader = sqlnodes.ExecuteReader(); List<string> tables = new List<string>(); //get nodes tables while (tablesreader.Read()) tables.Add(tablesreader[0].ToString()); tablesreader.Close(); foreach (string tablename in tables) { //Get columns with data types string cmdColumns = String.Format("select column_name,data_type from information_schema.columns columns where table_name = '{0}'" + "and exists(select 1 from information_schema.columns temp where temp.column_name like 'graph_id_%'" + "and temp.TABLE_SCHEMA = columns.table_schema and temp.TABLE_NAME = columns.TABLE_NAME)", tablename); List<Column> columns = new List<Column>(); using (SqlCommand sqlcmd = new SqlCommand(cmdColumns, sqlcon)) { SqlDataReader columnsreader = sqlcmd.ExecuteReader(); while (columnsreader.Read()) { columns.Add(new Column(columnsreader[0].ToString(), columnsreader[1].ToString())); } columnsreader.Close(); } string idcolumn = columns.Where(x => x.Name.StartsWith("$node_id")).FirstOrDefault().Name; string propColumns = string.Join(",", columns.Select(x => x.Name).Where(y => !y.StartsWith("$") && !y.StartsWith("graph_id_"))); string cmdNodes = "select JSON_VALUE([" + idcolumn + "],'$.id') as node_id " + (propColumns == "" ? "" : "," + propColumns); cmdNodes = cmdNodes + string.Format(" from {0}", tablename); //get nodes using (SqlCommand sqlcmd = new SqlCommand(cmdNodes, sqlcon)) { SqlDataReader nodesreader = sqlcmd.ExecuteReader(); //Get properties while (nodesreader.Read()) { Dictionary<string, object> properties = new Dictionary<string, object>(); foreach (Column col in columns.Where(x => !x.Name.StartsWith("$") && !x.Name.StartsWith("graph_id_"))) { properties.Add(col.Name, nodesreader[col.Name]); } properties.Add("node_id", nodesreader["node_id"].ToString()); Nodes.Add(new Node(nodesreader["node_id"].ToString(), tablename, properties)); } nodesreader.Close(); } } } } } public void GetEdges() { Edges = new List<Edge>(); //initialize connection using (SqlConnection sqlcon = new SqlConnection(ConnectionString)) { sqlcon.Open(); //retrieving nodes tables using (SqlCommand sqlnodes = new SqlCommand("select name from sys.tables where is_edge = 1", sqlcon)) { SqlDataReader tablesreader; tablesreader = sqlnodes.ExecuteReader(); List<string> tables = new List<string>(); //get edges tables while (tablesreader.Read()) tables.Add(tablesreader[0].ToString()); tablesreader.Close(); foreach (string tablename in tables) { //Get columns with data types string cmdColumns = String.Format("select column_name,data_type from information_schema.columns columns where table_name = '{0}'" + "and exists(select 1 from information_schema.columns temp where temp.column_name like 'graph_id_%'" + "and temp.TABLE_SCHEMA = columns.table_schema and temp.TABLE_NAME = columns.TABLE_NAME)", tablename); List<Column> columns = new List<Column>(); using (SqlCommand sqlcmd = new SqlCommand(cmdColumns, sqlcon)) { SqlDataReader columnsreader = sqlcmd.ExecuteReader(); while (columnsreader.Read()) { columns.Add(new Column(columnsreader[0].ToString(), columnsreader[1].ToString())); } columnsreader.Close(); } string idcolumn = columns.Where(x => x.Name.StartsWith("$edge_id")).FirstOrDefault().Name; string fromid = columns.Where(x => x.Name.StartsWith("$from_id")).FirstOrDefault().Name; string toid = columns.Where(x => x.Name.StartsWith("$to_id")).FirstOrDefault().Name; string propColumns = string.Join(",", columns.Select(x => x.Name).Where(y => !y.StartsWith("$") && !y.StartsWith("graph_id_") && !y.StartsWith("from_") && !y.StartsWith("to_"))); string cmdNodes = "select JSON_VALUE([" + idcolumn + "],'$.id') as edge_id " + ",JSON_VALUE([" + fromid + "],'$.id') as from_id " + ",JSON_VALUE([" + fromid + "],'$.table') as from_table " + ",JSON_VALUE([" + toid + "],'$.id') as to_id " + ",JSON_VALUE([" + toid + "],'$.table') as to_table" + (propColumns == "" ? "" : "," + propColumns); cmdNodes = cmdNodes + string.Format(" from {0}", tablename); using (SqlCommand sqlcmd = new SqlCommand(cmdNodes, sqlcon)) { SqlDataReader edgesreader = sqlcmd.ExecuteReader(); //Get properties while (edgesreader.Read()) { Dictionary<string, object> properties = new Dictionary<string, object>(); foreach (Column col in columns.Where(y => !y.Name.StartsWith("$") && !y.Name.StartsWith("graph_id_") && !y.Name.StartsWith("from_") && !y.Name.StartsWith("to_"))) { properties.Add(col.Name, edgesreader[col.Name]); } Edges.Add(new Edge(edgesreader["edge_id"].ToString(), tablename, edgesreader["from_table"].ToString(), edgesreader["from_id"].ToString(), edgesreader["to_table"].ToString(), edgesreader["to_id"].ToString(), properties)); } edgesreader.Close(); } } } } } public void Dispose() { Nodes = null; Edges = null; ConnectionString = null; } } } |
Writing nodes and edges to Neo4j
To write exported nodes and edges from SQL Server into the Neo4j database, I am going to create Neo4jWriter.cs class. This class takes the Neo4j database URI as a parameter. I am also going to define two functions ImportNodes() and ImportEdges() to import nodes and edges into the Neo4j database.
Neo4jWriter.cs Class
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
using System; using System.Collections.Generic; using Neo4jClient; namespace SQLToNeo4j { public class Neo4jWriter : IDisposable { private GraphClient client; public Neo4jWriter(Uri uri) { client = new GraphClient(uri); client.ConnectAsync().Wait(); } public void ImportNodes(List<Node> nodes) { foreach(Node nd in nodes) { string cypher = "(" + nd.Label + "_" + nd.ID.ToString() + ":" + nd.Label + " $prop)"; client.Cypher.Create(cypher).WithParam("prop", nd.Properties).ExecuteWithoutResultsAsync().Wait(); ; } } public void ImportEdges(List<Edge> edges) { foreach (Edge edg in edges) { Dictionary<string, object> param = new Dictionary<string, object>(); param.Add("prop", edg.Properties); param.Add("id1", edg.FromNodeID); param.Add("id2", edg.ToNodeID); string cypher = "(" + edg.FromNode.ToLower() + "1" + ") -[:" + edg.Label + " $prop]->(" + edg.ToNode.ToLower() + "2" + ")"; client.Cypher .Match("(" + edg.FromNode.ToLower() + "1:" + edg.FromNode + ")", "(" + edg.ToNode.ToLower() + "2:" + edg.ToNode + ")") .Where(edg.FromNode.ToLower() + "1.node_id = $id1") .AndWhere(edg.ToNode.ToLower() + "2.node_id = $id2") .Create(cypher) .WithParams(param).ExecuteWithoutResultsAsync().Wait(); ; } } public void Dispose() { client.Dispose(); } } } |
Importing data into Neo4j
After creating the classes we mentioned above, we will use the following code in the Program.cs class to import data from an SQL Server graph database into Neo4j database.
Note that you should first start the Neo4j database, as explained in this series’s previously published article.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
using System; namespace SQLToNeo4j { class Program { static void Main(string[] args) { using (SQLReader reader = new SQLReader("Server=<machine name>\\<instance name>;Database=GraphPL;Trusted_Connection=yes;")) { reader.GetNodes(); reader.GetEdges(); using (Neo4jWriter importer = new Neo4jWriter(new Uri("http://<user>:<password>@localhost:7474"))) { importer.ImportNodes(reader.Nodes); importer.ImportEdges(reader.Edges); } } } } } |
To ensure that data is imported successfully, you can open the Neo4j desktop (or from a web browser) and run the MATCH(n) RETURN n Cypher command from the database console to retrieve all data in the database.
As shown in the image below, the data is migrated successfully.
Figure – Visualizing migrated data
Conclusion
In this article, we first explained some reasons to migrate graph databases from SQL Server into Neo4j. Then we provided a step-by-step guide to migrate Nodes and edges using C#. The code provided may or may not be optimal. I published it on GitHub, so feel free to use it or improve it.
Table of contents
- An overview of SQL Server monitoring tools - December 12, 2023
- Different methods for monitoring MongoDB databases - June 14, 2023
- Learn SQL: Insert multiple rows commands - March 6, 2023