As data warehouse developers, we often have to extract data from a myriad of source systems. Thus, whilst some source systems readily integrate with our ETL tools there are instances whereby we need to install additional drivers and software addons in order to successfully connect and extract data from other source systems. Microsoft SharePoint Online is one such source system that I recently had to extract data from and its connectors are by default not part of the standard SQL Server Integration Services (SSIS) package template. As SSIS developers we often don’t have solid background on environments such as SharePoint, thus figuring out which version of SharePoint Software Development Kit to install in order to enable successful connection from SSIS can sometimes be a frustrating experience. In this article, I try to alleviate some of that frustration by sharing some of my recent experiences relating to getting data out of a SharePoint list using SSIS.
Part 1: SharePoint List as a Data Source
We begin by looking at sample SharePoint environment that will be used as a data source and identifying a list of URL addresses that we will need in our SSIS package. Figure 1 gives a preview of the data contained within our sample ApexSQLFree SharePoint list – basically a list containing SQL addons produced by ApexSQL and licensed to the public for free.
The first URL address to take note of, is the link to your SharePoint list, in my case that will be the web address to the ApexSQLFree SharePoint list, which is as follows:
https://mydomain.sharepoint.com/sites/SPSDemo/Lists/ApexSQLFree/sample.aspx
Obviously, assuming that you have a basic understanding of SharePoint, by looking at the URL address itself you would have noted the following:
- mydomain is a placeholder for an actual domain;
- SPSDemo is the name of the SharePoint site that this list is stored; and
- ApexSQLFree is the name of the list.
We can use the SharePoint URL to derive a REST API web address that will be suffixed with listdata.svc. It is important that we get a *listdata.svc URL as we will need it for establishing connection to the SharePoint list using the REST API in an SSIS package. What worked for me in terms of deriving the *listdata.svc URL was appending /_vti_bin/listdata.svc just after the site name in the above SharePoint list URL such that it becomes the following:
https://mydomain.sharepoint.com/sites/SPSDemo/_vti_bin/listdata.svc/
Obviously, you can always test the generated *listdata.svc URL by running it into a browser as shown in Figure 2.
Part 2: SSIS Development – Configuring OData Source for SharePoint Online
Having identified the SharePoint list from which data will be sourced, we switch to SSIS and configure the necessary components for SharePoint data extraction.
-
Configuring the OData Source Connection Manager
The first SSIS component that ought to be configured is the OData Connection Manager. This connection type is available under New Connection… option in SSIS’s Connection Managers tab as shown in Figure 3.
Figure 3 By clicking on the New Connection… option, the OData Connection Manager Editor will popup.
Fill in the Service document location box with your *listdata.svc URL and on the Authentication Type dropdown, choose “Microsoft Online Services”. The Microsoft Online Services will further require that you specify a username and password. Finally, if everything has been captured correctly, clicking the Test Connection button at the bottom left of the editor should return a “Test connection succeeded” message as shown in Figure 4.
Figure 4 -
Configuring the Data Flow Task
Having successfully created and tested our OData connection, we are now ready to begin data extraction and SSIS facilitates such an exercise through its Data Flow Task component. Within the Data Flow Task, we start off by configuring our data source component – which in this case will be OData Source as indicated in Figure 5.
Figure 5 Once the OData Source has been added into the Data Flow Task pane and its editor opened, under the OData connection manager label we select the OData connection we configured above. As indicated in Figure 6, such a selection will result in the rest of the boxes being auto-populated (except for Query options).
Figure 6 You can use several query options in your OData source connection as outlined here. For instance, if you want to return a subset of data from your SharePoint list, you can use the $top query option as indicated in Figure 7.
Figure 7 For the purposes of showing you how the filtering works, I duplicated the components in the data flow tasks such that one section applies the $top query option and the other doesn’t. As indicated in Figure 8, the first flow retrieves only 2 rows compared to the other section that returns all (7) rows.
Figure 8
Part 3: Some Issues to Look Out For
In the preceding section of this article, I demonstrated an end-to-end seamless configuration and extraction of SharePoint data using the OData source in SSIS without any errors. However, in real world it this hardly happens – you are likely to run into an error message or two. In this section, I have put down some of the errors that you should look out for when working with SharePoint Online and how you can go about fixing such errors.
-
Issue #1: Failed to Load SharePoint Assemblies
When configuring an OData connection, choosing the Microsoft Online Services authentication type can return the error message shown in Figure 9. To get around such an error, you will need to download and install SharePoint Online Client Components SDK – which is freely available from the Microsoft Download Center page.
Figure 9 -
Issue #2: Incompatible SharePoint Assembly Version
Despite having installed the SharePoint Online Client Components SDK, you may still run into another error message when attempting to test you newly configured OData connection. This usually occurs when you have installed SDKs for SharePoint Online instead of SharePoint on-prem, thus make sure that when you google/bing the SharePoint SDKs client components, that you install the correct ones.
Figure 10 Table 1 gives a breakdown of common on-prem SharePoint servers and their respective version numbers.
SharePoint Version Version Number SharePoint Server 2010 14.0.XXXX.XXXX SharePoint Server 2013 15.0.XXXX.XXXX SharePoint Server 2016 16.0.XXXX.XXXX Table 1 Furthermore, you can always verify the version installed by looking at the path: C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\%version number%\ISAP where %version number% is a parameter for version of SharePoint, thus for SharePoint Server 2016 you should have a path as follows: C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\16\ISAP.
-
Issue #3: Data Conversion Error
Finally, one of the frustrating error message you may encounter when extracting using SSIS to extract data out of a SharePoint list is a series of data conversion errors. This is because all string related fields from the OData Source component will be defaulted Unicode text stream data types (i.e. DT_NTEXT) as shown in Figure 11.
Figure 11 There are several ways to get around this error; you can add a data conversion transformation component and convert data as suggested in the error message. Alternatively, you can simply convert the destination output to a Unicode string by ticking the Unicode box as indicated in Figure 12.
Figure 12
Summary
SharePoint list is one of many data sources that as data warehouse developers can be required to extract data from. Unlike your typical data sources such as SQL Server or Excel, required drivers are often not readily installed in SSIS package template thus requiring a bit of work from your side in terms of identifying, downloading and ultimately installing the correct software addons. In this article we have demonstrated how data can be sourced from a SharePoint list and how you can go about dealing with several common issues.
References
- OData Connection Manager
- SharePoint Server build numbers
- Using the SharePoint REST Interface
- OData system query options using the OData endpoint