Introduction
There are plenty of scripts to compute the date of various holidays given the year. Let’s look at the problem in reverse: Given a date, determine if it is a given holiday or not.
Date Tables
Many dimensional models make use of date tables — often called DimDate. They’re pretty handy! A well-designed date table saves you from coding up things like computing fiscal vs calendar intervals, converting between formats or selecting parts of a date. When building a date table, we will likely need to flag some days as holidays.
For this article we’ll work with a simple date table. It looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
CREATE TABLE [dbo].[DimDate]( [DimDateID] [int] NOT NULL PRIMARY KEY, [DateValue] [date] NOT NULL UNIQUE, [Day] AS DAY(DateValue), [Week] AS DATEPART(WEEK, DateValue), [Month] AS MONTH(DateValue), [Quarter] AS DATEPART(QUARTER, DateValue), [Year] AS YEAR(DateValue), [DayOfWeek] AS DATEPART(WEEKDAY, DateValue), [IsCanadianHoliday] [bit] NOT NULL DEFAULT ((0)), [IsUSHoliday] [bit] NOT NULL DEFAULT ((0)), ) |
Notice that I use several computed columns. This makes my life easier (at least for this article!). If you prefer, you can compute and persist the values when you create the table, of course.
To populate this table, I’m using an in-line tally table approach:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
DECLARE @StartDate date = '20000101', @EndDate date = '20501231'; WITH N10(n) AS (SELECT 1 FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) v(n)) , N100(n) AS (SELECT 1 FROM N10, N10 n) , N10000(n) AS (SELECT 1 FROM N100, N100 n) , N100000(n) AS (SELECT 1 FROM N10, N10000 n) , N AS (SELECT TOP (DATEDIFF(DAY, @startdate, @enddate) + 1) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1 FROM N100000) INSERT INTO [dbo].[DimDate](DimDateID, DateValue) SELECT CAST(CONVERT(CHAR(8),InsertDate, 112) AS INT) , InsertDate FROM N CROSS APPLY (SELECT DATEADD(DAY, n, @Startdate)) d(InsertDate); |
If you haven’t used tally tables like this before, they are well-worth learning. The basic idea is to start with some set (I chose the integers from 0 to 9), then take the Cartesian product of that set and then do it again and again until you get at least as many items as you need. Note that to do that, I’m using the old-style join syntax rather than explicitly writing CROSS JOIN. That keeps each CTE on a single line, which I think improves readability. The final CTE, which I just call “N” (named after the double-struck symbol which is often used to denote the set of natural numbers), produces just the right number of integers required, starting at 0. In other implementations, you might see code where people have used WHILE loops or even (shudder!) cursors to do this sort of thing. Using a tally table is easy and set-based and rocket fast.
You might be curious what kind of execution plan the database engine generates for this. It’s a little long, so I’ve split it up to make it easier to see:
Are you worried about all those red X’s? You needn’t be. You get those whenever the query includes a join without a predicate. Hovering over one of them, I see the message:
In this case, though, we actually want a cross join operation, so we can safely ignore those warnings.
Notice also that I use CROSS APPLY as an expression evaluator. This keeps the code a little DRY-er. In OOP-speak, I’ve encapsulated what varies.
Now that I’ve got a basic date table, let’s see about updating those holiday columns. I’ll do it with an UPDATE command here, though it is possible to make them computed also — it just gets a little messy since there are so many different holidays with different calculations. Let’s start with Thanksgiving. In Canada, Thanksgiving Day is the second Monday in October. In the US, it’s the fourth Thursday in November. I’ve seen some interesting approaches to solving the problem. One (for US Thanksgiving) looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
UPDATE [dbo].[DimDate] SET IsUSHoliday = 1 FROM [dbo].[DimDate] AS c1 WHERE [Month] = 11 AND [DayOfWeek] = 5 AND ( SELECT count(0) FROM [dbo].[DimDate] AS c2 WHERE [Month] = 11 AND [DayOfWeek] = 5 AND c2.[Year] = c1.[Year] AND c2.[DateValue] < c1.[DateValue] ) = 3; |
Does it work? Well, yes, it does. It counts the number of Thursdays before the date being updated and ensures that there are three. Thinking about it another way though, the earliest Thursday in any month would be the 1st. The 3rd Thursday after that, the 22nd, would then be the earliest possible date for US Thanksgiving. The latest date would of course be 6 days after that (7 days after that is too much (why?)) which is the 28th. That means I can simplify this update to:
1 2 3 4 5 6 7 8 9 |
UPDATE [dbo].[DimDate] SET IsUSHoliday = 1 FROM [dbo].[DimDate] AS c1 WHERE c1.[Month] = 11 AND c1.[DayOfWeek] = 5 AND c1.[Day] between 22 and 28 |
Of course that’s just one pass through the table. The previous example may require two logical passes (although the optimizer may do it better than that), because of the sub query. I use the same technique for Canadian Thanksgiving:
1 2 3 4 5 6 7 8 9 |
UPDATE [dbo].[DimDate] SET IsCanadianHoliday = 1 FROM [dbo].[DimDate] AS c1 WHERE c1.[Month] = 10 AND c1.[DayOfWeek] = 2 AND c1.[Day] between 8 and 14 |
Another approach
If you’re not crazy about (or if you go crazy computing) the first and last possible days for a holiday, here’s another easy way, at least for holidays that are defined using some week number in the month. The way these work (Thanksgiving is an excellent example) is that they are relative to a specific week number. For Canadian and U.S. Thanksgivings, that would be week numbers 2 and 4, respectively. Now, given a day number in any month, the week number is simply:
Where “d” is the day number in the month. (We’re using integer division here). We have to subtract 1 from the day number before dividing by 7 since days are numbered from 1, not 0. Then, we have to add 1 at the end to get the week number for the same basic reason: weeks are numbered from 1. To see that this works, take the minimum and maximum days for Canadian Thanksgiving that we figured out in the last section:
Again, this is integer division here. It is easy to see that day numbers less than 8 return a week number of 1 and days greater than 14 yields a week number of 3 or more. Now, we could simply rework the expression in the WHERE clause like this:
1 2 3 4 5 |
WHERE c1.[Month] = 10 AND c1.[DayOfWeek] = 2 AND (c1.[Day] – 1) / 7 + 1 = 2 |
But why not be smarter, since we’ll likely need this for other holidays? Let’s add a new computed column instead:
1 2 3 |
[MonthWeek] AS (DAY(DateValue) – 1) / 7 + 1 |
Then, the WHERE clause reduces to:
1 2 3 4 5 |
WHERE c1.[Month] = 10 AND c1.[DayOfWeek] = 2 AND c1.MonthWeek = 2 |
Easy and an explicit implementation of the holiday’s specification.
Easter Sunday?
What about Easter Sunday? That is one wild formula! However, I’ll use CROSS APPLY once again, to avoid repeating myself. This algorithm uses the one from the US Navy.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
UPDATE dimdate SET IsCanadianHoliday = 1 , IsUSHoliday = 1 FROM [dbo].[DimDate] AS dimdate CROSS APPLY (SELECT dimdate.year AS y) _y CROSS APPLY (SELECT y / 100 as c, y - 19 * (y / 19) AS n) _nc CROSS APPLY (SELECT (c - 17) / 25 AS k) _k CROSS APPLY (SELECT c - c/4 - (c - k)/3 + 19 *n + 15 AS i1) _i1 CROSS APPLY (SELECT i1 - 30 * (i1 / 30) AS i2) _i2 CROSS APPLY (SELECT i2 - (i2 / 28) * (1 - (i2 / 28) * (29 / (i2 + 1)) * ((21 - n) / 11)) AS i) _i CROSS APPLY (SELECT y + y / 4 + i + 2 - c + c / 4 AS j1) _j1 CROSS APPLY (SELECT j1 - 7*(j1 / 7) AS j) _j CROSS APPLY (SELECT i - j AS el) _el CROSS APPLY (SELECT 3 + (el + 40) / 44 AS m) _m CROSS APPLY (SELECT el + 28 - 31*(m / 4) AS d) _d CROSS APPLY (SELECT DATEFROMPARTS(y, m, d) AS EasterSunday) _Easter WHERE dimdate.DateValue = EasterSunday |
Maybe you’re wondering how this performs. Take a look at the execution plan:
The SQL compiler has collapsed all those CROSS APPLYs into two compute scalar operations. The plan is a straight line. You can’t get much better than that!
A minor point here is that the subquery aliases in the CROSS APPLY clauses are not used in this example. (Of course, you could use them, but that would reduce the readability, in the writer’s opinion.) When I’m not going to use an alias, I usually prefix it with an underscore. It’s a habit I picked up from years of programming in Python. In fact, if there is just one CROSS APPLY, a single underscore is all you need, if you’re not going to use the alias.
Summary
Computing the dates of holidays is never my favorite thing to do. That’s why I like to push the work into a date table that’s built once and referenced by all. Also, remember the simple methods used for computing holidays that fall on certain weekdays. Just figure out the earliest and latest days of the month these can be and build that into your script. Alternatively, for holidays dependent on week numbers, use the simple method to compute the week number.
We’ve also shown how to use an inline tally table. These are very handy. Some DBAs like to put this into a view or function or even persist it to the database. However, you decide to use then, use them! You might even like to search your code for WHILE loops or CURSORs and see if you can eliminate them with tally tables.
Finally, we showed how you can use CROSS APPLY to encapsulate expressions. This is a fantastic and somewhat underutilized technique for factoring code and making it easier to read and maintain. Always keep in mind that your code will be read – and probably altered – more times than it is written.
- Snapshot Isolation in SQL Server - August 5, 2019
- Shrinking your database using DBCC SHRINKFILE - August 16, 2018
- Partial stored procedures in SQL Server - June 8, 2018