In the previous article we’ve practiced SQL, and today, we’ll continue with a few more SQL examples. The goal of this article is to start with a fairly simple query and move towards more complex queries. We’ll examine queries you could need at the job interview, but also some you would need in real-life situations. So, buckle up, we’re taking off!
Data Model
As always, let’s first take a quick look at the data model we’ll use. This is the same model we’re using in this series, so you should be familiar by now. In case, you’re not, just take a quick look at the tables, and how are they related.
We’ll analyze 6 SQL examples, starting from a pretty simple one. Each example will add something new, and we’ll discuss the learning goal behind each query. I’ll use the same approach covered in the article Learn SQL: How to Write a Complex SELECT Query? Let’s start.
#1 SQL Example – SELECT
We want to examine what is in the call table in our model. Therefore, we need to select all attributes, and we’ll sort them first by employee_id and then by start_time.
1 2 3 4 5 6 |
-- A list of all calls (sorted by employee and start time) SELECT * FROM call ORDER BY call.employee_id ASC, call.start_time ASC; |
This is a pretty simple query and you should understand it without any problem. The only thing I would like to point here is that we’ve ordered our result first by the id of the employee (call.employee_id ASC) and then by the call start time (call.start_time). In real-life situations, this is something you would do if you want to perform analytics during the time on the given criteria (all data for the same employee are ordered one after another).
#2 SQL Example – DATEDIFF Function
We need a query that shall return all call data, but also the duration of each call, in seconds. We’ll use the previous query as the starting point.
1 2 3 4 5 6 7 8 |
-- A list of all calls together with the call duration SELECT call.*, DATEDIFF("SECOND", call.start_time, call.end_time) AS call_duration FROM call ORDER BY call.employee_id ASC, call.start_time ASC; |
The result returned is almost the same as in the previous query (same columns & order) except for one column added. We’ve named this column call_duration. To get the call duration, we’ve used the SQL Server DATEDIFF function. It takes 3 arguments, the unit for the difference (we need seconds), first date-time value (start time, lower value), second date-time value (end time, higher value). The function returns the time difference in the given unit.
- Note: SQL Server has a number of (date & time) functions and we’ll cover the most important ones in upcoming articles.
#3 SQL Example – DATEDIFF + Aggregate Function
Now we want to return the total duration of all calls for each employee. So, we want to have 1 row for each employee and the sum of the duration of all calls he ever made. We’ll continue from where we stopped with the previous query.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
-- SUM of call duration per each employee SELECT employee.id, employee.first_name, employee.last_name, SUM(DATEDIFF("SECOND", call.start_time, call.end_time)) AS call_duration_sum FROM call INNER JOIN employee ON call.employee_id = employee.id GROUP BY employee.id, employee.first_name, employee.last_name ORDER BY employee.id ASC; |
There is nothing special to add regarding the result – we got exactly what we wanted. But let’s comment on how we achieved that. Few things I would like to emphasize here are:
- We’ve joined tables call and employee because we need data from both tables (employee details and call duration)
- We’ve used the aggregate function SUM(…) around the previously calculated call duration for each employee
- Since we’ve grouped everything on the employee level, we have exactly 1 row per employee
- Note: There are no special rules when you combine the result returned by any function and aggregate function. In our case, you can use combine the SUM function with DATEDIFF without any problem.
#4 SQL Example – Calculating Ratio
For each employee, we need to return all his calls with their duration. We also want to know the percentage of time an employee spent on this call, compared to the total call time of all his calls.
- Hint: We need to combine value calculated for one row with the aggregated value. To do that, we’ll use a subquery to calculate that aggregated value and then join into the related row.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
-- % of call duration per each employee compared to the duration of all his calls SELECT employee.id, employee.first_name, employee.last_name, call.start_time, call.end_time, DATEDIFF("SECOND", call.start_time, call.end_time) AS call_duration, duration_sum.call_duration_sum, CAST( CAST(DATEDIFF("SECOND", call.start_time, call.end_time) AS DECIMAL(7,2)) / CAST(duration_sum.call_duration_sum AS DECIMAL(7,2)) AS DECIMAL(4,4)) AS call_percentage FROM call INNER JOIN employee ON call.employee_id = employee.id INNER JOIN ( SELECT employee.id, SUM(DATEDIFF("SECOND", call.start_time, call.end_time)) AS call_duration_sum FROM call INNER JOIN employee ON call.employee_id = employee.id GROUP BY employee.id ) AS duration_sum ON employee.id = duration_sum.id ORDER BY employee.id ASC, call.start_time ASC; |
You can notice that we’ve achieved in combining row values with aggregated value. This is very useful because you could put such calculations inside the SQL query and avoid additional work later. This query contains a few more interesting concepts that should be mentioned:
- The most important is that we’ve placed the entire query returning the aggregated value in the subquery (the part starting from the 2nd INNER JOIN (INNER JOIN () and ending with ) AS duration_sum ON employee.id = duration_sum.id. Between these brackets, we’ve placed the slightly modified query from part #2 SQL Example – DATEDIFF Function. This subquery returns the id of each employee and the SUM of all his calls durations. Just think of it as a single table with these two values
- We’ve joined the “table” from the previous bullet to tables call and employee because we need values from these two tables
- We’ve already analyzed the DATEDIFF(…) function used to calculate the duration of a single call in part #2 SQL Example – DATEDIFF Function
- This part CAST( CAST(DATEDIFF(“SECOND”, call.start_time, call.end_time) AS DECIMAL(7,2)) / CAST(duration_sum.call_duration_sum AS DECIMAL(7,2)) AS DECIMAL(4,4)) AS call_percentage is pretty important. First we’ve casted both dividend (CAST(DATEDIFF(“SECOND”, call.start_time, call.end_time) AS DECIMAL(7,2))) and divisor (CAST(duration_sum.call_duration_sum AS DECIMAL(7,2)) as decimal numbers. While they are whole numbers, the expected result is a decimal number, and we have to “tell” that to SQL Server. In case, we haven’t CAST-ed them, SQL Server would perform division of whole numbers. We’ve also cast the result as a decimal number. This wasn’t needed because we’ve previously defined that when casting dividend and divisor, but I wanted to format the result to have 4 numeric values, and all 4 of them will be decimal places (this is a percentage in decimal format)
From this example, we should remember that we can use subqueries to return additional values we need. Returning the aggregated value using a subquery and combining that value with the original row is one good example where we could do exactly that.
#5 SQL Example – Average (AVG)
We need two queries. First shall return the average call duration per employee, while the second shall return average call duration for all calls.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
-- average call duration per employee SELECT employee.id, employee.first_name, employee.last_name, AVG(DATEDIFF("SECOND", call.start_time, call.end_time)) AS call_duration_avg FROM call INNER JOIN employee ON call.employee_id = employee.id GROUP BY employee.id, employee.first_name, employee.last_name ORDER BY employee.id ASC; -- average call duration - all calls SELECT AVG(DATEDIFF("SECOND", call.start_time, call.end_time)) AS call_duration_avg FROM call; |
There is no need to explain this in more detail. Calculating the average call duration per employee is the same as calculating the SUM of call durations per employee (#3 SQL Example – DATEDIFF + Aggregate Function). We’ve just replaced the aggregate function SUM with AVG.
The second query returns the AVG call duration of all calls. Notice that we haven’t used GROUP BY. We simply don’t need it, because all rows go into this group. This is one of the cases when aggregate function could be used without the GROUP BY clause.
#6 SQL Example – Compare AVG Values
We need to calculate the difference between the average call duration for each employee and the average call duration for all calls.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
-- the difference between AVG call duration per employee and AVG call duration SELECT single_employee.id, single_employee.first_name, single_employee.last_name, single_employee.call_duration_avg, single_employee.call_duration_avg - avg_all.call_duration_avg AS avg_difference FROM ( SELECT 1 AS join_id, employee.id, employee.first_name, employee.last_name, AVG(DATEDIFF("SECOND", call.start_time, call.end_time)) AS call_duration_avg FROM call INNER JOIN employee ON call.employee_id = employee.id GROUP BY employee.id, employee.first_name, employee.last_name ) single_employee INNER JOIN ( SELECT 1 AS join_id, AVG(DATEDIFF("SECOND", call.start_time, call.end_time)) AS call_duration_avg FROM call ) avg_all ON avg_all.join_id = single_employee.join_id; |
This query is really complex, so lets’ comment on the result first. We have exactly 1 row per employee with an average call duration per employee, and the difference between this average and average duration of all calls.
So, what we did to achieve this. Let’s mention the most important parts of this query:
- We’ve again used a subquery to return the aggregated value – average duration of all calls
- Besides that, we’ve added this – 1 AS join_id. It serves the purpose to join these two queries using the id. We’ll “generate” the same value in the main subquery too
- The “main” subquery returns data grouped on the employee level. Once more we’ve “generated” artificial key, we’ll use to join these two subqueries – 1 AS join_id
- We’ve joined subqueries using the artificial key (join_id) and calculated the difference between average values
Conclusion
I hope you’ve learned a lot in today’s article. The main thing I would like you to remember after this one is that you can perform many statistical computations directly in SQL, and then use the web form or Excel to present results using shiny tables and graphs. We’ll continue practicing in the next article, so stay tuned.
Table of contents
- Learn SQL: How to prevent SQL Injection attacks - May 17, 2021
- Learn SQL: Dynamic SQL - March 3, 2021
- Learn SQL: SQL Injection - November 2, 2020