SQL has many cool features and aggregate functions are definitely one of these features, actually functions. While they are not specific to SQL, they are used often. They are part of the SELECT statement, and this allows us to have all benefits of SELECT (joining tables, filtering only rows and columns we need), combined with the power of these functions.
The Model
Before we start talking about aggregate functions, we’ll shortly comment on the data model we’ll be using.
This is the same model we’ve been using in a few past articles. I won’t go into details, but rather mention that all 6 tables in the model contain data. Some of the records in tables are referenced in others, while some are not. E.g. we have countries without any related city, and we have cities without any related customers. We’ll comment on this in the article where it will be important.
The Simplest Aggregate Function
We’ll, of course, start with the simplest possible aggregate function. But, before we do it, let’s check the contents of the two tables we’ll use throughout this article. There are tables country and city. We’ll use the following statements:
1 2 3 4 5 |
SELECT * FROM country; SELECT * FROM city; |
You can see the result in the picture below:
This is nothing new and unexpected. We’ve just listed everything that is in our tables ( “*” in the query will result in returning all columns/attributes, while the lack of any condition/WHERE part of the query will result in returning all rows).
The only thing I would like to point out is that the country table has 7 rows and that the city table has 6 rows. Now, let’s examine the following queries and their result:
We can notice that for each query we got one row as a result, and the number returned represents the number of rows in each of these two tables. That’s what aggregate function COUNT does. It takes what the query without COUNT would return, and then returns the number of rows in that result. One more important thing you should be aware of is that only COUNT can be used with “*”. All other functions shall require an attribute (or formula) between brackets. We’ll see that later.
Aggregate Functions & JOINs
Now let’s try two more things. First, we’ll test how COUNT works when we’re joining tables. To do that, we’ll use the following queries:
1 2 3 4 5 6 7 |
SELECT * FROM country INNER JOIN city ON city.country_id = country.id; SELECT COUNT(*) AS number_of_rows FROM country INNER JOIN city ON city.country_id = country.id; |
While the first query is not needed, I’ve used it to show what it will return. I did that because this is what the second query counts. When two tables are joined, you can think of that result as of some intermediate table that can be used as any other tables (e.g. for calculations using aggregate functions, in subqueries).
- Tip: Whenever you’re writing a complex query, you can check what would parts return and that way you’ll be sure your query is working and will be working, as expected.
Also, we should notice, one more thing. We’ve used INNER JOIN while joining tables country and city. This will eliminate countries without any cities from the result (you can check why here). Now we’ll run 3 more queries where tables are joined using LEFT JOIN:
1 2 3 4 5 6 7 8 9 10 11 |
SELECT * FROM country LEFT JOIN city ON city.country_id = country.id; SELECT COUNT(*) AS number_of_rows FROM country LEFT JOIN city ON city.country_id = country.id; SELECT COUNT(country.country_name) AS countries, COUNT(city.city_name) AS cities FROM country LEFT JOIN city ON city.country_id = country.id; |
We can notice a few things:
- 1st query returned 8 rows. These are the same 6 rows as in a query using INNER JOIN and 2 more rows for countries that don’t have any related city (Russia & Spain)
- 2nd query counts the number of rows 1st query returns, so this number is 8
- 3rd query has two important things to comment on. The first one is that we’ve used aggregate function (COUNT), twice in the SELECT part of the query. This will usually be the case because you’re interested in more details about the group you want to analyze (number of records, average values, etc.). The second important thing is that these 2 counts used column names instead of “*” and they returned different values. That happens because COUNT was created that way. If you put column names between brackets COUNT will count how many values are there (not including NULL values). All our records had value for country_name, so the 1st COUNT returned 8. On the other hand, city_name wasn’t defined 2 times (=NULL), so the 2nd COUNT returned 6 (8-2=6)
- Note: This stands for other aggregate functions as well. If they run into NULL values, they will simply ignore them and calculate as they don’t exist.
SQL Aggregate Functions
Now it’s time that we mention all T-SQL aggregate functions. The most commonly used are:
- COUNT – counts the number of elements in the group defined
- SUM – calculates the sum of the given attribute/expression in the group defined
- AVG – calculates the average value of the given attribute/expression in the group defined
- MIN – finds the minimum in the group defined
- MAX – finds the maximum in the group defined
These 5 are most commonly used and they are standardized so you’ll need them not only in SQL Server but also in other DBMSs. The remaining aggregate functions are:
- APPROX_COUNT_DISTINCT
- CHECKSUM_AGG
- COUNT_BIG
- GROUPING
- GROUPING_ID
- STDEV
- STDEVP
- STRING_AGG
- VAR
- VARPB
While all aggregate functions could be used without the GROUP BY clause, the whole point is to use the GROUP BY clause. That clause serves as the place where you’ll define the condition on how to create a group. When the group is created, you’ll calculate aggregated values.
- Example: Imagine that you have a list of professional athletes and you know which sport each one of them plays. You could ask yourself something like – From my list, return the minimal, maximal and average height of players, grouped by the sport they play. The result would be, of course, MIN, MAX, and AVG height for groups – “football players”, “basketball players”, etc.
Aggregate Functions – Examples
Now, let’s take a look at how these functions work on a single table. They are rarely used this way, but it’s good to see it, at least for educational purposes:
The query returned aggregated value for all cities. While these values don’t have any practical use, this shows the power of aggregate functions.
Now we’ll do something smarter. We’ll use these functions in a way much closer than what you could expect in real-life situations:
This is a much “smarter” query than the previous one. It returned the list of all countries, with a number of cities in them, as well as SUM, AVG, MIN, and MAX of their lat values.
Please notice that we’ve used the GROUP BY clause. By placing country.id and country. country_name, we’ve defined a group. All cities belonging to the same country will be in the same group. After the group is created, aggregated values are calculated.
- Note: The GROUP BY clause must contain all attributes that are outside aggregate functions (in our case that was country.country_name). You could also include other attributes. We’ve included country.id because we’re sure it uniquely defines each country.
Conclusion
Aggregate functions are a very powerful tool in databases. They serve the same purpose as their equivalents in MS Excel, but the magic is that you can query data and apply functions in the same statement. Today, we’ve seen basic examples. Later in this series, we’ll use them to solve more complicated problems (with more complicated queries), so stay tuned.
Table of contents
- Learn SQL: How to prevent SQL Injection attacks - May 17, 2021
- Learn SQL: Dynamic SQL - March 3, 2021
- Learn SQL: SQL Injection - November 2, 2020