This article will provide an understanding of how to effectively use group concatenation in MySQL.
Introduction
MySQL is an open-source relational database management system. It is a widely used relational database management system in the entire world. In recent years, it has gained more support as it is supported by multiple clouds including Azure and AWS. MySQL can store a large amount of data in the database. And as one of the key common ask of all data operations, MySQL also provides the capability to perform aggregation. There are many different ways to do aggregation. Majorly in a database or more specifically in a table, aggregation can occur at two levels:
- Column level
- Row Level
In this article, we are going to discuss how to aggregate the article at a row level. MySQL provides a function called “group_concat” to perform row-level concatenation. Before we go ahead and learn how to do so, let us first understand:
- Why do we need it?
- Where do we need it?
- When to use it?
Why do we need it?
Data aggregation is a way of combining huge data into a single summary. Aggregation enables users to examine a large number of datasets and examine the values in a short amount of time. Imagine a scenario where there are data scientists who want to examine inputs recorded every second with millions of inputs every single day. Aggregation can help to summarize the data and feed it as a dataset to the visual consumption layer like reporting and others.
Where do we need it?
Data aggregation is mostly required in database operations with a large number of inputs to be summarized. Some of the common examples are live stream data, financial organizations, highly accessible applications or websites, and more
When to use it?
Row-level aggregation should only be used when the aggregation needs to be summarized. But if the summarized information is too extensive, then it needs to be bifurcated further. For example, if you want to get all the departments of an employee, he is part of, it is a good example to aggregate the information. However, if you are to aggregate all the employees of a particular department in a single row, it won’t make much help visually. It would be better to bifurcate the employees of a particular by their joining year or experience or state or any other group characteristics.
In MySQL, row-level aggregation can be achieved using “Group_Concat()”. Following is the group_concat() syntax,
1 2 3 4 5 6 |
GROUP_CONCAT([DISTINCT] expr [,expr ...] [ORDER BY {unsigned_integer | col_name | expr} [ASC | DESC] [,col_name ...]] [SEPARATOR str_val]) |
Where,
The DISTINCT clause allows you to get unique values excluding duplicate values
ORDER BY clause allows sorting the results based on a specific order. It could be based on a certain column order which can be alphabetical or numeric. One can even decide the order in which the results are to be sorted i.e., ascending or descending.
The SEPARATOR clause is used to change the default separator separating the results. By default, the “,” a comma is the separator that separates the results.
To use the above cloud, we need to use with “Group by” clause. As it is a group concatenation, it needs to be indicated for two places, one is by which rows need to be concatenated, and another column by which the records need to be concatenated.
We will use this different clause to see different usages of group concatenation. Let’s start by creating a quick table named Customer. This table will represent several records that capture transactions frequently. We will try to aggregate different types of data to show a different kind of summary. To create a table, please use below code:
1 2 3 4 5 6 7 8 9 10 11 12 |
CREATE TABLE customers ( FirstName varchar(255), LastName varchar(255), Age int, JoinedMonth varchar(255), Transactions int, TransactionDate date, State varchar(255), Country varchar(255) ); |
Once it is successfully executed, the table should look something like below. To see the table, use the below query as mentioned:
1 2 3 |
Select * from customers |
Now that we have created the table, let’s start aggregating data.
First, let us aggregate all the customer names as per the state they reside from. Please use the below query to get the data:
1 2 3 |
Select state,group_concat(FirstName) from customers group by state |
Once you successfully execute the query, you will get the below results.
In the above example you can see that in the left column, we have different state names. And in the right column, all the names of customers are aggregated.
Let us explore another example of aggregation. Let us gather all the customer transactions they did. To get such data, please explore the below query:
1 2 3 |
Select FirstName,group_concat(Transactions) as Transactions from customers group by FirstName |
Once the above query is successfully executed, it will give the below results.
In the above example, all the records of everyone with spends are aggregated however this doesn’t much clearer picture. Let’s group by each day even further, so that it gives a clearer picture. Please add “Transaction date” to the group by clause as follows.
1 2 3 |
Select TransactionDate,group_concat(Transactions) as Transactions from customers group by TransactionDate |
Once you execute the above query, it will show the below results.
Now we can see the data in a more meaningful manner. Let’s see another example. Let’s see another example. Many customer transactions have happened in a given month. Let’s say I would like to get all the customer names which done a transaction in a given month. To do this, use the below query as mentioned:
1 2 3 |
Select JoinedMonth,group_concat(distinct FirstName) as FirstName from customers group by JoinedMonth |
Once the query is successfully executed you will be able to get all the unique customer names as shown below.
In the above results, we can see all the customers’ unique names as per a given month. A distinct clause helps to automatically remove all the duplicate values and only state the unique values. But as you see the records are not properly sorted. We can still introduce one level of more sorting which will be by customer names. So, let’s sort the data in ascending order of names. Please use the below query.
1 2 3 |
Select JoinedMonth,group_concat(distinct FirstName order by FirstName) as FirstName from customers group by JoinedMonth |
Once correctly executed, we will get the below results,
In the above results, we can see the names of all customers in a given month sorted in ascending order of their names.
Let’s say we want to have to reverse the sort order, then all we have to do is to use “Desc” keywords and it will sort as below.
We can also the same option to sort the data for numbers from highest to lowest or even for the data in their chronological order. Let’s continue with this example further. As you can see the results combined in the aggregated data are separated by a comma. However, in case you want to change the separator, you have to use the “Separator” keyword as below:
1 2 3 |
Select JoinedMonth,group_concat(distinct FirstName order by FirstName separator '/') as FirstName from customers group by JoinedMonth |
Once you execute the above query, you will be able to see the below results.
In the above example, the results are separated by “/“rather than “,”. In the above example, we have used all three clauses along with Group_concat to get the data. In addition to this, we can also use different column-level aggregation functions to achieve further concatenation. One of the examples is Concat_WS() function which aggregates multiple column values. Let’s continue with our previous example. Let’s say we concatenate to columns Name and age and then show it in aggregated values. To do so, we need to use the query below:
1 2 3 |
Select JoinedMonth,group_concat(distinct concat_ws(',',FirstName,LastName) separator '/') as FullName from customers group by JoinedMonth |
Once the above query is successfully executed, we will have the below results.
The above results show a few points:
- To get unique values from a table
- To get unique values in a certain order
- To get unique values with a specific separator
- To get unique values by combining different column values
We can also use the group-concat function for the aggregation in nested select queries, inner joins, and much more.
Conclusion
- Group aggregation helps summarize information.
- Group aggregation should be used in conjunction with other aggregators to amplify the effect of summary.
- Single Group aggregation should not be used in highly complex operations, rather use nested aggregation to simplify the results.
- Finding Duplicates in SQL - February 7, 2024
- MySQL substring uses with examples - October 14, 2023
- MySQL group_concat() function overview - March 28, 2023