Excel GROUPBY function to group rows and aggregate values
Discover the power of Excel’s GROUPBY function for advanced data analysis. Group, summarize, sort and filter, calculate totals and subtotals, all with one formula!
Microsoft Excel has always been great for sorting through lots of data and making sense of it. You might have used features like outlines, subtotals, and pivot tables to structure and analyze your information. But now, there’s an even simpler way to do it all with just a formula. Introducing the GROUPBY function – a handy and efficient tool to group and aggregate data without leaving the comfort of your formula bar.
Excel GROUPBY function
The GROUPBY function in Excel is used to group and aggregate data in rows based on the values in one or more columns. Additionally, it can perform sorting and filtering on grouped data.
GROUPBY is part of Excel’s dynamic array functions, which means it can return multiple results that spill over to adjacent cells. The output is similar to that of a pivot table but without the formatting. The spill range returned by the GROUPBY function is fully dynamic and automatically recalculates when source data changes.
This function is particularly useful when dealing with large datasets where you need to summarize data by performing calculations like sum, average, or count.
Note. Excel’s GROUPBY is quite similar to the PIVOTBY function. The key difference is that while PIVOTBY can group data both in rows and columns, GROUPBY is purposed for grouping in rows only.
Syntax
The syntax of the GROUPBY function is as follows:
The function can accept seven arguments, but only the first three are required.
- Row_fields (required) – the range of values you want to group.
- Values (required) – the values to aggregate.
- Function (required) – the function to summarize the grouped data, such as SUM, AVERAGE, COUNT, MIN, MAX, etc.
- Field_headers [optional] – specifies whether your dataset has headers and whether you want to include them in the results. If omitted, headers are not shown.
- 0 – No headers
- 1 – Yes, but don’t show headers
- 2 – No headers, but generate them
- 3 – Yes, and show headers
- Total_depth [optional] – determines whether to show totals and subtotals. For subtotals, the row_fields argument must have at least 2 columns.
- 0 – No totals
- 1 (default) – Grand totals at the bottom
- 2 – Grand totals and subtotals at the bottom
- -1 – Grand totals at the top
- -2 – Grand totals and subtotals at the top
- Sort_order [optional] – sorting by column index number. Numbers correspond to columns in row_fields left-to-right, followed by columns in values. For ascending sorting use positive numbers, for descending sorting – negative numbers. To sort based on multiple row_fields, a one-dimensional array of numbers can be provided. Default is ascending sorting.
- Filter_array [optional] – filtering out specific rows. Can be provided as a logical expression that produces a 1D array of Boolean values matching the length of the row_fields array.
Usage notes
To help you get up to speed with the GROUPBY function and avoid common pitfalls, here are some handy tips:
- Dynamic formula behavior. The formula is dynamic and will update automatically if you make changes within the current dataset. However, if you add new rows to the original dataset, the formula won’t update unless you include additional currently empty rows in the row_fields and values arguments. Alternatively, you can convert your dataset into an Excel table. In this case, the formula will be auto-expandable and fully dynamic.
- Headers. If the field_headers argument is not set, Excel automatically determines whether your dataset has headers based on the values argument. Specifically, if the 1st value is text and the 2nd value is a number, Excel interprets the first row as headers.
- Consistent range lengths. It’s crucial to supply ranges of equal length for the row_fields and values arguments to prevent a #VALUE! error.
- Auto-formatting results. The GROUPBY output can be difficult to interpret, especially with grand totals and subtotals included. Conditional formatting can help visually distinguish different levels of aggregation and make your data more readable.
Note. The Excel GROUPBY function is still under development. Therefore, it’s not recommended to rely on it for critical workbooks just yet. Make sure to test thoroughly and keep on top of the latest Excel updates to ensure compatibility and functionality.
Which Excel version has GROUPBY function?
The GROUPBY is only available in Excel for Microsoft 365 (currently in the Insider: Beta Channel).
Basic Excel GROUPBY formula
To get started with the GROUPBY function, we’ll keep things simple in this example. Suppose you have a dataset like the one shown below where column A contains project names, column B – project types, and column C – revenues. Your goal is to summarize the revenues by project type. For this, you supply the project types (B2:B32) for the row_fields argument and revenues (C2:C32) for values. For the aggregation function, you choose SUM from the list of available functions.
The complete formula would look like this:
=GROUPBY(B2:B32, C2:C32, SUM)
As a result, you’ll get a list of unique project types where each type is listed only once, along with the sum of revenues for each project type.
Since the optional arguments are not set, Excel uses their default settings – no headers, ascending sort, and no filtering. However, notice that it has added a total row at the bottom, which is the default behavior for the total_depth argument.
This simple example demonstrates the GROUPBY function’s ability to efficiently group and summarize data, providing a succinct overview without the need for complex formulas or additional tools. However, it lacks some handy features such as column headers and formatting, which makes it a little less user-friendly at first glance. Further examples will show how to construct more advanced formulas and employ conditional formats to make the data presentation more intuitive and engaging.
How to use GROUPBY function in Excel – formula examples
Now that you’re familiar with the basics of the GROUPBY function, let’s dive into some more advanced examples that will showcase its versatility and power. We’ll explore how to choose an appropriate aggregation function and incorporate additional arguments like field_headers for clarity, total_depth for depth of analysis, and sort_order for optimal data presentation.
Choose aggregation function
Currently, the Excel GROUPBY function lets you choose from 16 predefined aggregation operations:
Here you’ve got classic summary functions such as SUM, AVERAGE, MEDIAN, MIN, MAX, COUNT, etc.
Also, there are a couple of brand-new functions such as:
- PERCENTOF – calculates the percentage of a value compared to the total.
- ARRAYTOTEXT – converts an array of values into a text representation.
The practical use of these two functions will be demonstrated in further examples.
Eta Lambdas
Interestingly, all the functions in the drop-down list only resemble regular worksheet functions; in fact, they are eta-reduced lambdas. This design allows you to use these functions without parentheses and arguments, which might seem a bit unusual at first. However, this approach simplifies the process. Instead of writing a full form such as LAMBDA(x, SUM(x)), you can simply put SUM.
Besides the built-in aggregation functions, you can use your own lambda function as described in this tutorial:Â Using custom lambdas with GROUPBY function.
Multiple aggregations
To perform multiple aggregations, a vector of lambdas can be provided. The orientation of the vector (vertical or horizontal) will determine how the aggregations are applied and displayed. A vertical vector will stack the results in columns, while a horizontal vector will arrange the results in rows. For more details, see this example.
Show headers
To include field headers in your results, use the field_headers argument of the GROUPBY function.
- If your dataset has column headers, set this argument to 3 (Yes, show).
- If your dataset does not have headers, you can opt for 2 (No, but generate). Though, this might not be very practical as the generated headers usually end up as generic labels like Row Field 1, Value 1, and so on.
For instance, this is how you can modify our basic formula to include headers:
=GROUPBY(B2:B32, C2:C32, SUM, 3)
By doing this, your output will now include field headers, enhancing the readability and clarity of your grouped data. It’s a simple tweak that can have a substantial impact on how you and others interpret the results.
Group rows based on multiple columns
To group rows by multiple columns, you can include a multi-column range in the row_fields argument.
For example, to group rows first by project type (column B) and then by status (column C), you would use the range B2:C32 for the row_fields argument.
To find out how many projects there are in each group, choose COUNT for the function argument. As the COUNT function counts only numeric values, be sure to define a numeric column for the values argument – the Revenue column (D2:D32) suits nicely.
Here’s how the finalized formula looks:
=GROUPBY(B2:C32, D2:D32, COUNT)
With this formula, you’ll get the data rows grouped by the values in two columns (project type and status), along with the project count for each group.
Pay attention that, in this example, we do not show headings automatically with a formula. Instead, we have manually entered the appropriate column headers in cells F2:H2.
This method is handy when you’re analyzing information across multiple categories and want a detailed drill down into your data.
Group rows based on non-adjacent columns
In large datasets, the key columns may not always be adjacent. To include all the desired columns in multi-level grouping, you can nest the CHOOSECOLS function within the row_fields argument of GROUPBY.
Here’s how it works: in the first parameter of CHOOSECOLS, you include the entire data array (A2:D32 in this example). For subsequent arguments, supply integers specifying which columns to include – in our case, it’s 2 (project type) and 4 (status):
CHOOSECOLS(A2:D32, 2, 4)
Nest the above formula in GROUPBY, and you’ll get this result:
=GROUPBY(CHOOSECOLS(A2:D32, 2, 4), C2:C32, COUNT)
This approach allows you to group data based on non-contiguous columns effectively, no matter how your data is structured.
Include or exclude grand total and subtotals
The inclusion of totals and subtotals in the output of the GROUPBY function is controlled by the total_depth argument. Here’s a quick recap of what each setting does:
- 0 – No totals are shown.
- 1 (default) – Displays the grand total at the bottom.
- 2 – Shows both the grand total at the bottom and subtotals below each group.
- -1 – Places the grand total at the top.
- -2 – Puts both the grand total at the top and subtotals above each group.
For instance, if you want to display grand totals at the bottom and subtotals under each data category, then set total_depth to 2:
=GROUPBY(B2:C32, D2:D32, SUM, 3, 2)
Tip. In the example screenshot, subtotals are highlighted in light green for better visualization. In your actual worksheets, you can apply your preferred formatting automatically using Excel’s Conditional Formatting features.
Sort grouped rows
By default, a GROUPBY formula sorts the results in ascending order from A to Z based on the row_fields values. When grouping by multiple columns, each column is sorted ascending beginning with the first (rightmost) one.
To sort grouped rows differently, you can customize the order with the sort_order argument:
- Ascending sort – use positive column numbers.
- Descending sort –Â use negative column numbers.
- Sorting by multiple columns – supply a one-dimensional array of numbers corresponding to columns in row_fields followed by the columns in values. Different columns can be sorted in a different order.
For instance, to sort the results by values instead of row_fields, you’d use either 2 (sort from lowest to highest) or -2 (sort from highest to lowest) for the 6th argument:
=GROUPBY(B2:B32, C2:C32, SUM, , , 2)
To perform multiple column sort, first by Project type (Z to A), and then by Status (A to Z), use the array {-1, 2} for the sort_order argument:
=GROUPBY(B2:C32, D2:D32, SUM, , , {-1,2})
These flexible sorting options help you present information in the most meaningful way possible, making it easier to navigate through your grouped data and find the information you need.
Filter out specific rows
To filter out any specific results, you can utilize the 7th argument named filter_array. This argument accepts a one-dimensional array of Boolean values matching the length of the row_fields array, where TRUE indicates the rows to include and FALSE indicates the rows to exclude.
For example, to exclude the Design projects from the output, you can use the “not equal to” logical expression B2:B32<>”Design”:
=GROUPBY(B2:B32, C2:C32, SUM, , , , B2:B32<>"Design")
As you can see, our GROUPBY formula effectively filters out rows based on the condition specified – any row where a value in column B is “Design” is excluded from the grouped results.
Aggregate multiple columns
When you need to summarize values across multiple numeric columns in your dataset, the GROUPBY function can be a real lifesaver. It lets you see how different data points relate to each other in a clear and organized way.
Aggregating adjacent columns
To aggregate adjacent columns, you just need to provide the appropriate range reference for the values argument.
For example, to get the average revenue and profit by project type, the formula is:
=GROUPBY(B2:B32, D2:E32, AVERAGE, 3)
Notice that we’re using D2:E32 for the values argument, which tells the GROUPBY function to aggregate the values in columns D (Revenue) and E (Profit).
Summarizing non-adjacent columns
If you’re dealing with non-adjacent columns and need to aggregate those, the CHOOSECOLS function comes in handy, just like it does for grouping.
Say, you want to find the average of Budget (column C) and Profit (column E). To get GROUPBY to calculate values in these columns, you serve CHOOSECOLS(C2:E32, 1, 3) for the values argument. This tells Excel to take columns 1 and 3 from the range C2:E32:
=GROUPBY(B2:B32, CHOOSECOLS(C2:E32, 1, 3), AVERAGE, 3)
The result will be a neatly organized set of data that shows the average of Budget and Profit by project type:
Perform multiple aggregations on the same data at once
If you’re looking to analyze the same metric with different aggregations, you can, of course, create a separate GROUPBY formula for each summary function. However, a more convenient approach is to place those aggregations in neighboring columns or rows, so that you can view all relevant summaries side by side. This can be achieved with the help of HSTACK or VSTACK, depending on whether you want the results arranged vertically or horizontally.
For example, to get the sum, average and percentage of total for Revenue and place the results across columns, use the HSTACK function:
=GROUPBY(B3:B32, C3:C32, HSTACK(SUM, AVERAGE, PERCENTOF))
To find the sum, average, min and max value for each project type and place the results across rows, use the VSTACK function:
=GROUPBY(B3:B32, C3:C32, VSTACK(SUM, AVERAGE, MIN, MAX),, 0)
This method allows you to create a side-by-side comparison of different aggregations for the same data, providing a multi-faceted view of your dataset at a glance. Such comprehensive snapshots are invaluable for financial or sales reports, helping you detect trends, measure performance against goals, and make informed business decisions.
Group comma-separated text values
The example shows how you can leverage the ARRAYTOTEXT function to turn an array of values into comma-separated text strings.
From the dataset below, suppose you wish to get a list of project names that each type encompasses. To achieve this, you use B3:B23 (project types) for row_fields and A3:A23 (project names) for the values argument. For function, you choose ARRAYTOTEXT. As you don’t want either headers or totals, the next two arguments are set to 0:
=GROUPBY(B3:B23, A3:A23, ARRAYTOTEXT, 0, 0)
As a result, you’ll get a clear, concise text representation of the projects grouped by their types, making it easy to see which projects fall under each category. This can be incredibly useful for reporting, or simply getting a better grasp of the data distribution within your dataset.
Format GROUPBY results with conditional formatting
As you have seen, the GROUPBY function can easily group and aggregate data, but the results may seem plain and unstructured due to a lack of formatting. For better readability and presentation, you may want to make certain elements stand out, such as headers, subtotals and totals. This can be done using Excel conditional formatting formulas.
The screenshot below provides an example of how the GROUPBY output may appear in your worksheets:
For more insights, keep an eye out for the upcoming article that will detail this approach:Â Conditionally format GROUPBY results.
Excel GROUPBY function not working
If you cannot find the GROUPBY function in your worksheets or your formula results in an error, here are some common issues to help you troubleshoot.
GROUPBY function not available in my Excel
Currently, the GROUPBY function is a beta feature available only to Office Insiders in the Beta channel. But Microsoft is gradually rolling it out to all Microsoft 365 users. So, if you have a Microsoft 365 subscription, it’s a good idea to stay updated with the latest releases.
However, it’s important to note that GROUPBY is not backward compatible and won’t be available in earlier versions of Excel.
#VALUE! error
If you’re encountering a #VALUE! error, it’s likely because the row_fields and values arguments refer to ranges of different lengths. For instance, if the former contains 10 rows and the latter has 11, you’ll get a #VALUE! error. Similarly, if the filter_array does not match the length of the row_fields array, it will also result in an error.
Another possible reason is that your formula is configured to display subtotals (total_depth set to 2 or -2) while you’re only grouping by a single column (row_fields has just one column).
Views: 8
No responses yet