SQL Date_Bucket
As promissed I would get back at you with the new SQL 2022 Functions once I had found a good use for it.
Well, date_bucket() is such a Function that was not real clear to me at first glance.
Not one that was specifically on my Bucketlist, so to say.
So in this post, let's see what's in the bucket, and what we can get out of it.
It is a Function that takes a bit to master, but once you get the hang of it, it is pretty useful.
Basic Syntax
Required Arguments
To understand the date_bucket() Function, you need to understand the basic Syntax first.
The date_bucket() Function requires 3 arguments.
First one is the Bucket wich is like a Timeslot, which could be a Day, Month, Hour et cetera like you would use in a dateadd() Function.
Better said, Buckets because if you do a Query on a Table, each Date should end up in the corresponding Bucket.
Hang on, this will be clear in a moment.
The second one it the width of this so called Bucket.
Together with the first argument you can specify you want a Timeslot of 4 Hours, 3 Months or 13 Weeks.
The third one is the Date you want to be placed in that Bucket.
Together this will return the Startdate of the Bucket you wanted.
Here is an example:
Now this will give you the first day of the current month.
Nice, but not really impressive.
We already had other ways to get to that Date like with DateFromParts.
What you need to understand is WHY it returns the first day of the month.
In this case, you specified Buckets for 1 Month.
As I am writing this Post in January 2024, the Bucket ranges from January 1st to January 31.
And since date_bucket() returns the Startdate of the Bucket, it returns January 1st for the current Date.
Optional Argument
It might become a bit more clear when we talk about the fourth, optional Argument.
That one is the Origin date.
If you leave that one out (like we did just now) it defaults back to 1900-01-01.
But let's specify the Origin to be tomorrow:
At first the result might seem like a random Date in the past.
So let's have a closer look at what we asked for.
Again we wanted to have Buckets for 1 Month but as the Origin we specified tomorrow.
So today does not go into that Bucket.
It has to go in the previous Bucket that started on tomorrow minus 1 month!
That is why you got that date in the past.
A practical example
For this example we are going to create a dummy Table with WebSales. In this Table we will insert some dummy data, a WebSale every few (random) minnutes with a random amount:
ID int identity,
SalesDate datetime,
SalesAmount float
)
GO
declare @TheDateTime datetime = '2024-01-01'
declare @Interval int
while @TheDateTime < '2024-07-01'
begin
select @Interval = abs(checksum(newid())) % 60
set @TheDateTime = dateadd(minute, @Interval * 5, @TheDateTime)
insert into WebSales(SalesDate, SalesAmount)
values(@TheDateTime, @Interval * 7.5)
end
Now let's say we want to show the Sales per Month. Traditionally, we could do it like this:
datepart(Month, SalesDate) as Month,
count(*) as NrOfSales,
sum(SalesAmount) as TotalSales
from WebSales
group by datepart(Year, SalesDate), datepart(Month, SalesDate)
order by datepart(Year, SalesDate), datepart(Month, SalesDate)
Using date_bucket() we could now do it like this:
count(*) as NrOfSales,
sum(SalesAmount) as TotalSales
from WebSales
group by date_bucket(Month, 1, SalesDate)
order by date_bucket(Month, 1, SalesDate)
Looks nice, and it is a bit less code to write so easier to maintain.
But is that why we really needed this?
Actually... Yes!
Imagine we wanted to list the Sales per Week.
In the traditional way, you had to add datepart(Week, SalesDate) several times to the Query.
With the new date_bucket() we only need to switch from Month to Week:
dateadd(Day, 6, date_bucket(Week, 1, SalesDate)) as ToDate,
count(*) as NrOfSales,
sum(SalesAmount) as TotalSales
from WebSales
group by date_bucket(Week, 1, SalesDate)
order by date_bucket(Week, 1, SalesDate)
That was easy, huh?
Now let's say we don't want to use the traditional Week numbers.
Instead we want to go for Weeks from Saturday to Saturday.
For that we only need to specify a Saturday as the origin.
For more clarity in the results I added the ToDate as to show the start and end of the Bucket:
select date_bucket(Week, 1, SalesDate, @FirstDate) as FromDate,
dateadd(Day, 6, date_bucket(Week, 1, SalesDate, @FirstDate)) as ToDate,
count(*) as NrOfSales,
sum(SalesAmount) as TotalSales
from WebSales
group by date_bucket(Week, 1, SalesDate, @FirstDate)
order by date_bucket(Week, 1, SalesDate, @FirstDate)
Once I had a Customer who was in the Retail business.
Typically, their best day of the week was Saturday which accounted for about 50% of the weekly Revenue.
Now comparing Months or Quarters might gave them the wrong impression, since some Months have more Saturdays than others.
An increase or decrease in Revenue could mean it was a good or a bad Month.
But it could also simply mean the Month had more or less Saturdays.
Instead of comparing let's say Quarters, with date_bucket() we can now easily go to comparing 13 weeks:
select date_bucket(Week, 13, SalesDate, @FirstDate) as FromDate,
dateadd(Day, 90, date_bucket(Week, 13, SalesDate, @FirstDate)) as ToDate,
count(*) as NrOfSales,
sum(SalesAmount) as TotalSales
from WebSales
group by date_bucket(Week, 13, SalesDate, @FirstDate)
order by date_bucket(Week, 13, SalesDate, @FirstDate)
Or if you have a 24/7 Production Facility that runs 3x an 8 Hour shift with the first shift starting at 09:00.
Do you want to see the results per shift?
It is a breeze with the date_bucket():
select date_bucket(Hour, 8, SalesDate, @FirstDate) as FromDate,
dateadd(Hour, 8, date_bucket(Hour, 8, SalesDate, @FirstDate)) as ToDate,
count(*) as NrOfSales,
sum(SalesAmount) as TotalSales
from WebSales
group by date_bucket(Hour, 8, SalesDate, @FirstDate)
order by date_bucket(Hour, 8, SalesDate, @FirstDate)
There are just so much possibilities when you understand how the date_bucket() works. It might seem confusing when you use it for a simple Select statement with just a getdate(). But once you have seen it in action on a DataSet, I'm sure the usefulness becomes pretty obvious!
date_bucket() and generate_series()
Remember the generate_series() from the What's new in SQL 2022 post? It goes like this:
from generate_series(0, datediff(day, '2024-01-01', '2024-06-30'))
Combined with date_bucket() you can now also generate a serie of Timeslots, for example 4 Hour Slots:
declare @EndDate datetime = '2024-07-01'
declare @TimeSlot int = 4
select date_bucket(Hour, @TimeSlot, dateadd(Hour, gs.value, @StartDate)) as SalesHour
from generate_series
(0, datediff(Hour, @StartDate, @EndDate)) as gs
group by date_bucket(Hour, @TimeSlot, dateadd(Hour, gs.value, @StartDate))
Do not forget to use the Group By at the end, otherwise you will have (in this case 4) duplicate results.
That would cause problems if you want to join this series later on to another Table.
Speaking of which,,,
Now let's bring all of this together in a Query that will have Buckets of 8 Hours.
For each Bucket I will display the Sales Amount for that Rimeslot.
An I will add a Running Total for each Day and Month.
For the last two I will use the Window Clause from the What's new in SQL 2022 post.
declare @EndDate datetime = '2024-07-01'
declare @TimeSlot int = 8
;with TimeSlots as
(
-- Generate Series using Date Bucket
select
date_bucket(Hour, @TimeSlot, dateadd(Hour, gs.value, @StartDate)) as SalesHour
from generate_series
(0, datediff(Hour, @StartDate, @EndDate)) as gs
group by
date_bucket(Hour, @TimeSlot,
dateadd(Hour, gs.value, @StartDate))
)
select
ts.SalesHour as SlotStart,
dateadd(Hour, @TimeSlot, ts.SalesHour) as SlotEnd,
sum(isnull(ws.SalesAmount, 0)) as SalesAmount,
row_number() over DailyWindow as Shift,
sum(sum(isnull(ws.SalesAmount, 0)))
over DailyWindow as DailyTotal,
sum(sum(isnull(ws.SalesAmount, 0)))
over MonthlyWindow as MonthlyTotal
from TimeSlots ts
-- Join Generated Series with the WebSales Table
left join WebSales ws
on ts.SalesHour = date_bucket(Hour, @TimeSlot, ws.SalesDate)
group by ts.SalesHour
-- Use Window Clause for Named Windows
window
DailyWindow as
(
partition by
datepart(Year, ts.SalesHour),
datepart(Month, ts.SalesHour),
datepart(Day, ts.SalesHour)
order by ts.SalesHour
),
MonthlyWindow as
(
partition by
datepart(Year, ts.SalesHour),
datepart(Month, ts.SalesHour)
order by ts.SalesHour
)
order by ts.SalesHour
Now if you would change the @TimeSlot from 8 to 1, you will see that there are some Hours when there was no Sale at all.
Without the generate_series() those records would not be shown.
And using that, you can now easily identify the best WeekDays or Hours for Sales.
This opens up a whole new world of possibilities.
Not bad for a new SQL 2022 Function that seemed a bit useless at first glance, huh?
That's it for today!
As you can see, the date_bucket() is a bit difficult to understand at first.
But it is a very usefull adition in SQL Server 2022.
That is of course also its limitation: it is SQL Server 2022 only.
Most of my Customers are not yet ready to jump over from their current SQL Server version.
Better performance and better stability are good reasons to consider upgrading.
And each new Function might by itself not a game changer, toghether they provide more food for thought.
And if you run into the new SQL 2022 in the wild, be sure to make use of these new Functions.
Until then, get the new SQL 2022 up and running on your local machine and play around with them.
It is only a matter of time, so be prepared!
You can leave a comment or send a question to andre@andrespeek.com. Let me know if you got inspired and stay safe and healthy always!