SQL Lag & Lead
Let's say you ran a query on a table containing appointments and you ordered them by the appointment date. In the result set, you can easily see the date of the previous appointment because it is on the previous line. But what if you want that previous appointment date on the same line? You could do that with a sub-query, but that would really affect the performace. Wouldn't it be nice if you could just grab it from the previous record from the result set? Well, you can by using Lag and Lead.
How to use Lag and Lead
As David Post pointed out after publishing this post,
let me point out that these Window Functions work from SQL 2012 and up.
So if you can't get them to work, do check the version of your SQL Server first.
With that said, using the Lag function you can grab any column you want from the previous record.
Same with the Lead function but this takes it from the next record.
Straight from the result set, so you don't have to query the same table over and over again.
So basicaly using Lag and Lead looks really simple:
or
Lead(StartDate) // StartDate from next record
Is it really that simple?
No, there is a bit more to it. You must specify an order by so SQL knows what the previous and next record should be. You can do that using the keyword "over". So if you want the previous and next appointment date, you should order by the StartDate:
Lag(StartDate) over (order by StartDate) as PreviousAppointment,
Lead(StartDate) over (order by StartDate) as NextAppointment
from tblAppointments
order by StartDate
This one gives the following results:
As you can see the first record has no PreviousAppointment, which makes sense because there was no previous appointment.
Likewise the last record has no NextAppointment.
But wait! There's more!
You don't have to go back to exactly the previous record or forward to exactly the next record.
For example, the appointments table could also store the person linked to the appointment.
And maybe you don't want to ga back to the previous record, but the previous record from that person.
To do that, the "over" statement should not just have an order by.
It should also have a partition by which specifies which columns should be used.
So for this example, it would look like this:
Lag(StartDate) over (partition by a.PersonID order by StartDate) as PreviousAppointment,
Lead(StartDate) over (partition by a.PersonID order by StartDate) as NextAppointment
from Appointments a
inner join Persons p on p.PersonID = a.PersonID
order by StartDate
And this one will give these results:
Since there are 2 persons in the result set, there are 2 records that don't have a PreviousAppointment.
And 2 records without a NextAppointment.
One thing to be aware of though
As explained the values are pulled from the result set. Hence it is not a performance killer because there is no need for additional querying the database.
But that also means Lag and Lead cannot be used in the constraints, the order by or group by conditions for the query.
Those are used to get the result set, so pulling things from the result set is not possible since the result set is not there yet.
Of course, you can create a view with Lag and Lead and then query that view with constraints, order by or group by as much as you like.
And finally some tips and tricks
In the Lag() you can only specify one column. Or better said, on result value. You can concat multiple columns, or use functions like datediff.
I am not sure why you would do that, but hey, now you know it possible when you run into a case for this.
More interesting is that you can use them for functions in your query results.
For example, if you would like to show the number of days between the previous and next appointment.
Lag(StartDate) over (partition by a.PersonID order by StartDate) as PreviousAppointment,
datediff(Day, Lag(StartDate) over (partition by a.PersonID order by StartDate), StartDate) as DaysSincePrevious,
Lead(StartDate) over (partition by a.PersonID order by StartDate) as NextAppointment,
datediff(Day, StartDate, Lead(StartDate) over (partition by a.PersonID order by StartDate)) as DaysToNext
from Appointments a
inner join Persons p on p.PersonID = a.PersonID
order by StartDate
So is this really faster?
You betcha! To wrap this up I rewrote the last query using sub-queries to get the previous and next startdate. And then ran the execution plan for both at the same time.
- Lag and Lead took 5% of the total time.
- Sub-queries took the other 95%.
You don't need to be a math genius to decide which one is faster!
Enjoy this trick!
Using Lag and Lead is easy but nevertheless gives you some great options. Without killing the performance of your queries. So enjoy these tricks and let me know how you used them in your projects!
You can leave a comment or send a question to andre@andrespeek.com. Let me know if you got inspired and stay safe and healthy always!