SQL Implicit Conversion

SQL Implicit Conversion

This BlogPost was written by Andre Speek. Click here for other BlogPosts.

As a Database Consultant, many of the questions I get are about performance. Or better said: questions that are about queries that lack performance.

Of course there are many reasons why a query is running slow. But there is one thing that I see regularly and this one is easy to avoid. In this post we will have a look at that one and I'll explain why this is bad for performance.

What is an Implicit Conversion?

Let's talk Explicit Conversion first

In a SQL Database, each Column has a specific DataType like a Varchar (string), Integer, DateTime or Float. If you want to have a different DataType in your results, you can change it in your query using the Cast or Convert Functions.

For example, when you have an Integer in your Table and you want to add some text behind it, you should Cast it as a Varchar DataType:

select 'Option ' + cast(1 as nvarchar(5))

If you would not do the Cast, the query would fail because you cannot add a number to a string.

By the way, you could this one more easily by using the Concat Function:

select concat('Option ', 1)

The Concat Function knows that it should output a string so it will automatically do the casting for you. But that is a different topic, so back to the topic at hand.

When you use a Cast, Convert or Concat Function, you change the DataType intentionally. This is what is called an Explicit Conversion.

So what is an Implicit Conversion?

Implicit Conversions happen in the conditions (the Constraints) of your Query when the matching condition you supply is from a different DataType than the Column. Now in many cases you will get an error when you supply the wrong DataType. For example, you cannot search on a DateTime Column with the value 'ABC'.

That is a no-brainer, but sometimes SQL Server can solve the error by changing the DataType. For example, when you search on a Varchar Column with the value 1 instead of '1'. Without the quotes, 1 is an Integer and not a Varchar.

Now rather than to throw an error at you, SQL Server plays nice and will automatically Convert the Integer to a Varchar without asking. This seems like a nice thing to do, but this is where bad things might happen.

Why can an Implicit Conversion be bad?

First of all, SQL Server needs to check if it can do a conversion of the DataType. If that is possible, it will do so. This is an extra step but in most cases you will hardly notice it. That in itself does not slow down your query significantly.

Bad things will happen though if this happens on a Column that is part of an Index.

If you want SQL Server to use the Index, it is important that you supply the correct DataType. If you don't, it might not seem like a problem because SQL Server can find the results you want without that Index. But even though your query will give you the results you want, you just effectively killed the performance.

I have oftentimes seen queries from people who overlooked this issue because they wrote the query on a small Database with just some test data. With not much data, the drop in performance is hardly noticeable. But then they ran the same query on a Production Database and they could not get why it was running so slow.

Let's look at an example

For this example we will create a Table with some dummy General Ledger Entries. Yeah, I know these are not correct or complete Entries, but it is just an example and making it 100% correct would only take away the focus of what is happening with the Implicit Conversion.

So let's create a Table for it with this one:

create table GLEntries (
  GLEntryID uniqueidentifier
    not null
    default newid()
    primary key clustered,
  EntryNumber nvarchar(15),
  GLAccountNumber nvarchar(10),
  EntryDate datetime,
  Amount float

And then fill that Table with some Test Data, in this case 20.000 rows:

declare @Counter int = 10000
declare @EntryDate datetime = getdate()
while @Counter > 0
  insert into GLEntries(
  values (
    concat(format(@EntryDate, 'yyyyMMdd'), @Counter),
    ' 8000',
  ), (
    concat(format(@EntryDate, 'yyyyMMdd'), @Counter),
    ' 1300',
  set @Counter = @Counter - 1
  set @EntryDate = dateadd(Minute, -7, @EntryDate)

Now let's do some easy queries on this Test Data. To have a better understanding of what is happening, I set the IO statistics on to see what SQL Server is reading from disk. And to see how SQL Server gets the results, I switched the Actual Execution Plan on with the button on top.

Now in the first Query, I search on the Account Number 8000. Since the Column is a NVarchar DataType, SQL Server will do an Implicit Conversion in the background. I avoid that in the second Query by specifically look for the actual Value in the correct DataType:

set statistics IO on

select Amount
from GLEntries
where GLAccountNumber = 8000

select Amount
from GLEntries
where GLAccountNumber = ' 8000'

Now if you run these Queries you will not see any difference in the performance. And if you look at the Actual Execution Plan, they are exactly the same:

As I told you, an Implicit Conversion is not always a bad thing. But it will be once your Column is part of an Index.

So let's add an Index to our Table:

create index IX_MainColumns
  on GLEntries
  (GLAccountNumber, EntryNumber, EntryDate, Amount)

Now we can run the Queries again and then have a look at the Actual Execution Plan to see what has changed in performance:

There you go! By not using the correct DataType you just doubled the time needed to get the results!

This is why you should always check the Actual Execution Plan. The difference might hardly be visible on such a small Data Set. But the Actual Execution Plan show you what to expect when you run that same Query on a large Production Database.

But wait! There is more!

As you can see, with the first Query SQL Server has to do an Index Scan. It goes through all the Data and filters out the rows you want.

With the second Query, SQL Server can do an Index Seek which means SQL Server can just get the rows you want. This means SQL Server does not have to read through all the Data, hence the better performance.

And since we set the IO Statistics on, we can verify this with the messages:

As you can see, SQL Server had do to only half the reads from Disk with the second Query. So can we say for sure the second Query is twice as fast as the first one?

I hate to tell you but it is not that simple. With the Index Seek SQL Server needs to read way less data. In this case half the data. But that is because we only used 2 GLAccountNumbers with an equal amount of rows. The real difference on Production will vary depending of the amount of GLAccountNumbers that are used and the number of rows per GLAccountNumber. It will also depend on how much SQL Server can store in Internal Memory and the speed of the actual Disks in that Server.

But can we conclude the second Query will perform much faster than the first one? Definitely yes!

Just by switching to the correct DataType, I managed to make queries run much faster. Typically about 5 to 10 times faster, but occasionaly more than 100 times faster. The most extreme case was with a complex Query that went from executing 17 minutes to just 2 seconds. So yes, this is definitely one you should check when a Query is running slow.

Why are Numbers not always Numbers?

There are a few reasons people often overlook these Implicit Conversions. First reason is that it just works. No errors at all. And on a small Test Database it goes unnoticed.

Second reason is that Column Names throw them of. In my example of GLAccountNumber, the name does point to the Column containing a number and that would imply it is an Integer.

So why is that Column a Varchar instead of an Integer?

The fact is that many of what people call numbers aren't actually numbers. Take for example this phonenumber: +31 (0)6-123 45 678. It is a valid phonenumber, but an Integer cannot contain the +, the brackets, the dash in that position and the spaces. We call it a phonenumber, but it should actually be called a phonestring.

The same also goes for bank account numbers, VAT numbers or Chamber of Commerce numbers, just to name a few. All of them numbers that can contain special characters and spaces, so not an Integer DataType.

And I picked the GL Account Number as an example because here it is less obvious. But fact is that people in Finance prefer to keep the same length for Account Numbers. Here I used 1300 and 8000 so a length of 4 positions. For numbers below 1000, they would not use for example 700, but instead 0700.

The leading 0 would disappear if you would store it in the Database as an Integer. Don't believe me? Check it out with this one:

select cast('0700' as int)

That's it for Today!

Changing the DataType in the Where conditions (the constraints) of a Query is something you should always avoid if possible. Especially on Columns that are part of an Index.

But even if you have avoided that, SQL Server might still do it for you. Not because you asked for it, but because you threw it off by supplying the wrond DataType.

So now you know why that is and that the impact on performance is dramatically, you should avoid that too.

If not, well... then you did ask for it! 😁

Do you want to comment on this post? Leave a response here and I will get in touch with you!

You can leave a comment or send a question to andre@andrespeek.com. Let me know if you got inspired and stay safe and healthy always!

An error has occurred. This application may no longer respond until reloaded. Reload 🗙