Varchar(10) versus Varchar(8000)

So let's see how this works with a simple example. For testing purposes, you can create this table in a test database first:

create table VarCharTests (
  ID int identity primary key clustered,
  Code varchar(10),
  LongCode varchar(8000),
  Description varchar(250)
)

As you can see I used a Column Code which is a varchar(10) which should in many cases be sufficient. But you might never now, so I added a Column LongCode and made that a varchar(8000). No way someone will come up with a Code that long.

Now if someone would add a Code with 11 characters, the Code Column would protest and tell you it cannot insert that value since data would be truncated. With the LongCode Column, it would never be an actual problem. And since the diskspace needed would be the same, it looks like his logical conclusion was right!

But do you want this?

Typically some input validation should be done from the application. But that is something the developer should add in there, it is not there by default.

Now when a developer forgets to add that specific validation, you could end up with messy data in your database. For example, if a user accidentally types in the Description (which can be 250 characters) in the Code field. Without the validation in the application, it would fall through without anyone knowing.

That is why I prefer to use specific lengths for Varchar Columns. I prefer an ugly error message over messy or even corrupted data.

Let's add some indexes

Ok, so we have our simple Table to do some testing. Let's add some indexes for better performance, shall we? Here is how you can add them:

create index IX_Code on VarCharTests (Code)
create index IX_LongCode on VarCharTests (LongCode)

First one goes without any problem. But we get a warning on the second one:

Warning! The maximum key length for a nonclustered index is 1700 bytes.
The index 'IX_LongCode' has maximum length of 8000 bytes.
For some combination of large values, the insert/update operation will fail.

Now this tells you that you can store 8000 characters in the Column itself, but the maximum for the index is 1700. A LongCode of 1701 characters can be stored in the Column, but not in the index. Now if SQL Server is doing an Index Scan to retrieve data, it will completly skip that record since it is not in the index. Good luck with all the unexpected results you might get!

So, varchar(8000) for each Column is not such a good idea when you want to index that Column. Then why not use varchar(8000) from most of the Columns, and use varchar(1700) for Columns we want to use for indexing?

Let's add some data and see

Let's put some records in our Table, just some dummy rows with this query that will add 100 rows.

insert into VarCharTests(Code, LongCode)
values ('A', 'A')
GO 100

Now we have some data, we can have a look if there is a difference between the two. Be sure to hit the Actual Execution Plan button:

select Code from VarCharTests
select LongCode from VarCharTests

Now let's have a look at the Execution Plan.

Both queries take 50% of the execution time, so they are just as fast. No problem there.

But look at the details of the Index Scans. With the Code Column, SQL server expects 16 bytes for each row. Using the LongCode Column, that changes to 4011 bytes.

Why is this a problem?

With our Test Table with just 100 rows, this is not that much of a problem. But if you would have 1.000.000 rows, SQL Server expects about 15MB is needed to run the query for the varchar(10) version and allocates that in memory. For the varchar(8000) version, SQL Server expects 3.8GB (yes, gigabytes) is needed and allocates as much of that as it possibly can in memory.

In other words, making Colums way much longer than is actually needed, is throwing SQL Server off while creating an Execution Plan. And it will allocate way too much memory to execute a Query than is actually needed.

You are simply wasting one of the most valuable resources of your SQL Server.

Just don't do it!

As you can see, the devil is, as usual, in the detail. While it is not obvious at first glance, there is a serious disadvantage when giving Colums much more length than needed. Just don't do it.

When in doubt, just give it a little bit of extra length. For example, in the Netherlands a Phonenumber can be 10 digits. Some users might throw in a dash and a few spaces, so don't make it a varchar(10). A varchar(15) will do, or a varchar(20) at most. Keep your Columns lean and mean!

That's it for today!

As you can see there is a very good reason why in most databases the Column length is maximized to what practically would go in that Column. Feel free to add some extra room in there to be prepared for future changes. I don't want to go through the whole application when at some point in time it decided that a Phonenumber can now be 11 digits long. Add 50 to 100% if that makes you feel more comfortable.

Just don't multiply it with 100 or more just to be on the safe side since you are just wasting valuable resources.

You can leave a comment or send a question to andre@andrespeek.com. Let me know if you got inspired and stay safe and healthy always!