Collations are one of the most often overlooked properties in database deployment, usually taken for granted. Collation is a set of rules used for sorting and comparing data. It determines different properties, like case sensitivity, accent sensitivity, and other local rules. It also defines the consistency of the way data is stored and retrieved in an RDBMS.

There are exactly three levels of collation definition in the SQL Server:

  • server
  • database
  • column

If you want a deep dive and learn more about collation, visit a dedicated MS learn page.

Why collation is important

I’ll try to demonstrate the importance of a collation setting through an example. As mentioned before, we can set up a collation on a column level, so I will use that option to explain.

Let’s create a simple table. It has two string columns, one with a case-insensitive collation and the other with a case-sensitive counterpart.

DROP TABLE IF EXISTS #t;
 
CREATE TABLE #t
(
    Id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
    TokInsensitive VARCHAR(10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL /*Case Insensitive Collation*/,
    TokSensitive VARCHAR(10) COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL /*Case Sensitive Collation*/,
);

Obviously, I’ve watched a few TikToks while writing this post.

Now, let’s insert some data in the new table we’ve just created.

INSERT INTO #t
(
    TokInsensitive,
    TokSensitive
)
VALUES
('TikTok', 'TikTok'),
('Tiktok', 'Tiktok'),
('tiktok', 'tiktok');

And yes, as you can see from the code snippet above, we did play around with letter cases in our entries.

Now, let’s see how the behavior of our outputs changes depending on the collation.

SELECT

We’ll now try to retrieve some rows from our table to see what happens. Let’s first try to get data regardless of letter cases, both minuscule and capital letters included. To do this, we’ll need to make sure our columns are “case insensitive”.

SELECT 'InsensitiveSelect' AS Descr,
       *
FROM #t
WHERE TokInsensitive = 'TikTok';

As you can see from the results below, the table is retrieved with all the instances of letter cases.

select-sensitive results

If, however, you wish to retrieve specific data, which in our case would be case-sensitive, you’d need to specify that with the SensitiveSelect flag, like in the example below.

SELECT 'SensitiveSelect' AS Descr,
       *
FROM #t
WHERE TokSensitive = 'TikTok';

In this case, your results will show the first instance that matched the query.

select-insensitive results

GROUP BY

Things can get interesting when trying to group your data with the help of collation. If you try to organize data over our insensitive column, all results will be tallied up.

Let’s examine the SQL code sample below to see what I’m talking about.

SELECT 'InsensitiveGrouping' AS Descr,
       TokInsensitive,
       COUNT(*) cnt
FROM #t
GROUP BY TokInsensitive;

As you can see, all letter case instances were considered and aggregated as one result.

group-insensitive results

Now, let’s see how the results differ if you try to group over the sensitive counterpart. In this instance, the code sample will look like this.

SELECT 'SensitiveGrouping' AS Descr,
       TokSensitive,
       COUNT(*) cnt
FROM #t
GROUP BY TokSensitive;

The data fetched will now differ greatly from the first example. All instances of letter cases will be interpreted and displayed as separate results.

group-sensitive results

Moral of the story

As you can see, this humble property can lead to a variety of amusing results during tests. Still, it can also cause absolute mayhem when it gets forgotten or taken for granted during the deployment of production instances.

So take good care of your collations, folks!

thats-it-folks-looney-tunes