I stumbled upon an interesting little nuance of SQL Server’s behaviour over the past few days that I was not aware of and I figure its worth talking about it here so others are aware. It concerns the handling of the german character “ß” which I understand to be german shorthand for “ss” (I’m no expert on language or linguists so don’t beat me up if I’m wrong about that).
In short, two NVARCHAR values that differ only by one using “ß” and the other using “ss” will be treated as the same. This is easily demonstrated using the following code snippet:
(Collation on my database is set to ‘SQL_Latin1_General_CP1_CI_AS’)
Notice that casting the values as NVARCHAR (which is what the N'' notation does) causes SQL Server to treat them as the same. Obviously this is going to cause a problem if you need to treat those as distinct values (such as inserting into a column with a unique key upon it – which is the problem I encountered that caused me to stumble across this)
There is a bug submission to Connect regarding this issue at 'ß' and 'ss' are NOT equal in which a member of the SQL Server team says:
Our current behavior follows the SQL/ISO standard and unless those standards are updated with the latest changes we don't intend to change the behavior in SQL Server. Changing existing SQL Server behavior has lot of implications and today we rely on Windows for all of our windows collation sorting capabilities. If and when in the future Windows adopts these new rules / Unicode standard we will incorporate it in SQL Server.
In other words, SQL Server is merely following the behaviour as defined by the International Standards Organisation so its not a bug in SQL Server as such, just a nuance that one needs to be aware of. And now you are.
For more unicode fun and games check out Dark corners of Unicode