THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Adam Machanic

Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.

Splitting a string of unlimited length

There are many techniques for splitting a string in T-SQL (in other words, taking a character-delimited string and producing a table of the values), the best of which are encapsulated in Erland Sommarskog's famous article. My favorite of his string splitting techniques is adapted from a previous example that was created by Anith Sen.

Both of these resources are excellent, but sometimes you just need a little bit more. None of their solutions will split a string larger than 8000 characters in length. So I've produced my own modified version. If you're splitting smaller strings then by all means, use theirs! This is certainly slower, but I do believe it's the fastest way in SQL Server 2000 to split a very long string. You will require a numbers table, so make sure you have it on hand...

Anyway, the code:

CREATE FUNCTION dbo.SplitString
(
@List TEXT,
@Delimiter CHAR(1)
)
RETURNS @ReturnTbl TABLE
(
OutParam VARCHAR(20)
)
WITH SCHEMABINDING
AS
BEGIN
DECLARE @LeftSplit VARCHAR(7998)
DECLARE @SplitStart INT SET @SplitStart = 0
DECLARE @SplitEnd INT
SET @SplitEnd = 7997

SELECT
@SplitEnd = MAX(Number)
FROM dbo.Numbers
WHERE
(
REPLACE(SUBSTRING(@List, Number, 1), ' ', CHAR(255)) =
REPLACE(@Delimiter, ' ', CHAR(255))
OR Number = DATALENGTH(@List) + 1
)
AND Number BETWEEN @SplitStart AND @SplitEnd

WHILE @SplitStart < @SplitEnd
BEGIN
SET @LeftSplit =
@Delimiter +
SUBSTRING(@List, @SplitStart, @SplitEnd - @SplitStart) +
@Delimiter

INSERT @ReturnTbl
(
OutParam
)
SELECT
LTRIM
(
RTRIM
(
SUBSTRING
(
@LeftSplit,
Number + 1,
CHARINDEX(@Delimiter, @LeftSplit, Number + 1) - Number - 1
)
)
) AS Value
FROM dbo.Numbers
WHERE
Number <= LEN(@LeftSplit) - 1
AND REPLACE(SUBSTRING(@LeftSplit, Number, 1), ' ', CHAR(255)) =
REPLACE(@Delimiter, ' ', CHAR(255))
AND '' <>
SUBSTRING
(
@LeftSplit,
Number + 1,
CHARINDEX(@Delimiter, @LeftSplit, Number + 1) - Number - 1
)

SET @SplitStart = @SplitEnd + 1
SET @SplitEnd = @SplitEnd + 7997

SELECT
@SplitEnd = MAX(Number) + @SplitStart
FROM dbo.Numbers
WHERE
(
REPLACE(SUBSTRING(@List, Number + @SplitStart, 1), ' ', CHAR(255)) =
REPLACE(@Delimiter, ' ', CHAR(255))
OR Number + @SplitStart = DATALENGTH(@List) + 1
)
AND Number BETWEEN 1 AND @SplitEnd - @SplitStart
END

RETURN
END
GO

This code does nothing more than chunk up the string, using the delimiter as boundries, into small enough pieces (< 8000 characters) that CHARINDEX can be used on them. Then the same algorithm from Erland's split string is applied.

Note that this function does not gracefully handle situations such as passing in the wrong delimiter. So please carefully test your code before deploying this!

Using the function is quite simple:

 

SELECT * 
FROM dbo.SplitString('something, something else, etc, etc, etc...', ',')

 


Update, February 15, 2005: Fixed so that only 8000 numbers are needed in the Numbers table for this to work. Previously required as many numbers as were present in the string to be split.

Update, February 23, 2009: Four years since the last update and a new bug is discovered by the insightful Craig Hathaway! Fixed the function so that it now handles space as a delimiter. Thanks, Craig!


Published Wednesday, July 12, 2006 10:20 PM by Adam Machanic

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Tiraj said:

Well it says "Using the function is quite simple"

But im having problems (errors) even while tring to create the function...

March 14, 2007 2:15 AM
 

Adam Machanic said:

What error are you seeing?  It works fine on this end...

March 18, 2007 12:13 PM
 

Ali said:

It works fine. but when I use it in a Stored Procedure it gives error:

Incorrect syntax near 'SplitString'.

April 19, 2007 2:22 AM
 

Asghar said:

Hi Adam this is a great function but I have got string with such as ("hello","sdfsfs",....)

My problem is how can I pass two parremeters such as , and "".

and how can I insert the values into a table

Thanks

November 28, 2007 11:55 PM
 

David said:

Had to change the last section of the code to get it to work beyond 8,000 characters.  Great approach though.

_________________________________

--Start next chunk

SET @SplitStart = @SplitEnd + 1

SET @SplitEnd = @SplitEnd + 7998

IF ( @SplitEnd >= DATALENGTH(@List)   )

  SET @SplitEnd = DATALENGTH(@List) +1

ELSE

BEGIN

 SELECT @SplitEnd = @SplitStart + MAX(Number)

 FROM dbo.Numbers

 WHERE (SUBSTRING(@List, Number, 1) = @Delimiter

 OR Number = DATALENGTH(@List) + 1

  )

AND Number BETWEEN @SplitStart AND @SplitEnd

END

February 5, 2008 4:07 PM
 

David said:

Please ignore the previous post - that didn't fix it.  The issue seemed to be with parsing the last number in the last chunk.  I added

    AND CHARINDEX(',', @LeftSplit, Number + 1) > 0

to the key SELECT statement.

February 5, 2008 4:36 PM
 

Adam Machanic said:

"hickymanz" asked in the SQL Server Central forums for a method of counting unique words in a text column.

January 25, 2009 2:52 PM
 

Adam Machanic said:

Yes, another string splitting UDF from a guy who's obvioiusly become obsessed with TSQL string splitting.

January 25, 2009 2:54 PM
 

Craig Hathaway said:

Hi, this function times out when the delimiter is the <space> character - for splitting sentences into words (etc). I suspect thsi has to do with trimming...

E.g.

SELECT *

FROM dbo.[SplitString]( 'test one, two, three',' ')

February 23, 2009 12:15 AM
 

Adam Machanic said:

Hi Craig,

Busy day at the office?  Nothing better to do than rip apart my old blog posts? <g>

The function has been fixed... Thanks for reporting this!

February 23, 2009 11:55 AM
 

techvslife said:

Thanks.  I notice Erland has a new version, & it will split nvarchar(max), i.e. more than 8000 chars: (long article but search for "Chunking Multi-Statement Function")

http://www.sommarskog.se/arrays-in-sql-2005.html  

(& btw, isn't text type deprecated now.)

(haven't tested both yet.)

July 19, 2009 2:04 AM
 

TechVsLife said:

I altered Erland's function to include wordposition and separator position (maybe in a highly inefficient way, but it seems to work correctly).  

To test it on large strings, you can run something like:

SELECT * FROM dbo.tfnSplitString(REPLICATE(CAST(',AB,CD,EFG,HI,MNZ,QR,UVW,YZ1' AS nvarchar(max)), 6000), DEFAULT) ORDER BY WordPos;

(Note that my numbers table is called "Number" and the number column in it is "Num.")

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE FUNCTION [dbo].[tfnSplitString] (

@StringToSplit NVARCHAR(MAX),

@Separator NCHAR(1) = N','

)

RETURNS @t TABLE (Word nvarchar(4000) NOT NULL, WordPos int not null, SepPos int not null) AS

BEGIN

  DECLARE @slices TABLE (

slice nvarchar(4000) NOT NULL,

slicenum int NOT NULL)

  DECLARE @slice nvarchar(4000),

          @textpos int = 1,

          @maxlen  int = 4000-2,

          @stoppos int,

          @slicenum int = 0

  WHILE datalength(@StringToSplit) / 2 - (@textpos - 1) >= @maxlen

  BEGIN;

 SET @slice = substring(@StringToSplit, @textpos, @maxlen);

 SET @stoppos = @maxlen -

charindex(@Separator COLLATE Latin1_General_100_BIN2,

reverse(@slice));

 INSERT @slices (slice, slicenum)

VALUES (@Separator + left(@slice, @stoppos) + @Separator, @slicenum);

 SET @textpos = @textpos - 1 + @stoppos + 2;

 SET @slicenum +=1;

 -- On the other side of the comma.

  END;

  INSERT @slices (slice, slicenum)

      VALUES (@Separator + substring(@StringToSplit, @textpos, @maxlen) + @Separator, @slicenum);

  WITH wordget (Word, SepPos) AS (

     SELECT ltrim(rtrim(substring(s.slice, N.Num + 1,

               charindex(@Separator COLLATE Latin1_General_100_BIN2,

                         s.slice, N.Num + 1) -

               N.Num - 1))),                

               charindex(@Separator COLLATE Latin1_General_100_BIN2,

                         s.slice, N.Num + 1)+(slicenum*@maxlen)-1

      FROM  Number N

      JOIN  @slices s

        ON  N.Num <= len(s.slice) - 1

       AND  substring(s.slice, N.Num, 1) = @Separator COLLATE Latin1_General_100_BIN2

  )

  INSERT @t (Word, WordPos, SepPos)

     SELECT Word, ROW_NUMBER() OVER (ORDER BY SepPos) AS WordPos, SepPos FROM wordget;

  RETURN;

END

July 19, 2009 12:22 PM
 

TechVsLife said:

I made a one-off error (--not the first time):

replace "(slicenum*@maxlen)-1"

with

"(slicenum*(@maxlen-1))-1"

(the mistake only occurs once, or only the 'zeroth' time, depending on how you count.)

July 19, 2009 4:12 PM
 

TechVsLife said:

Sorry, recorrected (one hopes for the last time)--it wasn't a one-off error, though it looked like it.  (--also, there may be a more efficient approach to adding separator position.)

NOTE: assumes the existence of a numbers table called 'Number' with an integer primary key column 'Num' that has consecutive integers (requires 4,000).

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE FUNCTION [dbo].[tfnSplitString] (

@StringToSplit NVARCHAR(MAX),

@Separator NCHAR(1) = N','

)

RETURNS @t TABLE (Word nvarchar(4000) NOT NULL, WordPos int not null, SepPos int not null) AS

BEGIN

/*

assumes the existence of a numbers table called 'Number' with an integer primary key column 'Num'.

*/

  DECLARE @slices TABLE (

slice nvarchar(4000) NOT NULL,

RunningSliceLen int NOT NULL)

  DECLARE @slice nvarchar(4000),

          @textpos int = 1,

          @maxlen  int = 4000-2,

          @stoppos int,

          @RunningSliceLen int = -1

  WHILE datalength(@StringToSplit) / 2 - (@textpos - 1) >= @maxlen

  BEGIN;

 SET @slice = substring(@StringToSplit, @textpos, @maxlen);

 SET @stoppos = @maxlen -

charindex(@Separator COLLATE Latin1_General_100_BIN2,

reverse(@slice));

 INSERT @slices (slice, RunningSliceLen)

VALUES (@Separator + left(@slice, @stoppos) + @Separator, @RunningSliceLen);

 SET @textpos = @textpos - 1 + @stoppos + 2;

 SET @RunningSliceLen = @RunningSliceLen + DATALENGTH(@Separator + left(@slice, @stoppos))/2;

 -- On the other side of the comma.

  END;

  INSERT @slices (slice, RunningSliceLen)

      VALUES (@Separator + substring(@StringToSplit, @textpos, @maxlen) + @Separator, @RunningSliceLen);

  WITH wordget (Word, SepPos) AS (

     SELECT ltrim(rtrim(substring(s.slice, N.Num + 1,

               charindex(@Separator COLLATE Latin1_General_100_BIN2,

                         s.slice, N.Num + 1) -

               N.Num - 1))),                

               charindex(@Separator COLLATE Latin1_General_100_BIN2,

                         s.slice, N.Num+1)+(s.RunningSliceLen)

      FROM  Number N

      JOIN  @slices s

        ON  N.Num <= len(s.slice) - 1

       AND  substring(s.slice, N.Num, 1) = @Separator COLLATE Latin1_General_100_BIN2

  )

  INSERT @t (Word, WordPos, SepPos)

     SELECT Word, ROW_NUMBER() OVER (ORDER BY SepPos) AS WordPos, SepPos FROM wordget;

  RETURN;

END

July 19, 2009 7:18 PM

Leave a Comment

(required) 
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based SQL Server developer, writer, and speaker. He focuses on large-scale data warehouse performance and development, and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "SQL Server 2008 Internals" (Microsoft Press, 2009) and "Expert SQL Server 2005 Development" (Apress, 2007). Adam regularly speaks at conferences and training events on a variety of SQL Server topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server, a Microsoft Certified IT Professional (MCITP), and an alumnus of the INETA North American Speakers Bureau.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement