Splitting Strings: How to Work with Consistent Data in SQL Server

How do you provide a consistent pattern of data when it's in different formats? Today, we write a Split function to make data consistent in our tables.

Written by Jonathan "JD" Danylko • Last Updated: October 16^th, 2017 • Develop •

If you're a developer, you always find sanctuary in code, but there are times when you're presented with data in an unusual format and things can get...strange.

For example, I'm working with a client where they provide me with an Excel spreadsheet of rows and want to import the data into SQL Server.

So until I get these maintenance screens in place, I continue to import the data. I already wrote one of the data screens based on a previous post I wrote called Import a Data Hierarchy From Excel into SQL Server.

Now I know what you're thinking. Why not use DTS (Data Transformation Services) of SQL Server?

When importing the data, you have an option to write a SQL Statement or define the columns and their sizes. While this provides a robust front-end for importing data, there are more inconsistencies to this particular approach.

I started down this path and found it way too time-consuming so I decided to look for another way.

Back to the spreadsheet.

In this Excel spreadsheet, the data wasn't consistent at all.

Names are in one cell ("John Doe") instead of in separate cells (First Name, John, Last Name Doe).
Phone Numbers are in different formats ('(999) 999-9999', '999-999-9999', '999.999.9999', '(999)999-9999 Ext.99', etc.)

Since we want our initial load of data to be consistent in the database, we need to "massage" the data into a usable format.

In today's post, since I've been wrestling with inconsistent data this weekend, I'll show a quick and simple splitter to help with manipulating strings which will save you hours of work.

Creating the Function

Since this is just for an initial load of data, this code is not meant for a production environment.

However, I've been using this function for a long time. It's one of the SQL Server functions I use in my arsenal and I find it quite useful.

The function, when created, is stored in the Table-valued Functions (<database> > Programmability > Functions > Table-valued Functions) and is always available.

CREATE FUNCTION [dbo].[Split] (
    @delimited NVARCHAR(MAX),
    @delimiter NVARCHAR(100))
RETURNS @t TABLE (
    id INT IDENTITY (1, 1)
   ,val NVARCHAR(MAX)
)
AS
BEGIN
    DECLARE @xml XML
    SET @xml = 
        N'<t>' + 
            REPLACE(@delimited, @delimiter, '</t><t>') 
        + '</t>'
    INSERT INTO @t (val)
        SELECT
            r.value('.', 'varchar(MAX)') AS item
        FROM @xml.nodes('/t') AS records (r)
    RETURN
END

Very quick and easy. The dbo.Split function works like a table.

SELECT * FROM dbo.Split('John Doe',' ')

Execute this statement and you receive the following output.

id	val
1	John
2	Doe

As you can see, this makes splitting names very easy. If you want to save the values into the fields FirstName and LastName, it's as easy as:

DECLARE @FirstName VARCHAR(100);
DECLARE @LastName VARCHAR(100);
SELECT @FirstName=val FROM dbo.Split(@Person,' ') WHERE id=1

SELECT @LastName=val FROM dbo.Split(@Person,' ') WHERE id=2

Now, I've ran into a couple issues where the name in the Excel spreadsheet is "John A. Doe". If you have a middle initial, it will still split it out, but you'll have 3 records instead of two.

Perform a count on the resulting split to determine the best approach for your data.

For my needs, I only added the middle name variable to the first name variable and save the FirstName to the table.

What about the Phone Numbers?

The phone numbers required additional massaging to get the format we need.

The client's database requires all numbers in the field. No dashes, periods, or parentheses.

We need to remove any extraneous characters and perform the same split.

-- (In Excel spreadsheet, possible phone values are '(xxx) xxx-xxxx', 
-- 'xxx-xxx-xxxx', 'xxx.xxx.xxxx', '(xxx)xxx-xxxx Ext.99'...or anything else).
DECLARE @PhoneSplitter VARCHAR(30);

--Test
SELECT @PhoneSplitter='999.999.9999';
-- Remove the area code parentheses '()'
SELECT @PhoneSplitter = REPLACE(@PhoneSplitter,')',' ');
SELECT @PhoneSplitter = REPLACE(@PhoneSplitter,'(',' ');
-- Remove the periods
SELECT @PhoneSplitter = REPLACE(@PhoneSplitter,'.',' ');
-- Remove the dashes
SELECT @PhoneSplitter = REPLACE(@PhoneSplitter,'-',' ');
-- Remove extra spaces
SELECT @PhoneSplitter = REPLACE(@PhoneSplitter,'  ',' ');
SELECT * FROM dbo.Split(@PhoneSplitter,' ');

Your output will consist of the following:

id	val
1	999
2	999
3	9999

Concatenate, cast as an integer, and save them to your table.

Again, you may have a phone number which includes an extension ("ext. 2044" or "x64") meaning you'll have 4 records instead of 3.

Adjust as necessary for your data needs.

Conclusion

I find it ironic working with databases where I have problems manipulating data to my needs.

Maybe it's because I'm not a DBA or maybe I've been spoiled by C#'s string manipulation methods? ;-)

This particular function is one of my tried-and-true functions which I use every time I'm in SQL Server when I have to massage or manipulate data.

In today's post, I demonstrated how to use XML to perform string manipulation and how it's meant to be a starting point for full-stack developers to discover additional ways in writing their own database "functions" to speed up the unusual ways of data manipulation.

Was there a better way to perform this? Do you have an existing library of database functions? Post your comments below and let's discuss.

ASP.NET 8 Best Practices by Jonathan Danylko

Reviewed as a "comprehensive guide" and a "roadmap to excellence" with over 120 Best Practices for ASP.NET Core 8, Jonathan's first book by Packt Publishing explores proven techniques for every phase of the SDLC.

Learn industry-standard concepts to improve your coding, debugging, and deployment of ASP.NET Core websites.

Jonathan "JD" Danylko is an author, web architect, and entrepreneur who's been programming for over 30 years. He's developed websites for small, medium, and Fortune 500 companies since 1996.

He currently works at Insight Enterprises as an Architect.

When asked what he likes to do in his spare time, he replies, "I like to write and I like to code. I also like to write about code."

comments powered by Disqus

What's New

October 2^nd, 2024

Created a presentation on Improving Website SEO (Search Engine Optimization) using ASP.NET.
September 9^th, 2024

Updated the About JD page by adding my book and other tidbits
January 10^th, 2024

Added more reviews for my latest book, "ASP.NET 8 Best Practices."
June 22^nd, 2023

Amazon posted the pre-order page for my book (and yes, I'm freaking out a bit).
May 26^th, 2023

Added the Stir Trek 2023 videos link to the post.
November 3^rd, 2022

Updated Technology Trends for Developers by adding the Web Almanac (State of the Web Report)
July 4^th, 2022

Taking it easy today...Happy 4th of July everyone!
June 15^th, 2022

Yikes! My first vidcast EVER! Thanks Ed for the opportunity.
March 14^th, 2022

Updated Collection:Git Resources for Beginners w/ 20 Git Commands for Beginners
March 14^th, 2022

Updated Collection: Web API Best Practices w/ How to design REST APIs and The 10 REST Commandments.

Welcome

Jonathan Danylko Hi! Welcome to DanylkoWeb. This is the PERSONAL web site of Jonathan "JD" Danylko where I focus on Microsoft web technologies including ASP.NET using C#, Web Performance, Code Exorcisms (refactorings), and business lessons learned over a period of 30 years of programming.

I've also collected the best ways to learn C#.

If you have any questions, feel free to contact me.

Search

Morning Coffee Link

June 30^th, 2025Six Foot Coder: Blazor Server Website does not work with AT&T Mobile Data

View All

Be Social

Follow my blog with Bloglovin

Latest Book

Amazon Unbound

By Brad Stone

ISBN: 1398500976 / 978-1398500976

See the latest books I've read at the Reading Corner.