how-to-split-a-comma-separated-value-(csv)-into-columns-in-sql?

When dealing with a considerable quantity of strings possessing varying characteristics, dividing them into columns proves beneficial. This facilitates organized data, ultimately making the retrieval of values within each column significantly more manageable. In this blog, we will delve into various methods to partition comma-separated values into columns in SQL, along with examples for illustration.

Contents Overview:

Techniques for Dividing a Comma-separated Value (CSV) into Columns in SQL

Methods like String-split, row_number, and OpenJSON with clauses are advantageous for transforming comma-separated values into columns.

Technique 1: STRING_AGG with CTE in SQL

This technique utilizes the String_Split method to segment a CSV into rows, followed by CTE (Common Table Expressions) that assigns row numbers.

Example:

DECLARE @csv NVARCHAR(MAX) = 'Pebble,Stone,Rock';
WITH CTE AS (
    SELECT value, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS rn
    FROM STRING_SPLIT(@csv, ',')
)
SELECT STRING_AGG(value, ' | ') AS Result
FROM CTE;

Output:

STRING_AGG with CTE in SQL

Explanation: This technique successfully converted the strings into columns.

Technique 2: Using OPENJSON() in SQL

OPENJSON is a native function in SQL Server that converts JSON-formatted strings into columns. This function aids in transforming JSON objects and arrays into a tabular layout, simplifying data processing.

Example:

DECLARE @json NVARCHAR(MAX) = '[{"Col1":"Apple", "Col2":"Red", "Col3":"Sweet"},
                               {"Col1":"Banana", "Col2":"Yellow", "Col3":"Sweet"},
                               {"Col1":"Lemon", "Col2":"Yellow", "Col3":"Sour"}]';
SELECT *
FROM OPENJSON(@json)
WITH (
    Col1 NVARCHAR(50) '$.Col1',
    Col2 NVARCHAR(50) '$.Col2',
    Col3 NVARCHAR(50) '$.Col3'
);

Output:

Using OPENJSON() in SQL

Explanation: The input was provided in the format of JSON arrays and subsequently transformed into a tabular column with their respective attributes. In this case, the apple is characterized as red and sweet. The array is organized into columns based on these attributes.

Technique 3: Using Dynamic SQL with STRING_SPLIT & PIVOT

This method involves utilizing STRING_AGG to gather the records, STRING_SPLIT to convert CSV to rows, and the PIVOT function to dynamically transform rows into columns.

Example:

CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,
    EmployeeName NVARCHAR(100),
    ManagerID INT,
    FOREIGN KEY (ManagerID) REFERENCES Employees(EmployeeID)
);
INSERT INTO Employees (EmployeeID, EmployeeName, ManagerID)
VALUES 
    (1, 'Alice', NULL), 
    (2, 'Bob', 1),     
    (3, 'Charlie', 1),   
    (4, 'David', 2),       
    (5, 'Eve', 2);
DECLARE @csv NVARCHAR(MAX);

-- Retrieve CSV of Employee Names
SELECT @csv = STRING_AGG(EmployeeName, ',') FROM Employees;

-- Set Up Temporary Table for Splitting
IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL DROP TABLE #TempTable;
CREATE TABLE #TempTable (
    ColIndex INT,
    Value NVARCHAR(100)
);

-- Insert Data into Temporary Table
INSERT INTO #TempTable (ColIndex, Value)
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS ColIndex, value
FROM STRING_SPLIT(@csv, ',');

-- Create Dynamic Column Names
DECLARE @cols NVARCHAR(MAX) = '';
SELECT @cols += '[Col' + CAST(ColIndex AS NVARCHAR) + '], ' FROM #TempTable;
SET @cols = LEFT(@cols, LEN(@cols) - 1); -- Trim trailing comma

-- Prepare Dynamic Pivot Query
DECLARE @query NVARCHAR(MAX) = '
SELECT ' + @cols + ' FROM 
(SELECT Value, ''Col'' + CAST(ColIndex AS NVARCHAR) AS ColName FROM #TempTable) AS SourceData
PIVOT (MAX(Value) FOR ColName IN (' + @cols + ')) AS PivotTable;';

-- Execute Dynamic SQL
EXEC sp_executesql @query;

-- Cleanup
DROP TABLE #TempTable;

Output:

Using Dynamic SQL with STRING_SPLIT & PIVOT

Explanation: We established a database of employees, complete with manager IDs, where the names were processed through string_split and then dynamically converted into columns using PIVOT.

Technique 4: Using SUBSTRING_INDEX () in SQL

“`html

The substring_Index is a string function that transforms delimited strings into distinct columns; however, the downside of this approach is that we must do everything manually, such as determining how many columns are required, and multiple invocations are needed for each segment of the string to convert them into columns. 

Example:

SELECT 
 -- Retrieves 'Apple'
 SUBSTRING_INDEX(csv, ',', 1) AS Col1,   

 -- Retrieves 'Banana'
SUBSTRING_INDEX(SUBSTRING_INDEX(csv, ',', 2), ',', -1) AS Col2,   

  -- Retrieves 'Cherry'
SUBSTRING_INDEX(SUBSTRING_INDEX(csv, ',', 3), ',', -1) AS Col3
FROM (SELECT 'Apple,Banana,Cherry' AS csv) AS t;

Output:

Using SUBSTRING_INDEX () in SQL

Explanation: We accessed each string and transformed them into columns. 

Method 5: Utilizing UNNEST ( ) alongside STRING_TO_ARRAY ( ) in SQL

The STRING_TO_ARRAY function is implemented to divide the strings into arrays, subsequently, UNNEST will broaden the array elements into rows.

Example:

CREATE TABLE AnimalOffspring (
    Animal_Offspring TEXT
);

INSERT INTO AnimalOffspring (Animal_Offspring) VALUES
('Lion,Cub'),
('Elephant,Calf'),
('Dog,Puppy'),
('Cat,Kitten'),
('Cow,Calf');
SELECT 
    (STRING_TO_ARRAY(Animal_Offspring, ','))[1] AS Animal,
    (STRING_TO_ARRAY(Animal_Offspring, ','))[2] AS Offspring 
FROM AnimalOffspring;

Output:

Using UNNEST ( ) with STRING_TO_ARRAY ( ) in SQL

Explanation: The string_to_array function converted all the strings into an array, then transitioned from rows to columns.

Method 6: Utilizing REGEXP_MATCHES () in SQL

The REGEXP_SUBSTR function extracts substrings from comma-separated values (CSV) into columns. REGEXP_MATCHES serves as a potent tool that can be utilized in PostgreSQL Server. 

Example:

CREATE TABLE StudentSubjects (
    Student_Info VARCHAR(100)
);
INSERT INTO StudentSubjects (Student_Info) VALUES ('John,Math,85');
INSERT INTO StudentSubjects (Student_Info) VALUES ('Alice,English,90');
INSERT INTO StudentSubjects (Student_Info) VALUES ('Bob,Science,75');
SELECT 
    (REGEXP_MATCHES(Student_Info, '^([^,]+)', 'g'))[1] AS Student_Name,
    (REGEXP_MATCHES(Student_Info, ',([^,]+),', 'g'))[1] AS Subject,
    (REGEXP_MATCHES(Student_Info, '([^,]+)$', 'g'))[1] AS Marks
FROM StudentSubjects;

Output:

Using REGEXP_MATCHES () in SQL

Explanation: The REGEXP_MATCHES function will extract the substring, and it will then generate a column based on the specific element. 

Method 7: Implementing STRING_SPLIT() with CROSS APPLY in SQL Server

The STRING_SPLIT() function in SQL Server segments the comma-separated values into various columns, while CROSS APPLY assists with data stored within a table.

Example:

CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,
    EmployeeName NVARCHAR(50),
    Skills NVARCHAR(MAX) -- CSV formatted skills
);
INSERT INTO Employees (EmployeeID, EmployeeName, Skills) VALUES
(1, 'Karan', 'SQL,Python, Excel'),
(2, 'Sara', 'Java,SQL,C#'),
(3, 'Charan', 'Python ,PowerBI');
SELECT EmployeeID, EmployeeName, value AS Skill
FROM Employees
CROSS APPLY STRING_SPLIT(Skills, ',');

Output:

Using STRING_SPLIT() with CROSS APPLY in SQL Server

Explanation: The STRING_SPLIT() in conjunction with CROSS APPLY effectively retrieved all the data present in the string and transformed it into columns. 

Performance Consideration

Methods Performance  Best Use Case Limitations
STRING_AGG with CTE Performance is optimal for small and medium datasets. It is most effective when utilizing the aggregate function and reorganizing data. Inapplicable for large datasets.
OPENJSON() Operates effectively solely with JSON files. Ideal scenario when structured or nested data is utilized.  It necessitates JSON file format to operate.
Dynamic SQL with STRING_SPLIT & PIVOT Performs well with small to medium datasets.  When converting CSV values into columns and for dynamic reporting. Insufficient indexing, and STRING_SPLIT does not guarantee ordered output.
SUBSTRING_INDEX() Executes faster with a predetermined number of strings. When the exact splitting position is established. Works effectively only when the element count is known.
UNNEST() with STRING_TO_ARRAY() Functions extremely efficiently with datasets. Best scenario for simple CSV to row transformation. Lacks built-in ordering. 
REGEXP_MATCHES() Executes slower due to the regex overhead. Performs well for intricate patterns. Not optimal for large datasets.
STRING_SPLIT() with CROSS APPLY Executes faster for straightforward splits. For simple and easy CSV to row conversions. Cannot perform ordering without explicit conditions. 

Real-world Use cases 

Case 1: Employing the string_to_array and split_part methods to retrieve order details:

Example:

CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    OrderDetails TEXT -- Storing CSV as text
);
INSERT INTO Orders (OrderID, OrderDetails) VALUES
(101, 'Laptop, Mouse, Bag'),
(102, 'Phone, Charger, Gun'),
(103, 'Keyboard, Piano');
SELECT OrderID, unnest(string_to_array(OrderDetails, ',')) AS Item,
       split_part(OrderDetails, ',', 1) AS Item1,
       split_part(OrderDetails, 
``````html
',', 3) AS Item2,
       split_part(OrderDetails, ',', 2) AS Item3
FROM Orders;

Result:

Real-world Use cases 1

Clarification: The string_to_array and split_part have transformed the orders list into separate columns. Items are divided according to their sequence in the array. For instance, the Bag and Gun are in the third position on the list. The second item represents the orders in this third position.

Scenario 2: Utilizing cross-apply to fetch event timestamps

Illustration:

CREATE TABLE Logs (
    LogID INT PRIMARY KEY,
    Timestamps NVARCHAR(MAX) -- Storing CSV timestamps as text
);

INSERT INTO Logs (LogID, Timestamps) VALUES
(1, '2024-02-19,2024-02-20,2024-02-21'),
(2, '2024-02-22,2024-02-23');
SELECT LogID, value AS EventTimestamp
FROM Logs
CROSS APPLY STRING_SPLIT(Timestamps, ',');

Result:

Real-world Use cases 2

Clarification: The cross-apply string split has transformed the series of timestamps into individual columns.

Summary

The separation of a comma-delimited value (CSV) into columns in SQL can be accomplished through various approaches, depending on the database server. Techniques such as STRING_SPLIT, ROW_NUMBER, and OPENJSON are employed to parse CSV into columns. String functions like SUBSTRING_INDEX, STRING_TO_ARRAY can be selected based on the efficiency requirements and the complexity of the data.

Common Queries

The article How to Split a Comma-separated Value (CSV) into Columns in SQL? was first published on Intellipaat Blog.

“`


Leave a Reply

Your email address will not be published. Required fields are marked *

Share This