jump to navigation

Split Strings in Oracle 4 November 2009

Posted by David Alejo Marcos in SQL - PL/SQL.
Tags:
trackback

I received an email from one of our developers asking the following question:

Do you know how to transform a horrid legacy column consisting of semicolon-separated strings into a view containing one row per (originally semicolon-separated) value? For example:

Existing table:

Row1: Value1; Value2;

Row2: Value3; Value4; Value5;

Desired view based on the above:

Row1: Value1

Row2: Value2

Row3: Value3

Row4: Value4

Row5: Value5

I thought it was quite an interesting question, so I created a simple test:

davidmarcos@devdb>  create table mytest (pos number, description varchar2(1000));

Table created.

Elapsed: 00:00:00.40

davidmarcos@devdb> insert into mytest values (1,’value1;value2;value3′);

1 row created.

Elapsed: 00:00:00.01

davidmarcos@devdb> insert into mytest values (2,’value4;value5′);

1 row created.

Elapsed: 00:00:00.03

davidmarcos@devdb> insert into mytest values (3,’value6′);

1 row created.

Elapsed: 00:00:00.04

davidmarcos@devdb> insert into mytest values (4,’value7;value8;value9;value10;value11′);

1 row created.

Elapsed: 00:00:00.28

davidmarcos@devdb> commit;

Commit complete.

Elapsed: 00:00:00.03

davidmarcos@devdb> select * from mytest;

POS      DESCRIPTION

——- ————————————

1        value1;value2;value3

2       value4;value5

3       value6

4       value7;value8;value9;value10;value11
Elapsed: 00:00:00.06

The first thing I noticed is that the number of values are variable and the column can be null. The easiest way I found to find out how many different values we had per row was counting the number of “;” and adding 1 (last value does not have semicolon):

davidmarcos@devdb>select description,
2    pos,
3    (length(description)-length(replace(description,’;’,”)))+1 cnt
4    from mytest
5   where description is not null;

DESCRIPTION                                          POS    CNT

—————————————— ——- ———-

value1;value2;value3                1          3

value4;value5                        2          2

value6                                    3          1

value7;value8;value9;value10;value11           4          5

Elapsed: 00:00:00.03

This query will become the core of my select.

The next step was to split the string. I decide to use regular expressions (regexp_substr to be more specific).

To split the string I used the following:

regexp_substr(description,'[^;]+’,1,level)

Where:

description is the field.

^; indicates the string beginning with “;”

1 indicates where the search should start (in this case, the first string after “;”)

Level tells oracle which occurrence to search for.

The query looks like:

with data
as
(
select description,
pos,
(length(description)-length(replace(description,’;’,”)))+1 cnt
from mytest
where description is not null
)
select distinct ltrim(regexp_substr(description,'[^;]+’,1,level)) result, pos
from data
connect by level <= cnt
order by pos, result

Note:

– ltrim was used to remove any blanks at the beginning of the string.

davidmarcos@devdb > with data
2  as
3  (
4  select description,
5  pos,
6  (length(description)-length(replace(description,’;’,”)))+1 cnt
7  from mytest
8  where description is not null
9  )
10  select distinct ltrim(regexp_substr(description,'[^;]+’,1,level)) result, pos
11  from data
12  connect by level <= cnt
13  order by pos, result
14  /

RESULT                                                  POS
————————————— ———-
value1                                                      1
value2                                                      1
value3                                                      1
value4                                                      2
value5                                                      2
value6                                                      3
value10                                                    4
value11                                                    4
value7                                                      4
value8                                                      4
value9                                                      4

11 rows selected.

Elapsed: 00:00:00.37

So far, so good. The only missing detail is to order the columns in a proper manner. I decided to order by level instead of description and hide it from the user (he is not really interested on the level, but on having the information ordered in the same way it was inserted on the column):

davidmarcos@devdb> with data

2      as
3      (
4      select description,
5             pos,
6             (length(description)-length(replace(description,’;’,”)))+1 cnt
7        from mytest
8         where description is not null
9  ) select result, pos
10  from (
11      select distinct ltrim(regexp_substr(description,'[^;]+’,1,level)) result, pos, level
12        from data
13      connect by level <= cnt
14        order by pos, level
15  )
16  /

RESULT                                                    POS
—————————————- ———-
value1                                                      1
value2                                                      1
value3                                                      1
value4                                                      2
value5                                                      2
value6                                                      3
value7                                                      4
value8                                                      4
value9                                                      4
value10                                                    4
value11                                                    4

11 rows selected.

Elapsed: 00:00:00.17

Advertisements

Comments»

1. georgeh12 - 23 February 2012

You have the best solution to this problem! Simple and elegant, employing a Connect By function. I have also used the length function as a crude counting tool, very nice. The only critique I have is the ltrim function seems unnecessary.

Great work!
Thanks


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: