Monday, 26 March 2007

Flatten Out a Heirarchy using SYS_CONNECT_BY_PATH

In a recent application, we needed to model the Organisational Hierarchy which at its most complex ran to 7 levels deep. This was achieved using a self referencing foreign key (Pigs Ear) similar to that of the EMP table in the Scott schema. In essence, it simply stores the parent / child relationship for each entry in the Hierarchy.

This approach serviced the application very well in that a simply tree walk (CONNECT BY PRIOR) allowed us to construct the Hierarchical tree and bounce our requests off that.

During the production of the Discoverer End User Layer, it became evident that this Hierarchy needed to be flattened out (un-normalized) in order for it to reported on. This is because Discoverer (or any other BI product) does not support the CONNECT BY and START WITH clause. Discoverer needs to know how many levels exist within a Hierarchy and that every thread in the Hierarchy has the same number of levels in order to build a folder structure that can be reported on.

After a bit of investigation, I happened upon the: SYS_CONNECT_BY_PATH new in 9i

This can be used to construct the "Directory Path" for all values in the EMP table i.e. we can construct the reporting path for all employees in the EMP table

For Example:

SELECT empno, ename, job, SYS_CONNECT_BY_PATH (ename, '/') || '/' chain
FROM emp
START WITH mgr IS NULL
CONNECT BY PRIOR empno = mgr;


EMPNO ENAME JOB CHAIN
----- ------ ----- ---------------------
7839 KING PRESIDENT /KING/
7566 JONES MANAGER /KING/JONES/
7788 SCOTT ANALYST /KING/JONES/SCOTT/
7876 ADAMS CLERK /KING/JONES/SCOTT/ADAMS/
7902 FORD ANALYST /KING/JONES/FORD/
7369 SMITH CLERK /KING/JONES/FORD/SMITH/
7698 BLAKE MANAGER /KING/BLAKE/
7499 ALLEN SALESMAN /KING/BLAKE/ALLEN/
7521 WARD SALESMAN /KING/BLAKE/WARD/
7654 MARTIN SALESMAN /KING/BLAKE/MARTIN/
7844 TURNER SALESMAN /KING/BLAKE/TURNER/
7900 JAMES CLERK /KING/BLAKE/JAMES/
7782 CLARK MANAGER /KING/CLARK/
7934 MILLER CLERK /KING/CLARK/MILLER/

14 rows selected.



From here, we can simply parse the CHAIN string to work out what our Hierarchy would look like to any number of levels. The following function can be used to construct out flat Hierarchy

CREATE OR REPLACE FUNCTION parse_string (
pv_str_i IN VARCHAR2 DEFAULT NULL,
pv_first_occ IN NUMBER DEFAULT NULL,
pv_last_occ IN NUMBER DEFAULT NULL
)
RETURN VARCHAR2
IS
lv_retval VARCHAR2 (4000);
lv_str VARCHAR2 (4000);
lv_tester VARCHAR2 (4000);
lv_substr_pos1 NUMBER;
lv_substr_pos2 NUMBER;
BEGIN
SELECT INSTR (pv_str_i, '/', 1, pv_first_occ)
INTO lv_substr_pos1
FROM DUAL;

SELECT INSTR (pv_str_i, '/', 1, pv_last_occ)
INTO lv_substr_pos2
FROM DUAL;

lv_tester :=
SUBSTR (pv_str_i, lv_substr_pos1 + 1,
(lv_substr_pos2 - lv_substr_pos1));

lv_str := REPLACE (lv_tester, '/', NULL);

RETURN lv_str;
END parse_string;
/

If we wanted to then create a flat hierarchy to 3 levels deep, we can simply then write a query like follows:

SELECT chain,
NVL (parse_string (chain, 1, 2), '<- ' || ename) level1,
NVL (parse_string (chain, 2, 3), '<- ' || ename) level2,
NVL (parse_string (chain, 3, 4), '<- ' || ename) level3
FROM (SELECT empno, ename, job,
SYS_CONNECT_BY_PATH (ename, '/') || '/' chain
FROM emp
START WITH mgr IS NULL
CONNECT BY PRIOR empno = mgr);

CHAIN LEV1 LEV2 LEV3
---------------------- ---- ---- ----
/KING/ KING <- KING <- KING
/KING/JONES/ KING JONES <- JONES
/KING/JONES/SCOTT/ KING JONES SCOTT
/KING/JONES/SCOTT/ADAMS/ KING JONES SCOTT
/KING/JONES/FORD/ KING JONES FORD
/KING/JONES/FORD/SMITH/ KING JONES FORD
/KING/BLAKE/ KING BLAKE <- BLAKE
/KING/BLAKE/ALLEN/ KING BLAKE ALLEN
/KING/BLAKE/WARD/ KING BLAKE WARD
/KING/BLAKE/MARTIN/ KING BLAKE MARTIN
/KING/BLAKE/TURNER/ KING BLAKE TURNER
/KING/BLAKE/JAMES/ KING BLAKE JAMES
/KING/CLARK/ KING CLARK <- CLARK
/KING/CLARK/MILLER/ KING CLARK MILLER

14 rows selected.


And now we have a table that can be used to report on with BI products such as Discoverer

4 comments:

Anonymous said...

Thanks for this, it was very helpful!!

Anonymous said...

Just hunting for a similar solution myself and happened on the code. Gives me some good ideas so thanks..

That said I thought I'd leave you a performance related note about the two queries inside your function:

SELECT INSTR (pv_str_i, '/', 1, pv_first_occ)
INTO lv_substr_pos1
FROM DUAL;

SELECT INSTR (pv_str_i, '/', 1, pv_last_occ)
INTO lv_substr_pos2
FROM DUAL;

Using "select from dual" to reference INSTR is not really necessary as INSTR is a native PL/SQL function and can be called directly within the block.

This will save two parses for each call to the function. Not much if you're running it on EMP but quite a bit if you're referencing large tables and calling it multiple times per record.. :-)

(Plus it's a good habit not to select from dual unless you have to :-> )

Cheers,
Steve.

Duncan Mein said...

Thanks for the comments Steve. I see what you are saying and your are correct. This example was knocked together fairly quickly so I don't doubt that there is room for improvement. I will get it changed once I get a couple of mins.

Regards

Duncan

Kevin Woodard said...

Fantastic solution to a problem that I have been using PhP to resolve for ages.