Tuesday, 14 June 2011

Native String Aggregation in 11gR2

A fairly recent requirement meant that we had to send a bulk email to all users of each department from within our APEX application. We have 5000 records in our users table and the last thing we wanted to do was send 5000 distinct emails (one email per user) for both performance and to be kind on the mail queue / server.

In essence, I wanted to to perform a type of string aggregation where I could group by department and produce a comma delimited sting of all email address of users within that department. With a firm understanding of the requirement, so began the hunt for a solution. Depending on what version of the database you are running, the desired result can be achieved in a couple of ways.

Firstly, the example objects.

CREATE TABLE app_user
(id NUMBER
,dept VARCHAR2 (255)
,username VARCHAR2(255)
,email VARCHAR2(255)
);

INSERT INTO app_user (id, dept, username, email)
VALUES (1,'IT','FRED','fred@mycompany.com');

INSERT INTO app_user (id, dept, username, email)
VALUES (2,'IT','JOE','joe@mycompany.com');

INSERT INTO app_user (id, dept, username, email)
VALUES (3,'SALES','GILL','gill@mycompany.com');

INSERT INTO app_user (id, dept, username, email)
VALUES (4,'HR','EMILY','emily@mycompany.com');

INSERT INTO app_user (id, dept, username, email)
VALUES (5,'HR','BILL','bill@mycompany.com');

INSERT INTO app_user (id, dept, username, email)
VALUES (6,'HR','GUS','gus@mycompany.com');

COMMIT;

If you are using 11gR2, you can expose the new LISTAGG function as follows to perform your string aggregation natively:

SELECT dept
,LISTAGG(email,',') WITHIN GROUP (ORDER BY dept) email_list
FROM app_user
GROUP BY dept;

DEPT EMAIL_LIST
-----------------------------------------------------------------
HR emily@mycompany.com,bill@mycompany.com,gus@mycompany.com
IT fred@mycompany.com,joe@mycompany.com
SALES gill@mycompany.com

If running 11g or earlier, you can achieve the same result using XMLAGG as follows:

SELECT au.dept
,LTRIM
(EXTRACT
(XMLAGG
(XMLELEMENT
("EMAIL",',' || email)),'/EMAIL/text()'), ','
) email_list
FROM app_user au
GROUP BY au.dept;

DEPT EMAIL_LIST
-----------------------------------------------------------------
HR emily@mycompany.com,bill@mycompany.com,gus@mycompany.com
IT fred@mycompany.com,joe@mycompany.com
SALES gill@mycompany.com

The introduction of native string aggregation into 11gR2 is a real bonus and a function that has already proved to have had huge utility within our applications.

Wednesday, 8 June 2011

A Right Pig's Ear of a Circular Reference

If you have ever used a self referencing table within Oracle to store hierarchical data (e.g. an organisations structure), you will have undoubtedly used CONNECT BY PRIOR to build your results tree. This is something we use on pretty much every project as the organisation is very hierarchy based.

Recently, the support cell sent the details of a recent call they received asking me to take a look. Looking down the call, I noticed that the following Oracle Error Message was logged:

"ORA-01436: CONNECT BY loop in user data"

A quick look at the explanation of -01436 and it was clear that there was a circular reference in the organisation table i.e. ORG_UNIT1 was the PARENT of ORG_UNIT2 and ORG_UNIT2 was the PARENT of ORG_UNIT1. In this example, both ORG_UNITS were the child and parent of each other. Clearly this was an issue which was quickly resolved by the addition of a application and server side validation to prevent this from re-occurring.

The outcome of this fix was a useful script that I keep to identify if there are any circular references within a self referencing table. The example below shows this script in action:
CREATE TABLE ORGANISATIONS (ORG_ID NUMBER
, PARENT_ORG_ID NUMBER
, NAME VARCHAR2(100));

INSERT INTO ORGANISATIONS VALUES (1, NULL, 'HQ');
INSERT INTO ORGANISATIONS VALUES (2, 1, 'SALES');
INSERT INTO ORGANISATIONS VALUES (3, 1, 'RESEARCH');
INSERT INTO ORGANISATIONS VALUES (4, 1, 'IT');
INSERT INTO ORGANISATIONS VALUES (5, 2, 'EUROPE');
INSERT INTO ORGANISATIONS VALUES (6, 2, 'ASIA');
INSERT INTO ORGANISATIONS VALUES (7, 2, 'AMERICAS');

COMMIT;

A quick tree walk query shows the visual representation of the hierarchy with no errors:
SELECT LPAD ('*', LEVEL, '*') || name tree
,LEVEL lev
FROM organisations
START WITH org_id = 1
CONNECT BY PRIOR org_id = parent_org_id;

TREE LEV
---------------------
*HQ 1
**SALES 2
***EUROPE 3
***ASIA 3
***AMERICAS 3
**RESEARCH 2
**IT 2

Now lets create a circular reference so that EUROPE is the parent of SALES:
UPDATE ORGANISATIONS SET parent_org_id = 5 WHERE NAME = 'SALES';
COMMIT;

Re-running the query from the very top of the tree completes but gives an incorrect
and incomplete result set:
TREE         LEV
---------------------
*HQ 1
**RESEARCH 2
**IT 2

If you running Oracle Database 9i and backwards, this Experts Exchange article provides a nice procedural solution and a quick mod to the PL/SQL gave me exactly the information I needed:
SET serveroutput on size 20000

DECLARE
l_n NUMBER;
BEGIN
FOR rec IN (SELECT org_id FROM organisations)
LOOP
BEGIN
SELECT COUNT ( * )
INTO l_n
FROM ( SELECT LPAD ('*', LEVEL, '*') || name tree
, LEVEL lev
FROM organisations
START WITH org_id = rec.org_id
CONNECT BY PRIOR org_id = parent_org_id);
EXCEPTION
WHEN OTHERS
THEN
IF SQLCODE = -1436
THEN
DBMS_OUTPUT.put_line
(
rec.org_id
|| ' is part of a Circular Reference'
);
END IF;
END;
END LOOP;
END;
/
As Buzz Killington pointed out in the comments section, Oracle Database 10g onwards introduces CONNECT BY NOCYCLE which will instruct Oracle to return rows even if it is involved in a self referencing loop. When used with the CONNECT_BY_ISCYCLE pseudocolumn, you can easily identify erroneous relationships via SQL without the need to switch to PL/SQL. An example of this can be seen by executing the following query:

WITH
my_data
AS
( SELECT ORG.PARENT_ORG_ID
,ORG.ORG_ID
,ORG.NAME org_name
FROM organisations org
)
SELECT
SYS_CONNECT_BY_PATH (org_name,'/') tree
,PARENT_ORG_ID
,ORG_ID
,CONNECT_BY_ISCYCLE err
FROM
my_data
CONNECT BY NOCYCLE PRIOR org_id = parent_org_id
ORDER BY 4 desc;

TREE PARENT_ORG_ID ORG_ID ERR
------------------------------------------------
/SALES/EUROPE 2 5 1
/EUROPE/SALES 5 2 1
/EUROPE 2 5 0
/EUROPE/SALES/ASIA 2 6 0
/EUROPE/SALES/AMERICAS 2 7 0
/ASIA 2 6 0
/AMERICAS 2 7 0
/SALES 5 2 0
/SALES/ASIA 2 6 0
/SALES/AMERICAS 2 7 0
/HQ 1 0
/HQ/RESEARCH 1 3 0
/HQ/IT 1 4 0
/RESEARCH 1 3 0
/IT 1 4 0

Any value > 0 in the ERR column indicates that the row is involved in a self referencing join. This is a much neater and more performant way to achieve to desired result. Thanks to Buzz for pointing out a much better way to original PL/SQL routine.

Pen Test Tool for APEX

Just a quick plug for a cool Penetration Test tool that we have been using on-site for a few months now. The application is called: Application Express Security Console and developed by a company called Recx Ltd

This can be used to identify areas of you APEX applications that are vulnerable to:
SQL Injection, XSS as well as inadequate access control etc. It kindly suggests ways in which the vulnerability can be addressed as well.

We have built the use of this into our formal release process now and has definitely proved value for money to organisation.