Deleting Duplicate Rows in Oracle

by Rakesh on January 16, 2013

p>Deleting duplicate rows from tables is one of the common task to oracle developers. The data get duplicate because of missing primary/unique key on the table or batch file getting loaded multiple times. Here I have tried to summarize different ways of deleting this duplicated data. Please note that this is not an extensive list of all available methods but the ones I was able to figure out.

CREATE TABLE tbl_test(
        SER_NO NUMBER,
        FST_NM VARCHAR2(30),
        DEPTID NUMBER,
        CMNT   VARCHAR2(30));

INSERT INTO tbl_test VALUES(1, 'aaaaa', 2004, 'xxx');
INSERT INTO tbl_test VALUES(2, 'bbbbb', 2005, 'yyy');
INSERT INTO tbl_test VALUES(1, 'aaaaa', 2004, 'xxx');
INSERT INTO tbl_test VALUES(1, 'aaaaa', 2004, 'xxx');
INSERT INTO tbl_test VALUES(3, 'ccccc', 2005, 'zzz');
INSERT INTO tbl_test VALUES(2, 'bbbbb', 2005, 'yyy');

1. Using MIN(rowid) : The most common method of removing duplicate rows.

DELETE FROM tbl_test
           WHERE ROWID NOT IN (SELECT   MIN (ROWID)
                                          FROM tbl_test
                                    GROUP BY ser_no, fst_nm, deptid, cmnt);

Comment: This method will take hours & hours if the many record in table (records in million). 2. Using MIN(rowid) & Join: More or less the same as first one

DELETE FROM tbl_test t
   WHERE t.ROWID NOT IN (SELECT MIN (b.ROWID)
                         FROM tbl_test b
                         WHERE b.ser_no = t.ser_no
                         AND b.fst_nm = t.fst_nm
                         AND b.deptid = t.deptid
                         AND b.cmnt   = t.cmnt);

3. Using Subquery: This is an interesting one

DELETE FROM tbl_test a
   WHERE (a.ser_no, a.fst_nm, a.deptid, a.cmnt) IN (SELECT b.ser_no, b.fst_nm, b.deptid, b.cmnt
                                                   FROM tbl_test b
                                                   WHERE a.ser_no = b.ser_no
                                                     AND a.fst_nm = b.fst_nm
                                                     AND a.deptid = b.deptid
                                                     AND a.cmnt   = b.cmnt
                                                     AND a.ROWID  > b.ROWID);

Comment: Will work but for large tables, this is not efficient. 4. Using Analytic Fucntions:

DELETE FROM tbl_test
   WHERE ROWID IN (SELECT rid
         FROM (SELECT ROWID rid,
              ROW_NUMBER () OVER (PARTITION BY ser_no, fst_nm, deptid, cmnt ORDER BY ROWID) rn
              FROM tbl_test)
              WHERE rn <> 1);

Comments: This is by far one of the best solutions if the table is really really large. Using the invaluable power of Analytics. Enjoy!!!!

zp8497586rq

Comments on this entry are closed.

Previous post:

Next post:

Güncel Haberler