Skip to main content

Oracle Text Index -- Real time implementation

Problem:

Create a context index on a table which has around 100 million records adds upto a size of 120GB. Index creation takes more than 20 days to complete. Along with 100 million records there are around 50 million more records which will be added to this table which makes the table grow to a size of 180GB and with this data index creation is expected to take even more time.

Following steps were being used to create the index.

set timing on time on

exec Ctx_Ddl.Create_Preference('SCB', 'BASIC_WORDLIST');

exec ctx_ddl.set_attribute('SCB', 'wildcard_maxterms',1 5000) ;

exec ctx_ddl.set_attribute('SCB', 'substring_index', 'TRUE') ;

execute CTXSYS.CTX_ADM.SET_PARAMETER ('LOG_DIRECTORY','/oat_lg/rrotmedb/archive/ctx/');

execute CTXSYS.CTX_OUTPUT.START_LOG('g_SCB_IDX_gmis.LOG');

exec ctx_output.add_event(CTX_OUTPUT.EVENT_INDEX_PRINT_ROWID);

drop index NORKOM56.SCB_TRANS_IDX force;

CREATE INDEX NORKOM56.scb_trans_idx ON NORKOM56.scb_all_transactions

(CONCAT_SEARCH_INFO)

INDEXTYPE IS CTXSYS.CONTEXT

PARAMETERS('Sync (on commit) MEMORY 1073741823 wordlist SCBWCP’);

Solution:

We have made changes to few db parameters and also added few preferences and removed a preference from above list. Steps are as below,

1. Add the following db parameters to pfile,

db_writer_processes=4
_log_parallelism_max=6
_log_parallelism=4
log_buffer=26214400
sort_area_zise=68157440

Bounce the database.

2. Create Preferences as below,

a. EXEC Ctx_Ddl.Create_Preference('USE', 'BASIC_WORDLIST');
EXEC ctx_ddl.set_attribute('USE', 'wildcard_maxterms',15000) ;


b. EXEC ctx_ddl.drop_preference('SCB_LEXER');
EXEC ctx_ddl.create_preference('SCB_LEXER','basic_lexer');
EXEC ctx_ddl.set_attribute('SCB_LEXER','printjoins','-.,&/');


c. EXEC ctx_ddl.create_preference('dimp_USE', 'BASIC_STORAGE');
EXEC ctx_ddl.set_attribute('dimp_USE', 'I_TABLE_CLAUSE','tablespace TME_SGHK_CM_DATA01 STORAGE (INITIAL 10M)');
EXEC ctx_ddl.set_attribute('dimp_USE', 'K_TABLE_CLAUSE','tablespace TME_SGHK_CM_DATA01 STORAGE (INITIAL 10M)');
EXEC ctx_ddl.set_attribute('dimp_USE', 'N_TABLE_CLAUSE','tablespace TME_SGHK_CM_DATA01 STORAGE (INITIAL 10M)');
EXEC ctx_ddl.set_attribute('dimp_USE', 'I_INDEX_CLAUSE','tablespace TME_SGHK_CM_DATA01 STORAGE (INITIAL 10M)');


3. Enable log (Optional) to monitor index creation. Disabling this will give even more performance improvement.

EXEC CTXSYS.CTX_ADM.SET_PARAMETER ('LOG_DIRECTORY','/oat_lg/rrotmedb/archive/mithun/');
EXEC CTXSYS.CTX_OUTPUT.START_LOG('g_SCB_IDX_mYth1.LOG');
EXEC CTX_OUTPUT.ADD_EVENT(CTX_OUTPUT.EVENT_INDEX_PRINT_ROWID);

4. Create index.

CREATE INDEX SCB_TRANS_IDX ON NORKOM56.transactions
(CONCAT_SEARCH_INFO)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('Sync (on commit) WORDLIST USE STORAGE dimp_USE STOPLIST CTXSYS.EMPTY_STOPLIST LEXER SCB_LEXER');



As already explained and demonstrated to your team, There were lots of wait on redo log creation and dbwrite so have increased the parameters on db. You can continue to use this to get better performance not only for index creation but for your day to day processing. Please test these parameters throughly if you want to continue using these parameters in production.

We have removed preference setting substring_index while index creations as this adds up to the bottleneck of any DML operation on the table. Also have introduce and empty stop list, by default index creation was using oracle stop list which creates unnecessary bottleneck while index creation. Also as per your new requirement i have added LEXER properties as well.


With all the changes, index creation for 100 million rows now takes around 10 hours and on 150 million records it takes around 18 hours. These have been tested on 2 test instances which have equivalent hardware resources as production 8 dual core CPU's and 32 GB RAM.

I was able to reduce the time from >20 days to less than 18 hours with 150 million records.

This is still an ongoing issue with search performance, will be posting details about Oracle Text and other performance issues related to Oracle Text shortly.


Comments

Anonymous said…
"Create a context index on a table which has around 100 million records adds upto a size of 120GB. Index creation takes more than 20 days to complete"

Are you kidding?

Anyway the information in your site
is useful. Keep up the good work.

Regards,
Boris
Mithun Ashok said…
Hi Boris,

Truth is greater than fiction, indeed it was taking 20 days to complete rather it never reached completion and I am not kidding.

And thanks for your comments.
Mithun

Popular posts from this blog

SQL Interview Questions on Subqueries

SUB Queries:
1. List the employees working in research department 2. List employees who are located in New York and Chicago
3. Display the department name in which ANALYSTS are working
4. Display employees who are reporting to JONES
5. Display all the employees who are reporting to Jones Manager
6. Display all the managers in SALES and ACCOUNTING department
7. Display all the employee names in Research and Sales Department who are having at least 1 person reporting to them
8. Display all employees who do not have any reportees
9. List employees who are having at least 2 reporting
10. List the department names which are having more than 5 employees
11. List department name having at-least 3 salesman
12. List employees from research and accounting having at-least 2 reporting
13. Display second max salary
14. Display 4th max salary
15. Display 5th max salary  -- Answer for nth Max Salary
Co-Related Subqueries:
16. Write a query to get 4th max salary from EMP table
17. Write a query to get 2nd…

'Linux-x86_64 Error: 28: No space left on device' While trying to start the database -- Error

SQL> startup mount pfile='/tmp/initdlfasp12.ora'
ORA-27102: out of memory
Linux-x86_64 Error: 28: No space left on device


This as you can see is on Linux x86 with 64 bit processor. We got this error after we changed SGA on 10gR2 database. So was sure that this is something to do with the OS.

Parameters to check for this are shmall.

shmall is the total amount of shared memory, in pages, that the system can use at one time.

$cat /proc/sys/kernel/shmmax
53687091200

$ getconf PAGE_SIZE
4096

As per Oracle SHMALL should be set to the total amount of physical RAM divided by page size.

Our system has 64GB memory, so change kernel.shmall = 1024 * 1024 * 1024 * 64 / 4096 = 16777216


Once this value is calculated you can modify Linux system configuration file directly.

$ su - root
vi /etc/sysctl.conf file:
kernel.shmall=16777216

and

# sysctl -p

Once this is done the database was started without any problem.

Answers for SUB Queries

1. SQL> select empno, ename from emp where deptno=(select deptno from dept where dname='RESEARCH');


2. SQL> select empno, ename from emp where deptno in (select deptno from dept where loc in ('NEW YORK','CHICAGO'));

3. SQL> select dname from dept where deptno in ( select deptno from emp where job ='ANALYST');

4. SQL> select empno, ename, mgr from emp where mgr = (select empno from emp where ename='JONES');

5. SQL> select empno, ename, mgr from emp where mgr = (select mgr from emp where ename='JONES')

6. SQL> select empno, ename, job from emp where deptno in ( select deptno from dept where dname in ('SALES','ACCOUNTING'))

7. SQL> select empno, ename, job from emp where deptno in ( select deptno from dept where dname in ('SALES','RESEARCH')) and empno in (select mgr from emp)

8. SQL> select empno, ename from emp where empno not in ( select mgr from emp where mgr is not null)

9. select…