Skip to main content

Oracle Text Index -- Real time implementation

Problem:

Create a context index on a table which has around 100 million records adds upto a size of 120GB. Index creation takes more than 20 days to complete. Along with 100 million records there are around 50 million more records which will be added to this table which makes the table grow to a size of 180GB and with this data index creation is expected to take even more time.

Following steps were being used to create the index.

set timing on time on

exec Ctx_Ddl.Create_Preference('SCB', 'BASIC_WORDLIST');

exec ctx_ddl.set_attribute('SCB', 'wildcard_maxterms',1 5000) ;

exec ctx_ddl.set_attribute('SCB', 'substring_index', 'TRUE') ;

execute CTXSYS.CTX_ADM.SET_PARAMETER ('LOG_DIRECTORY','/oat_lg/rrotmedb/archive/ctx/');

execute CTXSYS.CTX_OUTPUT.START_LOG('g_SCB_IDX_gmis.LOG');

exec ctx_output.add_event(CTX_OUTPUT.EVENT_INDEX_PRINT_ROWID);

drop index NORKOM56.SCB_TRANS_IDX force;

CREATE INDEX NORKOM56.scb_trans_idx ON NORKOM56.scb_all_transactions

(CONCAT_SEARCH_INFO)

INDEXTYPE IS CTXSYS.CONTEXT

PARAMETERS('Sync (on commit) MEMORY 1073741823 wordlist SCBWCP’);

Solution:

We have made changes to few db parameters and also added few preferences and removed a preference from above list. Steps are as below,

1. Add the following db parameters to pfile,

db_writer_processes=4
_log_parallelism_max=6
_log_parallelism=4
log_buffer=26214400
sort_area_zise=68157440

Bounce the database.

2. Create Preferences as below,

a. EXEC Ctx_Ddl.Create_Preference('USE', 'BASIC_WORDLIST');
EXEC ctx_ddl.set_attribute('USE', 'wildcard_maxterms',15000) ;


b. EXEC ctx_ddl.drop_preference('SCB_LEXER');
EXEC ctx_ddl.create_preference('SCB_LEXER','basic_lexer');
EXEC ctx_ddl.set_attribute('SCB_LEXER','printjoins','-.,&/');


c. EXEC ctx_ddl.create_preference('dimp_USE', 'BASIC_STORAGE');
EXEC ctx_ddl.set_attribute('dimp_USE', 'I_TABLE_CLAUSE','tablespace TME_SGHK_CM_DATA01 STORAGE (INITIAL 10M)');
EXEC ctx_ddl.set_attribute('dimp_USE', 'K_TABLE_CLAUSE','tablespace TME_SGHK_CM_DATA01 STORAGE (INITIAL 10M)');
EXEC ctx_ddl.set_attribute('dimp_USE', 'N_TABLE_CLAUSE','tablespace TME_SGHK_CM_DATA01 STORAGE (INITIAL 10M)');
EXEC ctx_ddl.set_attribute('dimp_USE', 'I_INDEX_CLAUSE','tablespace TME_SGHK_CM_DATA01 STORAGE (INITIAL 10M)');


3. Enable log (Optional) to monitor index creation. Disabling this will give even more performance improvement.

EXEC CTXSYS.CTX_ADM.SET_PARAMETER ('LOG_DIRECTORY','/oat_lg/rrotmedb/archive/mithun/');
EXEC CTXSYS.CTX_OUTPUT.START_LOG('g_SCB_IDX_mYth1.LOG');
EXEC CTX_OUTPUT.ADD_EVENT(CTX_OUTPUT.EVENT_INDEX_PRINT_ROWID);

4. Create index.

CREATE INDEX SCB_TRANS_IDX ON NORKOM56.transactions
(CONCAT_SEARCH_INFO)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('Sync (on commit) WORDLIST USE STORAGE dimp_USE STOPLIST CTXSYS.EMPTY_STOPLIST LEXER SCB_LEXER');



As already explained and demonstrated to your team, There were lots of wait on redo log creation and dbwrite so have increased the parameters on db. You can continue to use this to get better performance not only for index creation but for your day to day processing. Please test these parameters throughly if you want to continue using these parameters in production.

We have removed preference setting substring_index while index creations as this adds up to the bottleneck of any DML operation on the table. Also have introduce and empty stop list, by default index creation was using oracle stop list which creates unnecessary bottleneck while index creation. Also as per your new requirement i have added LEXER properties as well.


With all the changes, index creation for 100 million rows now takes around 10 hours and on 150 million records it takes around 18 hours. These have been tested on 2 test instances which have equivalent hardware resources as production 8 dual core CPU's and 32 GB RAM.

I was able to reduce the time from >20 days to less than 18 hours with 150 million records.

This is still an ongoing issue with search performance, will be posting details about Oracle Text and other performance issues related to Oracle Text shortly.


Comments

Anonymous said…
"Create a context index on a table which has around 100 million records adds upto a size of 120GB. Index creation takes more than 20 days to complete"

Are you kidding?

Anyway the information in your site
is useful. Keep up the good work.

Regards,
Boris
Mithun Ashok said…
Hi Boris,

Truth is greater than fiction, indeed it was taking 20 days to complete rather it never reached completion and I am not kidding.

And thanks for your comments.
Mithun

Popular posts from this blog

SQL Interview Questions on Subqueries

SUB Queries:
1. List the employees working in research department 2. List employees who are located in New York and Chicago
3. Display the department name in which ANALYSTS are working
4. Display employees who are reporting to JONES
5. Display all the employees who are reporting to Jones Manager
6. Display all the managers in SALES and ACCOUNTING department
7. Display all the employee names in Research and Sales Department who are having at least 1 person reporting to them
8. Display all employees who do not have any reportees
9. List employees who are having at least 2 reporting
10. List the department names which are having more than 5 employees
11. List department name having at-least 3 salesman
12. List employees from research and accounting having at-least 2 reporting
13. Display second max salary
14. Display 4th max salary
15. Display 5th max salary  -- Answer for nth Max Salary
Co-Related Subqueries:
16. Write a query to get 4th max salary from EMP table
17. Write a query to get 2nd…

Basics of RDBMS

Data
Small set of information becomes data, this set of information helps make decision. Data is always some useful information.


Database
Place where you store the data. Database represents some aspect of the real world called "miniworld". A database is designed, built and populated with data for a specific purpose. It has intended group of users and some preconceived applications in which these users are interested.

In other words, a database has some source from which data is derived, some degree of interaction with events in the real world and an audience that is actively interested in the contents of the database.

Database can also be defined as collection of one or more tables.

Ex: Mobile, human brain etc



DBMS (Database Management System)
Is a program that stores retrieves and modifies data in the database on request.

Study of different techniques of design, development and maintenance of the database

Types of DBMS
These types are based upon their management of database s…

Answers for SQL Functions

1. SQL> SELECT empno, ename FROM emp WHERE Length(ename) = 4;

2. SQL> SELECT empno, ename, job FROM emp where Length(job)=7;

3. SQL> SELECT Length('qspiders') - Length(replace('qspiders','s','')) FROM dual;

4. SQL>  SELECT empno, ename, job FROM emp WHERE Instr(job,'MAN') >0;

5. SQL> SELECT empno, ename, job FROM emp WHERE Instr(job, 'MAN') =1;

6. SQL> SELECT empno, ename, job FROM emp WHERE (Length(ename) - Length(Replace(ename, 'L',''))) = 1;

7. SQL> SELECT * FROM dept WHERE Instr(dname,'O') > 0;

8. SQL> SELECT Concat(ename,' working as a ') || Concat(job, ' earns ') || Concat(sal, '  in ') || Conc
at('dept ',deptno) AS text from emp;

OR

SQL> SELECT Concat(Concat(Concat(Concat(Concat(Concat(Concat(ename,' working as a '), job),' earns '), sal),'  in '),'dept '), deptno) AS text FROM emp;

9. SQL> SELECT empno, ename…