Problem:
Create a context index on a table which has around 100 million records adds upto a size of 120GB. Index creation takes more than 20 days to complete. Along with 100 million records there are around 50 million more records which will be added to this table which makes the table grow to a size of 180GB and with this data index creation is expected to take even more time.
Following steps were being used to create the index.
Solution:
We have made changes to few db parameters and also added few preferences and removed a preference from above list. Steps are as below,
1. Add the following db parameters to pfile,
db_writer_processes=4
_log_parallelism_max=6
_log_parallelism=4
log_buffer=26214400
sort_area_zise=68157440
Bounce the database.
2. Create Preferences as below,
a. EXEC Ctx_Ddl.Create_Preference('USE', 'BASIC_WORDLIST');
EXEC ctx_ddl.set_attribute('USE', 'wildcard_maxterms',15000) ;
b. EXEC ctx_ddl.drop_preference('SCB_LEXER');
EXEC ctx_ddl.create_preference('SCB_LEXER','basic_lexer');
EXEC ctx_ddl.set_attribute('SCB_LEXER','printjoins','-.,&/');
c. EXEC ctx_ddl.create_preference('dimp_USE', 'BASIC_STORAGE');
EXEC ctx_ddl.set_attribute('dimp_USE', 'I_TABLE_CLAUSE','tablespace TME_SGHK_CM_DATA01 STORAGE (INITIAL 10M)');
EXEC ctx_ddl.set_attribute('dimp_USE', 'K_TABLE_CLAUSE','tablespace TME_SGHK_CM_DATA01 STORAGE (INITIAL 10M)');
EXEC ctx_ddl.set_attribute('dimp_USE', 'N_TABLE_CLAUSE','tablespace TME_SGHK_CM_DATA01 STORAGE (INITIAL 10M)');
EXEC ctx_ddl.set_attribute('dimp_USE', 'I_INDEX_CLAUSE','tablespace TME_SGHK_CM_DATA01 STORAGE (INITIAL 10M)');
3. Enable log (Optional) to monitor index creation. Disabling this will give even more performance improvement.
EXEC CTXSYS.CTX_ADM.SET_PARAMETER ('LOG_DIRECTORY','/oat_lg/rrotmedb/archive/mithun/');
EXEC CTXSYS.CTX_OUTPUT.START_LOG('g_SCB_IDX_mYth1.LOG');
EXEC CTX_OUTPUT.ADD_EVENT(CTX_OUTPUT.EVENT_INDEX_PRINT_ROWID);
4. Create index.
CREATE INDEX SCB_TRANS_IDX ON NORKOM56.transactions
(CONCAT_SEARCH_INFO)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('Sync (on commit) WORDLIST USE STORAGE dimp_USE STOPLIST CTXSYS.EMPTY_STOPLIST LEXER SCB_LEXER');
As already explained and demonstrated to your team, There were lots of wait on redo log creation and dbwrite so have increased the parameters on db. You can continue to use this to get better performance not only for index creation but for your day to day processing. Please test these parameters throughly if you want to continue using these parameters in production.
We have removed preference setting substring_index while index creations as this adds up to the bottleneck of any DML operation on the table. Also have introduce and empty stop list, by default index creation was using oracle stop list which creates unnecessary bottleneck while index creation. Also as per your new requirement i have added LEXER properties as well.
With all the changes, index creation for 100 million rows now takes around 10 hours and on 150 million records it takes around 18 hours. These have been tested on 2 test instances which have equivalent hardware resources as production 8 dual core CPU's and 32 GB RAM.
I was able to reduce the time from >20 days to less than 18 hours with 150 million records.
This is still an ongoing issue with search performance, will be posting details about Oracle Text and other performance issues related to Oracle Text shortly.
Create a context index on a table which has around 100 million records adds upto a size of 120GB. Index creation takes more than 20 days to complete. Along with 100 million records there are around 50 million more records which will be added to this table which makes the table grow to a size of 180GB and with this data index creation is expected to take even more time.
Following steps were being used to create the index.
set timing on time on
exec Ctx_Ddl.Create_Preference('SCB', 'BASIC_WORDLIST');
exec ctx_ddl.set_attribute('SCB', 'wildcard_maxterms',1 5000) ;
exec ctx_ddl.set_attribute('SCB', 'substring_index', 'TRUE') ;
execute CTXSYS.CTX_ADM.SET_PARAMETER ('LOG_DIRECTORY','/oat_lg/rrotmedb/archive/ctx/');
execute CTXSYS.CTX_OUTPUT.START_LOG('g_SCB_IDX_gmis.LOG');
exec ctx_output.add_event(CTX_OUTPUT.EVENT_INDEX_PRINT_ROWID);
drop index NORKOM56.SCB_TRANS_IDX force;
CREATE INDEX NORKOM56.scb_trans_idx ON NORKOM56.scb_all_transactions
(CONCAT_SEARCH_INFO)
INDEXTYPE IS CTXSYS.CONTEXT
Solution:
We have made changes to few db parameters and also added few preferences and removed a preference from above list. Steps are as below,
1. Add the following db parameters to pfile,
db_writer_processes=4
_log_parallelism_max=6
_log_parallelism=4
log_buffer=26214400
sort_area_zise=68157440
Bounce the database.
2. Create Preferences as below,
a. EXEC Ctx_Ddl.Create_Preference('USE', 'BASIC_WORDLIST');
EXEC ctx_ddl.set_attribute('USE', 'wildcard_maxterms',15000) ;
b. EXEC ctx_ddl.drop_preference('SCB_LEXER');
EXEC ctx_ddl.create_preference('SCB_LEXER','basic_lexer');
EXEC ctx_ddl.set_attribute('SCB_LEXER','printjoins','-.,&/');
c. EXEC ctx_ddl.create_preference('dimp_USE', 'BASIC_STORAGE');
EXEC ctx_ddl.set_attribute('dimp_USE', 'I_TABLE_CLAUSE','tablespace TME_SGHK_CM_DATA01 STORAGE (INITIAL 10M)');
EXEC ctx_ddl.set_attribute('dimp_USE', 'K_TABLE_CLAUSE','tablespace TME_SGHK_CM_DATA01 STORAGE (INITIAL 10M)');
EXEC ctx_ddl.set_attribute('dimp_USE', 'N_TABLE_CLAUSE','tablespace TME_SGHK_CM_DATA01 STORAGE (INITIAL 10M)');
EXEC ctx_ddl.set_attribute('dimp_USE', 'I_INDEX_CLAUSE','tablespace TME_SGHK_CM_DATA01 STORAGE (INITIAL 10M)');
3. Enable log (Optional) to monitor index creation. Disabling this will give even more performance improvement.
EXEC CTXSYS.CTX_ADM.SET_PARAMETER ('LOG_DIRECTORY','/oat_lg/rrotmedb/archive/mithun/');
EXEC CTXSYS.CTX_OUTPUT.START_LOG('g_SCB_IDX_mYth1.LOG');
EXEC CTX_OUTPUT.ADD_EVENT(CTX_OUTPUT.EVENT_INDEX_PRINT_ROWID);
4. Create index.
CREATE INDEX SCB_TRANS_IDX ON NORKOM56.transactions
(CONCAT_SEARCH_INFO)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('Sync (on commit) WORDLIST USE STORAGE dimp_USE STOPLIST CTXSYS.EMPTY_STOPLIST LEXER SCB_LEXER');
As already explained and demonstrated to your team, There were lots of wait on redo log creation and dbwrite so have increased the parameters on db. You can continue to use this to get better performance not only for index creation but for your day to day processing. Please test these parameters throughly if you want to continue using these parameters in production.
We have removed preference setting substring_index while index creations as this adds up to the bottleneck of any DML operation on the table. Also have introduce and empty stop list, by default index creation was using oracle stop list which creates unnecessary bottleneck while index creation. Also as per your new requirement i have added LEXER properties as well.
With all the changes, index creation for 100 million rows now takes around 10 hours and on 150 million records it takes around 18 hours. These have been tested on 2 test instances which have equivalent hardware resources as production 8 dual core CPU's and 32 GB RAM.
I was able to reduce the time from >20 days to less than 18 hours with 150 million records.
This is still an ongoing issue with search performance, will be posting details about Oracle Text and other performance issues related to Oracle Text shortly.
Comments
Are you kidding?
Anyway the information in your site
is useful. Keep up the good work.
Regards,
Boris
Truth is greater than fiction, indeed it was taking 20 days to complete rather it never reached completion and I am not kidding.
And thanks for your comments.
Mithun