Skip to main content

Text Index -- Printjoins

I am trying to create a test case for one of my clients having issues with text index.

Text index comes with lot of options for language, lexer etc. It is not as simple as creating a normal index. Through my experience i have seen multiple issues with text indexes and one of which is with printjoins.

Details of Text index and its options can be found at,

http://download.oracle.com/docs/cd/B19306_01/text.102/b14218.pdf

and look at my other post on text index,

http://mithunashok.blogspot.com/2009/10/oracle-text-index-real-time-experience.html

User complains that when they search with . character it does not consider . as character. Look at the example below(tested on 10.2.0.1).


create table mith( n number primary key, l varchar2(60));

insert into mith values( 1, 'nisha.mithun');
insert into mith values( 2, 'nisha mithun');
insert into mith values( 3, 'nisha mithun nisha mithun');
insert into mith values( 4, 'hello-world');
insert into mith values( 5, 'hello@world');

insert into mith values( 6, 'hello world. world');


create index ctx_mith on mith(l)
indextype is ctxsys.context;


select * from mith where contains( l, 'nisha.mithun') > 0;

N L
---------- ------------------------------------------------------------
1 nisha.mithun
2 nisha mithun
3 nisha mithun nisha mithun



Ideally it should have returned only nisha.mithun but it has returned all the occurences of nisha mithun. Lets look at the index descrition. You can use the following sql to describe an index.


SELECT CTX_REPORT.DESCRIBE_INDEX ('CTX_MITH1') FROM DUAL;

===========================================================================
INDEX DESCRIPTION
===========================================================================
index name: "TESTINDEX"."CTX_MITH"
index id: 1060
index type: context

base table: "TESTINDEX"."MITH"
primary key column: N
text column: L
text column type: VARCHAR2(60)

CTX_REPORT.DESCRIBE_INDEX('CTX_MITH')
--------------------------------------------------------------------------------
language column:
format column:
charset column:


===========================================================================
INDEX OBJECTS
===========================================================================
datastore: DIRECT_DATASTORE

filter: NULL_FILTER

CTX_REPORT.DESCRIBE_INDEX('CTX_MITH')
--------------------------------------------------------------------------------

section group: NULL_SECTION_GROUP

lexer: BASIC_LEXER

wordlist: BASIC_WORDLIST
stemmer: ENGLISH
fuzzy_match: GENERIC

stoplist: BASIC_STOPLIST
stop_word: Mr



Look at group lexer which is basic_lexer by default and worklist is fuzzy_match by default.

Now lets change few options and check if it returns correct search.



EXEC Ctx_Ddl.drop_preference('SSUSE');
EXEC Ctx_Ddl.Create_Preference('SSUSE', 'BASIC_WORDLIST');

EXEC ctx_ddl.drop_preference('SS_SCB_LEXER');
EXEC ctx_ddl.create_preference('SS_SCB_LEXER','basic_lexer');
EXEC ctx_ddl.set_attribute('SS_SCB_LEXER','printjoins','-.,&/');

CREATE INDEX ctx_mith1 ON mith (l) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS('WORDLIST SSUSE STOPLIST CTXSYS.EMPTY_STOPLIST LEXER SS_SCB_LEXER');


SQL> select * from mith where contains( l, 'nisha.mithun') > 0;

N L
---------- ------------------------------------------------------------
1 nisha.mithun



Now it returns result as expected. By default . is treated as an operator and not as a text element. If you want to use operators as text elements then use printjoins option with basix_lexer.


Now since this works, let me insert one more row,


SQL> insert into mith values( 7, 'nisha. mithun');

1 row created.

SQL> select * from mith where contains( l, 'nisha. mithun') > 0;

N L
---------- ------------------------------------------------------------
2 nisha mithun
3 nisha mithun nisha mithun good
7 nisha. mithun



Suprising!

Not suprising enough. Same rule applies even for whitespace as is for '.' operator. Look at the following query,


SQL> select * from mith where contains( l, 'nisha mithun') > 0;

N L
---------- ------------------------------------------------------------
2 nisha mithun
3 nisha mithun nisha mithun good
7 nisha. mithun
8 nisha mithun
9 nisha mithun

SQL> select * from mith where contains( l, 'nisha mithun') > 0;

N L
---------- ------------------------------------------------------------
2 nisha mithun
3 nisha mithun nisha mithun good
7 nisha. mithun
8 nisha mithun
9 nisha mithun



For this include ' ' in printjoins and see the result below.


EXEC ctx_ddl.drop_preference('SS_SCB_LEXER');
EXEC ctx_ddl.create_preference('SS_SCB_LEXER','basic_lexer');
EXEC ctx_ddl.set_attribute('SS_SCB_LEXER','printjoins','-.,&/ ');

drop index ctx_mith1;


CREATE INDEX ctx_mith1 ON mith (l) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS('WORDLIST SSUSE STOPLIST CTXSYS.EMPTY_STOPLIST LEXER SS_SCB_LEXER SYNC (ON COMMIT)');


SQL> select * from mith where contains( l, 'nisha.\ mithun') > 0;

N L
---------- ------------------------------------------------------------
7 nisha. mithun

Comments

Popular posts from this blog

SQL Interview Questions on Subqueries

SUB Queries:
1. List the employees working in research department 2. List employees who are located in New York and Chicago
3. Display the department name in which ANALYSTS are working
4. Display employees who are reporting to JONES
5. Display all the employees who are reporting to Jones Manager
6. Display all the managers in SALES and ACCOUNTING department
7. Display all the employee names in Research and Sales Department who are having at least 1 person reporting to them
8. Display all employees who do not have any reportees
9. List employees who are having at least 2 reporting
10. List the department names which are having more than 5 employees
11. List department name having at-least 3 salesman
12. List employees from research and accounting having at-least 2 reporting
13. Display second max salary
14. Display 4th max salary
15. Display 5th max salary  -- Answer for nth Max Salary
Co-Related Subqueries:
16. Write a query to get 4th max salary from EMP table
17. Write a query to get 2nd…

'Linux-x86_64 Error: 28: No space left on device' While trying to start the database -- Error

SQL> startup mount pfile='/tmp/initdlfasp12.ora'
ORA-27102: out of memory
Linux-x86_64 Error: 28: No space left on device


This as you can see is on Linux x86 with 64 bit processor. We got this error after we changed SGA on 10gR2 database. So was sure that this is something to do with the OS.

Parameters to check for this are shmall.

shmall is the total amount of shared memory, in pages, that the system can use at one time.

$cat /proc/sys/kernel/shmmax
53687091200

$ getconf PAGE_SIZE
4096

As per Oracle SHMALL should be set to the total amount of physical RAM divided by page size.

Our system has 64GB memory, so change kernel.shmall = 1024 * 1024 * 1024 * 64 / 4096 = 16777216


Once this value is calculated you can modify Linux system configuration file directly.

$ su - root
vi /etc/sysctl.conf file:
kernel.shmall=16777216

and

# sysctl -p

Once this is done the database was started without any problem.

Answers for SUB Queries

1. SQL> select empno, ename from emp where deptno=(select deptno from dept where dname='RESEARCH');


2. SQL> select empno, ename from emp where deptno in (select deptno from dept where loc in ('NEW YORK','CHICAGO'));

3. SQL> select dname from dept where deptno in ( select deptno from emp where job ='ANALYST');

4. SQL> select empno, ename, mgr from emp where mgr = (select empno from emp where ename='JONES');

5. SQL> select empno, ename, mgr from emp where mgr = (select mgr from emp where ename='JONES')

6. SQL> select empno, ename, job from emp where deptno in ( select deptno from dept where dname in ('SALES','ACCOUNTING'))

7. SQL> select empno, ename, job from emp where deptno in ( select deptno from dept where dname in ('SALES','RESEARCH')) and empno in (select mgr from emp)

8. SQL> select empno, ename from emp where empno not in ( select mgr from emp where mgr is not null)

9. select…