Problem:
Select distinct in version 5.x doesn't return the correct row count.
Here is an example demonstrating the problem using employee.gdb:
connect "c:tempdistinctemployee.gdb" user "sysdba" password
"masterkey";
set count;
set plan;
/* This is the select statement that returns duplicate rows */
select
Customer
from
sales, customer
where
sales.cust_no=customer.cust_no and total_value 10000;
PLAN JOIN (CUSTOMER NATURAL,SALES INDEX (RDB$FOREIGN25))
CUSTOMER
=========================
Signature Design
Signature Design
Signature Design
Dallas Technologies
Dallas Technologies
Buttle, Griffith and Co.
Buttle, Griffith and Co.
Central Bank
DataServe International
Anini Vacation Rentals
MPM Corporation
Dynamic Intelligence Corp
3D-Pad Corp.
Records affected: 13/*
======================================================*/
/*This is the select distinct statement that returns incorrect row
count. 9 rows should be returned but 15 are returned instead.*/
select DISTINCT
Customer
from
sales, customer
where
sales.cust_no=customer.cust_no and total_value 10000;
PLAN JOIN (CUSTOMER ORDER CUSTNAMEX,SALES INDEX (RDB$FOREIGN25))
CUSTOMER
=========================
3D-Pad Corp.
Anini Vacation Rentals
Buttle, Griffith and Co.
Central Bank
DT Systems, LTD.
Dallas Technologies
DataServe International
Dynamic Intelligence Corp
Dyno Consulting
GeoTech Inc.
Lorenzi Export, Ltd.
MPM Corporation
Max
Mrs. Beauvais
Signature Design
Records affected: 15
Solution:
This is a bug in version 5.x.
Try either one of the following suggestions as workaround for the above
example:
1) Add index on the sales table. Something like this:
create index myindex on sales(cust_no);
commit;
This forces our optimizer to use a different plan thus return correct
output.
2) Force a differnt plan to the select query. Something like this:
select DISTINCT
Customer
from
sales, customer
where
sales.cust_no=customer.cust_no and total_value 10000
PLAN SORT (JOIN (SALES NATURAL,CUSTOMER INDEX (RDB$PRIMARY22)));
This is fixed in InterBase 6.