Select distinct doesn't return correct row count

By: Borland Staff

Abstract: Using a plan or creating an index fix the problem

Problem:
Select distinct in version 5.x doesn't return the correct row count.
Here is an example demonstrating the problem using employee.gdb:

connect "c:tempdistinctemployee.gdb" user "sysdba" password 
"masterkey";
set count;
set plan;
/*  This is the select statement that returns duplicate rows */

select 
  Customer
 from
  sales, customer
 where
  sales.cust_no=customer.cust_no and total_value   10000;

PLAN JOIN (CUSTOMER NATURAL,SALES INDEX (RDB$FOREIGN25))
CUSTOMER                  
========================= 

Signature Design          
Signature Design          
Signature Design          
Dallas Technologies       
Dallas Technologies       
Buttle, Griffith and Co.  
Buttle, Griffith and Co.  
Central Bank              
DataServe International   
Anini Vacation Rentals    
MPM Corporation           
Dynamic Intelligence Corp 
3D-Pad Corp.              

Records affected: 13/* 
======================================================*/
/*This is the select distinct statement that returns incorrect row 
count. 9 rows should be returned but 15 are returned instead.*/

select DISTINCT
  Customer
 from
  sales, customer
 where
  sales.cust_no=customer.cust_no and total_value   10000;

PLAN JOIN (CUSTOMER ORDER CUSTNAMEX,SALES INDEX (RDB$FOREIGN25))
CUSTOMER                  
========================= 

3D-Pad Corp.              
Anini Vacation Rentals    
Buttle, Griffith and Co.  
Central Bank              
DT Systems, LTD.          
Dallas Technologies       
DataServe International   
Dynamic Intelligence Corp 
Dyno Consulting           
GeoTech Inc.              
Lorenzi Export, Ltd.      
MPM Corporation           
Max                       
Mrs. Beauvais             
Signature Design          

Records affected: 15

Solution:
This is a bug in version 5.x.  

Try either one of the following suggestions as workaround for the above 
example:

 1)  Add index on the sales table.  Something like this:
 
 create index myindex on sales(cust_no);
 commit;
 
 This forces our optimizer to use a different plan thus return correct
output.
 
 2)  Force a differnt plan to the select query.  Something like this:
 
 select DISTINCT
   Customer
  from
   sales, customer
  where
   sales.cust_no=customer.cust_no and total_value   10000
  PLAN SORT (JOIN (SALES NATURAL,CUSTOMER INDEX (RDB$PRIMARY22)));

This is fixed in InterBase 6.

Server Response from: ETNASC02