 |
This is the technical paper from a talk given at the 12th Annual Borland Developer's Conference
By Charlie Caro Borland Software Corp. |
Charlie Caro is a software engineer with Borland InterBase Research and Development. Prior to his 13 years of database development, Charlie developed high-performance file systems and operating systems for transaction-based environments. Charlie holds a bachelor's degree in engineering/computer science from Harvard University, Cambridge, Massachusetts. He has also studied in the master's program in computer science at Boston University.
Note: No portion of this paper may be distributed or reproduced by any means, or in any form, without the author's and/or Borland's prior written permission.
SQL Syntax Analysis
SQL Semantic Analysis
Storage Allocation Pass
Query Optimization
Pass
Security Checking
Appendix
Abstract
This paper describes a methodology for adding SQL functions to the InterBase
engine. SQL functions
can be scalar functions or aggregate functions. Scalar functions take
a list of scalar arguments and
evaluate to a scalar value. A scalar value belongs to a basic, atomic
datatype as defined by the SQL
standard. Aggregate functions reduce a set of scalar values to a single
scalar value. SQL scalar
expressions apply scalar operators to scalar operands to produce a
scalar value. Therefore, SQL
function references can be used to form arbitrary SQL scalar expressions.
There are a couple of different approaches to this outcome which involve
tradeoffs in performance,
footprint and engineering effort required to implement an SQL function.
The most fundamental of the
two approaches involves the definition of BLR (Binary Language Representation),
a low level, byte
coded relational algebra.
The other approach uses System Defined Functions or SDF's which are
built-in, preloaded functions
on top of the UDF (User Defined Function) call mechanism. This mechanism
is not currently supported
in the engine but a prototype approach is suggested in the paper.
Depending on the characteristics of the added SQL functionality, either
of the approaches may be
a viable choice. The author explains some of the criteria that should
be addressed to decide which
technique may be more suitable to the task at hand.
Overview of the SQL Compilation Process
Before either of the two techniques can be discussed, it's necessary to
have an understanding of the
process by which SQL statements are compiled into a form that is suitable
for execution by the
InterBase database engine. The compilation process involves a translation
of the source SQL
language into an equivalent request expressed in the target BLR language.
To that end, the following compilation steps need to be reviewed:
SQL Analysis phase:
-
Linear analysis of SQL statements (lexical)
-
Hierarchical analysis of the SQL statements (syntax or parsing)
-
Semantic analysis of SQL parse tree (correctness and annotation)
SQL Synthesis phase:
Code generation of BLR from annotated parse tree
BLR Compilation passes:
-
BLR recursive-descent parsing (generating the execution node tree)
-
Semantic execution tree pass (view substitution, resolving defaults
and computed fields)
-
Storage allocation pass (impure space reservation)
-
Query optimization pass (join optimization and matching streams to access
methods)
-
Security checking (authorization and access control)
SQL Analysis Phase
The SQL analysis phase is responsible for revealing the underlying grammatical
structure of
the passed in SQL statement. The heart of the process involves InterBase's
SQL parser
which is embodied in the module DSQL/PARSE.Y. The file is a notational
specification of
an SQL grammar which is input into the YACC compiler - a parser
generator tool. The
output of the YACC compiler is a C source code file which can parse
and generate a
parse tree for the input SQL statement.
SQL Lexical Analysis
The lexical analyzer is responsible for generating a stream of tokens
from the sequence of
input characters comprising the SQL statement. It performs a simple
left-to-right scan of the
input and groups characters which have a cohesive, collective meaning.
These groups of
characters are called tokens. Tokens include identifiers, keywords,
numbers, operators
and miscellaneous punctuation symbols.
The lexical analyzer performs other ancillary actions to simplify the
work of the parser.
Rather than complicate the grammar (and hence the parser) with the
complexity of white
space, the lexical analyzer silently strips unnecessary white space
and SQL comments
from the input character stream.
InterBase's lexical analyzer is built-in to its parser and implemented
as a simple co-routine
which is called on demand by the parser to generate the next token.
In this sense, the lexical
scanner is a passive participant.
SQL Syntax Analysis
As indicated above, the parser drives the analysis phase by calling
the lexical scanner when
it needs another token. Thus, the parser sees a partitioned token stream
as input that is very
much different from the input character sequence of the lexical scanner.
The parser's job is
to determine whether a parse tree can be derived from the token stream
given the underlying
grammar for the SQL language. If such a tree can be generated then
we can have some
assurance that the input statement has at least the superficial form
of a legal SQL statement.
The parser needs to employ more powerful, hierarchical methods than
the simple, linear
scanning of lexical analysis. Although tools exist to generate lexical
analyzers from regular
expression notation specifications, a close look at YYLEX() in DSQL/PARSE.Y
will show
that it is a trivial handcuffed implementation. On the other hand,
the parser employs more
rigorous, bottom-up, shift-reduce LALR (Look Ahead, Left Right) algorithms
to handle
the complexity of nested grammatical structures. Fortunately, YACC
hides the nasty details
in the grammar specifications.
The production rules of the grammar for the parser is specified using
a widely known notation
called BNF (Bakus-Naur Form). The basic structure of a production has
a non-terminal grammar
construct on the left side, a colon ":" and a string of non-terminal
and terminal symbols on the right
side of the production. The production rule is followed by pseudo-programming
action which
indicates how to construct the parse tree node for the language construct
defined by its
production. The parse tree node has a basic structure of a slot for
the operator implied by the
language construct and a slot for each child node implied by the righted
of the production rule.
The parser specification includes a section to denote keyword
tokens for the SQL language. These
are identifiers which have been reserved by the SQL language and which
may not be used by the
user to define identifiers for SQL objects such as tables or columns.
The final output of the parser is a parsed, syntax tree which is used
as input to the next phase of
semantic analysis. While the grammatical structure of the SQL statement
has be verified, there is
no guarantee that the objects referenced by the SQL statement correspond
to any of the elements
of the InterBase SQL database. Even if the references are used in a
legal way as defined by the
SQL language specification, are they employed meaningfully in the input
SQL statement.
SQL Semantic Analysis
The job of SQL semantic analysis is to insure that the SQL grammatical
constructs of the
statement are used correctly and to annotate the parse tree with information
that will be useful
for BLR code generation of the synthesis phase. There is no ideal notational
convention for
specifying this behavior so the rules and conventions are directly
embedded in the module
DSQL/PASS1. which forms the bulk of the semantic analysis code.
Some of the actions include:
-
Alias substitution which includes both column and table aliases. In the
case of the table
alias, it is necessary to create an association between the alias reference
and base table
object. This associative link annotates all parse tree nodes for which
there is an alias
node reference. This is used during code generation to generate BLR
for the base table.
-
View validation checks that any views used in an SQL statement actually
correspond to
a defined view in the SQL schema for the InterBase database.
-
Type checking validates that all operators are applied to operands that
have the correct
types as defined by the language. It also validates stored procedure
references and that
their arguments match in both type and number. Additionally, it guarantees
that procedure
call method is used appropriately for either execute of select usage.
-
Aggregate handling insures that column references used in an SQL statement
including
a GROUP BY clause are aggregate functions over the group or define
the columns
comprising the group.
-
Data descriptions need to annotate all expressions and field parse tree
nodes. This is
especially important for generating the BLR message declarations which
define all interaction
between data requests and the InterBase engine. The BLR message loosely
correspond
to what an SQL programmer would associate with the SELECT list references
in a
direct SELECT or as the basis for a CURSOR definition.
SQL Synthesis Phase
At this point, the annotated SQL parse tree has everything necessary to
generate an equivalent
translation to a BLR program. Recall that BLR is a byte-code language
for specifying the actions
of a relational statement to the InterBase engine.
BLR Code Generation
The general structure of a BLR program is something like the following:
input message declarations;
output message declarations;
input message receive;
for loop
record selection expression
output message send
assignment of value expressions
to output message parameters
end loop
terminal message send
From an SQL programmer's perspective, you can think of the "input message
declaration" as
the optional placeholders ("?") in a dynamic SQL statement. The "output
message declaration"
are the scalar values returned by an SQL SELECT.
The BLR request performs an "input message receive" which conceptually
receives the elements
transmitted from input XSQLDA variables.
The "for loop" corresponds to the SQL WHERE clause and other ancillary
clauses (ORDER/GROUP
BY/HAVING) and specifies the target tables and Boolean search predicates.
This expression also
implicitly defines joins by naming multiple tables in the record selection
expression. Hence the language
forms a relational algebra but not a calculus as it doesn't specify
how the query will be executed.
The "assignment" assigns arbitrary value expressions usually involving
column scalar values to the
parameters making up the output message. The output message is
then sent off to the client which
is conceptually the DSQL layer. The DSQL layer performs transforms
BLR message receipt to the
expected InterBase DSQL wire protocol. The InterBase client at the
user's application then converts
this to the output XSQLDA as expected by the programmer.
The BLR code generation is handled in DSQL/GEN.C. By this point, the
code generation is fairly
straight forward as the parse tree lends itself to an easier translation.
It basically follows the outline
as given above.
BLR Request Compilation
BLR request compilation is the process by which a BLR program is compiled
into an executable
request tree that can be directly interpreted by the InterBase relational
engine. Each node in the
execution tree is visited an the actions implied by its children nodes
are carried out. This includes
the retrieval of candidate rows from underlying base tables and join
processing.
BLR Recursive-descent parsing
BLR is very cheap to parse. The language design is non ambiguous and clean
enough that very little
or no lookahead processing is required to parse a BLR statement. As
no backtracking on the input
BLR is required, it's only a slight exaggeration to say that no intermediate
parse tree is generated. For
the most part, a non-optimized execution tree is produced during the
initial pass over the BLR string.
Because of the foregoing properties, a simple recursive-descent parsing
algorithm translates the input
BLR into a non-final execution request tree with the sequence of parsing
functions called serving as an]
implicit parse tree.
It's instructive to peruse JRD/PAR.C and JRD/CMP.C to see the details
of processing. Unfortunately
there is no documented notational specification in source code form
which I can direct you toward.
There are very good hard copy reference guides to the OSRI (Open Standard
Relational Interface)
Architecture which includes BLR specification and semantics. However,
these are in short supply by
InterBase old-timers. As Compaq/Digital did with ancestor DSRI specifications,
Borland stopped
supporting the BLR interface layers for customer usage which means
its documentation ceased as
well many years ago.
Semantic execution tree pass
Since the BLR request interface was at one time a supported customer interface,
the request tree
cannot be trusted to form a meaningful operation. Legacy customers
built custom BLR translators
for 4GL data sublanguages so it's necessary that the InterBase engine
pretty much perform the
identical checks that the DSQL layer performed as described earlier.
Over an above those type checking operations described by the SQL processing:
-
the semantic pass will fold in view substitutions from the database
schema and remap
underlying stream references (record contexts) to the base relations.
-
In addition, computed field references and default value clauses are appended
to the
execution tree to handle these operations transparently to the user.
-
Stored procedure references are transformed by metadata templates to bind
the reference
to a compiled forms of the procedure in the server's memory.
-
User-defined function references are checked and loaded into the server's
memory.
Similar to procedures, a binding generates assignment nodes to assign
function
arguments to their formal parameters.
-
Note that trigger execution is not bound to the users data request. Instead
trigger
requests are compiled as separate system requests and control is transferred
to
them during normal user request execution of the respective DML operation.
Storage Allocation
Pass
An execution request is divided into pure and an impure
sections. The "pure" section of
the request is the execution node tree which means that no memory writes
will occur to
the tree after the request has been compiled. During the execution
of the request, the node
tree is walked and the operations implied by its children are carried
out.
However, there must be some place where the intermediate results of
those operations are
stored and this is the "impure" section of the request. During the
storage allocation pass, the
nodes of the execution tree are revisited and storage is assigned for
the data descriptors which
describe value nodes in the tree. The impure storage space is
assigned as a contiguous allocation
chunk of memory and relative offsets into this space is written back
into the execution nodes.
Buffer space for the retrieval of row contents is deferred until run-time.
Since InterBase allows
multiple formats of record structures to be extant, the dynamic case
of reallocating a larger
buffer has to be handled anyway. Might as well default this to run-time
as the code to handle
it is already present.
Query Optimization
Pass
The operations implied by the BLR record selection expression are resolved
during query
optimization. Explicit access methods are determined in the form of
a sequential chain of
record source blocks. This includes the mapping of search predicates
to indices defined for
table, the order of join processing, and the creation of sort blocks
for explicit ORDER BY
or implicit sort/merge joins. This is module JRD/OPT.C
Record source blocks are defined for sequential, index, navigational,
sort and join access
methods. The retrieval operations for each of type can be found in
JRD/RSE.C.
As the treatment of query optimization is worthy of a paper of its own
and beyond the scope
of this paper nothing more will be said of this component.
Security Checking and
Authorization
During request generation and optimization, an access list of objects
explicitly or implicitly
referenced by the request is maintained. The access list includes both
the object and the
type of access requested (SELECT, INSERT, UPDATE, DELETE, EXECUTE and
REFERENCE).
Although InterBase presents a SQL model of security in the form of the
GRANT/REVOKE
syntax, InterBase maintains underlying webs of access control lists
(ACL's) which are stored
as blobs within the database. For each access implied by the request,
the user's authentication
identifier and intended access is checked against the ACL stored for
the object.
Choices For Extending SQL Functions
With the introduction of InterBase's SQL statement processing behind us,
we can now address
various methods of adding functionality to InterBase's relational engine.
Based on the presented
material, there are two immediate methods that come to mind: BLR verb
support and built-in,
system defined functions (SDF's).
BLR verb support involve "hard-wiring" the behavior for the SQL function
into the execution
engine. New BLR byte codes are assigned to the new function as well
as a syntax for acceptable
arguments take by the new BLR verb. This approach has:
-
Potentially better performance than built-in SDF's
-
More labor-intensive to add functions than built-in SDFs
-
Has better character set and collation support than built-in SDF's
Builtin, system-defined functions (SDF'S) is a combination of UDFs and
static definition of
that functionality into the base engine. The subtle difference is that
the functionality associated
with built-in SDFs becomes available as a default behavior of the engine.
Additionally:
-
There is no need to repeatedly make the same DDL/UDF declarations for each
database
that uses them.
-
There is no need to manipulate, port and carry along supporting libraries.
-
The built-in behavior can be overridden by a user's UDF, if so desired
-
Much easier to add the functionality than the BLR approach.
Let's take a quick tour of what it would take to add an SQL function with
each method to
gain an understanding of the benefits and tradeoffs of each approach.
Currently, the engine provides BLR support for the UPPER function which
returns the
uppercased version of an input string. However, there is no corresponding
BLR support
for the counterpart LOWER function. Instead, InterBase comes with a
collection of
common UDF functions of which one is the LOWER function.
There are several benefits associated with reimplementing the LOWER
function in BLR.
The chief reason is that the common UDF does not support international
character sets.
It only assumes ASCII.
BLR support of LOWER function
The steps required to add this support are extensive and touch almost every
part
of the DSQL component and the InterBase engine component. An experienced
InterBase
engineer can do this work but it's not for InterBase neophyte.
First the DSQL component:
-
Modify the DSQL parser to add a token for the keyword LOWER and a production
rule
for the LOWER language construct.
%token LOWER
LOWER '(' value ')'
{ $$ = make_node (nod_upcase, 1, $3); }
-
Add case for BLR generation in dsql/gen.c of new blr verb blr_lwcase
-
Add DSQL node type for nod_lwcase to dsql/node.h
-
Add descriptor generation for node to dsql/make.c
-
Add semantic support for lower case to all value cases in dsql/pass1.
Then for the JRD engine component:
-
Add new BLR verb, blr_lwcase to jrd/blrtable.c, blr.h
-
Add case for blr_lwcase to jrd/par.c and jrd/cmp.c
-
Add descriptor support to jrd/cmp.c/CMP_get_desc()
-
Add impure space allocation
-
Add lower case support to expression evaluator jrd/evl.c
-
Add lower case hooks to international language driver jrd/intl.c
-
Add string-to-lower function to map characters from lower to upper
for each major class of language
Not bad but the total number of files modified comes out to about 18 files.
SDF support of LOWER function
Since the concept of a system-defined function does not currently exist,
it should probably
be explained. There is currently a module named jrd/builtin.c which
is used to statically
link functions used by the QA department for white-box testing. This
allows the simulation
of catastrophic server failures or insertion of code for timing-related
tests that are difficult
to simulate externally.
If a function reference cannot be resolved to a UDF declaration as stored
in
RDB$FUNCTIONS and RDB$FUNCTION_ARGUMENTS then an internal lookup
table to these built-in functions is checked. This support currently
exists. Because of the order
of function lookup, the user has the ability to override the behavior
of an SDF by defining
an equivalent UDF for a different behavior
The SDF concept constructs an internal binary representation of RDB$FUNCTIONS
and
RDB$FUNCTION_ARGUMENTS. This internal form makes it unnecessary to
declare
the functions to each database thus providing universal access to all
databases on the server.
The table-driven structure enables function dispatch and a means to
marshall function arguments.
The following data structures would be defined in a new include file
JRD/BLTN.H:
typedef struct bltn {
UCHAR
*bltn_function;
int
(*bltn_entrypoint)();
USHORT bltn_position;
USHORT bltn_mechanism;
USHORT bltn_fld_type;
USHORT bltn_fld_scale;
USHORT bltn_fld_length;
USHORT bltn_fld_subtype;
USHORT bltn_charset_id;
} *BLTN;
static bltn builtin_functions[] {
"ACOS", BLTN_acos,
0, FUN_value, blr_double,
0, 8, 0, 0,
"ACOS", BLTN_acos,
1, FUN_reference, blr_double, 0,
8, 0, 0,
"FACTORIAL", BLTN_factorial, 0, FUN_value,
blr_int64, 0, 8, 0, 0,
"FACTORIAL", BLTN_factorial, 1, FUN_reference,
blr_short, 0, 2, 0, 0,
"SUBSTR", BLTN_substr,
0, FUN_reference, blr_cstring, 0, 80,
0, 0,
"SUBSTR", BLTN_substr,
1, FUN_reference, blr_cstring, 0, 80,
0, 0,
"SUBSTR", BLTN_substr,
2, FUN_reference, blr_short, 0,
2, 0, 0,
"SUBSTR", BLTN_substr,
3, FUN_reference, blr_short, 0,
2, 0, 0,
.
.
.
0,
0,
0, 0,
0, 0,
0, 0, 0
};
And here's some source code for a factorial function:
UINT64 BLTN_factorial (USHORT n)
{
UINT64 factorial = 1;
while (n > 1)
factorial *= n--;
return factorial;
}
The source code for each of the internal SDFs are statically compiled
inside the engine. This allows
access to internal support routines for international language support
in the same way that BLR
execution has access.
The performance overhead of a SDF/UDF mainly derives from the invocation
preamble of
JRD/FUN.E/FUN_evaluate(). The appendix to this paper has a memo which
discusses some
of the ways that this overhead can be addressed. However, the final
result is still that BLR
execution ends up having a positive performance advantage over a SDF/UDF.
As an example of improving UDF's look at the current LOWER UDF shipped
with InterBase:
char* EXPORT IB_UDF_lower(
char* s)
{
char *buf;
char *p;
buf = (char*) ib_util_malloc (strlen (s) +
1);
p = buf;
while (*s)
if (*s >= 'A' && *s <= 'Z')
{
*p++ = *s++ - 'A' + 'a';
}
else
*p++ = *s++;
*p = '0';
return buf;
}
Since the UDF function invocation performs a copy-by-value for string
arguments, there is no reason
to malloc() another buffer for the LOWER output (followed by a free()
to release the buffer. Thus the
memory allocation, string copy and deallocation overhead can be removed.
The result LOWERed
string will be copied automatically by the function return process
to an impure section of the request area.
The cost of implemention for a user to add SQL functions with the SDF
methodology would be 2
files. Adding an entry to the built-in structure for dispatch and the
source code for the function itself.
Summary
In general, there is a very, large finite number of functions that can
be added to enhance InterBase's
SQL processing capability. These range from string handling to transcendental
mathematical functions.
Therefore, there should be some general guidelines to help decide when
a new function is added
via BLR support and when it might be added as an SDF or UDF.
Some guiding criteria might be:
An SQL function defined by the SQL standard might benefit from BLR implementation.
It can't
be overridden and there is a slight performance gain associated with
the BLR support. Presumably
the SQL function is commonly used to benefit such an implementation.
Esoteric functions that are used infrequently by a smaller population
of users might benefit from
either a UDF or SDF implementation. This preserves the core interpretive
execution engine's
memory footprint. The case statements for node dispatch and evaluation
will not bloat to hundreds
or thousands of cases. There are BLR escape sequence conventions which
will forestall the
range of possible BLR from becoming exhausted but this isn't cause
to abandon the justification
of new BLR language.
Overloaded functions might benefit from the flexibility of a UDF/SDF
implementation. Trigonometric
math functions could take either radians or degrees as input arguments
without having separate
function address spaces. The user can define the UDF external to the
engine or define SDF
internal to the engine.
The developer's implementation skills and familiarity with InterBase
internals might guide the simpler
method of SDF/UDF selection. Similarly, a competent, experienced
InterBase engineer might favor
the BLR approach.
Appendix
Here's some of the raw footage that went into writing this paper. These
are excerpts from internal
memos and emails on the subject. It's a lively debate between engineers
about the merits of both
approaches. You can come to your own conclusions. Some of the names
have been changed to
protect the innocent.
Subject:
Re: proposal for converting IB UDF to SQL functions
Resent-From: ib_engineering@inprise.com
Date:
Wed, 24 Nov 1999 16:25:58 -0500
From:
************************************
To:
************************************
Bill,
At the risk of stating the obvious, builtin.c is for built-in functions.
It is currently searched after the user defined modules for function
entrypoints.
I think you could save a great deal of work in this project if you took
advantage of builtin.c's untapped capability. That's why it's there
in
the first place -- waiting for someone like you to come along and
exploit its existance.
Why would you consider any other alternative to implementing built-in
functions? Your proposal seems like alot of work but that's OK because
work can be fun ... to a certain extent.
The advantages of builtin.c:
1) No need to modify the parser, looper or introduce new blr. Generate
the current blr and let the current function lookup mechanism search
builtin.c before user defined UDF modules. (It currently searches after.
Changing the search order would prevent your standards-defined SQL
UDFs
from being overriden by the user.)
2) Builtin.c automatically solves the problem of not having to declare
them. Of course, if statically linking these functions into the engine
is too old-fashioned, you can always change it to dynamically load
the
ib_udf and ib_util shared libraries for better maintenance. In either
case, there's no need to declare them for each database -- they are
server-wide.
My advice would be to scrap your current proposal or, at the very least,
investigate where a new proposal based on builtin.c leads you.
Regards,
Charlie
Bill wrote:
>
> Hi Folks:
>
> The objective of this project is to convert all IB UDF to SQL functions.
> There are two requirements for
> the implementation of the SQL functions:
>
> 1. The SQL functions should not require any external DLL
>
> 2. The SQL functions should not require DECLARE EXTERNAL FUNCTION
to
> enable access to the SQL functions
>
> There are totally 35 IB UDF and there are:
>
> math functions: --- 23
>
> abs, acos, asin, atan,
atan2, ceiling, cos, cosh, cot, div,
> floor, ln, log, log10,
mod, pi, rand, sign, sin, sinh,
> sqrt, tan, tanh
>
> binary functions; --- 3
>
> bin_and, bin_or, bin_xor
>
> character functions; --- 9
>
> ascii_char, ascii_val, lower, strlen, ltrim,
ltrim, substr
>
> note: TRIM
build-in function has to be rewritten to conform to ANSI/SQL
>
STRLEN build-in function has to be rewritten to conform to ANSI/SQL
>
> ANSI/SQL string length functions are in the form as:
>
> CHAR_LENGTH ( <string value> ) or CHARACTER_LENGTH
( <string value>
> )
>
> OCTET_LENGTH ( <string value> )
>
> BIT_LENGTH ( <string value> )
>
> ANSI/SQL TRIM functions are in the form as:
>
> TRIM ( [ [ LEADING | TRAILING | BOTH ] [
<trim character> ] FROM ]
> <character value> )
>
> The default is BOTH and the default <trim
character> is blank
>
> Currently IB character UDF (lower) does not support international
> character set.
>
> IB will continue support it current UDF. If user declared the UDF,
then
> IB server will honor the UDF. Otherwise IB server will process the
> function as SQL function.
> For example:
>
> select lower( c1 ) from t1;
>
> if the user declared the LOWER external function in his/her application,
> then the LOWER function
> is an UDF function. Otherwise, the LOWER function is a SQL function.
>
> The following are the tasks needed for converting 35 IB udf to SQL
> functions:
>
> 1. define new parse tree nodes for each SQL
functions
>
> 2. new logic to build the 1st new parse trees
for each SQL functions
>
> 3. define new blr for each SQL functions
from 1st new parse trees
>
> 4. new logic to build the new blr string
for each SQL functions
>
> 5. new logic for processing the new blr strings
and build the 2nd
> new parse trees from
the new blr strings
>
> 6. new logic in pass1 and pass2 to process
the 2nd new parse trees
>
> 7. new logic in looper for executing the
2nd new parse trees
>
> 8. the new code (perhaps the udf code can
be recycled if possible)
> for the SQL functions
>
> 9. new international character sets support
for each new character
> string SQL functions
(currently the character string udf that IB has
> does not support
international character sets)
>
> There is a buildin.c in jrd component, does there any body know what
it
> use for?
>
> If you have any comments or feed backs, please let me know by 12/1/99.
> Otherwise this is it and I will start writing the functional spec
based on this.
>
> Thanks,
>
> Bill
Subject:
Re: proposal for converting IB UDF to SQL functions
Resent-From: ib_engineering@inprise.com
Date:
Wed, 24 Nov 1999 19:59:24 -0500
From:
************************************
To:
************************************
Bill,
Bill wrote:
>
> Hi Charlie:
>
> If it is so easy
as what you said, why when Dean implemted his
> EXTRACT SQL function
did use the buildin.c approach?
>
I don't know, why don't you ask Dean?
But if you multiply his efforts implementing the single EXTRACT by 35
more UDF functions, it's going to be one expensive project. If it takes
a weeks
to implement "arc sine, arc cosine and arc tangent", it will cost about
$10,000 for
functions that nobody uses.
I view BLR byte code and InterBase's execution engine the same way that
Java byte code and the JVM is viewed by Java's architects. There has
to
be a compelling reason to alter the virtual machine. I don't think
that
transcendental functions and string triming functions merit that
elevated status. Possibly COBOL programmers would disagree with me.
It makes the engine more complicated (exe.c/looper and evl.c/expression
case statements jump by 35) for functions which are rarely executed
(not
to mention parsing and blr generation code.) It potentially makes all
request execution more expensive just by the presence of their
footprint.
UDFs are an extensibility mechanism which were designed not just for
our
customers but for us as well. The function invocation mechanism
(jrd/fun.e) and the function implementation bodies are isolated and
separated from Interbase's core execution engine.
> Do you have any examples
how to execute a buildin function
> thru buildin.c? I
need an example. Perhaps you can give me
> one or point me to
it.
>
We link jrd/qatest.c into the engine so that QA can run tests to
internally generate catastrophic server faults in places that would
be
difficult from an external application. You can see some its entrypoints
labeled in jrd/builtin.c. I can't name the TCS tests but it has
something to do with shadowing. Perhaps an enlightened co-reader of
this
mail can name the test and win a prize.
> Thanks,
>
> Bill
>
Regards,
Charlie
Subject:
Re: adding built-in functions
Date:
Wed, 08 Dec 1999 13:17:44 -0800
From:
************************************
To:
************************************
Bill & Michael,
To get a sense of how easy it would be to add new built-in functions,
imagine
there is an new include file, jrd/bltn.h. It declares a new structure
"bltn" and
a static array of all built-in functions. The static array can be used
to populate
RDB$FUNCTIONS and RDB$FUNCTION_ARGUMENTS or silently
initialize the run-time symbol and functions blocks with no system
metadata.
Whatever we decide is better for the customer.
In the example below, I have added a factorial function. It is
a simple two
step process for the engineer:
1) Add the factorial function parameters to the static array of
builtin_functions;
2) Add the factorial source code to ib_udf.c or whatever file we call
it.
Here's jrd/bltn.h:
typedef struct bltn {
UCHAR
*bltn_function;
int
(*bltn_entrypoint)();
USHORT bltn_position;
USHORT bltn_mechanism;
USHORT bltn_fld_type;
USHORT bltn_fld_scale;
USHORT bltn_fld_length;
USHORT bltn_fld_subtype;
USHORT bltn_charset_id;
} *BLTN;
static bltn builtin_functions[] {
"ACOS",
BLTN_acos, 0, FUN_value,
blr_double, 0, 8, 0, 0,
"ACOS",
BLTN_acos, 1, FUN_reference, blr_double,
0, 8, 0, 0,
"FACTORIAL", BLTN_factorial, 0, FUN_value,
blr_int64, 0, 8,
0, 0,
"FACTORIAL", BLTN_factorial, 1, FUN_reference, blr_short,
0, 2, 0, 0,
"STRLEN",
BLTN_strlen, 0, FUN_value,
blr_short, 0, 2,
0, 0,
"STRLEN",
BLTN_strlen, 1, FUN_reference, blr_cstring,
0, 32767, 0, 0,
"SUBSTR",
BLTN_substr, 0, FUN_reference, blr_cstring, 0,
80, 0, 0,
"SUBSTR",
BLTN_substr, 1, FUN_reference, blr_cstring, 0,
80, 0, 0,
"SUBSTR",
BLTN_substr, 2, FUN_reference, blr_short,
0, 2, 0, 0,
"SUBSTR",
BLTN_substr, 3, FUN_reference, blr_short,
0, 2, 0, 0,
.
.
.
0,
0,
0, 0,
0,
0, 0, 0, 0
};
And here's some source code for a factorial function:
UINT64 BLTN_factorial (USHORT n)
{
UINT64 factorial = 1;
while (n > 1)
factorial *= n--;
return factorial;
}
Regards,
Charlie
Subject:
bench mark result of BLR acos VS UDF acos
Date:
Wed, 08 Dec 1999 13:16:22 -0800
From:
************************************
To:
************************************
Hi Folks:
I have implemented the BLR acos. It did not take a long time to add
one SQL function the BLR way as I expected. The UDF acos takes 37%
slower than BLR acos and takes 63488 more memories. The interesting
thing is UDF acos run 6% faster than BLR acos. You can find the bench
mark result in the attached file:
show table t3;
NUM_1
NUMERIC(5, 3) Nullable
select * from t3;
NUM_1
============
0.500
...
0.500
total number of rows: 15449
select acos (num_1) from t3;
acos (num_1): BLR
acos (num_1): UDF
Current memory = 9144320 Current memory = 9155584,
13312 more
Delta memory = 207872
Delta memory = 219136,
11264 more
Max memory = 9242712
Max memory = 9281624. 38912 more
Elapsed time = 8.55 sec
Elapsed time = 11.75 sec, 37.43%
slower
Cpu = 2.20 sec
Cpu = 2.07 sec,
6.28% faster
Buffers = 2048
Buffers = 2048,
same
Reads = 223
Reads = 229
6 more Reads
Writes = 0
Writes = 4,
4 more Writes
Fetches = 31635
Fetches = 31716,
81 more Fetches
Bill
Subject:
bench mark result with Charlie approach
Date:
Fri, 10 Dec 1999 10:05:54 -0800
From:
Bill Taylor <btaylor@inprise.com>
To:
************************************
Hi Folks:
I ran the bench mark between the BLR approach VS Charlie approach and
the following is the result. What Charlie has done is as the following:
1. passing TDBB down to FUN_evaluation
2. cleaning up some code in the Fun_evaluate function
3. removing THREAD_ENTER/THREAD_EXIT
4. removing START_CHECK_FOR_EXCEPTIONS/END_CHECK_FOR_EXCEPTIONS.
Total number of records in table T1 is 15448
Total number of records in table T3 is 15449
The bench mark result of BLR approach VS Charlie new code is as the
following:
LOWER function:
BLR lower:
Charlie's UDF lower:
==========
====================
Current memory = 9145344 Current memory = 9154560,
9216 more
Delta memory = 208896
Delta memory = 218112,
9216 more
Max memory = 9242712
Max memory = 9280600,
37888 more
Elapsed time= 7.99 sec
Elapsed time= 23.31 sec,
191.74 slower
Cpu = 1.77 sec
Cpu = 2.62 sec,
48.02 slower
Buffers = 2048
Buffers = 2048,
same
Reads = 255
Reads = 259,
4 more Reads
Writes = 4
Writes = 0,
3 less Writes
Fetches = 31728
Fetches = 31732,
4 more Fetches
Charlie's UDF lower:
Base code UDF lower:
====================
====================
Current memory = 9154560 Current memory = 9155584,
1024 more
Delta memory = 218112
Delta memory = 219136,
1024 more
Max memory = 9280600
Max memory = 9281624,
1024 more
Elapsed time= 23.31 sec
Elapsed time= 24.62 sec,
6% slower
Cpu = 2.62 sec
Cpu = 2.57 sec,
2% faster
Buffers = 2048
Buffers = 2048,
same
Reads = 259
Reads = 259,
same
Writes = 0
Writes = 4,
4 more Writes
Fetches = 31732
Fetches = 31774,
42 more Fetches
BLR lower:
Base code UDF lower:
==========
====================
Current memory = 9145344 Current memory = 9155584,
10240 more
Delta memory = 208896
Delta memory = 219136,
10240 more
Max memory = 9242712
Max memory = 9281624, 38912
Elapsed time= 7.99 sec
Elapsed time= 24.62 sec,
208.14% slower
Cpu = 1.77 sec
Cpu = 2.57 sec,
45.20% slower
Buffers = 2048
Buffers = 2048,
same
Reads = 255
Reads = 259
4 more Reads
Writes = 4
Writes = 4,
same
Fetches = 31728
Fetches = 31774
46 more Fetches
ACOS function:
BLR acos (num_1):
UDF/Charlie acos (num_1):
Current memory = 9144320 Current memory
= 9154556, 10236 more
Delta memory = 207872
Delta memory = 218108, 10236 more
Max memory = 9242712
Max memory = 9280596, 37884 more
Elapsed time= 8.79 sec
Elapsed time= 8.70 sec,
about same
Cpu = 2.20 sec
Cpu = 2.21 sec,
about same
Buffers = 2048
Buffers = 2048,
same
Reads = 223
Reads = 229,
6 more Reads
Writes = 0
Writes = 0,
same
Fetches = 31635
Fetches = 31660,
25 more Fetches
>From this result, we can see, for numeric UDF, Charlie new code was
able to improve the performance to the level as fast as BLR approach
but not to character function.
But the biggest performance gain was due to get it off the:
START_CHECK_FOR_EXCEPTIONS/END_CHECK_FOR_EXCEPTIONS
Without these two instructions, the function will run 33% faster
than the original execution. With the implementation of the
preload approach, there will be some new code to be written to
skip the execution of CHECK_FOR_EXCEPTIONS when IB is executing
IB UDF and not to skip the execution of CHECK_FOR_EXCEPTIONS
when IB is executing users' UDF.
To improve the general UDF performance, we can add a new ISQL
option which will tell IB server to skip the execution of the
above two instructions.
Since I am running locally with no multy threads, removing the
THREAD_ENTER/THREAD_EXIT does not mean any thing and does not
improve the performance at all. With the implementation of the
preload approach, in the multy threaded environment, the
performance may not be as good as the BLR approach. There is
nothing that we can improve about this since IB is a multy
threaded engine. In order to save guard each thread, IB
has to do THREAD_ENTER/THREAD_EXIT.
The big advantage of the BLR approach is it does not have to
do any of the THREAD_ENTER/THREAD_EXIT and
START_CHECK_FOR_EXCEPTIONS/END_CHECK_FOR_EXCEPTIONS.
Charlie's code does not improve the performance of LOWER at
all. Charlie and I suspect the LOWER UDF was written very
badly. Both of us concluded to implement LOWER function as
BLR approach and leave the LOWER UDF as it within the share
library. If the user declare the LOWER function, then IB
server will execute the LOWER UDF, otherwise, it will
execute BLR LOWER function the same way as UPPER.
Do we wand to implement ltrim, ltrim, and substr as BLR?
It is up to you all. I will replace ltrim/rtrim with SQL
TRIM function and substr with SQL SUBSTRING. Remember though,
there is not performance gain by bring them in. If the users
ask for performance, we can tell them to use TRIM/SUBSTRING.
Passing down the TDBB and massaging the FUN_evaluation
function only show 1 - 2% performance gain. It does not
make any noticeable performance gain. But it makes the code
cleaner.
>From this exercise, we all learned a great deal about the
performance of UDF. Now we are ready to do:
1. improve the general performance of UDF
2. bring IB UDF SQL functions into the engine.
Since there is a way to improve the general execution of
UDF performance, I will not oppose the blended approach.
But I will reserve the concern about the performance of
THREAD_ENTER/THREAD_EXIT. Without any hard number to
show, there is nothing for me to argue about it. Is there
any body that can show me how to measure the performance
of THREAD_ENTER/THREAD_EXIT in the multy threaded
environment? Your help is greatly appreciated.
Now we almost have all the facts (except the
THREAD_ENTER/THREAD_EXIT) and I am ready to implement
any approaches that we as a team decide. Either one
(BLR or blended) will be fine to me. Implement as
preload only I think is out of the question. The
blended approach may be the way to go.
There have been some heated discussions about the
implementation approaches as engineering, I believe
it is not a bad thing and it is not personal.
We all wand to find the best optimal solution for
our customer (at least that what I had in mind).
So let it be in peace and move forward.
Thanks every body for all of your ideas and special
thanks to Charlie for doing the prototyping which
made the bench mark much more meaningful.
Bill
Subject:
Re: The fact of the 2 approaches (blr VS UDF approach)
Date:
Wed, 01 Dec 1999 15:45:50 -0500
From:
************************************
To:
************************************
Bill,
Perhaps a blended approach would be the optimal design. The SQL
functions should use the blr approach since they are likely to be the
most heavily used. The more esoteric non-SQL functions should use the
current UDF approach since they are an added service to the customer
but
unlikely to be used frequently enough.
The list of possible functions that have a finite utility to the
customer is infinite. To try to codify increasing numbers of those
functions in the future would be a mistake in my opinion. If we were
hardware designers, I'd be in the RISC camp while you preached the
virtues of the CISC camp. I believe in a simpler design to keep the
heavily used functions running quicker and augment extensible
functionality thru software libraries at a cost to performance.
CISC designs every possible function in silicon with no need for
software extensibility. But CISC designers don't specify instructions
for atan, sin, cos in the machine architecture even though they would
execute faster. They just have such a limited applicability and low
frequency of usage.
The fact that a function has membership in the current IB_UDF library
should not be the only criteria for blr built-in support. Frequency
of
use should be a heavy consideration, as well as, whether the function
qualifies as a SQL2/3 built-in function.
I agree with most of the points made in your rebuttal and found the
performance studies very enlightening -- very sound engineering.
However, I still find myself in stark disagreement with the universality
of your proposal. I think there has to be a dividing line between
built-in blr support and UDF extensibility for your listed functions.
I propose the following:
1) Implement the blr approach for the actual SQL standard functions
as
defined by SQL2/3: ABS, POSITION, CHAR_LENGTH, OCTET_LENGTH, BIT_LENGTH,
MOD, SUBSTRING, TRIM, TRANSLATE, UPPER, LOWER etc.
2) Keep the non-SQL standard-specified functions as UDFs but preload
the
RDB$FUNCTION and RDB$FUNCTION_ARGUMENTS metadata for the user at
database creation time. I still maintain that most of the 23
mathematical functions in your list are a waste of space in the engine.
I can come up with scalar functions like FACTORIAL(), as well as,
aggregate functions such as MEDIAN() and STANDARD_DEVIATION() which
would be more useful in statistical applications. This is a fertile
application area with the analysis of clickstream data at Web sites
and
the emphasis on customer relationship management.
But tanh, log, cot, I just don't buy their usefulness ...
The effort to preload this UDF metadata should take minimal engineering
time and I mentioned this for jrd/ini.e/INI_format(). So the user would
not need to DECLARE EXTERNAL ... as you suggested. It would already
be
predeclared at database creation time.
3) I wouldn't modify the behavior SUBSTR, LTRIM, STRLEN. These are not
SQL functions defined by the standard and don't purport to be. They
are
UDF functions that happen to have similar names and semantics for the
SQL functions with keywords of SUBSTRING, TRIM and {BIT | CHAR |
OCTET}_LENGTH. You are also likely to break existing and future customer
applications. My understanding is that most of these functions were
ported from dBASE to help migration from desktop database products.
Might be wise to leave these as is unless the UDF function name matches
an SQL keyword for a standard function.
Regards,
Charlie
Subject:
Re: The fact of the 2 approaches (blr VS UDF approach)
Date:
Wed, 01 Dec 1999 22:15:07 -0500
From:
Charlie Caro <ccaro@interbase.com>
To:
************************************
Bill,
The logical conclusion to draw is that UDFs should be dropped from the
product because they don't perform well enough. That's really what
you
are saying. Instead, customers should lobby and pay us to add BLR
support in the engine for esoteric functions that are of little use
to
anyone else but themselves.
Or could the conclusion be that UDF function invocation could benefit
from better attention to implementation details?
To wit:
1) Every CALL_UDF is bracketed by the THREAD_EXIT/THREAD_ENTER macros
because we don't know how long an arbitrary UDF will execute. However,
if the UDF were in support of SQL functions which we know to be
computationally brief then one could write:
if (function->fun_flags & (FUN_sql_function |
FUN_fast_function))
CALL_UDF (...);
else
{
THREAD_EXIT;
CALL_UDF (...);
THREAD_ENTER;
}
This would eliminate the substantial code overhead (relative to UDF
function invocation) of our thread scheduler.
2) It's somewhat misleading to imply that the memory overhead of 64KB
was due to some gross inefficiency of UDF function execution. Most
of
that memory is probably due to the nested system requests to query
RDB$FUNCTIONS and RDB$FUNCTION_ARGUMENTS during function lookup.
However, once compiled, these requests are reused for all subsequent
function lookups. It's an indictment against all system metadata because
that metadata must be queried to be useful.
3) You haven't made any mention of what your memory footprint will be
with the blr support. Since you are proposing to have blr built-in
support for everything that looks, sounds or smells like a function,
the
memory footprint will always be present whether the UDF functions are
used or not because they will be comingled with every other useful
part
of the execution engine.
4) FUN_evaluate() has many gratuitous MOVE_CLEAR macro calls. Here's
one
of my favorites:
temp_string = (STR) ALLOCDV (type_str, function->fun_temp_length,
...)
MOVE_CLEAR (temp_string->str_data, temp_string->str_length);
Since our ALL_alloc memory allocator zeros its data structures, the
MOVE_CLEAR zeros a zeroed piece of memory. Many engineers clear memory
when they don't need to because it seems like a good idea. There are
times when that's needed and there are times when it's unnecessary
and
costs performance. There's also a MOVE_CLEAR of an 800 byte temp buffer
which looks unnecessary.
6) You are comparing the complete UDF infrastructure against a couple
of
existing BLR functions and a prototype. COUNT isn't even a function
call
but an autoincrement of a memory location. You need to compare UDFs
against real function calls to the math or clib library for ASIN unless
you intend to inline the Taylor series expansion for it.
I don't dispute your proven claim about the overhead of UDF function
calls for simple functions. That's why macros and inline compilation
exist for regular compilers. But your proposal begs the question: Do
you
really intend to add BLR support for every function that comes your
way?
Looks like you've already made up your mind. If nothing else, the
performance should be great!
Regards,
Charlie
Bill wrote:
>
> Hi Folks:
>
> I have run another bench mark (blr implementation of COUNT SQL
> function VS UDF implementation of ASIN function). The conclusion
> is even for less possibly used ASIN function, The UDF
> implementation takes:
>
> Elapsed time:
1334% slower
> Cpu time:
25100% slower
> total extra memory usage: 64512
>
> If you executed the function more than one, The UDF
> implementation takes:
>
> Elapsed time: 1442% slower
> Cpu time: 26200% slower
> memory usage is same since IB is smart enought
> to recycle the usage of memory.
>
> I think the bench mark result speaks for itself.
>
> Bill
>
> attached is the bench mark result:
>
> show table t3;
>
> NUM_1
NUMERIC(5, 3) Nullable
>
> select * from t3;
>
>
NUM_1
> ============
>
0.500
>
...
>
0.500
>
> total number of rows: 15449
>
> count (num_1):
asin (num_1):
>
> Current memory = 9140224
Current memory = 9153536, 13312 more
> Delta memory = 203776
Delta memory = 217088,
13312 more
> Max memory = 9242712
Max memory = 9280600, 37888 more
> Elapsed time = 0.86 sec
Elapsed time = 12.34 sec, 1334%
slower
> Cpu = 0.01 sec
Cpu = 2.52 sec,
25100% slower
> Buffers = 2048
Buffers = 2048,
same
> Reads = 221
Reads = 229,
8 more Reads
> Writes = 0
Writes = 0,
same
> Fetches = 31628
Fetches = 31660,
32 more Fetches
>
> subsequent execution:
>
> Current memory = 9172992
Current memory = 9172992, same
> Delta memory = 0
Delta memory = 0,
same
> Max memory = 9280600
Max memory = 9280600, same
> Elapsed time = 0.80 sec
Elapsed time = 12.34 sec, 1442% slower
> Cpu = 0.01 sec
Cpu = 2.63 sec,
26200% slower
> Buffers = 2048
Buffers = 2048,
same
> Reads = 0
Reads = 0,
same
> Writes = 0
Writes = 0,
same
> Fetches = 31281
Fetches = 31281,
same
>
Subject:
Re: The fact of the 2 approaches (blr VS UDF approach)
Date:
Fri, 03 Dec 1999 16:16:59 -0500
From:
************************************
To:
************************************
My summary (see points below): We won't be able to compare the two
proposals until we do some work to improve UDF function invocation.
Bill wrote:
>
> Hi Folks:
>
> From all of the responses I see 3 different issues that we need
> to address:
>
> 1. The performance of IB UDF in general is very
> bad and need to be addressed. As Charlie
has suggested,
> there is code in many areas of the
engine can be improved
> to make UDF runs faster. This should
be dealt by another
> project all by it self called general
IB UDF performance
> enhancement.
>
Agreed, but it has a bearing on this discussion. If the performance
wasn't so bad, there would be no need for you to take the approach
that
you are proposing. You propose an approach that has a cost of
development that is proportional to the number of functions added.
This
cost goes forward everytime we add a new function.
If the UDF function invocation performance is equal (unlikely) there
will be no need for your proposal. If the performance is not quite
as
good, you can go forward with support for the standart SQL functions
but
there will be no need to add BLR support for non-standard SQL UDF
functions.
> 2. To remove the limitation of IB BLR from an one byte BLR to 2
> bytes BLR or even 4 bytes BLR. This
is another IB limitation
> that should be dealt by another project
all by it self called
> up scaling IB BLR. This is a similar
problem that IB has on
> versioning. The limit is also 256 with
is a 1 byte field.
>
Original K&R "C" had 29 keywords. Bjarne Stroustrup struggled to
keep
"C++" to 42 keywords. James Gosling wanted to improve on C++ and winced
at the 50 keywords required for Java. A multibillion dollar Java
industry from embedded to enterprise application has been generated
on
top of less than 256 Java byte codes.
You don't need 2 and 4 byte BLR codes. The sum total of words in the
East/West civilization, as well as, the Third World countries could
fit
in 4 bytes.
BLR is a low-level database programming language; not a dictionary for
every possible function name we can think of. It's cheap to parse and
network transportable. If I was tasked with the project of supporting
cross database joins for InterBase databases, I would want to
machine-generate BLR in the bowels of the engine and pump it across
the
network without ever hitting the overhead of SQL processing. I wouldn't
want the other side to have to deal with byte-ordering and big/little
endianness just to pass it the BLR.
It should be sometime before we need anything this drastic if we pay
proper attention to the language design. Even then I would argue to
keep
it a single byte encoding with an "escape" BLR verb to denote the
extended BLR verbs.
> 3. The SQL functions project it self.
>
> I personally against the blended approach and the following is my
> rationales:
>
> 1. The argument for the blended approach is to separate the heavily
> used functions from the unlikely to
be used frequently functions.
> Perhaps those functions just have been
used by 1% or less of
> IB customers. Since we do not care
about those 1% or less then 1%
> customers, then why bother to spend
extra money to implement
> the blended approach for them?
>
What extra money! The blended approach is nothing new. I'm just
proposing that we pre-declare the IB UDFs for the user. It's the same
result as "isql <dbname> ib_udf.sql".
The built-in dispatch mechanism is already there in builtin.c. It's
a
simple edit procedure to add the function names there and a simple
edit
procedure to add the function attributes in jrd/ini.e to initialize
the
metadata into RDB$FUNCTIONS RDB$FUNCTION_ARGUMENTS. Adding new function
would be as easy as adding new pararmeters to ibconfig.
In that regard, I think your proposal is more costly and repetitive.
It
duplicates prior work but doesn't leverage that prior work. It's like
doing the same thing over and over versus writing a script to do it
automatically.
The litmus test is whether UDF function invocation can improve enough
to
make the performance difference neglible. So you can quote the
performance advantages of BLR today but it's just beating a dead horse
until we try to improve UDF function invocation. This will benefit
user
UDFs and SQL functions at the same time.
If the performance is still bad, I would be the first one to defend
your
proposal for the true SQL functions. But even then, I would still not
support the miscellaneous and sundry functions going into BLR. That
would be a never ending resource drain unless someone could promise
that
we would never add another IB UDF function again. And I don't think
anyone is prescient enough to make that promise.
> 2. If IB really care about those 1% or less customers, then why do
> not give them a sound solution? Remember
what Murphy's
> law has said, once you think a function
that is not being
> used heavily, the next day, some customers
will use it.
>
> 3. What do the customers gain by IB implement the blended approach?
> Does it make the functions run faster?
Does it make the
> functions run with less memories? No.
The functions performance
> is as bad as it used to be. The only
gain for the customers is
> he/she does not have to declare the
function before using
> it and he/she does not have to install
an extra DLL.
> Since those customers are less then
or equal
> to 1%, who cares? It is just a costly
cosmetic
> improvement. It is a typical and classical
fixing the
> symptom and not the source.
>
> 4. If we spend some money to improve the general performance
> of IB UDF, those so call unlikely used
functions can
> be benefited much more than if we spend
the money on
> doing the blended approach.
>
> 5. If we combine the BLR approach with Sriram's suggestion,
> then the issue of taking too many BLRs
is solved. We
> do not have to worry about the BLR
implementation of
> SQL functions were taking too many
BLRs. But we still
> have to address the issue of expending
the BLR from 1
> byte to multiple bytes for future IB
features.
>
On points 2, 3, 4, and 5. If customers can come up with another 100
useful UDF functions, do we really want to have to commit valuable
engineering resources, like yours, adding those functions with your
proposal. I just don't see that return because the coding to add them
is
so labor-intensive.
The table-driven approach could add these functions in a day (assuming
the source code for the UDF functions was available). It would probably
take 100 days to do it the BLR way. If another engineer other than
BT
had to do the work, it would take even longer.
> 6. By doing the blended approach, now we have 3 different
> approaches for SQL functions (the BLR
implementation,
> the blended implementation, and the
true UDF
> implementation), By doing so, does
it make IB code
> easy to maintain or otherwise, I leave
it as a
> food of thought for all of you to think
about.
>
You're right! But the blended approach and UDF approach are closer
related than the BLR implementation. The blended approach is not even
a
new approach. it just predeclares the IB UDFs, it doesn't change the
way
the UDF mechanism works. You wouldn't be able to tell the difference
from running the IB_UDF.SQL script. We can keep the IB_UDF as a shared
library for better replacement maintenance or bring it into the engine
with the static reference capability that is already present in
builtin.c.
Regards,
Charlie
Connect with Us