Extending SQL Functionality in InterBase

By: Charlie Caro

Abstract: The Chief Architect of InterBase covers how to add additional SQL functions and internal UDFs to your database.

This is the technical paper from a talk given at the 12th Annual Borland Developer's Conference
By Charlie Caro – Borland Software Corp.

Charlie Caro is a software engineer with Borland InterBase Research and Development. Prior to his 13 years of database development, Charlie developed high-performance file systems and operating systems for transaction-based environments. Charlie holds a bachelor's degree in engineering/computer science from Harvard University, Cambridge, Massachusetts. He has also studied in the master's program in computer science at Boston University.

Note: No portion of this paper may be distributed or reproduced by any means, or in any form, without the author's and/or Borland's prior written permission.


Abstract

Overview of SQL Compilation Process

SQL Analysis Phase

SQL Lexical Analysis

SQL Syntax Analysis

SQL Semantic Analysis

SQL Synthesis Phase

BLR Code Generation

BLR Request Compilation

Recursive-Descent Parsing

Semantic Execution Tree Pass

Storage Allocation Pass

Query Optimization Pass

Security Checking

Extending SQL Functions

BLR Support of LOWER

SDF Support of LOWER

Summary

Appendix


Abstract

This paper describes a methodology for adding SQL functions to the InterBase engine. SQL functions
can be scalar functions or aggregate functions. Scalar functions take a list of scalar arguments and
evaluate to a scalar value. A scalar value belongs to a basic, atomic datatype as defined by the SQL
standard. Aggregate functions reduce a set of scalar values to a single scalar value. SQL scalar
expressions apply scalar operators to scalar operands to produce a scalar value. Therefore, SQL
function references can be used to form arbitrary SQL scalar expressions.

There are a couple of different approaches to this outcome which involve tradeoffs in performance,
footprint and engineering effort required to implement an SQL function. The most fundamental of the
two approaches involves the definition of BLR (Binary Language Representation), a low level, byte
coded relational algebra.

The other approach uses System Defined Functions or SDF's which are built-in, preloaded functions
on top of the UDF (User Defined Function) call mechanism. This mechanism is not currently supported
in the engine but a prototype approach is suggested in the paper.

Depending on the characteristics of the added SQL functionality, either of the approaches may be
a viable choice. The author explains some of the criteria that should be addressed to decide which
technique may be more suitable to the task at hand.

Overview of the SQL Compilation Process

Before either of the two techniques can be discussed, it's necessary to have an understanding of the
process by which SQL statements are compiled into a form that is suitable for execution by the
InterBase database engine. The compilation process involves a translation of the source SQL
language into an equivalent request expressed in the target BLR language.

To that end, the following compilation steps need to be reviewed:

SQL Analysis phase:

  • Linear analysis of  SQL statements (lexical)
  • Hierarchical analysis of the SQL statements (syntax or parsing)
  • Semantic analysis of SQL parse tree (correctness and annotation)
SQL Synthesis phase:
  • Code generation of BLR from annotated parse tree
  • BLR Compilation passes:
    • BLR recursive-descent parsing (generating the execution node tree)
    • Semantic execution tree pass (view substitution, resolving defaults and computed fields)
    • Storage allocation pass (impure space reservation)
    • Query optimization pass (join optimization and matching streams to access methods)
    • Security checking (authorization and access control)
    SQL Analysis Phase

    The SQL analysis phase is responsible for revealing the underlying grammatical structure of
    the passed in SQL statement. The heart of the process involves InterBase's SQL parser
    which is embodied in the module DSQL/PARSE.Y. The file is a notational specification of
    an SQL grammar which is input into the YACC compiler -  a parser generator tool. The
    output of the YACC compiler is a C source code file which can parse and generate a
    parse tree for the input SQL statement.

    SQL Lexical Analysis

    The lexical analyzer is responsible for generating a stream of tokens from the sequence of
    input characters comprising the SQL statement. It performs a simple left-to-right scan of the
    input and groups characters which have a cohesive, collective meaning. These groups of
    characters are called tokens. Tokens include identifiers, keywords, numbers,  operators
    and miscellaneous punctuation symbols.

    The lexical analyzer performs other ancillary actions to simplify the work of the parser.
    Rather than complicate the grammar (and hence the parser) with the complexity of white
    space, the lexical analyzer silently strips unnecessary white space and SQL comments
    from the input character stream.

    InterBase's lexical analyzer is built-in to its parser and implemented as a simple co-routine
    which is called on demand by the parser to generate the next token. In this sense, the lexical
    scanner is a passive participant.

    SQL Syntax Analysis

    As indicated above, the parser drives the analysis phase by calling the lexical scanner when
    it needs another token. Thus, the parser sees a partitioned token stream as input that is very
    much different from the input character sequence of the lexical scanner. The parser's job is
    to determine whether a parse tree can be derived from the token stream given the underlying
    grammar for the SQL language. If such a tree can be generated then we can have some
    assurance that the input statement has at least the superficial form of a legal SQL statement.

    The parser needs to employ more powerful, hierarchical methods than the simple, linear
    scanning of lexical analysis. Although tools exist to generate lexical analyzers from regular
    expression notation specifications, a close look at YYLEX() in DSQL/PARSE.Y will show
    that it is a trivial handcuffed implementation. On the other hand, the parser employs more
    rigorous, bottom-up, shift-reduce LALR (Look Ahead, Left Right) algorithms to handle
    the complexity of nested grammatical structures. Fortunately, YACC hides the nasty details
    in the grammar specifications.

    The production rules of the grammar for the parser is specified using a widely known notation
    called BNF (Bakus-Naur Form). The basic structure of a production has a non-terminal grammar
    construct on the left side, a colon ":" and a string of non-terminal and terminal symbols on the right
    side of the production. The production rule is followed by pseudo-programming action which
    indicates how to construct the parse tree node for the language construct defined by its
    production. The parse tree node has a basic structure of a slot for the operator implied by the
    language construct and a slot for each child node implied by the righted of the production rule.

    The parser specification includes a section to denote keyword tokens for the SQL language. These
    are identifiers which have been reserved by the SQL language and which may not be used by the
    user to define identifiers for SQL objects such as tables or columns.

    The final output of the parser is a parsed, syntax tree which is used as input to the next phase of
    semantic analysis. While the grammatical structure of the SQL statement has be verified, there is
    no guarantee that the objects referenced by the SQL statement correspond to any of the elements
    of the InterBase SQL database. Even if the references are used in a legal way as defined by the
    SQL language specification, are they employed meaningfully in the input SQL statement.

    SQL Semantic Analysis

    The job of  SQL semantic analysis is to insure that the SQL grammatical constructs of the
    statement are used correctly and to annotate the parse tree with information that will be useful
    for BLR code generation of the synthesis phase. There is no ideal notational convention for
    specifying this behavior so the rules and conventions are directly embedded in the module
    DSQL/PASS1. which forms the bulk of the semantic analysis code.

    Some of the actions include:
     

    • Alias substitution which includes both column and table aliases. In the case of the table

    • alias, it is necessary to create an association between the alias reference and base table
      object. This associative link annotates all parse tree nodes for which there is an alias
      node reference. This is used during code generation to generate BLR for the base table.
       
    • View validation checks that any views used in an SQL statement actually correspond to

    • a defined view in the SQL schema for the InterBase database.
       
    • Type checking validates that all operators are applied to operands that have the correct

    • types as defined by the language. It also validates stored procedure references and that
      their arguments match in both type and number. Additionally, it guarantees that procedure
      call method is used appropriately for either execute of select usage.
       
    • Aggregate handling insures that column references used in an SQL statement including

    • a GROUP BY clause are aggregate functions over the group or define the columns
      comprising the group.
       
    • Data descriptions need to annotate all expressions and field parse tree nodes. This is

    • especially important for generating the BLR message declarations which define all interaction
      between data requests and the InterBase engine. The BLR message loosely correspond
      to what an SQL programmer would associate with the SELECT list references in a
      direct SELECT or as the basis for a CURSOR definition.

    SQL Synthesis Phase

    At this point, the annotated SQL parse tree has everything necessary to generate an equivalent
    translation to a BLR program. Recall that BLR is a byte-code language for specifying the actions
    of a relational statement to the InterBase engine.

    BLR Code Generation

    The general structure of a BLR program is something like the following:

    input message declarations;
    output message declarations;
    input message receive;
    for loop
        record selection expression
        output message send
            assignment of value expressions to output message parameters
    end loop
    terminal message send

    From an SQL programmer's perspective, you can think of the "input message declaration" as
    the optional placeholders ("?") in a dynamic SQL statement. The "output message declaration"
    are the scalar values returned by an SQL SELECT.

    The BLR request performs an "input message receive" which conceptually receives the elements
    transmitted from input XSQLDA variables.

    The "for loop" corresponds to the SQL WHERE clause and other ancillary clauses (ORDER/GROUP
    BY/HAVING) and specifies the target tables and Boolean search predicates. This expression also
    implicitly defines joins by naming multiple tables in the record selection expression. Hence the language
    forms a relational algebra but not a calculus as it doesn't specify how the query will be executed.

    The "assignment" assigns arbitrary value expressions usually involving column scalar values to the
    parameters making up the output message. The output  message is then sent off to the client which
    is conceptually the DSQL layer. The DSQL layer performs transforms BLR message receipt to the
    expected InterBase DSQL wire protocol. The InterBase client at the user's application then converts
    this to the output XSQLDA as expected by the programmer.

    The BLR code generation is handled in DSQL/GEN.C. By this point, the code generation is fairly
    straight forward as the parse tree lends itself to an easier translation. It basically follows the outline
    as given above.

    BLR Request Compilation

    BLR request compilation is the process by which a BLR program is compiled into an executable
    request tree that can be directly interpreted by the InterBase relational engine. Each node in the
    execution tree is visited an the actions implied by its children nodes are carried out. This includes
    the retrieval of candidate rows from underlying base tables and join processing.

    BLR Recursive-descent parsing

    BLR is very cheap to parse. The language design is non ambiguous and clean enough that very little
    or no lookahead processing is required to parse a BLR statement. As no backtracking on the input
    BLR is required, it's only a slight exaggeration to say that no intermediate parse tree is generated. For
    the most part, a non-optimized execution tree is produced during the initial pass over the BLR string.

    Because of the foregoing properties, a simple recursive-descent parsing algorithm translates the input
    BLR into a non-final execution request tree with the sequence of parsing functions called serving as an]
    implicit parse tree.

    It's instructive to peruse JRD/PAR.C and JRD/CMP.C to see the details of processing. Unfortunately
    there is no documented notational specification in source code form which I can direct you toward.
    There are very good hard copy reference guides to the OSRI (Open Standard Relational Interface)
    Architecture which includes BLR specification and semantics. However, these are in short supply by
    InterBase old-timers. As Compaq/Digital did with ancestor DSRI specifications, Borland stopped
    supporting the BLR interface layers for customer usage which means its documentation ceased as
    well many years ago.

    Semantic execution tree pass

    Since the BLR request interface was at one time a supported customer interface, the request tree
    cannot be trusted to form a meaningful operation. Legacy customers built custom BLR translators
    for 4GL data sublanguages so it's necessary that the InterBase engine pretty much perform the
    identical checks that the DSQL layer performed as described earlier.

    Over an above those type checking operations described by the SQL processing:

    •  the semantic pass will fold in view substitutions from the database schema and remap

    • underlying stream references (record contexts) to the base relations.
    • In addition, computed field references and default value clauses are appended to the

    • execution tree to handle these operations transparently to the user.
    • Stored procedure references are transformed by metadata templates to bind the reference

    • to a compiled forms of the procedure in the server's memory.
    • User-defined function references are checked and loaded into the server's memory.

    • Similar to procedures, a binding generates assignment nodes to assign function
      arguments to their formal parameters.
    • Note that trigger execution is not bound to the users data request. Instead trigger

    • requests are compiled as separate system requests and control is transferred to
      them during normal user request execution of the respective DML operation.
    Storage Allocation Pass

    An execution request is divided into  pure and an impure sections. The "pure" section of
    the request is the execution node tree which means that no memory writes will occur to
    the tree after the request has been compiled. During the execution of the request, the node
    tree is walked and the operations implied by its children are carried out.

    However, there must be some place where the intermediate results of those operations are
    stored and this is the "impure" section of the request. During the storage allocation pass, the
    nodes of the execution tree are revisited and storage is assigned for the data descriptors which
    describe value nodes in the tree. The impure storage space  is assigned as a contiguous allocation
    chunk of memory and relative offsets into this space is written back into the execution nodes.

    Buffer space for the retrieval of row contents is deferred until run-time. Since InterBase allows
    multiple formats of record structures to be extant, the dynamic case of reallocating a larger
    buffer has to be handled anyway. Might as well default this to run-time as the code to handle
    it is already present.

    Query Optimization Pass

    The operations implied by the BLR record selection expression are resolved during query
    optimization. Explicit access methods are determined in the form of a sequential chain of
    record source blocks. This includes the mapping of search predicates to indices defined for
    table, the order of join processing, and the creation of sort blocks for explicit ORDER BY
    or implicit sort/merge joins. This is module JRD/OPT.C

    Record source blocks are defined for sequential, index, navigational, sort and join access
    methods. The retrieval operations for each of type can be found in JRD/RSE.C.

    As the treatment of query optimization is worthy of a paper of its own and beyond the scope
    of this paper nothing more will be said of this component.

    Security Checking and Authorization

    During request generation and optimization, an access list of objects explicitly or implicitly
    referenced by the request is maintained. The access list includes both the object and the
    type of access requested (SELECT, INSERT, UPDATE, DELETE, EXECUTE and
    REFERENCE).

    Although InterBase presents a SQL model of security in the form of the GRANT/REVOKE
    syntax, InterBase maintains underlying webs of access control lists (ACL's) which are stored
    as blobs within the database. For each access implied by the request, the user's authentication
    identifier and intended access is checked against the ACL stored for the object.

    Choices For Extending SQL Functions

    With the introduction of InterBase's SQL statement processing behind us, we can now address
    various methods of adding functionality to InterBase's relational engine. Based on the presented
    material, there are two immediate methods that come to mind: BLR verb support and built-in,
    system defined functions (SDF's).

    BLR verb support involve "hard-wiring" the behavior for the SQL function into the execution
    engine. New BLR byte codes are assigned to the new function as well as a syntax for acceptable
    arguments take by the new BLR verb. This approach has:

    • Potentially better performance than built-in SDF's
    • More labor-intensive to add functions than built-in SDFs
    • Has better character set and collation support than built-in SDF's
    Builtin, system-defined functions (SDF'S) is a combination of UDFs and static definition of
    that functionality into the base engine. The subtle difference is that the functionality associated
    with built-in SDFs becomes available as a default behavior of the engine. Additionally:
    • There is no need to repeatedly make the same DDL/UDF declarations for each database

    • that uses them.
    • There is no need to manipulate, port and carry along supporting libraries.
    • The built-in behavior can be overridden by a user's UDF, if so desired
    • Much easier to add the functionality than the BLR approach.
    Let's take a quick tour of what it would take to add an SQL function with each method to
    gain an understanding of the benefits and tradeoffs of each approach.

    Currently, the engine provides BLR support for the UPPER function which returns the
    uppercased version of an  input string. However, there is no corresponding BLR support
    for the counterpart LOWER function. Instead, InterBase comes with a collection of
    common UDF functions of which one is the LOWER function.

    There are several benefits associated with reimplementing the LOWER function in BLR.
    The chief reason is that the common UDF does not support international character sets.
    It only assumes ASCII.

    BLR support of LOWER function

    The steps required to add this support are extensive and touch almost every part
    of the DSQL component and the InterBase engine component. An experienced InterBase
    engineer can do this work but it's not for InterBase neophyte.

    First the DSQL component:

    • Modify the DSQL parser to add a token for the keyword LOWER and a production rule

    • for the LOWER language construct.
             %token LOWER
             LOWER '(' value ')'   { $$ = make_node (nod_upcase, 1, $3); }
    • Add case for BLR generation in dsql/gen.c of new blr verb blr_lwcase
    • Add DSQL node type for nod_lwcase to dsql/node.h
    • Add descriptor generation for node to dsql/make.c
    • Add semantic support for lower case to all value cases in dsql/pass1.

    •  
    Then for the JRD engine component:
    • Add new BLR verb, blr_lwcase to jrd/blrtable.c, blr.h
    • Add case for blr_lwcase to jrd/par.c and jrd/cmp.c
    • Add descriptor support to jrd/cmp.c/CMP_get_desc()
    • Add impure space allocation
    • Add lower case support to expression evaluator jrd/evl.c
    • Add lower case hooks to international language driver jrd/intl.c
    • Add string-to-lower function to map characters from lower to upper

    • for each major class of language
    Not bad but the total number of files modified comes out to about 18 files.

    SDF support of LOWER function

    Since the concept of a system-defined function does not currently exist, it should probably
    be explained. There is currently a module named jrd/builtin.c which is used to statically
    link functions used by the QA department for white-box testing. This allows the simulation
    of catastrophic server failures or insertion of code for timing-related tests that are difficult
    to simulate externally.

    If a function reference cannot be resolved to a UDF declaration as stored in
    RDB$FUNCTIONS and RDB$FUNCTION_ARGUMENTS then an internal lookup
    table to these built-in functions is checked. This support currently exists. Because of the order
    of function lookup, the user has the ability to override the behavior of an SDF by defining
    an equivalent UDF for a different behavior

    The SDF concept constructs an internal binary representation of RDB$FUNCTIONS and
    RDB$FUNCTION_ARGUMENTS. This internal form makes it unnecessary to declare
    the functions to each database thus providing universal access to all databases on the server.
    The table-driven structure enables function dispatch and a means to marshall function arguments.
    The following data structures would be defined in a new include file JRD/BLTN.H:

    typedef struct bltn {
        UCHAR       *bltn_function;
        int         (*bltn_entrypoint)();
        USHORT      bltn_position;
        USHORT      bltn_mechanism;
        USHORT      bltn_fld_type;
        USHORT      bltn_fld_scale;
        USHORT      bltn_fld_length;
        USHORT      bltn_fld_subtype;
        USHORT      bltn_charset_id;
    } *BLTN;

    static bltn builtin_functions[] {

    "ACOS",         BLTN_acos,      0, FUN_value,       blr_double,   0,     8, 0, 0,
    "ACOS",         BLTN_acos,      1, FUN_reference,   blr_double,   0,     8, 0, 0,

    "FACTORIAL",    BLTN_factorial, 0, FUN_value,       blr_int64,    0,     8, 0, 0,
    "FACTORIAL",    BLTN_factorial, 1, FUN_reference,   blr_short,    0,     2, 0, 0,

    "SUBSTR",       BLTN_substr,    0, FUN_reference,   blr_cstring,  0,    80, 0, 0,
    "SUBSTR",       BLTN_substr,    1, FUN_reference,   blr_cstring,  0,    80, 0, 0,
    "SUBSTR",       BLTN_substr,    2, FUN_reference,   blr_short,    0,     2, 0, 0,
    "SUBSTR",       BLTN_substr,    3, FUN_reference,   blr_short,    0,     2, 0, 0,
    .
    .
    .
    0,              0,              0, 0,                0,           0,     0, 0, 0
    };

    And here's some source code for a factorial function:

    UINT64 BLTN_factorial (USHORT n)
    {
    UINT64  factorial = 1;

    while (n > 1)
        factorial *= n--;

    return factorial;
    }

    The source code for each of the internal SDFs are statically compiled inside the engine. This allows
    access to internal support routines for international language support in the same way that BLR
    execution has access.

    The performance overhead of a SDF/UDF mainly derives from the invocation preamble of
    JRD/FUN.E/FUN_evaluate(). The appendix to this paper has a memo which discusses some
    of the ways that this overhead can be addressed. However, the final result is still that BLR
    execution ends up having a positive performance advantage over a SDF/UDF.

    As an example of improving UDF's look at the current LOWER UDF shipped with InterBase:

    char* EXPORT IB_UDF_lower(
        char* s)
    {
        char *buf;
        char *p;

        buf = (char*) ib_util_malloc (strlen (s) + 1);
        p = buf;
        while (*s)
     if (*s >= 'A' && *s <= 'Z')
         {
         *p++ = *s++ - 'A' + 'a';
         }
     else
         *p++ = *s++;
        *p = '0';

        return buf;
    }

    Since the UDF function invocation performs a copy-by-value for string arguments, there is no reason
    to malloc() another buffer for the LOWER output (followed by a free() to release the buffer. Thus the
    memory allocation, string copy and deallocation overhead can be removed. The result LOWERed
    string will be copied automatically by the function return process to an impure section of the request area.

    The cost of implemention for a user to add SQL functions with the SDF methodology would be 2
    files. Adding an entry to the built-in structure for dispatch and the source code for the function itself.

    Summary

    In general, there is a very, large finite number of functions that can be added to enhance InterBase's
    SQL processing capability. These range from string handling to transcendental mathematical functions.
    Therefore, there should be some general guidelines to help decide when a new function is added
    via BLR support and when it might be added as an SDF or UDF.

    Some guiding criteria might be:

    An SQL function defined by the SQL standard might benefit from BLR implementation. It can't
    be overridden and there is a slight performance gain associated with the BLR support. Presumably
    the SQL function is commonly used to benefit such an implementation.

    Esoteric functions that are used infrequently by a smaller population of users might benefit from
    either a UDF or SDF implementation. This preserves the core interpretive execution engine's
    memory footprint. The case statements for node dispatch and evaluation will not bloat to hundreds
    or thousands of cases. There are BLR escape sequence conventions which will forestall the
    range of possible BLR from becoming exhausted but this isn't cause to abandon the justification
    of new BLR language.

    Overloaded functions might benefit from the flexibility of a UDF/SDF implementation. Trigonometric
    math functions could take either radians or degrees as input arguments without having separate
    function address spaces. The user can define the UDF external to the engine or define SDF
    internal to the engine.

    The developer's implementation skills and familiarity with InterBase internals might guide the simpler
    method of SDF/UDF selection.  Similarly, a competent, experienced InterBase engineer might favor
    the BLR approach.

    Appendix

    Here's some of the raw footage that went into writing this paper. These are excerpts from internal
    memos and emails on the subject. It's a lively debate between engineers about the merits of both
    approaches. You can come to your own conclusions. Some of the names have been changed to
    protect the innocent.



    Subject:            Re: proposal for converting IB UDF to SQL functions
    Resent-From:    ib_engineering@inprise.com
    Date:                Wed, 24 Nov 1999 16:25:58 -0500
    From:               ************************************
    To:                   ************************************

    Bill,

    At the risk of stating the obvious, builtin.c is for built-in functions.
    It is currently searched after the user defined modules for function
    entrypoints.

    I think you could save a great deal of work in this project if you took
    advantage of builtin.c's untapped capability. That's why it's there in
    the first place -- waiting for someone like you to come along and
    exploit its existance.

    Why would you consider any other alternative to implementing built-in
    functions? Your proposal seems like alot of work but that's OK because
    work can be fun ... to a certain extent.

    The advantages of builtin.c:

    1) No need to modify the parser, looper or introduce new blr. Generate
    the current blr and let the current function lookup mechanism search
    builtin.c before user defined UDF modules. (It currently searches after.
    Changing the search order would prevent your standards-defined SQL UDFs
    from being overriden by the user.)

    2) Builtin.c automatically solves the problem of not having to declare
    them. Of course, if statically linking these functions into the engine
    is too old-fashioned, you can always change it to dynamically load the
    ib_udf and ib_util shared libraries for better maintenance. In either
    case, there's no need to declare them for each database -- they are
    server-wide.

    My advice would be to scrap your current proposal or, at the very least,
    investigate where a new proposal based on builtin.c leads you.

    Regards,
     Charlie

    Bill wrote:
    >
    > Hi Folks:
    >
    > The objective of this project is to convert all IB UDF to SQL functions.
    > There are two requirements for
    > the implementation of the SQL functions:
    >
    > 1. The SQL functions should not require any external DLL
    >
    > 2. The SQL functions should not require DECLARE EXTERNAL FUNCTION to
    > enable access to the SQL functions
    >
    > There are totally 35 IB UDF and there are:
    >
    > math functions: --- 23
    >
    >     abs,   acos, asin, atan,  atan2, ceiling, cos,  cosh, cot, div,
    >     floor, ln,   log,  log10, mod,   pi,      rand, sign, sin, sinh,
    >     sqrt,  tan,  tanh
    >
    > binary functions; --- 3
    >
    >     bin_and, bin_or, bin_xor
    >
    > character functions; --- 9
    >
    >     ascii_char, ascii_val, lower, strlen, ltrim, ltrim, substr
    >
    >         note: TRIM   build-in function has to be rewritten to conform to ANSI/SQL
    >                STRLEN build-in function has to be rewritten to conform to ANSI/SQL
    >
    > ANSI/SQL string length functions are in the form as:
    >
    >     CHAR_LENGTH ( <string value> ) or CHARACTER_LENGTH ( <string value>
    > )
    >
    >     OCTET_LENGTH ( <string value> )
    >
    >     BIT_LENGTH ( <string value> )
    >
    > ANSI/SQL TRIM functions are in the form as:
    >
    >     TRIM ( [ [ LEADING | TRAILING | BOTH ] [ <trim character> ] FROM ]
    > <character value> )
    >
    >     The default is BOTH and the default <trim character> is blank
    >
    > Currently IB character UDF (lower) does not support international
    > character set.
    >
    > IB will continue support it current UDF. If user declared the UDF, then
    > IB server will honor the UDF. Otherwise IB server will process the
    > function as SQL function.
    > For example:
    >
    > select lower( c1 ) from t1;
    >
    > if the user declared the LOWER external function in his/her application,
    > then the LOWER function
    > is an UDF function. Otherwise, the LOWER function is a SQL function.
    >
    > The following are the tasks needed for converting 35 IB udf to SQL
    > functions:
    >
    >     1. define new parse tree nodes for each SQL functions
    >
    >     2. new logic to build the 1st new parse trees for each SQL functions
    >
    >     3. define new blr for each SQL functions from 1st new parse trees
    >
    >     4. new logic to build the new blr string for each SQL functions
    >
    >     5. new logic for processing the new blr strings and build the 2nd
    >         new parse trees from the new blr strings
    >
    >     6. new logic in pass1 and pass2 to process the 2nd new parse trees
    >
    >     7. new logic in looper for executing the 2nd new parse trees
    >
    >     8. the new code (perhaps the udf code can be recycled if possible)
    >         for the SQL functions
    >
    >     9. new international character sets support for each new character
    >         string SQL functions (currently the character string udf that IB has
    >         does not support international character sets)
    >
    > There is a buildin.c in jrd component, does there any body know what it
    > use for?
    >
    > If you have any comments or feed backs, please let me know by 12/1/99.
    > Otherwise this is it and I will start writing the functional spec based on this.
    >
    > Thanks,
    >
    > Bill


    Subject:              Re: proposal for converting IB UDF to SQL functions
    Resent-From:      ib_engineering@inprise.com
    Date:                  Wed, 24 Nov 1999 19:59:24 -0500
    From:                 ************************************
    To:                     ************************************

    Bill,

    Bill wrote:
    >
    > Hi Charlie:
    >
    >         If it is so easy as what you said, why when Dean implemted his
    >         EXTRACT SQL function did use the buildin.c approach?
    >

    I don't know, why don't you ask Dean?

    But if you multiply his efforts implementing the single EXTRACT by 35
    more UDF functions, it's going to be one expensive project. If it takes a weeks
    to implement "arc sine, arc cosine and arc tangent", it will cost about $10,000 for
    functions that nobody uses.

    I view BLR byte code and InterBase's execution engine the same way that
    Java byte code and the JVM is viewed by Java's architects. There has to
    be a compelling reason to alter the virtual machine. I don't think that
    transcendental functions and string triming functions merit that
    elevated status. Possibly COBOL programmers would disagree with me.

    It makes the engine more complicated (exe.c/looper and evl.c/expression
    case statements jump by 35) for functions which are rarely executed (not
    to mention parsing and blr generation code.) It potentially makes all
    request execution more expensive just by the presence of their
    footprint.

    UDFs are an extensibility mechanism which were designed not just for our
    customers but for us as well. The function invocation mechanism
    (jrd/fun.e) and the function implementation bodies are isolated and
    separated from Interbase's core execution engine.

    >         Do you have any examples how to execute a buildin function
    >         thru buildin.c? I need an example. Perhaps you can give me
    >         one or point me to it.
    >

    We link jrd/qatest.c into the engine so that QA can run tests to
    internally generate catastrophic server faults in places that would be
    difficult from an external application. You can see some its entrypoints
    labeled in jrd/builtin.c. I can't name the TCS tests but it has
    something to do with shadowing. Perhaps an enlightened co-reader of this
    mail can name the test and win a prize.

    >         Thanks,
    >
    >         Bill
    >

    Regards,
     Charlie


    Subject:             Re: adding built-in functions
    Date:                 Wed, 08 Dec 1999 13:17:44 -0800
    From:                ************************************
    To:                    ************************************
     

     Bill & Michael,

     To get a sense of how easy it would be to add new built-in functions, imagine
    there is an new include file, jrd/bltn.h. It declares a new structure "bltn" and
    a static array of all built-in functions. The static array can be used to populate
    RDB$FUNCTIONS and RDB$FUNCTION_ARGUMENTS or silently
    initialize the run-time symbol and functions blocks with no system metadata.
    Whatever we decide is better for the customer.

     In the example below, I have added a factorial function. It is a simple two
    step process for the engineer:

    1) Add the factorial function parameters to the static array of
    builtin_functions;

    2) Add the factorial source code to ib_udf.c or whatever file we call it.

    Here's jrd/bltn.h:

    typedef struct bltn {
        UCHAR        *bltn_function;
        int                  (*bltn_entrypoint)();
        USHORT      bltn_position;
        USHORT      bltn_mechanism;
        USHORT      bltn_fld_type;
        USHORT      bltn_fld_scale;
        USHORT      bltn_fld_length;
        USHORT      bltn_fld_subtype;
        USHORT      bltn_charset_id;
    } *BLTN;

    static bltn builtin_functions[] {

    "ACOS",               BLTN_acos,      0, FUN_value,       blr_double,   0,     8, 0, 0,
    "ACOS",               BLTN_acos,      1, FUN_reference, blr_double,   0,     8, 0, 0,

    "FACTORIAL",    BLTN_factorial, 0, FUN_value,       blr_int64,      0,     8, 0, 0,
    "FACTORIAL",    BLTN_factorial, 1, FUN_reference, blr_short,      0,     2, 0, 0,

    "STRLEN",            BLTN_strlen,    0, FUN_value,       blr_short,      0,     2, 0, 0,
    "STRLEN",            BLTN_strlen,    1, FUN_reference, blr_cstring,    0,    32767, 0, 0,

    "SUBSTR",            BLTN_substr,    0, FUN_reference, blr_cstring,  0,     80, 0, 0,
    "SUBSTR",            BLTN_substr,    1, FUN_reference, blr_cstring,  0,     80, 0, 0,
    "SUBSTR",            BLTN_substr,    2, FUN_reference, blr_short,    0,     2, 0, 0,
    "SUBSTR",            BLTN_substr,    3, FUN_reference, blr_short,    0,     2, 0, 0,
    .
    .
    .
    0,                          0,                        0, 0,                       0,               0,      0, 0, 0
    };

    And here's some source code for a factorial function:

    UINT64 BLTN_factorial (USHORT n)
    {
    UINT64  factorial = 1;

    while (n > 1)
        factorial *= n--;

    return factorial;
    }

    Regards,
     Charlie



    Subject:              bench mark result of BLR acos VS UDF acos
    Date:                  Wed, 08 Dec 1999 13:16:22 -0800
    From:                 ************************************
    To:                     ************************************

    Hi Folks:

    I have implemented the BLR acos. It did not take a long time to add
    one SQL function the BLR way as I expected. The UDF acos takes 37%
    slower than BLR acos and takes 63488 more memories. The interesting
    thing is UDF acos run 6% faster than BLR acos. You can find the bench
    mark result in the attached file:

    show table t3;

        NUM_1                           NUMERIC(5, 3) Nullable

    select * from t3;

               NUM_1
        ============
               0.500
               ...
               0.500

    total number of rows: 15449
    select acos (num_1) from t3;

    acos (num_1): BLR            acos (num_1): UDF

    Current memory = 9144320     Current memory = 9155584,    13312 more
    Delta memory = 207872           Delta memory = 219136,         11264 more
    Max memory = 9242712          Max memory = 9281624.        38912 more
    Elapsed time = 8.55 sec            Elapsed time = 11.75 sec,        37.43% slower
    Cpu = 2.20 sec                         Cpu = 2.07 sec,                       6.28% faster
    Buffers = 2048                          Buffers = 2048,                        same
    Reads = 223                             Reads = 229                            6 more Reads
    Writes = 0                                Writes = 4,                               4 more Writes
    Fetches = 31635                       Fetches = 31716,                    81 more Fetches

    Bill



    Subject:              bench mark result with Charlie approach
    Date:                  Fri, 10 Dec 1999 10:05:54 -0800
    From:                 Bill Taylor <btaylor@inprise.com>
    To:                     ************************************

    Hi Folks:

    I ran the bench mark between the BLR approach VS Charlie approach and
    the following is the result. What Charlie has done is as the following:

    1. passing TDBB down to FUN_evaluation
    2. cleaning up some code in the Fun_evaluate function
    3. removing THREAD_ENTER/THREAD_EXIT
    4. removing START_CHECK_FOR_EXCEPTIONS/END_CHECK_FOR_EXCEPTIONS.

    Total number of records in table T1 is 15448
    Total number of records in table T3 is 15449

    The bench mark result of BLR approach VS Charlie new code is as the
    following:

    LOWER function:

    BLR lower:                       Charlie's UDF lower:
    ==========                   ====================

    Current memory = 9145344     Current memory = 9154560,      9216 more
    Delta memory = 208896           Delta memory = 218112,           9216 more
    Max memory = 9242712          Max memory = 9280600,          37888 more
    Elapsed time= 7.99 sec             Elapsed time= 23.31 sec,           191.74 slower
    Cpu = 1.77 sec                         Cpu = 2.62 sec,                         48.02 slower
    Buffers = 2048                          Buffers = 2048,                          same
    Reads = 255                             Reads = 259,                             4 more Reads
    Writes = 4                                Writes = 0,                                 3 less Writes
    Fetches = 31728                      Fetches = 31732,                        4 more Fetches

    Charlie's UDF lower:                      Base code UDF lower:
    ====================         ====================

    Current memory = 9154560     Current memory = 9155584,      1024 more
    Delta memory = 218112           Delta memory = 219136,           1024 more
    Max memory = 9280600          Max memory = 9281624,          1024 more
    Elapsed time= 23.31 sec           Elapsed time= 24.62 sec,           6% slower
    Cpu = 2.62 sec                         Cpu = 2.57 sec,                         2% faster
    Buffers = 2048                          Buffers = 2048,                          same
    Reads = 259                             Reads = 259,                             same
    Writes = 0                                Writes = 4,                                 4 more Writes
    Fetches = 31732                       Fetches = 31774,                      42 more Fetches

    BLR lower:                        Base code UDF lower:
    ==========                   ====================

    Current memory = 9145344     Current memory = 9155584,     10240 more
    Delta memory = 208896           Delta memory = 219136,          10240 more
    Max memory = 9242712          Max memory = 9281624,         38912
    Elapsed time= 7.99 sec             Elapsed time= 24.62 sec,          208.14% slower
    Cpu = 1.77 sec                         Cpu = 2.57 sec,                        45.20% slower
    Buffers = 2048                          Buffers = 2048,                         same
    Reads = 255                             Reads = 259                             4 more Reads
    Writes = 4                                Writes = 4,                                same
    Fetches = 31728                       Fetches = 31774                      46 more Fetches
     

    ACOS function:

    BLR acos (num_1):             UDF/Charlie acos (num_1):

    Current memory = 9144320      Current memory = 9154556,  10236 more
    Delta memory = 207872            Delta memory = 218108,       10236 more
    Max memory = 9242712           Max memory = 9280596,      37884 more
    Elapsed time= 8.79 sec              Elapsed time= 8.70 sec,         about same
    Cpu = 2.20 sec                          Cpu = 2.21 sec,                     about same
    Buffers = 2048                           Buffers = 2048,                      same
    Reads = 223                              Reads = 229,                         6 more Reads
    Writes = 0                                 Writes = 0,                             same
    Fetches = 31635                        Fetches = 31660,                   25 more Fetches

    >From this result, we can see, for numeric UDF, Charlie new code was
    able to improve the performance to the level as fast as BLR approach
    but not to character function.

    But the biggest performance gain was due to get it off the:

         START_CHECK_FOR_EXCEPTIONS/END_CHECK_FOR_EXCEPTIONS

    Without these two instructions, the function will run 33% faster
    than the original execution. With the implementation of the
    preload approach, there will be some new code to be written to
    skip the execution of CHECK_FOR_EXCEPTIONS when IB is executing
    IB UDF and not to skip the execution of CHECK_FOR_EXCEPTIONS
    when IB is executing users' UDF.

    To improve the general UDF performance, we can add a new ISQL
    option which will tell IB server to skip the execution of the
    above two instructions.

    Since I am running locally with no multy threads, removing the
    THREAD_ENTER/THREAD_EXIT does not mean any thing and does not
    improve the performance at all. With the implementation of the
    preload approach, in the multy threaded environment, the
    performance may not be as good as the BLR approach. There is
    nothing that we can improve about this since IB is a multy
    threaded engine. In order to save guard each thread, IB
    has to do THREAD_ENTER/THREAD_EXIT.

    The big advantage of the BLR approach is it does not have to
    do any of the THREAD_ENTER/THREAD_EXIT and
    START_CHECK_FOR_EXCEPTIONS/END_CHECK_FOR_EXCEPTIONS.

    Charlie's code does not improve the performance of LOWER at
    all. Charlie and I suspect the LOWER UDF was written very
    badly. Both of us concluded to implement LOWER function as
    BLR approach and leave the LOWER UDF as it within the share
    library. If the user declare the LOWER function, then IB
    server will execute the LOWER UDF, otherwise, it will
    execute BLR LOWER function the same way as UPPER.

    Do we wand to implement ltrim, ltrim, and substr as BLR?
    It is up to you all. I will replace ltrim/rtrim with SQL
    TRIM function and substr with SQL SUBSTRING. Remember though,
    there is not performance gain by bring them in. If the users
    ask for performance, we can tell them to use TRIM/SUBSTRING.

    Passing down the TDBB and massaging the FUN_evaluation
    function only show 1 - 2% performance gain. It does not
    make any noticeable performance gain. But it makes the code
    cleaner.

    >From this exercise, we all learned a great deal about the
    performance of UDF. Now we are ready to do:

    1. improve the general performance of UDF
    2. bring IB UDF SQL functions into the engine.

    Since there is a way to improve the general execution of
    UDF performance, I will not oppose the blended approach.
    But I will reserve the concern about the performance of
    THREAD_ENTER/THREAD_EXIT. Without any hard number to
    show, there is nothing for me to argue about it. Is there
    any body that can show me how to measure the performance
    of THREAD_ENTER/THREAD_EXIT in the multy threaded
    environment? Your help is greatly appreciated.

    Now we almost have all the facts (except the
    THREAD_ENTER/THREAD_EXIT) and I am ready to implement
    any approaches that we as a team decide. Either one
    (BLR or blended) will be fine to me. Implement as
    preload only I think is out of the question. The
    blended approach may be the way to go.

    There have been some heated discussions about the
    implementation approaches as engineering, I believe
    it is not a bad thing and it is not personal.
    We all wand to find the best optimal solution for
    our customer (at least that what I had in mind).
    So let it be in peace and move forward.

    Thanks every body for all of your ideas and special
    thanks to Charlie for doing the prototyping which
    made the bench mark much more meaningful.

    Bill


    Subject:              Re: The fact of the 2 approaches (blr VS UDF approach)
    Date:                  Wed, 01 Dec 1999 15:45:50 -0500
    From:                 ************************************
    To:                     ************************************

    Bill,

    Perhaps a blended approach would be the optimal design. The SQL
    functions should use the blr approach since they are likely to be the
    most heavily used. The more esoteric non-SQL functions should use the
    current UDF approach since they are an added service to the customer but
    unlikely to be used frequently enough.

    The list of possible functions that have a finite utility to the
    customer is infinite. To try to codify increasing numbers of those
    functions in the future would be a mistake in my opinion. If we were
    hardware designers, I'd be in the RISC camp while you preached the
    virtues of the CISC camp. I believe in a simpler design to keep the
    heavily used functions running quicker and augment extensible
    functionality thru software libraries at a cost to performance.

    CISC designs every possible function in silicon with no need for
    software extensibility. But CISC designers don't specify instructions
    for atan, sin, cos in the machine architecture even though they would
    execute faster. They just have such a limited applicability and low
    frequency of usage.

    The fact that a function has membership in the current IB_UDF library
    should not be the only criteria for blr built-in support. Frequency of
    use should be a heavy consideration, as well as, whether the function
    qualifies as a SQL2/3 built-in function.

    I agree with most of the points made in your rebuttal and found the
    performance studies very enlightening -- very sound engineering.
    However, I still find myself in stark disagreement with the universality
    of your proposal. I think there has to be a dividing line between
    built-in blr support and UDF extensibility for your listed functions.

    I propose the following:

    1) Implement the blr approach for the actual SQL standard functions as
    defined by SQL2/3: ABS, POSITION, CHAR_LENGTH, OCTET_LENGTH, BIT_LENGTH,
    MOD, SUBSTRING, TRIM, TRANSLATE, UPPER, LOWER etc.

    2) Keep the non-SQL standard-specified functions as UDFs but preload the
    RDB$FUNCTION and RDB$FUNCTION_ARGUMENTS metadata for the user at
    database creation time. I still maintain that most of the 23
    mathematical functions in your list are a waste of space in the engine.

    I can come up with scalar functions like FACTORIAL(), as well as,
    aggregate functions such as MEDIAN() and STANDARD_DEVIATION() which
    would be more useful in statistical applications. This is a fertile
    application area with the analysis of clickstream data at Web sites and
    the emphasis on customer relationship management.

    But tanh, log, cot, I just don't buy their usefulness ...

    The effort to preload this UDF metadata should take minimal engineering
    time and I mentioned this for jrd/ini.e/INI_format(). So the user would
    not need to DECLARE EXTERNAL ... as you suggested. It would already be
    predeclared at database creation time.

    3) I wouldn't modify the behavior SUBSTR, LTRIM, STRLEN. These are not
    SQL functions defined by the standard and don't purport to be. They are
    UDF functions that happen to have similar names and semantics for the
    SQL functions with keywords of SUBSTRING, TRIM and {BIT | CHAR |
    OCTET}_LENGTH. You are also likely to break existing and future customer
    applications. My understanding is that most of these functions were
    ported from dBASE to help migration from desktop database products.
    Might be wise to leave these as is unless the UDF function name matches
    an SQL keyword for a standard function.

    Regards,
     Charlie


    Subject:              Re: The fact of the 2 approaches (blr VS UDF approach)
    Date:                  Wed, 01 Dec 1999 22:15:07 -0500
    From:                 Charlie Caro <ccaro@interbase.com>
    To:                     ************************************

    Bill,

    The logical conclusion to draw is that UDFs should be dropped from the
    product because they don't perform well enough. That's really what you
    are saying. Instead, customers should lobby and pay us to add BLR
    support in the engine for esoteric functions that are of little use to
    anyone else but themselves.

    Or could the conclusion be that UDF function invocation could benefit
    from better attention to implementation details?

    To wit:

    1) Every CALL_UDF is bracketed by the THREAD_EXIT/THREAD_ENTER macros
    because we don't know how long an arbitrary UDF will execute. However,
    if the UDF were in support of SQL functions which we know to be
    computationally brief then one could write:

        if (function->fun_flags & (FUN_sql_function | FUN_fast_function))
            CALL_UDF (...);
        else
            {
            THREAD_EXIT;
            CALL_UDF (...);
            THREAD_ENTER;
            }

    This would eliminate the substantial code overhead (relative to UDF
    function invocation) of our thread scheduler.

    2) It's somewhat misleading to imply that the memory overhead of 64KB
    was due to some gross inefficiency of UDF function execution. Most of
    that memory is probably due to the nested system requests to query
    RDB$FUNCTIONS and RDB$FUNCTION_ARGUMENTS during function lookup.
    However, once compiled, these requests are reused for all subsequent
    function lookups. It's an indictment against all system metadata because
    that metadata must be queried to be useful.

    3) You haven't made any mention of what your memory footprint will be
    with the blr support. Since you are proposing to have blr built-in
    support for everything that looks, sounds or smells like a function, the
    memory footprint will always be present whether the UDF functions are
    used or not because they will be comingled with every other useful part
    of the execution engine.

    4) FUN_evaluate() has many gratuitous MOVE_CLEAR macro calls. Here's one
    of my favorites:

      temp_string = (STR) ALLOCDV (type_str, function->fun_temp_length, ...)
      MOVE_CLEAR (temp_string->str_data, temp_string->str_length);

    Since our ALL_alloc memory allocator zeros its data structures, the
    MOVE_CLEAR zeros a zeroed piece of memory. Many engineers clear memory
    when they don't need to because it seems like a good idea. There are
    times when that's needed and there are times when it's unnecessary and
    costs performance. There's also a MOVE_CLEAR of an 800 byte temp buffer
    which looks unnecessary.

    6) You are comparing the complete UDF infrastructure against a couple of
    existing BLR functions and a prototype. COUNT isn't even a function call
    but an autoincrement of a memory location. You need to compare UDFs
    against real function calls to the math or clib library for ASIN unless
    you intend to inline the Taylor series expansion for it.

    I don't dispute your proven claim about the overhead of UDF function
    calls for simple functions. That's why macros and inline compilation
    exist for regular compilers. But your proposal begs the question: Do you
    really intend to add BLR support for every function that comes your way?

    Looks like you've already made up your mind. If nothing else, the
    performance should be great!

    Regards,
     Charlie

    Bill wrote:
    >
    > Hi Folks:
    >
    > I have run another bench mark (blr implementation of COUNT SQL
    > function VS UDF implementation of ASIN function).  The conclusion
    > is even for less possibly used ASIN function, The UDF
    > implementation takes:
    >
    > Elapsed time:              1334% slower
    > Cpu     time:                25100% slower
    > total extra memory usage: 64512
    >
    > If you executed the function more than one, The UDF
    > implementation takes:
    >
    > Elapsed time:  1442% slower
    > Cpu     time:    26200% slower
    > memory usage is same since IB is smart enought
    >     to recycle the usage of memory.
    >
    > I think the bench mark result speaks for itself.
    >
    > Bill
    >
    > attached is the bench mark result:
    >
    > show table t3;
    >
    >     NUM_1                           NUMERIC(5, 3) Nullable
    >
    > select * from t3;
    >
    >            NUM_1
    >     ============
    >            0.500
    >            ...
    >            0.500
    >
    > total number of rows: 15449
    >
    > count (num_1):                    asin (num_1):
    >
    >     Current memory = 9140224          Current memory = 9153536,    13312 more
    >     Delta memory = 203776                Delta memory = 217088,         13312 more
    >     Max memory = 9242712               Max memory = 9280600,        37888 more
    >     Elapsed time = 0.86 sec                 Elapsed time = 12.34 sec,        1334% slower
    >     Cpu = 0.01 sec                              Cpu = 2.52 sec,                       25100% slower
    >     Buffers = 2048                               Buffers = 2048,                        same
    >     Reads = 221                                  Reads = 229,                           8 more Reads
    >     Writes = 0                                     Writes = 0,                               same
    >     Fetches = 31628                            Fetches = 31660,                     32 more Fetches
    >
    > subsequent execution:
    >
    >     Current memory = 9172992         Current memory = 9172992,  same
    >     Delta memory = 0                         Delta memory = 0,                 same
    >     Max memory = 9280600              Max memory = 9280600,      same
    >     Elapsed time = 0.80 sec                Elapsed time = 12.34 sec,      1442% slower
    >     Cpu = 0.01 sec                            Cpu = 2.63 sec,                     26200% slower
    >     Buffers = 2048                             Buffers = 2048,                     same
    >     Reads = 0                                    Reads = 0,                            same
    >     Writes = 0                                   Writes = 0,                            same
    >     Fetches = 31281                          Fetches = 31281,                  same
    >


    Subject:              Re: The fact of the 2 approaches (blr VS UDF approach)
    Date:                  Fri, 03 Dec 1999 16:16:59 -0500
    From:                 ************************************
    To:                     ************************************
     

    My summary (see points below): We won't be able to compare the two
    proposals until we do some work to improve UDF function invocation.

    Bill wrote:
    >
    > Hi Folks:
    >
    > From all of the responses I see 3 different issues that we need
    > to address:
    >
    > 1. The performance of IB UDF in general is very
    >      bad and need to be addressed. As Charlie has suggested,
    >      there is code in many areas of the engine can be improved
    >      to make UDF runs faster. This should be dealt by another
    >      project all by it self called general IB UDF performance
    >      enhancement.
    >

    Agreed, but it has a bearing on this discussion. If the performance
    wasn't so bad, there would be no need for you to take the approach that
    you are proposing. You propose an approach that has a cost of
    development that is proportional to the number of functions added. This
    cost goes forward everytime we add a new function.

    If the UDF function invocation performance is equal (unlikely) there
    will be no need for your proposal. If the performance is not quite as
    good, you can go forward with support for the standart SQL functions but
    there will be no need to add BLR support for non-standard SQL UDF
    functions.

    > 2. To remove the limitation of IB BLR from an one byte BLR to 2
    >      bytes BLR or even 4 bytes BLR. This is another IB limitation
    >      that should be dealt by another project all by it self called
    >      up scaling IB BLR. This is a similar problem that IB has on
    >      versioning. The limit is also 256 with is a 1 byte field.
    >

    Original K&R "C" had 29 keywords. Bjarne Stroustrup struggled to keep
    "C++" to 42 keywords. James Gosling wanted to improve on C++ and winced
    at the 50 keywords required for Java. A multibillion dollar Java
    industry from embedded to enterprise application has been generated on
    top of less than 256 Java byte codes.

    You don't need 2 and 4 byte BLR codes. The sum total of words in the
    East/West civilization, as well as, the Third World countries could fit
    in 4 bytes.

    BLR is a low-level database programming language; not a dictionary for
    every possible function name we can think of. It's cheap to parse and
    network transportable. If I was tasked with the project of supporting
    cross database joins for InterBase databases, I would want to
    machine-generate BLR in the bowels of the engine and pump it across the
    network without ever hitting the overhead of SQL processing. I wouldn't
    want the other side to have to deal with byte-ordering and big/little
    endianness just to pass it the BLR.

    It should be sometime before we need anything this drastic if we pay
    proper attention to the language design. Even then I would argue to keep
    it a single byte encoding with an "escape" BLR verb to denote the
    extended BLR verbs.

    > 3. The SQL functions project it self.
    >
    > I personally against the blended approach and the following is my
    > rationales:
    >
    > 1. The argument for the blended approach is to separate the heavily
    >      used functions from the unlikely to be used frequently functions.
    >      Perhaps those functions just have been used by 1% or less of
    >      IB customers. Since we do not care about those 1% or less then 1%
    >      customers, then why bother to spend extra money to implement
    >      the blended approach for them?
    >

    What extra money! The blended approach is nothing new. I'm just
    proposing that we pre-declare the IB UDFs for the user. It's the same
    result as "isql <dbname> ib_udf.sql".

    The built-in dispatch mechanism is already there in builtin.c. It's a
    simple edit procedure to add the function names there and a simple edit
    procedure to add the function attributes in jrd/ini.e to initialize the
    metadata into RDB$FUNCTIONS RDB$FUNCTION_ARGUMENTS. Adding new function
    would be as easy as adding new pararmeters to ibconfig.

    In that regard, I think your proposal is more costly and repetitive. It
    duplicates prior work but doesn't leverage that prior work. It's like
    doing the same thing over and over versus writing a script to do it
    automatically.

    The litmus test is whether UDF function invocation can improve enough to
    make the performance difference neglible. So you can quote the
    performance advantages of BLR today but it's just beating a dead horse
    until we try to improve UDF function invocation. This will benefit user
    UDFs and SQL functions at the same time.

    If the performance is still bad, I would be the first one to defend your
    proposal for the true SQL functions. But even then, I would still not
    support the miscellaneous and sundry functions going into BLR. That
    would be a never ending resource drain unless someone could promise that
    we would never add another IB UDF function again. And I don't think
    anyone is prescient enough to make that promise.

    > 2. If IB really care about those 1% or less customers, then why do
    >      not give them a sound solution? Remember what Murphy's
    >      law has said, once you think a function that is not being
    >      used heavily, the next day, some customers will use it.
    >
    > 3. What do the customers gain by IB implement the blended approach?
    >      Does it make the functions run faster? Does it make the
    >      functions run with less memories? No. The functions performance
    >      is as bad as it used to be. The only gain for the customers is
    >      he/she does not have to declare the function before using
    >      it and he/she does not have to install an extra DLL.
    >      Since those customers are less then or equal
    >      to 1%, who cares? It is just a costly cosmetic
    >      improvement. It is a typical and classical fixing the
    >      symptom and not the source.
    >
    > 4. If we spend some money to improve the general performance
    >      of IB UDF, those so call unlikely used functions can
    >      be benefited much more than if we spend the money on
    >      doing the blended approach.
    >
    > 5. If we combine the BLR approach with Sriram's suggestion,
    >      then the issue of taking too many BLRs is solved. We
    >      do not have to worry about the BLR implementation of
    >      SQL functions were taking too many BLRs. But we still
    >      have to address the issue of expending the BLR from 1
    >      byte to multiple bytes for future IB features.
    >

    On points 2, 3, 4, and 5. If customers can come up with another 100
    useful UDF functions, do we really want to have to commit valuable
    engineering resources, like yours, adding those functions with your
    proposal. I just don't see that return because the coding to add them is
    so labor-intensive.

    The table-driven approach could add these functions in a day (assuming
    the source code for the UDF functions was available). It would probably
    take 100 days to do it the BLR way. If another engineer other than BT
    had to do the work, it would take even longer.

    > 6. By doing the blended approach, now we have 3 different
    >      approaches for SQL functions (the BLR implementation,
    >      the blended implementation, and the true UDF
    >      implementation), By doing so, does it make IB code
    >      easy to maintain or otherwise, I leave it as a
    >      food of thought for all of you to think about.
    >

    You're right! But the blended approach and UDF approach are closer
    related than the BLR implementation. The blended approach is not even a
    new approach. it just predeclares the IB UDFs, it doesn't change the way
    the UDF mechanism works. You wouldn't be able to tell the difference
    from running the IB_UDF.SQL script. We can keep the IB_UDF as a shared
    library for better replacement maintenance or bring it into the engine
    with the static reference capability that is already present in
    builtin.c.

    Regards,
     Charlie

    Server Response from: ETNASC03