A New Generation of Database Engine

By: Dmitry Seriy

Abstract: The object of TDataSet type of a new BDE, having formed the update query, should define the area of data that is to be updated...

Dear Sirs!

I would like to present you the results of my work in the field of data modification, retrieved as the result of the query containing the instructions LEFT JOIN.

Let’s view the SQL-expression as the data source of the tree structure of the comprised base tables. Let’s copy the result of the query fulfilled into the array of the table structure. Let’s call the array of table structure the conglomerate, and the result of semantic analysis of the SQL-expression - the data of the conglomerate structure.

Then, let’s define the data of the base tables’ structure. Some data about the conglomerate structure will contain the references to the base tables’ data, some operations will require this data. We will put the conglomerate in accordance with its own data, i.e. the accordance of column indexes with the table fields from the conglomerate content.

So, the following hierarchy is defined:

  • Base tables structure data;
  • Conglomerate structure data;
  • Conglomerate own data;
  • Conglomerate as the array of table structure.

The conglomerate table’s tree

Let’s define the notion of the tree of the conglomerate tables. Let’s view the relations among the tables in the following way: we put each conglomerate table in accordance with the principal table, the former being in slave position to the latter one. The only table which is not put in accordance with the principal one in the conglomerate is the exception: let’s call it Root. Each record in the principal table should correspond to only one record in the slave tables from the conglomerate content. Let’s view the conditions of realizing this rule:

  1. The slave table has the candidate key which in its turn has the foreign key in relation to the principal table. For each field from the content of this candidate key, except the foreign one, we define the value (parameter) to which all the values of the pointed fields of records of conglomerate slave table should correspond. Thus, each record of the principal table will correspond to the only record of slave table restricted by the set of parameters.

Classifying the types of conglomerate tables’ relations, we will define subordinate tables relations as type Single, and for the root table – Root.

  1. The principal table has the candidate key which in its turn has the foreign one in relation to the slave table. This type (Multi) is common for the reference tables and slave tables for which the principal table in conglomerate has the candidate key containing in its content the foreign key in relation to the slave table (as used in the conglomerate structure).

Let’s set up a correspondence between each table and a set of parameters (the name of the field – the value of a parameter), defined by the content of the used candidate key. The tables can be included into the conglomerate more than once with different values of parameters, as well as with different types of relations, causing the differences of sets of parameters by content.

Procedure of data base refreshing

Let’s view the information array about the following structure updating:

  • the type of updating;
  • the physical name or alias database;
  • the table name;
  • the name and value of primary key field;
  • the names array and the value of fields, corresponding the set of parameters determining the above mentioned values of candidate key, used by the tables tree of the conglomerate;
  • the name and modified value of the updated field.

In fact, the procedure of data refreshing is as follows:

  • performing the task of modifying the conglomerate: cell editing, paste/deletion of the conglomerate row;
  • preparation of the updating structure data;
  • support of business-regulations; the realization of support leads essentially to the generation of updating structures that correspond to the updating produced by business-regulations;
  • formation of SQL-expression and performance of corresponding update query;
  • data transmission of update structure to the constructed conglomerates.

The type of update:

Update

Using the above data hierarchy, by column index containing modified cell, let’s define the field name of the corresponding base table, and bring the value of modified cell to the type of the field.

Then, we will get the number of conglomerate column, containing the values of the primary key of the corresponding base table in the conglomerate.

Let’s analyze the value of the conglomerate cell content that correspond to the primary key column in the row where the update have been performed.

If the cell is empty there is no record, so let’s define the update type as Insert, it is being viewed farther.

So, the specified cell is not empty.

Let’s do the construction of SQL-expression:

  • the instruction SET contains the name and the value of the field, corresponding to the value of the modified cell;
  • the instruction WHERE contains the name and the value of the primary key field.

In accordance with the constructed SQL-expression let’s perform the query.

Let’s perform the transmission of update structure data to the constructed conglomerates.

The base table can enter into the conglomerate content more than once; for every entry we define the columns index, corresponding to the primary and modified field.

Let’s define the number of the row, the cell of which, corresponding to the primary field of a base table of the respective entry, contains the value equaling the value of the primary key of the modification structure. If such a cell is detected, we put the modified value into the cell corresponding to the number of this row and the number of modified field column.

If the base table entry into the tree conglomerate content has the type Single or Root, the look-up of the rows can be stopped. Otherwise, (relation type – Multi) the record can enter into the content of data more than once and the complete look-up of the rows is necessary.

So, the procedure of data refreshing, in fact, consists in the point that during the modification the conglomerate cells (including deletion of rows), the data necessary for the construction of the corresponding query can be identified. This very data can be used for the refreshing of all conglomerates without referencing database (except in the situations connected with the necessity of obtaining additional data considered further.)

Let’s mention that the operations of updating do not require the refreshing on the set level. In other words, the operation of updating occurs not on the columns level but on the rows one.

The only remark on sorting: the data about the conglomerate structure can contain the data about the content of the instruction ORDER BY fields, fulfilling sorting on the client level.

Let’s view the case when the modified field serves as a foreign key. This case as a rule requires a query, returning the data of the record of the table that the foreign key refers to.

Here we need to examine the procedure of this activity:

  1. The event of refreshing is being analyzed, the structure of refreshing is being prepared.
  2. The structure of refreshing is being transmitted to the procedure of distribution of the refreshing structure with indication of the following parameters:
    • P1 = TRUE being the local distribution of the refreshing; the server DBMS does not receive queries, the rest applications do not receive any messages on refreshing. This situation occurs:
    • during the processing of messages on refreshing sent by other applications;
    • during the additional processing of the primary message on refreshing, related to the recursive generation of the messages on refreshing. Returning to the examined case of the modification of the foreign key, we should mention the necessity of modification of conglomerate cells, corresponding to the record of the table the above mentioned foreign key makes reference to, regardless of its subordination in the conglomerate tables tree.

So, the primary message is being processed; the event of foreign keys refreshing is being identified. For this particular type of refreshing additional messages are being generated (P1=TRUE):

  1. deletion of the record of the table, the above mentioned foreign key refers to;
  2. paste the record of the table, the above mentioned foreign key also refers to.

Of course, these messages must not be transmitted to DBMS, because the record of the table, the above mentioned foreign key refers must not be refreshed on this level. The refreshing must occur only on the level of conglomerates’ applications, which process the primary message. On the other hand, the local refreshing will require data about the record, a new value refers to, and it may turn out that the local conglomerates do not possess data about all field values of this record. Therefore, the above mentioned situations connected with the necessity of receiving additional data occur, but the character of refreshing does not change - the refreshing occurs on the rows level.

Further, P1=FALSE the refreshing occurred in native application and requires distribution of the refreshing to DBMS and other applications. For realization of business-rules the structure of the refreshing is transmitted to user libraries which in its turn distribute recursively their own refreshing structures.

  • P2 = TRUE message is distributed to the user libraries of business-rules realization. This parameter is enabled for the management of message distribution, generated by the libraries themselves, and is controlled by the creators of these libraries;
  • P3 = TRUE parameter is used by the hub of refreshing for the registration of recursive circle of call-ins ending.

Insert

The cell corresponding to the column of the primary field and the row of the modified cell is empty.

The problem connected with the receiving of the value of the primary key will be discussed later. For distinctness we shall consider that this value can be obtained before the insertion. Otherwise, if this value has been obtained after the insertion, the type of refreshing should be changed into Update.

The refreshing structure should contain the array of parameters, corresponding to the content of the set of parameters of the modified table. As expected above, the set is determined be the content of the used potential key having in its content a foreign key in relation to the principal table in the context of conglomerate, except the foreign one referring to the principal table.

Let’s add to the given set a parameter containing the name of the foreign key field and the value of the primary key of the principal table – thus, the entire potential key of the inserted record will be constructed.

Let’s add to the given set a parameter containing the name of the name of the field and the value of the primary key of the modified table.

Let’s add to the given set a parameter containing the name of the name of the field and the value of the modified cell.

Let’s do the construction of SQL-expression: the instruction INTO/VALUES should contain all the names and values, defined by the content of the obtained set.

Let’s examine the case when the array of parameters contains in its content in addition to the foreign key, in relation to the principal table in the context of conglomerate, the foreign keys referring to other tables. The update in this case is similar to the mentioned above case of modification of the foreign key for the type of refreshing Update.

The following procedure of potential keys value construction is proposed: the parameters’ values constitute a potential key, except one foreign one corresponding to the principal table which is equal.

Associating the given group with the values of the indicated foreign key, we receive the unique sets of values within a group, corresponding to the potential key. The rest groups correspond to other sets of parameters, since they can be considered as the result of subtraction from common set of the given group.

The change of the foreign key value, corresponding to the principal table, won’t derive the record from within the limits of the group and will save the integrity of the potential key.

The modification of foreign key, not corresponding to the principal table, as well as the modification of other parameters which are not part of the potential key, lead to the exit of the record from the content of the given group. This condition makes additional control of integrity on the table level necessary.

The constructed model presupposes the availability of no more than one potential key, containing foreign keys. Otherwise, the data preparation for the insertion demands additional activities beyond the frames of introduced technology.

Now, I would only mention that the examined type of updated queries seems to be the most common for the sets of data, supplied to applications for modification, owing to the completeness of information about the data of principal tables (from the point of view of subordination on the level of conglomerate). The greater part of the drawing queries of other types can be presented as filters of given queries of the examined type, thus expanding the area of application of the presented ideology.

Delete

The operation of deletion demands to include only the name of the primary field and its value for the given record into the content of the update structure. Let’s examine the deletion of a conglomerate row. For the queries of the mentioned type the deletion of a row unambiguously demands the deletion of the Root table record.

For the slave tables, the foreign keys descriptions of which contain references to the Root table, as well as for all the tables of the following levels of subordination, regulated by the description of the foreign keys, the cascading deletion should be done.

The admissibility of cascading deletion is being interpreted here as the following: the operation cannot be fulfilled if at least one table contains the description of foreign key, restricted by the option RESTRICTED and contains one or more records the foreign key of which corresponds to one of the cascade deleted records of the principal tables.

So, for the table Root the structure of update contains the name of the primary field and its value for the given record. The constructor of the SQL-expression will form the instruction WHERE, consisting of these values. For the tables subordinated (subordination on the base of foreign keys description) to the table Root, the instruction WHERE should contain a foreign key name and the value of the primary field of the deleted record of the table Root, i.e. the value from the preceding set of the update structure. In case of restrictions RESTRICTED of the foreign key, the query concerning the admissibility of the cascade deletion is necessary (the absence of the corresponding records); this query described above uses the same values from the preceding set of update structure.

So, the deletion and its cascade reflection can be fulfilled both on the level of application and on the level of DBMS. Before this operation the application should check up the admissibility of deletion in terms of any restrictions and provide the formation and distribution of update structures, corresponding to every table used in the operation.

To make the explanation simpler, we suppose that a foreign key of the subordinate table refers to the primary key of the principal table. Digressing from this assumption, let’s consider the update of a potential key that a foreign key refers to. The cascade of the update operation would end up by the refreshing of foreign keys of the subordinate tables (let’s call these tables the first level of the subordinate tables). The subordinate tables, having the subordinate tables of the first level as principal ones (and the second and the following levels), would not be included in the content of the cascade update, since they would refer to potential keys of the first level tables, in case these potential keys would not be foreign ones – otherwise, potential keys would contain foreign ones, and the second level tables would be included in the content of the first level tables.

Somehow or other, the update of foreign keys would be finished by the update of the first level tables.

The deletion will obviously touch the tables of all the levels of subordination.

The main conclusion is: for deletion in table of level n+1 all values of all primary (again within the frames of simplification) keys of the deleted records of the level n tables, somehow or other, whether the deletion will be fulfilled on DBMS level or not, it is a requirement related to the formation of the update structure data; in other words, it is a requirement for the update on row level.

Thus, the plan of cascade deletion in application should contain the array of elements that include identifiers of tables and the array of values of primary keys, corresponding to the records of identified tables being deleted, the order of occurrences should be identified by the order of subordinate levels.

After plan formation and approval of admissibility of deletion, there goes the preparation and distribution of update structures in the reverse order with the entry of elements – first, the update tables of the last level of subordination are distributed. As well as in the procedure of data extraction, described below, the performance of relational expressions on the server level should yield a better effect, the DBMS server will at a time form a similar ideological plan of deletion of the record of a primary table in the form of drawing of all deleted records. The deletion on the level of application DBMS will form as many plans of deletion as there will be queries of deletion fulfilled by application.

The solution, similar to the solution of the procedure of data extraction of heterogeneous data banks, can be used in considering the procedure of deletion. The tables’ records in reverse order of their entry into the deletion plan, corresponding to the same database, can be deleted by one query, the following portions by the other, etc.

The Procedure of Data Extraction from Base Tables

Conglomerate keeps the copies of field values of base tables, in other words, the observed processes of data refreshing are not connected with the references to the identifiers of queries that have returned corresponding data. Thus, immediately after data copying the cursors, connected with the corresponding active sets, are deactivated.

The conglomerate data can be extracted as a result of performing a singular SQL-query, and the conglomerate structure data can be obtained on the base of SQL-expression analysis. SQL-expression can be constructed in other way as a result of analysis of the conglomerate structure data.

The data can be extracted by sequential queries toward each of base tables, beginning with Root, and sequentially corresponding subordinate ones, filling the cells of conglomerate in accordance with the conglomerate structure data. Let’s describe this method of extraction in detail.

The framework of instruction WHERE for queries, returning the data of every table, should correspond to the content of the parameters described above, corresponding to a potential key (except a foreign key, being in its content, in relation to the principal table). For each record of the subordinate table in accordance with the conglomerate structure data, we perform the search of a row, corresponding to the record of the principal table. For the ties of Single type for every record of the subordinate table there is a single record of the principal table. For the ties of Multi type, the records of the subordinate table can correspond to a few records of the principal table, besides, the records of the subordinate table may not correspond to any record of the principal table.

As a matter of fact, this method of extraction by sequential queries is a result of performing relational expressions fulfilled in the application level.

Undoubtedly, the performance of relational expressions on the server level should yield a better effect both in terms of productivity of performing relational expressions and in terms of decreasing the volume of transmitted data, at least due to the availability of the table in conglomerate, which use the tie of Multi type with the principal tables.

On the other hand, the performance of heterogeneous queries can be restricted as well as the use of operations of transformation of field types, corresponding to the ties of heterogeneous tables.

We assume the following solution of the problem of data extraction: for Root and tables, related to the base Root of upper levels of conglomerate tree the SQL-expression is constructed, and on the base of its performance the data are extracted – the first method of extraction is realized.

In case the part of conglomerate tables do not relate to the base Root, the SQL-expression is constructed, and on the base of its performance the part of data is extracted corresponding to the other base. The result of this query performance can be included in the constructed conglomerate by the procedure of the second method of extraction.

Let’s settle only on the transformation of field types, corresponding to the ties of heterogeneous tables. As in the case of realization of business-rules by user libraries, the procedures of transformation of field types can be supplied by user libraries, it is only necessary to bring a user description of a foreign key into line with the library procedure.

The Work with Auto-Incremental Field Type

The best example, to my mind, is the work with the field of "counter" type MS Access, owing to the absence of triggers that could anyhow identify the inserted record, by the way, the identification by means of trigger can be supported on the level of user libraries performed by any kind of modification described below.

Let’s examine the procedure of distribution of update structure to other applications. Let’s assume that there is a certain table of special purpose that exists in every used database. The application during the construction of conglomerate, using the database tables, creates a record in this table, containing local and net data, permitting to undoubtedly identify the application. Another application, performing the update of the tables of this database, performs independently the distribution of update structure data, using the data of this special table.

In the issue under review, the next aspect is important – the value of a primary key field of the mentioned special table is the identifier of the application for a given database.

Let’s associate each table with the field from the table content, for which during the insertion of a record the constructor of the expression INSERT will expand the instruction VALUES, putting into it the name of this field and the value of the identifier of the application. The third query UPDATE will modify the value of the field, containing the identifier, with the necessary or empty value.

Data of DBMS Data

Let’s examine the procedure of obtaining the data about the structure of DBMS data not considering the aspects of work with the heterogeneous tables as well as with Dbase and other DBMS that do not contain full descriptions of data structure.

Let’s assume that the information about the content of tables, fields, keys etc. the application can receive directly from DBMS, having performed the queries to system tables of DBMS. As a matter of fact, the data of this kind can be used for the description of conglomerates as well – for example, as the data of the structure of corresponding presentations. This is, undoubtedly, the best variant – the application will not keep the data about the data that have been framed-up artificially and demand alterations every time the structure of data of DBMS itself is being altered.

Sincerely yours,

Dmitry B. Seriy

E-mail: son@mcom.dp.ua


Server Response from: ETNASC02