InterBase Replication Internals

By: Conference Speaker

Abstract: This session discusses more advanced replication topics, such as replicating subsets of data, taking control of replication from within your applications, and managing unusual replication requirements.

This is the technical paper from a talk given at the 11th Annual Inprise & Borland Developer's Conference
By Vince Duggan
Vince Duggan is the technical director of Synectics Software, an Inprise/Borland client/server business partner based in Sandton, South Africa. Over the past 15 years, he has worked with COBOL, BASIC, dBASE, Clipper, and Oracle on DOS, Windows, VMS, and UNIX platforms. Since late 1994, he has worked exclusively with Inprise/Borland products, particularly Delphi and InterBase.

Introduction

Replication Basics

Replication Data Structures and Triggers

Advance Replication Methods

Controlling Replication from within Applications

Complex Replication Environments


Introduction

This paper is designed to provide some insight into the power of IBReplicator which supplies replication capabilities to Interbase.

Because IBreplicator has been written as an application outside of the Interbase engine, it is easy to get at the internals, which provides developers and administrators with a lot of control.

There will be a brief overview of the definition of replication, but the bulk of this paper will discuss the deeper aspects, such as the structures of the tables and triggers behind the Replication architecture, and the more powerful features such as using Stored Procedures and Interbase Events to gain greater control of the replication server.

Replication Basics.

Although this paper is meant to provide an in depth look at replication, a basic overview is probably required. The fundamentals of Interbase replication is provided in the separate paper: "Interbase Replication: Fundamentals".

Brief overview of Replication.

Replication can mean different things to different people. Essentially it means that data and/or changes to data are replicated from a source database, to one or more target databases. Once replication has taken place, the data at all sites is in synchronization. This does not necessarily mean that the source and target databases are mirror copies, nor that the rows in question are identical, nor that the actual values of the data are the same.

In most cases, the source and target database will be identical, but in some cases, only some of the tables will be replicated, and within those tables, only part of the data may be replicated. As well as that, the actual field values may be modified on the way to the target database.

Replication is an architecture that allows an enterprise to maintain multiple copies of data. This is typically done to improve access to the data, by allowing various sites in the organization to have their own copies of the data. Each site then works on their own copy, and the changes made by each must 'automagically' be reflected at any or all of the other sites.

Replication also allows for Branch Office consolidation to Head Office scenarios. The Branch offices will on a periodic basis move their data to head office, either as raw data, or in summary form.

There are quite a few different architectures which can implement replication:

  • Simple copy of the database.
  • Export/Import model.
  • Synchronous model.
  • Asynchronous model
A very simple option is to copy the database to the different target database servers. This is a very manual approach, and the multiple copies will diverge during the period between copies.

One could write a very specific export/import system. In this case, either all the relevant rows or just the changed rows are exported to a file, and the file is sent to the target sites, where another specific import routine imports the data. This option tends not to be very portable, but can work well in certain situations.

The synchronous model is quite widely used. In this model, a change to a row in the source database, is propagated to the target databases synchronously. This means that as changes occur in the source database, they appear simultaneously in all  target databases. The disadvantage to this method is that if one or more of the targets is unavailable, then things quickly get very complex, and usually one has to fall back to an asynchronous method.

The Asynchronous model is very popular, and is the model used by IBReplicator. In this model, changes are 'stored up', and every so often, every few seconds/minutes/hours or days the Replication Server wakes up, and sends the changes to the targets. The advantages to this method is that if one or more of the target databases is off-line, it does not matter, since it will catch up next time it is on-line. There is a disadvantage in that there is a 'lag' time in which the source is out of sync with it's various targets, but this normally can be kept within acceptable levels.

Replication Data Structures and Triggers

All configuration parameters are stored by the Configuration Tool in a database called the Configuration database. The Configuration tool also makes some changes to each source database, specifically by creating three new tables, and adding Insert/Update and Delete triggers to each table that is replicated.

The Configuration Database

The IBReplicator Configuration tool is used to provide the Replication Server with sufficient information in order to allow it to carry out replication operations. These configuration parameters are stored in a separate database called the Configuration Database. This database is typically a database separate from other production databases, but can easily be any database that is itself a source or target database.

Most of the tables, triggers and stored procedures in this database are of little interest to developers and administrators, since they should not be modified directly, but some of them can be used in various ways to enhance control of replication.

RELATIONS table.

This table contains the bulk of the information regarding each table that is replicated. Note that a replication database can contain many schemas, each of which describes one source database with its target databases. Therefore a particular table can appear multiple times in the RELATIONS table, but with a different REPLNO/TGTDBNO values.

There are entries which hold the SQL Insert/Update/Delete and Select statements used during replication, and other entries which keep a count of the number of operations that have been replicated for each table.

CREATE TABLE RELATIONS
(
REPLNO                           INTEGER  NOT NULL,
DBNO                             INTEGER  NOT NULL,
RELATIONNO                       INTEGER  NOT NULL,
RELATIONNAME                     VARCHAR(100),
COMMENTS                         VARCHAR(1000),
SELECTSQL                        VARCHAR(8192),
DELETESQL                        VARCHAR(8192),
UPDATESQL                        VARCHAR(8192),
INSERTSQL                        VARCHAR(8192),
DISABLED                         CHAR(1) DEFAULT 'N',
TIMEFIELDNAME                    VARCHAR(20),
CONDITION                        VARCHAR(255) DEFAULT ''  NOT NULL,
TGTDBNO                          INTEGER  NOT NULL,
TARGETNAME                       VARCHAR(100),
TARGETTYPE                       CHAR(1),
INSERTS                          INTEGER DEFAULT 0  NOT NULL,
UPDATES                          INTEGER DEFAULT 0  NOT NULL,
DELETES                          INTEGER DEFAULT 0  NOT NULL,
SECONDS                          INTEGER DEFAULT 0  NOT NULL,
KEEPSTATS                        INTEGER DEFAULT 1  NOT NULL,
ERRORS                           INTEGER DEFAULT 0  NOT NULL,
CONFLICTS                        INTEGER DEFAULT 0  NOT NULL,
PRIMARY KEY (REPLNO, DBNO, TGTDBNO, RELATIONNO)
);

REPLNO,DBNO,TGTDBNO,RELATIONNO fields:

These fields identify the table being replicated, via the Schema, Source Database, Target database and relation.

SELECTSQL,INSERTSQL,UPDATESQL,DELETESQL fields:

These 4 fields contain the SQL statements which are used during replication. They are generated by triggers, and so are automatically kept in sync with the RELATIONREPLFIELDS and RELATIONKEYFIELDS tables, which define the data fields and key fields of tables to be replicated.

INSERTS,UPDATES,DELETES SECONDS,ERRORS,CONFLICTS fields:

Stats are kept on all operations, so the number of inserts, updates, deletes, errors and conflicts are kept as well as the amount of time taken to complete each operation. Triggers cause these stats to cascade up to the PUBLISHEDDB, SUBSCRIBEDDB, SCHEMATA and CONFIG tables. This allows the Replication Monitor to drill down and display stats at any level.

TIMEFIELDNAME and CONDITION fields:

The TIMEFIELDNAME field is used when time stamped conflict resolution is used. When updating a row on the target database, the field is used to determine that only older rows are updated, and not rows that were changed after the source row was last modified.

The CONDITION field contains a portion of an 'IF' statement. This puts a constraint on which rows are replicated. For example if CONDITION contains the value 'DEPTNO =10', then only rows with a DEPNO of 10 would be replicated. This field is used when creating the triggers which control replication on each source table.

DATABASES table.

This table describes the properties of each source and target database involved in replication. Since it holds a description and full path of each database, it can be used as a central repository of database paths for other applications as well. It also holds information regarding RAS dialup if this is required. This can also be used by other applications.
CREATE TABLE DATABASES
(
DBNO                             INTEGER  NOT NULL,
DBPATH                           VARCHAR(254),
DBNAME                           VARCHAR(100),
ADMINUSER                        VARCHAR(50),
ADMINPASSWORD                    VARCHAR(50),
COMMENTS                         VARCHAR(1000),
PRIORITY                         INTEGER DEFAULT 0  NOT NULL,
TIMEFIELDNAME                    VARCHAR(20),
CHARSET                          VARCHAR(20) DEFAULT ''  NOT NULL,
DBTYPE                           INTEGER DEFAULT 0  NOT NULL,
X                                INTEGER DEFAULT -1,
Y                                INTEGER DEFAULT -1,
USERROLE                         VARCHAR(50),
DIALECT                          INTEGER DEFAULT 1,
DIALUSERNAME                     VARCHAR(20),
DIALPASSWORD                     VARCHAR(20),
RASNAME                          VARCHAR(20),
KEEPCONNECTION                   INTEGER DEFAULT 0  NOT NULL,
PRIMARY KEY (DBNO)
);

All values in this table (besides the Primary Key) can safely be modified at any time by any application. The changes will only be used by the Replication Server if it is restarted or told to re-load all configuration parameters. See the discussion on the Stored Procedure 'NotifyServer' below to see how to do this. The Admin user and password are only used by the Replication Manager in order to query the structure of tables, so as to create the relevant SQL statements for replication. The 'X' and 'Y' fields are used by the Replication Manager when displaying the 'Schema View' map. These are saved when the database is moved on the map.
 
 

NOTIFYSERVER Stored Procedure.

This stored procedure is likely to be the most widely used of the internal replication structures. It's sole function is to post Interbase Events which will cause the Replication Server to initiate certain actions.

The actions are:

  • Reload configuration parameters.
  • Replicate now.
  • Synchronize Databases now.
  • Exit.
The structure of the Procedure is as follows:
CREATE PROCEDURE NOTIFYSERVER 
(THETYPE INTEGER) 
AS 
      DECLARE VARIABLE EVENT VARCHAR(20); 
BEGIN 
      IF (THETYPE = 1) THEN 
          POST_EVENT 'RELOAD'; 
      ELSE IF (THETYPE = 2) THEN 
          POST_EVENT 'REPLNOW'; 
      ELSE IF (THETYPE = 3) THEN 
          POST_EVENT 'SYNCNOW'; 
      ELSE IF (THETYPE = 4) THEN 
          POST_EVENT 'EXITNOW'; 
END
The replication server listens for these events, and acts on them when it receives one. Using these events is the only reliable way to stop the Linux/Unix versions of the replication server.

The procedure exists in the 'Configuration' database, and not on either of the source or target databases. This is because the Replication Manager is permanently connected to the Configuration database, and will therefore receive events posted in that database. There will be two main reasons for using this procedure:

  • An application may need to ensure that rows it has modified get replicated immediately.
  • An application may have changed some configuration parameters, and needs to notify the server to load the changed values.

 

Structures added to each Source Database.

Each Source database involved in replication has some modifications applied to it by the Configuration Tool. Target databases are not modified in any way, unless they are also Source databases.

There are 3 tables added:

REPL_LOG Table:

This table is used to store information on each row modified in each replicated table in the database. The information here consists of the following:
  • Values required to look up SQL statements in the RELATIONS table in the configuration database.(REPLNO,PUBDBNO,SUBDBNO,RELATIONNO)
  • Operation done on the row (Insert, Update or Delete)(REPTYPE)
  • Value of the Primary key of the row(OLDKEY and NEWKEY)
  • The SEQNO field is used to control the order operations are replicated, so that changes are applied to the target database in the same order they were applied in the source database.
CREATE TABLE REPL_LOG
(
REPLNO                           INTEGER  NOT NULL,
PUBDBNO                          INTEGER  NOT NULL,
SUBDBNO                          INTEGER  NOT NULL,
SEQNO                            INTEGER  NOT NULL,
RELATIONNO                       INTEGER  NOT NULL,
REPTYPE                          CHAR(1),
OLDKEY                           VARCHAR(256),
NEWKEY                           VARCHAR(256),
PRIMARY KEY (REPLNO, PUBDBNO, SUBDBNO, SEQNO)
);

 

MANUAL_LOG table.

This table has a very similar structure to the REPL_LOG table. It contains those rows from REPL_LOG which could not be replicated due to conflict resolution problems. For example, if the replication server could not update a row in the target database because of timestamp inconsistencies, the row is moved from REPL_LOG to MANUAL_LOG, and the Manual Conflict Resolution Tool is used to resolve the problem.
 

REPL_SEPARATOR table.

Since a particular table in a database may be replicated to a number of different targets, multiple entries need to appear in the REPL_LOG table whenever a change is made to a row. The REPL_SEPARATOR table contains one entry for each reference per table, and the triggers described below use this information to place the correct number of entries in to the REPL_LOG table.
 

BEFORE INSERT/UPDATE/DELETE triggers.

Each table that needs to be replicated has 3 triggers defined on it, one each BEFORE Insert, Update and Delete. These triggers ensure that the correct entries get placed in the REPL_LOG table after each modification to a row. These triggers are maintained by the Configuration Tool, and should not be modified, since any changes will be overwritten by the Configuration tool.

Here is a sample:

CREATE TRIGGER REPL$T22_I_1_1 for T2 
POSITION 32767 
AFTER INSERT 
AS 
BEGIN 
IF (USER <> 'REPL') THEN 
   BEGIN 
      INSERT INTO REPL_LOG(REPLNO,PUBDBNO,SUBDBNO, 
                           RELATIONNO,REPTYPE,NEWKEY) 
            SELECT REPLNO,PUBDBNO,SUBDBNO,RELATIONNO,'I',NEW.K1 
              FROM REPL_SEPARATOR 
             WHERE REPLNO=1 
               AND PUBDBNO=1 
               AND RELATIONNO=2; 

END 
END 

The name of the trigger is a combination of the name of the table, the operation involved (I,U or D) and the values of REPLNO, PUBDBNO and RELATIONNO. This is done to ensure that the trigger name does not conflict with any other object in the database.
 

Advance Replication Methods

The majority of sites will use Replication in a fairly simple way. They will have a well designed normalized data structure, with enterprise wide unique keys which do not get changed, and Replication can run almost unattended.

Some sites will not be able to get replication to run in so simple or smooth a fashion. This will not necessarily be because the data schema has been badly designed, but rather because many companies have complex data requirements, and a simple normalized structure will not suffice.
 

Replicating to a Stored Procedure

Typically, when defining a replication schema, the source and target data structures will be identical. However in some cases the source and target tables will differ to some extent, perhaps quite substantially.

In the simple case, the Replication Server will read data from the source table, and write directly to the target table, using standard SQL Insert,Update and Delete statements. If the structures differ radically, then this is not possible. For example, a single field in the target table may require concatenation of two or more fields in the source database.

These complex environments are easily handled by use of Stored Procedures. The Stored Procedure takes as parameters all of the relevant fields from the source table, and does all the necessary work required to write the data correctly into the target database.

Suppose the Source and Target databases contain a table as follows:

CREATE TABLE T (
  K1 INTEGER NOT NULL,
  K2 VARCHAR(10) NOT NULL,
  K3 DATE NOT NULL,
  F1 VARCHAR(20),
  F2 DOUBLE PRECISION,
  F3 INTEGER,
  F4 VARCHAR(100),
  F5 NUMERIC(4,1),
  F6 NUMERIC(9,2),
  F7 NUMERIC(15,2),
  PRIMARY KEY (K1,K2,K3)
);

A stored procedure could be defined in the Target database as follows:
CREATE PROCEDURE REPLICATE_T (
  F1      VARCHAR(20),
  F2      DOUBLE PRECISION,
  F3      INTEGER,
  F4      VARCHAR(100),
  F5      NUMERIC(4, 1),
  F6      NUMERIC(9, 2),
  F7      NUMERIC(15, 2),
  K1      INTEGER,
  K2      VARCHAR(10),
  K3      DATE,
  TYPE    CHAR(1)
) RETURNS (RESULT INTEGER)
AS
  DECLARE VARIABLE COUNTER INTEGER;
BEGIN

  RESULT = 0;  /*default return value*/

  SELECT COUNT(*) FROM T
    WHERE K1 = :K1 AND K2 = :K2 AND K3 = :K3
      INTO :COUNTER;

/* Inserts: If the row already exists, then exit with result=1. This
 * causes the replication server to apply conflict rules, which will
 * probably cause the procedure to be called again, but with an
 * with a TYPE of "U". An alternative approach would simply change
 * TYPE to "U" and proceed to update the row instead.
 */
  IF (TYPE = 'I' AND COUNTER > 0) THEN
  BEGIN
    RESULT = 1;
    EXIT;
  END

/* Updates: If the row does not exist, then exit with result=1. This
 * causes the replication server to apply conflict rules, which will
 * probably cause the procedure to be called again, but with a TYPE
 * of "I". An alternative approach would simply change TYPE to "I"
 * and proceed to insert the row instead.
 */
  IF (TYPE = 'U' AND COUNTER = 0) THEN
  BEGIN
    RESULT = 1;
    EXIT;
  END

/* Deletes: If the row does not exist then exit with result=1. The
 * replication server will log the error, but otherwise ignore it. 
 */
  IF (TYPE = 'D' AND COUNTER = 0) THEN
  BEGIN
    RESULT = 1;
    EXIT;
  END

  IF (TYPE = 'I') THEN
    INSERT INTO T(K1,K2,K3,F1,F2,F3,F4,F5,F6,F7)
    VALUES (:K1,:K2,:K3,:F1,:F2,:F3,:F4,:F5,:F6,:F7);

  IF (TYPE = 'U') THEN
    UPDATE T 
    SET    F1 = :F1,
           F2 = :F2,
           F3 = :F3,
           F4 = :F4,
           F5 = :F5,
           F6 = :F6,
           F7 = :F7
    WHERE  K1 = :K1
      AND  K2 = :K2
      AND  K3 = :K3;

  IF (TYPE = 'D') THEN
    DELETE FROM T
    WHERE K1 = :K1
      AND K2 = :K2
      AND K3 = :K3;

  EXIT;
END

This Stored procedure can do whatever is required to make sure that the changed data is applied to the target database correctly. It can change Inserts to updates, can modify values of fields, concatenate fields together etc., and only in the most unusual circumstances need it return an error to the Replication Server.

Replicating a View and 'Virtual' Data

Some applications depend on complex database Views to present data to applications. If these views are updatable (either because they are simple views, or because they have had the relevant triggers attached), then they can be replicated in the normal fashion, either to a target view, or to a target table, or to a target Stored Procedure. They will behave in exactly the same way as an ordinary replicated table.

If the view is not updated directly, but rather the underlying tables are updated, then it is the underlying tables which must be replicated.

However, often views are used to present consolidated or calculated data, and there is a requirement to replicate this calculated data. Since the data values do not actually exist in any single table, they cannot be directly replicated. There are a number of ways to overcome this problem:

  • Replicate the underlying tables if these tables exist on the target.
  • Replicate the underlying tables to a stored procedure which does the necessary consolidations.
  • Replicate the view directly, i.e. use the Configuration Tool to set up replication for the view, as if it was an ordinary table, and 'fake' the entries in the replication log (REPL_LOG).
The first option is self-explanatory, and the second may or may not work in a multi-user environment, depending on the exact nature of the consolidation/calculation.

The third option is quite easy to do, even though the view is never updated directly by any application. Since the view is never updated, the triggers will not fire, and no entries will appear in the REPL_LOG table.

The answer is to do one of two things:

  • Manually enter the rows into the replication log.
  • Update the view directly, which will cause entries to appear in the log.
The first can be done with similar code to the way that the triggers do it, but the second is much simpler. Since the view now has three triggers defined for it, it is has become an updatable view. Therefore if one were to 'update' a row in the view, then the required entries would be placed in the REPL_LOG table, but nothing else would happen, i.e. no other tables would be updated. This requires that the applications using the view be modified to do this, but it is a trivial modification. Care must be taken though to ensure that Update statements against the view run in a reasonable time, since the view may have a complex 'WHERE' statement.
 

Replicating via slow/unreliable Dial-Up lines.

On Windows platforms, the Replication Server is able to use RAS to dial out in order to connect to source or target databases. The chances of broken connections occurring is quite high, especially when hundreds of rows need to be replicated.

Normally, when a replication cycle is started, all rows in the replication log are replicated to an individual target within one transaction. This is because the replication server has no way of breaking the entries into their individual transactions.

If a broken connection has occurred, Interbase will roll back all the work done so far. This means that when the replication server connects next time, it has to re-do all the previous work as well as do any new work.

It is possible to tell the replication server to issue a Commit Retaining every so often, so that if a break does occur in the connection, the server can continue where it left off.

The drawback to this is that incomplete transactions can be applied to the target database, which will only be completed some time in the future.
 

Controlling Replication from within Applications

Normally the replication server has full control over when to replicate. However there are occasions when more control is required.

Using Interbase Events to control Replication

Replication can be scheduled to execute in a number of different ways. The easiest is to run the server and to tell it to execute every n seconds. Alternatively, a windows Scheduling tool is supplied which will allow more complex schedules to be defined.

The third way is to invoke replication from within your own applications. This is done by way of the NOTIFYSERVER Stored Procedure described earlier in this paper. This uses events to notify the server that it should replicate.

This feature has been disabled in the Interbase 5.x version of IBReplicator, due to a problem in the event mechanism within Interbase itself.

This method allows developers, administrators and users to invoke replication at suitable times, especially if it should occur at sporadic intervals.

In Delphi, the code is very simple:

Procedure ReplicateNow;
Begin
   MyDatabase.StartTransaction;
   with TStoredProc.Create(Self) do
   Begin
      DatabaseName := 'SomeDB';
      StoredProcName := 'NotifyServer';
      Prepare;
      Params[0].AsInteger := 2; //Causes Server to replicate 
      ExecProc;
      Free;
   End;
   MyDatabase.Commit;

End;

Getting 'Priority' Data to the head of the Queue

The replication log (REPL_LOG) has a field called SeqNo, which is populated by a generator, and this is used to make sure that changes are applied to target databases in exactly the same order as they were applied to the source database.

It is possible to subvert this process, and manually place entries at the head of the queue, so that when the replication server fires up, these entries get sent first. A simple example of this is a messaging(e-mail) system, over a slow dial-up line. Since the messages need to get to the target as quickly as possible, it makes sense to put them at the front of the queue.

There is no mechanism built into IBReplicator to do this, so custom code must be created which writes directly to the REPL_LOG table.

It's not quite as simple as that, since if a table has been set up for replication, two things happen: Firstly an entry is placed in the table REPL_SEPARATOR, and secondly, 3 triggers are created on the table, one each for Insert, Update and Delete. The 3 triggers, in combination with the row in the REPL_SEPARATOR table cause the correct rows to be inserted into the REPL_LOG table. However, we do not want this entry to appear in the REPL_LOG, since it will go to the back of the queue, and we want it at the front of the queue. We could simply remove the entry in REPL_SEPARATOR, and/or drop the 3 triggers. These will be put back again next time 'Create System Objects' is invoked, and so is not a reliable method. The best way would be to put in a 'ROW LEVEL REPLICATION CONDITION' clause for this table. Usually this is used to limit which rows are replicated, for example if a condition of 'DEPTNO=10' is entered, then only rows in department 10 are replicated. We want to prevent any row from being replicated, so we could enter something like 'DEPTNO = -999'. This would mean that no rows get replicated, which suits us, since we want to put in the replication log entries manually. Now we need to write some code that populates the REPL_LOG with the correct values: The structure of the table is as follows:

REPLNO                           INTEGER  NOT NULL,
PUBDBNO                          INTEGER  NOT NULL,
SUBDBNO                          INTEGER  NOT NULL,
SEQNO                            INTEGER  NOT NULL,
RELATIONNO                       INTEGER  NOT NULL,
REPTYPE                          CHAR(1),
OLDKEY                           VARCHAR(256),
NEWKEY                           VARCHAR(256),
PRIMARY KEY (REPLNO, PUBDBNO, SUBDBNO, SEQNO)
);
The Replno,PubDBNo,SubDBNo,RelationNo are filled in from the fields of the same name in the REPL_SEPARATOR table, while SEQNO should get a negative value from a generator, so as to keep it at the front of the queue. Probably the best way is to create a generator and initialize it to -2,147,483,647. REPTYPE should be one of 'I','U','D' for Insert,update or delete. OLDKEY/NEWKEY must contain the values of the primary key concatenated together, separated by the SEP field found in REPL_SEPARATOR. OLDKEY and NEWKEY are filled in for Updates, NEWKEY for inserts, and OLDKEY for deletes. This can all be done in a trigger on the table involved as follows:
CREATE TRIGGER PUSH_TO_FRONT_OF_QUEUE
FOR EMPLOYEE
AFTER INSERT
AS
BEGIN
IF (USER <> 'REPL') THEN BEGIN
  INSERT INTO REPL_LOG(REPLNO,PUBDBNO,SUBDBNO,SEQNO,RELATIONNO,REPTYPE,NEWKEY)
  SELECT REPLNO,PUBDBNO,SUBDBNO,gen_id(GEN_NEGATIVE,1),RELATIONNO,'I',
         EMP_NO||SEP||DEPTNO
    FROM REPL_SEPARATOR
   WHERE REPLNO=2
     AND PUBDBNO=1
     AND RELATIONNO=2;

END
END
This trigger is almost identical to the triggers generated by the Replication Manager for normal replication. The only difference is the SEQNO field gets supplied a negative value, instead of a positive value.

Since this is an 'insert' trigger, the value 'I' is supplied for REPTYPE, and the NEWKEY field is filled in with the fields of the primary key, separated by the SEP character in the REPL_SEPARATOR table.

If this row needs to be replicated to more than one target database, then there will be more than one entry in the REPL_SEPARATOR table, and therefore the correct number of entries will be placed into the REPL_LOG, one for each target.

Complex Replication Environments

Replicating Subsets of data

In many cases there is a requirement to restrict which rows get replicated from one database to another. For example in a Head Office/Branch Office scenario, only the data belonging to the Atlanta branch needs to be replicated there. The Atlanta office is not interested in Chicago data. There are two ways to do this. The first, and simplest is to place a 'ROW LEVEL REPLICATION CONDITION' on each table. This prevents data which does not match the condition from being sent to the target. The second method is to replicate all data to a Stored Procedure on the target, and let the target database decide whether it wants to keep the data or not. This places a greater load on the network, but may be required on very complex situations. See Replicating to a Stored Procedure for details on how to replicate to a stored procedure. In order to describe the 'row level replication condition', I'll use the following example: Head Office in New York, and two branch offices in Atlanta and Chicago. Replicate Sales data from the branch offices to had office, and other data from Head Office to the branches. Atlanta is DeptNo=10, and Chicago is Deptno=20.
            All Data                      All Data
         -------------->               <--------------  
Chicago                   Head Office                   Atlanta
            DeptNo = 20                  DeptNo = 10
         <--------------               --------------->
On the Atlanta Target Database, we need to define a row level replication condition for each table, based on the DeptNo. This is done in the 'Table Settings' as shown in this screen shot:

NOTE: the colon(:) in front of the field name. This condition is then reflected in the triggers that are generated for each table replicated. Here is a sample trigger, with the condition reflected. The colon(:) has been changed to 'NEW.'

CREATE TRIGGER REPL$T22_I_1_1 for T2 
POSITION 32767 
AFTER INSERT 
AS 
BEGIN 
  IF (NEW.DEPTNO = 10) THEN
  BEGIN 
    IF (USER <> 'REPL') THEN 
      BEGIN 
        INSERT INTO REPL_LOG(REPLNO,PUBDBNO,SUBDBNO, 
                           RELATIONNO,REPTYPE,NEWKEY) 
            SELECT REPLNO,PUBDBNO,SUBDBNO,RELATIONNO,'I',NEW.INVOICENO 
              FROM REPL_SEPARATOR 
             WHERE REPLNO=1 
               AND PUBDBNO=1 
               AND RELATIONNO=2; 

      END 
    END
END 

The condition can be as complex as you like, as long as it can be transformed into an IF statement inside the trigger. Remember, you need to put a colon(:) in front of any field names.

Preventing Rows from bouncing back in an N-Way model

In a simple 2-way (bi-directional A <--> B) replication environment, it is necessary to have a method to prevent data that has been replicated from A to B, from replicating straight back to A. This is simply done by having the Replication Server log in to each database as the user 'REPL'. You will see in the sample trigger code above, that there is an IF statement which prevents data from replicating if the current USER is called REPL. This is built in by default into all the triggers. However it is not always possible to log in as REPL, since you may want data to be replicated from A to B and then from B to C. If you use the user REPL then data from A will not be passed on to C. This gets more complex in a hub-and-spoke situation, where data is replicated to a hub machine, and it in turn replicated the data to other machines. In this example, let's assume that there are 4 locations, with the Head Office in New York, and branches in Chicago, San Diego and Atlanta. Each site has a complete copy of the database, and all changes are replicated to all other sites. To make it easier, each if the branches replicates to New York, and it in turn sends the changes on to the other branches. This is a hub-and-spoke configuration.

In this case we cannot use the username REPL all the time, since then data would (for example) replicate from Chicago to New York, and not get sent on to the other branches. On the other hand, we do not want data to be replicated from Chicago to New York, and then from New York back to Chicago. So we need to use a combination of the use of the user REPL, and a ROW LEVEL REPLICATION CONDITION, with some specific user names. This can be shown in the following table:
 
 

Source Target UserName Condition
Chicago New York REPL_CHICAGO -
San Diego New York REPL_SANDIEGO -
Atlanta New York REPL_ATLANTA -
New York Chicago REPL USER <> 'REPL_CHICAGO'
New York San Diego REPL USER <> 'REPL_SANDIEGO'
New York Atlanta REPL USER <> 'REPL_ATLANTA'

This ensures that data is replicated from one branch into the hub in New York. The hub then replicates this data to the other branches, but not back to the originating branch.

Server Response from: ETNASC04