This is the technical paper from a talk given at the 11th Annual Inprise & Borland Developer's Conference
By Vince Duggan
Vince Duggan is the technical director of Synectics Software, an Inprise/Borland client/server business partner based in Sandton, South Africa. Over the past 15 years, he has worked with COBOL, BASIC, dBASE, Clipper, and Oracle on DOS, Windows, VMS, and UNIX platforms. Since late 1994, he has worked exclusively with Inprise/Borland products, particularly Delphi and InterBase.
Introduction
This paper is designed to provide some insight into the power of IBReplicator
which supplies replication capabilities to Interbase.
Because IBreplicator has been written as an application outside of the
Interbase engine, it is easy to get at the internals, which provides developers
and administrators with a lot of control.
There will be a brief overview of the definition of replication, but
the bulk of this paper will discuss the deeper aspects, such as the structures
of the tables and triggers behind the Replication architecture, and the
more powerful features such as using Stored Procedures and Interbase Events
to gain greater control of the replication server.
Replication Basics.
Although this paper is meant to provide an in depth look at replication,
a basic overview is probably required. The fundamentals of Interbase replication
is provided in the separate paper: "Interbase Replication: Fundamentals".
Brief overview of Replication.
Replication can mean different things to different people. Essentially
it means that data and/or changes to data are replicated from a
source database, to one or more target databases. Once replication has
taken place, the data at all sites is in synchronization. This does not
necessarily mean that the source and target databases are mirror copies,
nor that the rows in question are identical, nor that the actual values
of the data are the same.
In most cases, the source and target database will be identical, but
in some cases, only some of the tables will be replicated, and within those
tables, only part of the data may be replicated. As well as that, the actual
field values may be modified on the way to the target database.
Replication is an architecture that allows an enterprise to maintain
multiple copies of data. This is typically done to improve access to the
data, by allowing various sites in the organization to have their own copies
of the data. Each site then works on their own copy, and the changes made
by each must 'automagically' be reflected at any or all of the other sites.
Replication also allows for Branch Office consolidation to Head Office
scenarios. The Branch offices will on a periodic basis move their data
to head office, either as raw data, or in summary form.
There are quite a few different architectures which can implement replication:
-
Simple copy of the database.
-
Export/Import model.
-
Synchronous model.
-
Asynchronous model
A very simple option is to copy the database to the different target database
servers. This is a very manual approach, and the multiple copies will diverge
during the period between copies.
One could write a very specific export/import system. In this case,
either all the relevant rows or just the changed rows are exported to a
file, and the file is sent to the target sites, where another specific
import routine imports the data. This option tends not to be very portable,
but can work well in certain situations.
The synchronous model is quite widely used. In this model, a change
to a row in the source database, is propagated to the target databases
synchronously. This means that as changes occur in the source database,
they appear simultaneously in all target databases. The disadvantage
to this method is that if one or more of the targets is unavailable, then
things quickly get very complex, and usually one has to fall back to an
asynchronous method.
The Asynchronous model is very popular, and is the model used by IBReplicator.
In this model, changes are 'stored up', and every so often, every few seconds/minutes/hours
or days the Replication Server wakes up, and sends the changes to the targets.
The advantages to this method is that if one or more of the target databases
is off-line, it does not matter, since it will catch up next time it is
on-line. There is a disadvantage in that there is a 'lag' time in which
the source is out of sync with it's various targets, but this normally
can be kept within acceptable levels.
Replication Data Structures and Triggers
All configuration parameters are stored by the Configuration Tool in a
database called the Configuration database. The Configuration tool also
makes some changes to each source database, specifically by creating three
new tables, and adding Insert/Update and Delete triggers to each table
that is replicated.
The Configuration Database
The IBReplicator Configuration tool is used to provide the Replication
Server with sufficient information in order to allow it to carry out replication
operations. These configuration parameters are stored in a separate database
called the Configuration Database. This database is typically a database
separate from other production databases, but can easily be any database
that is itself a source or target database.
Most of the tables, triggers and stored procedures in this database
are of little interest to developers and administrators, since they should
not be modified directly, but some of them can be used in various ways
to enhance control of replication.
RELATIONS table.
This table contains the bulk of the information regarding each table that
is replicated. Note that a replication database can contain many schemas,
each of which describes one source database with its target databases.
Therefore a particular table can appear multiple times in the RELATIONS
table, but with a different REPLNO/TGTDBNO values.
There are entries which hold the SQL Insert/Update/Delete and Select
statements used during replication, and other entries which keep a count
of the number of operations that have been replicated for each table.
CREATE TABLE RELATIONS
(
REPLNO INTEGER NOT NULL,
DBNO INTEGER NOT NULL,
RELATIONNO INTEGER NOT NULL,
RELATIONNAME VARCHAR(100),
COMMENTS VARCHAR(1000),
SELECTSQL VARCHAR(8192),
DELETESQL VARCHAR(8192),
UPDATESQL VARCHAR(8192),
INSERTSQL VARCHAR(8192),
DISABLED CHAR(1) DEFAULT 'N',
TIMEFIELDNAME VARCHAR(20),
CONDITION VARCHAR(255) DEFAULT '' NOT NULL,
TGTDBNO INTEGER NOT NULL,
TARGETNAME VARCHAR(100),
TARGETTYPE CHAR(1),
INSERTS INTEGER DEFAULT 0 NOT NULL,
UPDATES INTEGER DEFAULT 0 NOT NULL,
DELETES INTEGER DEFAULT 0 NOT NULL,
SECONDS INTEGER DEFAULT 0 NOT NULL,
KEEPSTATS INTEGER DEFAULT 1 NOT NULL,
ERRORS INTEGER DEFAULT 0 NOT NULL,
CONFLICTS INTEGER DEFAULT 0 NOT NULL,
PRIMARY KEY (REPLNO, DBNO, TGTDBNO, RELATIONNO)
);
REPLNO,DBNO,TGTDBNO,RELATIONNO fields:
These fields identify the table being replicated, via the Schema, Source
Database, Target database and relation.
SELECTSQL,INSERTSQL,UPDATESQL,DELETESQL fields:
These 4 fields contain the SQL statements which are used during replication.
They are generated by triggers, and so are automatically kept in sync with
the RELATIONREPLFIELDS and RELATIONKEYFIELDS tables, which define the data
fields and key fields of tables to be replicated.
INSERTS,UPDATES,DELETES SECONDS,ERRORS,CONFLICTS fields:
Stats are kept on all operations, so the number of inserts, updates, deletes,
errors and conflicts are kept as well as the amount of time taken to complete
each operation. Triggers cause these stats to cascade up to the PUBLISHEDDB,
SUBSCRIBEDDB, SCHEMATA and CONFIG tables. This allows the Replication Monitor
to drill down and display stats at any level.
TIMEFIELDNAME and CONDITION fields:
The TIMEFIELDNAME field is used when time stamped conflict resolution is
used. When updating a row on the target database, the field is used to
determine that only older rows are updated, and not rows that were changed
after the source row was last modified.
The CONDITION field contains a portion of an 'IF' statement. This puts
a constraint on which rows are replicated. For example if CONDITION contains
the value 'DEPTNO =10', then only rows with a DEPNO of 10 would be replicated.
This field is used when creating the triggers which control replication
on each source table.
DATABASES table.
This table describes the properties of each source and target database
involved in replication. Since it holds a description and full path of
each database, it can be used as a central repository of database paths
for other applications as well. It also holds information regarding RAS
dialup if this is required. This can also be used by other applications.
CREATE TABLE DATABASES
(
DBNO INTEGER NOT NULL,
DBPATH VARCHAR(254),
DBNAME VARCHAR(100),
ADMINUSER VARCHAR(50),
ADMINPASSWORD VARCHAR(50),
COMMENTS VARCHAR(1000),
PRIORITY INTEGER DEFAULT 0 NOT NULL,
TIMEFIELDNAME VARCHAR(20),
CHARSET VARCHAR(20) DEFAULT '' NOT NULL,
DBTYPE INTEGER DEFAULT 0 NOT NULL,
X INTEGER DEFAULT -1,
Y INTEGER DEFAULT -1,
USERROLE VARCHAR(50),
DIALECT INTEGER DEFAULT 1,
DIALUSERNAME VARCHAR(20),
DIALPASSWORD VARCHAR(20),
RASNAME VARCHAR(20),
KEEPCONNECTION INTEGER DEFAULT 0 NOT NULL,
PRIMARY KEY (DBNO)
);
All values in this table (besides the Primary Key) can safely be modified
at any time by any application. The changes will only be used by the Replication
Server if it is restarted or told to re-load all configuration parameters.
See the discussion on the Stored Procedure 'NotifyServer' below to see
how to do this. The Admin user and password are only used by the Replication
Manager in order to query the structure of tables, so as to create the
relevant SQL statements for replication. The 'X' and 'Y' fields are used
by the Replication Manager when displaying the 'Schema View' map. These
are saved when the database is moved on the map.
NOTIFYSERVER Stored Procedure.
This stored procedure is likely to be the most widely used of the internal
replication structures. It's sole function is to post Interbase Events
which will cause the Replication Server to initiate certain actions.
The actions are:
-
Reload configuration parameters.
-
Replicate now.
-
Synchronize Databases now.
-
Exit.
The structure of the Procedure is as follows:
CREATE PROCEDURE NOTIFYSERVER
(THETYPE INTEGER)
AS
DECLARE VARIABLE EVENT VARCHAR(20);
BEGIN
IF (THETYPE = 1) THEN
POST_EVENT 'RELOAD';
ELSE IF (THETYPE = 2) THEN
POST_EVENT 'REPLNOW';
ELSE IF (THETYPE = 3) THEN
POST_EVENT 'SYNCNOW';
ELSE IF (THETYPE = 4) THEN
POST_EVENT 'EXITNOW';
END
The replication server listens for these events, and acts on them when
it receives one. Using these events is the only reliable way to stop the
Linux/Unix versions of the replication server.
The procedure exists in the 'Configuration' database, and not on either
of the source or target databases. This is because the Replication Manager
is permanently connected to the Configuration database, and will therefore
receive events posted in that database. There will be two main reasons
for using this procedure:
-
An application may need to ensure that rows it has modified get replicated
immediately.
-
An application may have changed some configuration parameters, and needs
to notify the server to load the changed values.
Structures added to each Source Database.
Each Source database involved in replication has some modifications applied
to it by the Configuration Tool. Target databases are not modified in any
way, unless they are also Source databases.
There are 3 tables added:
REPL_LOG Table:
This table is used to store information on each row modified in each replicated
table in the database. The information here consists of the following:
-
Values required to look up SQL statements in the RELATIONS table in the
configuration database.(REPLNO,PUBDBNO,SUBDBNO,RELATIONNO)
-
Operation done on the row (Insert, Update or Delete)(REPTYPE)
-
Value of the Primary key of the row(OLDKEY and NEWKEY)
-
The SEQNO field is used to control the order operations are replicated,
so that changes are applied to the target database in the same order they
were applied in the source database.
CREATE TABLE REPL_LOG
(
REPLNO INTEGER NOT NULL,
PUBDBNO INTEGER NOT NULL,
SUBDBNO INTEGER NOT NULL,
SEQNO INTEGER NOT NULL,
RELATIONNO INTEGER NOT NULL,
REPTYPE CHAR(1),
OLDKEY VARCHAR(256),
NEWKEY VARCHAR(256),
PRIMARY KEY (REPLNO, PUBDBNO, SUBDBNO, SEQNO)
);
MANUAL_LOG table.
This table has a very similar structure to the REPL_LOG table. It contains
those rows from REPL_LOG which could not be replicated due to conflict
resolution problems. For example, if the replication server could not update
a row in the target database because of timestamp inconsistencies, the
row is moved from REPL_LOG to MANUAL_LOG, and the Manual Conflict Resolution
Tool is used to resolve the problem.
REPL_SEPARATOR table.
Since a particular table in a database may be replicated to a number of
different targets, multiple entries need to appear in the REPL_LOG table
whenever a change is made to a row. The REPL_SEPARATOR table contains one
entry for each reference per table, and the triggers described below use
this information to place the correct number of entries in to the REPL_LOG
table.
BEFORE INSERT/UPDATE/DELETE triggers.
Each table that needs to be replicated has 3 triggers defined on it, one
each BEFORE Insert, Update and Delete. These triggers ensure that the correct
entries get placed in the REPL_LOG table after each modification to a row.
These triggers are maintained by the Configuration Tool, and should not
be modified, since any changes will be overwritten by the Configuration
tool.
Here is a sample:
CREATE TRIGGER REPL$T22_I_1_1 for T2
POSITION 32767
AFTER INSERT
AS
BEGIN
IF (USER <> 'REPL') THEN
BEGIN
INSERT INTO REPL_LOG(REPLNO,PUBDBNO,SUBDBNO,
RELATIONNO,REPTYPE,NEWKEY)
SELECT REPLNO,PUBDBNO,SUBDBNO,RELATIONNO,'I',NEW.K1
FROM REPL_SEPARATOR
WHERE REPLNO=1
AND PUBDBNO=1
AND RELATIONNO=2;
END
END
The name of the trigger is a combination of the name of the table, the
operation involved (I,U or D) and the values of REPLNO, PUBDBNO and RELATIONNO.
This is done to ensure that the trigger name does not conflict with any
other object in the database.
Advance Replication Methods
The majority of sites will use Replication in a fairly simple way. They
will have a well designed normalized data structure, with enterprise wide
unique keys which do not get changed, and Replication can run almost unattended.
Some sites will not be able to get replication to run in so simple or
smooth a fashion. This will not necessarily be because the data schema
has been badly designed, but rather because many companies have complex
data requirements, and a simple normalized structure will not suffice.
Replicating to a Stored Procedure
Typically, when defining a replication schema, the source and target data
structures will be identical. However in some cases the source and target
tables will differ to some extent, perhaps quite substantially.
In the simple case, the Replication Server will read data from the source
table, and write directly to the target table, using standard SQL Insert,Update
and Delete statements. If the structures differ radically, then this is
not possible. For example, a single field in the target table may require
concatenation of two or more fields in the source database.
These complex environments are easily handled by use of Stored Procedures.
The Stored Procedure takes as parameters all of the relevant fields from
the source table, and does all the necessary work required to write the
data correctly into the target database.
Suppose the Source and Target databases contain a table as follows:
CREATE TABLE T (
K1 INTEGER NOT NULL,
K2 VARCHAR(10) NOT NULL,
K3 DATE NOT NULL,
F1 VARCHAR(20),
F2 DOUBLE PRECISION,
F3 INTEGER,
F4 VARCHAR(100),
F5 NUMERIC(4,1),
F6 NUMERIC(9,2),
F7 NUMERIC(15,2),
PRIMARY KEY (K1,K2,K3)
);
A stored procedure could be defined in the Target database as follows:
CREATE PROCEDURE REPLICATE_T (
F1 VARCHAR(20),
F2 DOUBLE PRECISION,
F3 INTEGER,
F4 VARCHAR(100),
F5 NUMERIC(4, 1),
F6 NUMERIC(9, 2),
F7 NUMERIC(15, 2),
K1 INTEGER,
K2 VARCHAR(10),
K3 DATE,
TYPE CHAR(1)
) RETURNS (RESULT INTEGER)
AS
DECLARE VARIABLE COUNTER INTEGER;
BEGIN
RESULT = 0; /*default return value*/
SELECT COUNT(*) FROM T
WHERE K1 = :K1 AND K2 = :K2 AND K3 = :K3
INTO :COUNTER;
/* Inserts: If the row already exists, then exit with result=1. This
* causes the replication server to apply conflict rules, which will
* probably cause the procedure to be called again, but with an
* with a TYPE of "U". An alternative approach would simply change
* TYPE to "U" and proceed to update the row instead.
*/
IF (TYPE = 'I' AND COUNTER > 0) THEN
BEGIN
RESULT = 1;
EXIT;
END
/* Updates: If the row does not exist, then exit with result=1. This
* causes the replication server to apply conflict rules, which will
* probably cause the procedure to be called again, but with a TYPE
* of "I". An alternative approach would simply change TYPE to "I"
* and proceed to insert the row instead.
*/
IF (TYPE = 'U' AND COUNTER = 0) THEN
BEGIN
RESULT = 1;
EXIT;
END
/* Deletes: If the row does not exist then exit with result=1. The
* replication server will log the error, but otherwise ignore it.
*/
IF (TYPE = 'D' AND COUNTER = 0) THEN
BEGIN
RESULT = 1;
EXIT;
END
IF (TYPE = 'I') THEN
INSERT INTO T(K1,K2,K3,F1,F2,F3,F4,F5,F6,F7)
VALUES (:K1,:K2,:K3,:F1,:F2,:F3,:F4,:F5,:F6,:F7);
IF (TYPE = 'U') THEN
UPDATE T
SET F1 = :F1,
F2 = :F2,
F3 = :F3,
F4 = :F4,
F5 = :F5,
F6 = :F6,
F7 = :F7
WHERE K1 = :K1
AND K2 = :K2
AND K3 = :K3;
IF (TYPE = 'D') THEN
DELETE FROM T
WHERE K1 = :K1
AND K2 = :K2
AND K3 = :K3;
EXIT;
END
This Stored procedure can do whatever is required to make sure that the
changed data is applied to the target database correctly. It can change
Inserts to updates, can modify values of fields, concatenate fields together
etc., and only in the most unusual circumstances need it return an error
to the Replication Server.
Replicating a View and 'Virtual' Data
Some applications depend on complex database Views to present data to applications.
If these views are updatable (either because they are simple views, or
because they have had the relevant triggers attached), then they can be
replicated in the normal fashion, either to a target view, or to a target
table, or to a target Stored Procedure. They will behave in exactly the
same way as an ordinary replicated table.
If the view is not updated directly, but rather the underlying tables
are updated, then it is the underlying tables which must be replicated.
However, often views are used to present consolidated or calculated
data, and there is a requirement to replicate this calculated data. Since
the data values do not actually exist in any single table, they cannot
be directly replicated. There are a number of ways to overcome this problem:
-
Replicate the underlying tables if these tables exist on the target.
-
Replicate the underlying tables to a stored procedure which does the necessary
consolidations.
-
Replicate the view directly, i.e. use the Configuration Tool to set up
replication for the view, as if it was an ordinary table, and 'fake' the
entries in the replication log (REPL_LOG).
The first option is self-explanatory, and the second may or may not work
in a multi-user environment, depending on the exact nature of the consolidation/calculation.
The third option is quite easy to do, even though the view is never
updated directly by any application. Since the view is never updated, the
triggers will not fire, and no entries will appear in the REPL_LOG table.
The answer is to do one of two things:
-
Manually enter the rows into the replication log.
-
Update the view directly, which will cause entries to appear in the log.
The first can be done with similar code to the way that the triggers do
it, but the second is much simpler. Since the view now has three triggers
defined for it, it is has become an updatable view. Therefore if one were
to 'update' a row in the view, then the required entries would be placed
in the REPL_LOG table, but nothing else would happen, i.e. no other tables
would be updated. This requires that the applications using the view be
modified to do this, but it is a trivial modification. Care must be taken
though to ensure that Update statements against the view run in a reasonable
time, since the view may have a complex 'WHERE' statement.
Replicating via slow/unreliable Dial-Up lines.
On Windows platforms, the Replication Server is able to use RAS to dial
out in order to connect to source or target databases. The chances of broken
connections occurring is quite high, especially when hundreds of rows need
to be replicated.
Normally, when a replication cycle is started, all rows in the replication
log are replicated to an individual target within one transaction. This
is because the replication server has no way of breaking the entries into
their individual transactions.
If a broken connection has occurred, Interbase will roll back all the
work done so far. This means that when the replication server connects
next time, it has to re-do all the previous work as well as do any new
work.
It is possible to tell the replication server to issue a Commit Retaining
every so often, so that if a break does occur in the connection, the server
can continue where it left off.
The drawback to this is that incomplete transactions can be applied
to the target database, which will only be completed some time in the future.
Controlling Replication from within Applications
Normally the replication server has full control over when to replicate.
However there are occasions when more control is required.
Using Interbase Events to control Replication
Replication can be scheduled to execute in a number of different ways.
The easiest is to run the server and to tell it to execute every n seconds.
Alternatively, a windows Scheduling tool is supplied which will allow more
complex schedules to be defined.
The third way is to invoke replication from within your own applications.
This is done by way of the NOTIFYSERVER Stored
Procedure described earlier in this paper. This uses events to notify
the server that it should replicate.
This feature has been disabled in the Interbase 5.x version of IBReplicator,
due to a problem in the event mechanism within Interbase itself.
This method allows developers, administrators and users to invoke replication
at suitable times, especially if it should occur at sporadic intervals.
In Delphi, the code is very simple:
Procedure ReplicateNow;
Begin
MyDatabase.StartTransaction;
with TStoredProc.Create(Self) do
Begin
DatabaseName := 'SomeDB';
StoredProcName := 'NotifyServer';
Prepare;
Params[0].AsInteger := 2; //Causes Server to replicate
ExecProc;
Free;
End;
MyDatabase.Commit;
End;
Getting 'Priority' Data to the head of the
Queue
The replication log (REPL_LOG) has a field called SeqNo, which is populated
by a generator, and this is used to make sure that changes are applied
to target databases in exactly the same order as they were applied to the
source database.
It is possible to subvert this process, and manually place entries at
the head of the queue, so that when the replication server fires up, these
entries get sent first. A simple example of this is a messaging(e-mail)
system, over a slow dial-up line. Since the messages need to get to the
target as quickly as possible, it makes sense to put them at the front
of the queue.
There is no mechanism built into IBReplicator to do this, so custom
code must be created which writes directly to the REPL_LOG table.
It's not quite as simple as that, since if a table has been set up for
replication, two things happen: Firstly an entry is placed in the table
REPL_SEPARATOR, and secondly, 3 triggers are created on the table, one
each for Insert, Update and Delete. The 3 triggers, in combination with
the row in the REPL_SEPARATOR table cause the correct rows to be inserted
into the REPL_LOG table. However, we do not want this entry to appear in
the REPL_LOG, since it will go to the back of the queue, and we want it
at the front of the queue. We could simply remove the entry in REPL_SEPARATOR,
and/or drop the 3 triggers. These will be put back again next time 'Create
System Objects' is invoked, and so is not a reliable method. The best way
would be to put in a 'ROW LEVEL REPLICATION CONDITION' clause for this
table. Usually this is used to limit which rows are replicated, for example
if a condition of 'DEPTNO=10' is entered, then only rows in department
10 are replicated. We want to prevent any row from being replicated, so
we could enter something like 'DEPTNO = -999'. This would mean that no
rows get replicated, which suits us, since we want to put in the replication
log entries manually. Now we need to write some code that populates the
REPL_LOG with the correct values: The structure of the table is as follows:
REPLNO INTEGER NOT NULL,
PUBDBNO INTEGER NOT NULL,
SUBDBNO INTEGER NOT NULL,
SEQNO INTEGER NOT NULL,
RELATIONNO INTEGER NOT NULL,
REPTYPE CHAR(1),
OLDKEY VARCHAR(256),
NEWKEY VARCHAR(256),
PRIMARY KEY (REPLNO, PUBDBNO, SUBDBNO, SEQNO)
);
The Replno,PubDBNo,SubDBNo,RelationNo are filled in from the fields of
the same name in the REPL_SEPARATOR table, while SEQNO should get a negative
value from a generator, so as to keep it at the front of the queue. Probably
the best way is to create a generator and initialize it to -2,147,483,647.
REPTYPE should be one of 'I','U','D' for Insert,update or delete. OLDKEY/NEWKEY
must contain the values of the primary key concatenated together, separated
by the SEP field found in REPL_SEPARATOR. OLDKEY and NEWKEY are filled
in for Updates, NEWKEY for inserts, and OLDKEY for deletes. This can all
be done in a trigger on the table involved as follows:
CREATE TRIGGER PUSH_TO_FRONT_OF_QUEUE
FOR EMPLOYEE
AFTER INSERT
AS
BEGIN
IF (USER <> 'REPL') THEN BEGIN
INSERT INTO REPL_LOG(REPLNO,PUBDBNO,SUBDBNO,SEQNO,RELATIONNO,REPTYPE,NEWKEY)
SELECT REPLNO,PUBDBNO,SUBDBNO,gen_id(GEN_NEGATIVE,1),RELATIONNO,'I',
EMP_NO||SEP||DEPTNO
FROM REPL_SEPARATOR
WHERE REPLNO=2
AND PUBDBNO=1
AND RELATIONNO=2;
END
END
This trigger is almost identical to the triggers generated by the Replication
Manager for normal replication. The only difference is the SEQNO field
gets supplied a negative value, instead of a positive value.
Since this is an 'insert' trigger, the value 'I' is supplied for REPTYPE,
and the NEWKEY field is filled in with the fields of the primary key, separated
by the SEP character in the REPL_SEPARATOR table.
If this row needs to be replicated to more than one target database,
then there will be more than one entry in the REPL_SEPARATOR table, and
therefore the correct number of entries will be placed into the REPL_LOG,
one for each target.
Complex Replication Environments
Replicating Subsets of data
In many cases there is a requirement to restrict which rows get replicated
from one database to another. For example in a Head Office/Branch Office
scenario, only the data belonging to the Atlanta branch needs to be replicated
there. The Atlanta office is not interested in Chicago data. There are
two ways to do this. The first, and simplest is to place a 'ROW LEVEL REPLICATION
CONDITION' on each table. This prevents data which does not match the condition
from being sent to the target. The second method is to replicate all data
to a Stored Procedure on the target, and let the target database decide
whether it wants to keep the data or not. This places a greater load on
the network, but may be required on very complex situations. See Replicating
to a Stored Procedure for details on how to replicate to a stored procedure.
In order to describe the 'row level replication condition', I'll use the
following example: Head Office in New York, and two branch offices in Atlanta
and Chicago. Replicate Sales data from the branch offices to had office,
and other data from Head Office to the branches. Atlanta is DeptNo=10,
and Chicago is Deptno=20.
All Data All Data
--------------> <--------------
Chicago Head Office Atlanta
DeptNo = 20 DeptNo = 10
<-------------- --------------->
On the Atlanta Target Database, we need to define a row level replication
condition for each table, based on the DeptNo. This is done in the 'Table
Settings' as shown in this screen shot:
NOTE: the colon(:) in front of the field name. This condition is then
reflected in the triggers that are generated for each table replicated.
Here is a sample trigger, with the condition reflected. The colon(:) has
been changed to 'NEW.'
CREATE TRIGGER REPL$T22_I_1_1 for T2
POSITION 32767
AFTER INSERT
AS
BEGIN
IF (NEW.DEPTNO = 10) THEN
BEGIN
IF (USER <> 'REPL') THEN
BEGIN
INSERT INTO REPL_LOG(REPLNO,PUBDBNO,SUBDBNO,
RELATIONNO,REPTYPE,NEWKEY)
SELECT REPLNO,PUBDBNO,SUBDBNO,RELATIONNO,'I',NEW.INVOICENO
FROM REPL_SEPARATOR
WHERE REPLNO=1
AND PUBDBNO=1
AND RELATIONNO=2;
END
END
END
The condition can be as complex as you like, as long as it can be transformed
into an IF statement inside the trigger. Remember, you need to put a colon(:)
in front of any field names.
Preventing Rows from bouncing back in an N-Way
model
In a simple 2-way (bi-directional A <--> B) replication environment,
it is necessary to have a method to prevent data that has been replicated
from A to B, from replicating straight back to A. This is simply done by
having the Replication Server log in to each database as the user 'REPL'.
You will see in the sample trigger code above, that there is an IF statement
which prevents data from replicating if the current USER is called REPL.
This is built in by default into all the triggers. However it is not always
possible to log in as REPL, since you may want data to be replicated from
A to B and then from B to C. If you use the user REPL then data from A
will not be passed on to C. This gets more complex in a hub-and-spoke situation,
where data is replicated to a hub machine, and it in turn replicated the
data to other machines. In this example, let's assume that there are 4
locations, with the Head Office in New York, and branches in Chicago, San
Diego and Atlanta. Each site has a complete copy of the database, and all
changes are replicated to all other sites. To make it easier, each if the
branches replicates to New York, and it in turn sends the changes on to
the other branches. This is a hub-and-spoke configuration.
In this case we cannot use the username REPL all the time, since then
data would (for example) replicate from Chicago to New York, and not get
sent on to the other branches. On the other hand, we do not want data to
be replicated from Chicago to New York, and then from New York back to
Chicago. So we need to use a combination of the use of the user REPL, and
a ROW LEVEL REPLICATION CONDITION, with some specific user names. This
can be shown in the following table:
| Source |
Target |
UserName |
Condition |
| Chicago |
New York |
REPL_CHICAGO |
- |
| San Diego |
New York |
REPL_SANDIEGO |
- |
| Atlanta |
New York |
REPL_ATLANTA |
- |
| New York |
Chicago |
REPL |
USER <> 'REPL_CHICAGO' |
| New York |
San Diego |
REPL |
USER <> 'REPL_SANDIEGO' |
| New York |
Atlanta |
REPL |
USER <> 'REPL_ATLANTA' |
This ensures that data is replicated from one branch into the hub in
New York. The hub then replicates this data to the other branches, but
not back to the originating branch.
Connect with Us