THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Hilary Cotter

  • Merge Replication Stored Procedure Conflict Resolver

    A conflict in merge replication occurs when applying a replicated command to the subscriber and:

    there is a constraint violation

    Foreign Key Violation

    Primary Key Violation

    there is an error modifying data

    Data domain violation

    Missing dependencies (trigger firing referencing missing objects)

    Security violation (the account the replication agent runs under does not have rights to apply the replicated command on the publisher/subscriber).

    You are updating a row which does not exist on the other side of your replication topology

    You are deleting a row which does not exist on the other side of the replication topology

    You are updating the column on the same row on both sides of a replication topology and are using column level tracking

    You are updating the same row on both sides of a replication topology and are using row level tracking.

    SQL Server provides a rich set of resolvers to resolve the conflict. You can see the list by issuing a call to:

    sp_enumcustomresolvers

    On a machine with no COM or Business Logic Resolvers will return the below list.

    Microsoft SQL Server Additive Conflict Resolver

    Microsoft SQL Server Averaging Conflict Resolver

    Microsoft SQL Server DATETIME (Earlier Wins) Conflict Resolver

    Microsoft SQL Server DATETIME (Later Wins) Conflict Resolver

    Microsoft SQL Server Download Only Conflict Resolver

    Microsoft SQL Server Maximum Conflict Resolver

    Microsoft SQL Server Merge Text Columns Conflict Resolver

    Microsoft SQL Server Minimum Conflict Resolver

    Microsoft SQL Server Priority Column Resolver

    Microsoft SQL Server Subscriber Always Wins Conflict Resolver

    Microsoft SQL Server Upload Only Conflict Resolver

    Microsoft SQLServer Stored Procedure Resolver

    However in some cases you wish to extend the way a conflict is handled by using a stored procedure resolver. A stored procedure resolver allows you to fire a custom stored procedure when a conflict occurs. The custom stored procedure resolver can handle a conflict arising from a single article or multiple articles.

    There is some overlap between what a Business Logic Resolver can do for you over a Stored Procedure Resolver. Here is a breakdown of what the differences between a Businesses Logic Resolver can handle vs a Stored Procedure Resolver.

    Feature

    Business Logic Resolver

    Stored Procedure Resolver

    Development Language

    Any .Net Language

    TSQL

    Can be modified on the fly

    Yes, but will require dll recompilation

    Yes

    Speed

    Slow

    Faster than BLR.

    Can Handle the following Replication Events

    Conflicts, Errors, DML

    Conflicts

    Can detect which column has changed?

    No

    Yes

    Debugging

    Somewhat complicated but allows step by step processing

    Can be done by logging.

    The parameters you must pass to your stored procedure resolver are:

    Parameter

    Data Type

    Description

    @tableowner

    sysname

    Name of the owner of the table for which a conflict is being resolved. This is the owner for the table in the publication database.

    @tablename

    Sysname

    Name of the table for which a conflict is being resolved.

    @rowguid

    Uniqueidentifier

    Unique identifier for the row having the conflict.

    @subscriber

    sysname

    Name of the server from where a conflicting change is being propagated.

    @subscriber_db

    sysname

    Name of the database from where conflicting change is being propagated.

    @log_conflict OUTPUT

    int

    Whether the merge process should log a conflict for later resolution:

    @conflict_message OUTPUT

    nvarchar(512)

    Message Logged in Conflict Tables

    @destowner

    Sysname

    The owner of the published table at the Subscriber.

    Table above snagged from ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10rp_1devconc/html/76bd8524-ebc1-4d80-b5a2-4169944d6ac0.htm

    Here is an example of how this works.

    use Master
    GO
    set nocount on
    GO
    --need to access cmdshell to create a directory called c:\temp
    --on most modern os's the program files directory is locked down
    --and you will be unable to create a snapshot in the default folder
    exec sp_configure 'show advanced options',1
    reconfigure with override
    exec sp_configure 'xp_cmdshell',1
    reconfigure with override
    exec xp_cmdshell 'md c:\temp'
    GO
    if exists(select *from sys.sysdatabases where name='StoredProcedureResolverPub')
    begin
    exec sp_replicationdboption 'StoredProcedureResolverPub','publish', false
    exec sp_replicationdboption 'StoredProcedureResolverPub','merge publish', false
    alter database StoredProcedureResolverPub set single_user with rollback immediate
    drop database StoredProcedureResolverPub
    end
    Create Database StoredProcedureResolverPub
    GO
    exec sp_replicationdboption 'StoredProcedureResolverPub','merge publish', true
    GO
    if exists(select *from sys.sysdatabases where name='StoredProcedureResolverSub1')
    begin
    exec sp_replicationdboption 'StoredProcedureResolverSub1','publish', false
    exec sp_replicationdboption 'StoredProcedureResolverSub1','merge publish', false
    alter database StoredProcedureResolverSub1 set single_user with rollback immediate
    drop database StoredProcedureResolverSub1
    end
    GO
    Create Database StoredProcedureResolverSub1
    GO
    if exists(select *from sys.sysdatabases where name='StoredProcedureResolverSub2')
    begin
    exec sp_replicationdboption 'StoredProcedureResolverSub2','publish', false
    exec sp_replicationdboption 'StoredProcedureResolverSub2','merge publish', false
    alter database StoredProcedureResolverSub2 set single_user with rollback immediate
    drop database StoredProcedureResolverSub2
    end
    Create database StoredProcedureResolverSub2
    use StoredProcedureResolverPub
    GO
    Create Table table1(PK int IDENTITY(1,1) NOT NULL PRIMARY KEY,
     charcol varchar(40) NULL,
     originating_db varchar(40) NULL DEFAULT ('StoredProcedureResolverPub') ,
     rowguid uniqueidentifier ROWGUIDCOL 
     NOT NULL UNIQUE CONSTRAINT table1_rowguid_Constraint DEFAULT (newsequentialid()) )
    GO
    declare @counter int=1
    while @counter<1000
    begin
    insert into table1( charcol) values('test')
    select @counter=@counter+1
    end
    GO
    --a log table which records when the stored procedure resolver was used.
    Create table LogTable(PK int identity primary key, dtstamp datetime)
    GO
    --here is our stored procedure conflict resolver
    CREATE PROC MyConflictResolver(@tableowner sysname, @tablename Sysname, @rowguid Uniqueidentifier, @subscriber sysname,  
    @subscriber_db sysname, @log_conflict int  OUTPUT, @conflict_message nvarchar(512)  OUTPUT, @destowner Sysname)
    AS
    set nocount on

    set @log_conflict =1--if set to NULL or 0 no conflicts are logged
    --just for kicks I am going to update a column from a row with the pk value of 11 on the publisher.
    --you will note that this change does not get replicated.
    --this causes a lack of consistency between your publisher and subscriber
    --this is generally not considered to be a good thing
    --but something to tuck away into your little gray cells
    --in case you ever have a req like this
    update table1 set originating_db='from resolver' where pk=11

    SELECT @conflict_message ='The same row was conflicted at both '+@@ServerName +'.'+db_name()+' and '+@subscriber+'.'+
    @subscriber_db+'. The resolver chose the update from '++@@ServerName +'.'+db_name()+' as the winner.'

    INSERT INTO LogTable(dtstamp) VALUES(getdate())

    SELECT PK, 'From Stored Procedure Conflict Resolver',
    originating_db, rowguid FROM table1  where rowguid=@rowguid
    GO
    exec sp_addmergepublication StoredProcedureResolver, @snapshot_in_defaultfolder=false,
    @alt_snapshot_folder='c:\temp', @publication_compatibility_level='100RTM',  @conflict_logging =both
    GO
    exec sp_addpublication_snapshot StoredProcedureResolver
    GO
    exec sp_addmergearticle StoredProcedureResolver, table1, @source_object=table1, @article_resolver='Microsoft SQLServer Stored Procedure Resolver',
    @resolver_info=MyConflictResolver
    Go
    sp_addmergesubscription StoredProcedureResolver, @@Servername, 'StoredProcedureResolverSub1'
    GO
    sp_addmergesubscription StoredProcedureResolver, @@Servername, 'StoredProcedureResolverSub2'
    GO
    exec sp_startpublication_snapshot StoredProcedureResolver
    --waiting for 1 minute for the snapshot to be generated
    WAITFOR DELAY '00:01'
    exec sp_startmergepushsubscription_agent StoredProcedureResolver, @@ServerName, StoredProcedureResolverSub1
    GO
    exec sp_startmergepushsubscription_agent StoredProcedureResolver, @@ServerName, StoredProcedureResolverSub2
    GO
    --waiting for 1 minute for the snapshot to be applied
    WAITFOR DELAY '00:01'
    GO
    use StoredProcedureResolverSub1
    GO
    update Table1 set charcol='StoredProcedureResolverSub1' where pk<10
    GO
    use StoredProcedureResolverSub2
    GO
    update Table1 set charcol='StoredProcedureResolverSub2' where pk<10
    GO
    use StoredProcedureResolverPub
    GO

    exec sp_startmergepushsubscription_agent StoredProcedureResolver, @@ServerName, StoredProcedureResolverSub1
    GO
    exec sp_startmergepushsubscription_agent StoredProcedureResolver, @@ServerName, StoredProcedureResolverSub2
    GO
    WAITFOR DELAY '00:01'
    GO
    select Reason_text from MSmerge_conflicts_info 
    GO
    select * from MSmerge_conflict_StoredProcedureResolver_table1
    GO
    select * from table1
    GO
    select * from Logtable
    GO
    Here is a conflict within the Conflict Viewer:


     
     
      
     

    Now look for the row less traveled. This is the row modified by inside our conflict resolver. Notice how it does not travel

    --the row less traveled!
    --this was the one we updated in the stored procedure conflict resolver
    --but it does not move to the subscribers!
    GO
    select * from StoredProcedureResolverPub.dbo.table1 where pk=11
    GO
    select * from StoredProcedureResolverSub2.dbo.table1 where pk=11
    GO
    select * from StoredProcedureResolverSub1.dbo.table1 where pk=11
    GO


    You may notice that the Publisher db has won the conflict, but the rows were both updated on the subscriber dbs. This is because the default is subscription type is Client – which means the first DML to the publisher is the one which will persist.

    It is worth pointing out that the Stored Procedure Conflict Resolver not only return the results set of data to be applied on the publisher and subscriber, but can also send out alerts, write data to replicated tables (which will not be replicated), write data to replicated tables, log data, in short do anything your skill in TSQL allows you to within the context of the security context your replication agent runs under.

    It can’t however access Web Sync Subscribers, Heterogeneous databases, or SQLCE subscribers.

    Here is an example of using a stored procedure resolver to access data on the subscriber. Note we are using dynamic SQL for this.

    alter PROC MyConflictResolver
       (
         @tableowner SYSNAME ,
         @tablename SYSNAME ,
         @rowguid UNIQUEIDENTIFIER ,
         @subscriber SYSNAME ,
         @subscriber_db SYSNAME ,
         @log_conflict INT OUTPUT ,
         @conflict_message NVARCHAR(512) OUTPUT ,
         @destowner SYSNAME
       )
    AS
       DECLARE @str NVARCHAR(MAX)
       DECLARE @newstr NVARCHAR(MAX)
       DECLARE @params NVARCHAR(MAX)

       DECLARE @originating_db varchar(40)
       DECLARE @PK int 
       DECLARE @charcol varchar(40)
      
       DECLARE @table1 Table (PK int, charcol varchar(40), originating_db varchar(40),rowguid uniqueidentifier)  
       SET @str = 'SELECT @originating_db = originating_db, @PK = PK, @charcol = charcol
       FROM [@subscriber].@subscriber_db.dbo.Table1 WHERE rowguid=''@rowguid'''
       SELECT  @newstr = REPLACE(@str, '@subscriber_db', @subscriber_db)
       SELECT  @newstr = REPLACE(@newstr, '@subscriber', @subscriber)
       SELECT  @newstr = REPLACE(@newstr, '@rowguid', @rowguid)

       SELECT  @params = N'@subscriber sysname, @subscriber_db sysname, @rowguid uniqueidentifier,
         @originating_db varchar(40) OUTPUT, @PK int OUTPUT, @charcol varchar(40) OUTPUT'

     select @newstr
       EXEC sp_executesql @newstr, @params, @subscriber, @subscriber_db, @rowguid,
           @originating_db = @originating_db OUTPUT, @PK = @PK OUTPUT,
           @charcol = @charcol OUTPUT
      
       insert into @table1(PK, originating_db, charcol, rowguid)
       select @PK, @originating_db, @charcol, @rowguid
      
       select * From @table1

    GO

    Using custom stored procedures to resolve conflicts is a great method to extend the basic functionality of merge replication.

     

  • Implementing Bi-Directional Transactional Replication

    Bi-Directional replication is copying data from one server to another where:

    •    either server can be the source or destination, and
    •     the data being copied can be in the same tables on both sides.


    For example, I want to replicate a database between Server A and Server B, so that:

    •    the data which originates on Server A will be replicated to Server B, and
    •    the data which originates on Server B will be replicated to Server A.

    Essentially, I want writeable copies of the data in two different databases - which might be on the same or different servers - and I want the data to be consistent ( i.e. - the same on both sides).


    Most implementations on which I consult have a “real time” requirement, i.e. at any one time the data in both database is identical. Though not really achievable using bi-directional replication, this may be achieved by doing some sort of split write (open a transaction, write on server A, write on Server B, commit the transaction, and then return to the application so it can move on and do different work).  


    Of course, using a split write has significant application latency considerations, and the link between the two servers must be very stable. Application latency refers to how long it takes for the application to commit a transaction to the database(s). The two servers also must be very close to each other; the greater the distance between the two servers, the greater the application latency.  Basically, if you go down the split write route, you do get identical point in time synchronization of the two databases, but the cost is transferred to application latency where transactions which originally took 10 ms, may now take 100 ms or more.  


    Normally, I have some education process where I have the client reduce the real time expectations to something of the order of several seconds or even minutes.
    The most common usage is for Load Balancing or Fault Tolerance/Disaster Recovery. Load balancing normally means the load is balanced between one or more servers - as one server cannot handle the load, the load spills over to servers which have a lighter load.  It also can mean or may entail that clients connecting to one server will continue to connect to that server for the lifetime of their sessions – sometimes called “sticky session” or persistence. Fault Tolerance means the ability of a server to with stand a localized failure and continue to serve clients with minimal disruption of services and minimal data loss. Disaster recovery means the ability to withstand regional failures and continue business operations in another location. I should point out that bi-directional transactional replication is a poor fit for all of these cases since:


    1)    Replication does not offer predictable latency, so at one instant my latency can be several seconds, while a half day later it can be several minutes, and sometimes several hours.  Using replication for disaster recovery means that your exposure to data loss can is unpredictable. Using replication for load balancing is also a poor choice as a client may do some work on one node, and then return seconds later and hit the other node expecting to have the data that was just entered in the first node. Due to replication latency - this data, however, has not yet made its way to the second node.
    2)    There is no automatic client redirection as found in mirroring or clustering.
    3)    There is no load balancing or workload distribution; in our case, as the workload on Server A increases, clients will not be automatically redirected to Server B - as in the case of NLB.


    With the above points being noted the majority of my bi-directional transactional replication implementations have been meant to address both fault tolerance/disaster recovery and load balancing goals and they do work well.


    The goal of some implementations of bi-directional transactional replication is to reduce contention on one server by scaling out the load over a multitude of servers. 

    This can be a good fit in some cases. For example, with read intensive workloads, you will find that the read activity can be halved by redirecting half of your clients to the second node. In other words, instead of having 100 clients banging away at Server A, you can distribute the load by having 50 of them bang away at Server B (the remaining 50 clients will bang away at Server A).


    However, if you are trying to reduce write contention, the dynamics are different. If I have 1000 inserts per second on Server A coming from 100 clients, and I decide to load balance by having 50 clients work on Server A and 50 on Server B, I will still end up with 1000 inserts per second on Server A – 500 from the 50 clients on Server A, and 500 from the replication process carrying over the 500 inserts per second from the clients banging away at Server B.  The end result is the same level of writes and twice the licensing/hardware and maintenance costs.


    But the transactional footprint can be quite different. If the transactional context of the client operations is such that it causes contention due to the transaction length or footprint, you may find that replicating the transactions to the other node will result in far less resource consumption than the transactional footprint on the originating server.


    For instance, consider a case when an insert statement which occurs on the publisher is the result of a 5 table join, as follows:
    Insert into tableName


    Select col1, col2, col3, col4 from tableName1
    Join tablename2 on TableName2.pk=TableName1.pk
    Left join tablename3 on TableName3.pk=TableName2.pk
    Left join tablename4 on TableName4.pk=TableName3.pk
    Left join tablename5 on TableName5.pk=TableName4.pk


    This will be replicated as:

    Insert into tableName(col1,col2, col3, col4) values(col1, col2, col3, col4)


    Of course the above would be wrapped in a replication stored procedure, I have stripped everything away except the essential insert for clarity.


    This replicated insert statement will just be the final state of the row and will not have to reference the 5 table join on the subscriber. The consequence of this is a much smaller resource footprint on the subscriber, and the locks will be held for a shorter time period.


    Also, consider this transaction occurring on the publisher:


    Begin Tran
    Exec Proc1 -- inserts 5 rows into table1 as the result of some lengthy calculation
    Exec Proc2 -- inserts 5 rows into table2 as the result of some lengthy calculation
    Exec Proc3 -- inserts 5 rows into table3 as the result of some lengthy calculation
    Exec Proc4 -- inserts 5 rows into table4 as the result of some lengthy calculation
    .
    .
    .
    Exec Proc20 -- inserts 5 rows into table20 as the result of some lengthy calculation
    Commit tran


    As you can imagine, the transaction footprint on the publisher would be far more severe than the simple 100 row insert which would occur on the subscriber - even though the transaction would be held for the entire 100 row insert on both the publisher and the subscriber. On the subscriber, the transaction is merely 100 insert statements – no lookups, no 5 table joins, etc.  So a transaction that takes several seconds on the publisher will


    •    take far less time on the subscriber,
    •    have a minimal IO hit,
    •    and cause significantly less blocking/deadlocking there.  


    One last item I wish to cover before diving into an example is to explain why bi-directional transactional replication is normally the better choice amidst the other contenders listed below.

    Merge Replication

    Merge replication can be used, but is generally a poor fit since the tracking triggers add considerable application latency, and the replication latency of the synchronization tends to be much higher than the alternatives.  There is also the added complexity of a merge replication solution and the guid column requirement.

    Peer to Peer Replication

    Peer to Peer replication can be used, but it is much slower than bi-directional transactional replication and has a much smaller feature set. For example, in Peer to Peer replication I cannot use custom stored procedures, there is no row or column filtering, or automatic identity range management. The Enterprise Edition requirement normally makes Peer to Peer replication a costly choice.


    Here is an illustration of just how slowly Peer to Peer replication will perform: to replicate 10,000 inserts with a CommitBatchSize of 100 and CommitBatchThreshold of 1000 (the defaults) takes an average of 6.325 s over 10 runs using bi-directional transactional replication, and an average of 12.939 s over 10 runs using peer to peer replication. Keep in mind that we are looking at an edge case and YMMV, but, in general, bi-directional transactional replication is significantly faster than Peer to Peer. On the downside, you cannot practically scale bi-directional replication to more than 2 nodes – it can be done, but performance degrades significantly the more nodes you add to your topology. Just an aside - Peer to Peer can’t be scaled beyond 10 nodes easily.


    Updatable Subscriptions

    Updatable Subscriptions are another option - however; they are intended for topologies where the majority of the transactions originate on the publisher. Immediate updating is a split write, and performance degrades as a function of the link speed/bandwidth for subscriber side originating transactions. Should the link go down between the publisher and subscriber, the published tables will be read-only on the subscriber.   Queued is an option, but again - it is tuned for the majority of the DML originating on the Publisher, not the Subscriber. Also keep in mind that Updatable Subscriptions are deprecated.


    Prepwork

    Setting up bi-directional transactional replication requires two things:
    1)    All identity columns must be set
    a.    to NOT FOR REPLICATION, and
    b.    odd on one side and even on the other  
    c.    the increment should be adjusted to 2
    d.    and reseed to hand out an odd value on the odd side and an even on the other.
    2)  All triggers and constraints should be set to NOT FOR REPLICATION. You don’t want the trigger to fire twice as it has already done its work on the publisher (auditing may be an exception here).  


    This can involve considerable work. Marking identity column NOT FOR REPLICATION is easy using the proc sp_identitycolumnforreplication. Changing the current identity value to odd or even is again simple using DBCC CheckIdent. However, changing the increment involves recreating the table.  

    Normally, I would do as follows:


    •    Script all triggers, constraints, indexes
    •    Script out the tables and recreate the new ones with a _temp extension
    •    Move the table data from the original table to the new table using bcp.
    •    Drop the original table and rename the new table the old name, ie OriginalTable data is copied to OriginalTable_temp, OriginalTable is dropped and OriginalTable_Temp is renamed as OriginalTable
    •    Apply the triggers, constraints and indexes.


    You will likely need to figure out which are your largest tables, and then group them into batches you can do simultaneously. For databases with large numbers or tables, doing this sequentially can be very time consuming and will involve considerable downtime. You may want to group your tables by size and then run multiple bcp scripts at one time - each dedicated to different tables to minimize your downtime.


    Doing this on a live production system, however, requires a different approach:
    •    Create the first publication/subscription pair (Server A and Server B in our case).  Ensure you use the no-sync subscription option.
    •    Ensure that the distribution agent is disabled.
    •    Backup the database and restore it on the Subscriber (Server B).


    Now you have all sorts of time to get your subscriber setup as the transactions occurring on the Publisher (Server A) will pool in its distribution database.


    Set all identity columns on Server B to have an island of ranges - several orders of magnitude above what is being currently assigned on the original source.  For example, on Server A you may be currently assigning identity ranges in the millions. Then, set your range on the subscriber to 100 million. This gives you a lot of wiggle room before primary key collisions occur when the same identity range value is assigned on both sides. Naturally, you want some alerting job that will send out email when you are exhausting a range, or when one range is likely to collide with another.

    Setup

    Setting up bi-directional transactional replication is simple. Once you have prepped your tables, all you need to do is ensure that the @loopback_detection parameter of sp_addsubscription is set to “true”.

    Here is an end to end example:

    USE master
    GO
    IF EXISTS(SELECT * FROM sys.databases WHERE name='BIDINodeA')
    BEGIN
    EXEC sp_removedbreplication 'BIDINodeA'
    ALTER DATABASE BIDINodeA SET SINGLE_USER WITH ROLLBACK IMMEDIATE
    DROP DATABASE BIDINodeA
    END
    GO
    IF EXISTS(SELECT * FROM sys.databases WHERE name='BIDINodeB')
    BEGIN
    EXEC sp_removedbreplication 'BIDINodeB'
    ALTER DATABASE BIDINodeB SET SINGLE_USER WITH ROLLBACK IMMEDIATE
    DROP DATABASE BIDINodeB
    END
    CREATE DATABASE BIDINodeA
    GO
    EXEC sp_replicationdboption 'BIDINodeA', 'publish', true
    GO
    CREATE DATABASE BIDINodeB
    GO
    EXEC sp_replicationdboption 'BIDINodeB', 'publish', true
    GO
    USE BIDiNodeA
    GO
    IF exists(SELECT * FROM sys.objects WHERE type='u' AND name='Table1')
    DROP TABLE Table1
    GO
    CREATE TABLE Table1(PK INT IDENTITY (1,2) NOT FOR REPLICATION CONSTRAINT Table1PK PRIMARY KEY, Charcol VARCHAR(20), originatingDB sysname DEFAULT db_name(), dtstamp DATETIME DEFAULT GETDATE())
    GO
    EXEC sp_addpublication 'BIDINodeA', @Status=ACTIVE
    GO
    EXEC sp_addarticle 'BIDINodeA', @article='Table1',@source_object='Table1' ,@identityrangemanagementoption='manual'
    GO
    EXEC sp_addsubscription 'BIDINodeA','ALL', @@ServerName, 'BIDINodeB', 'Replication support only', @loopback_detection='true'
    GO
    USE BIDiNodeB
    GO
    IF exists(SELECT * FROM sys.objects WHERE type='u' AND name='Table1')
    DROP TABLE Table1
    GO
    CREATE TABLE Table1(PK INT IDENTITY (2,2) NOT FOR REPLICATION CONSTRAINT Table1PK PRIMARY KEY, Charcol VARCHAR(20), originatingDB sysname DEFAULT db_name(), dtstamp DATETIME DEFAULT GETDATE())
    GO
    EXEC sp_addpublication 'BIDINodeB', @Status=ACTIVE
    GO
    EXEC sp_addarticle 'BIDINodeB', @article='Table1',@source_object='Table1', @identityrangemanagementoption='manual'
    GO
    EXEC sp_addsubscription 'BIDINodeB','ALL', @@ServerName, 'BIDINodeA', 'Replication support only', @loopback_detection='true'
    GO
    USE BIDINodeA
    GO
    DECLARE @counter INT
    SET @counter=1
    WHILE @counter<=10
    BEGIN
    INSERT INTO BIDINodeA.dbo.Table1(Charcol) VALUES('test')
    SELECT @counter=@counter+1
    END
    GO
    USE BIDINodeB
    GO
    DECLARE @counter INT
    SET @counter=1
    WHILE @counter<=10
    BEGIN
    INSERT INTO BIDINodeB.dbo.Table1(Charcol) VALUES('test')
    SELECT @counter=@counter+1
    END
    GO

    SELECT *from BIDINodeA.dbo.Table1
    SELECT *from BIDINodeB.dbo.Table1

    Note, that you may need to wait a couple of seconds for the above two queries for return 20 rows. Rows originating from BIDINodeA will have a value for the OriginatingDB column of BIDINodeA, and rows originating from BIDINodeB will have a value for the OriginatingDB column of BIDINodeB.

    Caveats

    There are some caveats with Bi-Directional Transactional Replication.

    Non-Convergence

    The first is that should a failure event occur, which might cause a lack of data consistency (i.e. different data on both sides), the distribution agent will fail/stop until the administrator can step in and clear the event which is causing the failure. This is normally a good thing, as it prevents your databases from losing consistency (i.e. having different data). This event will likely be a conflict – an attempt to:


    •    insert a primary key value on a node which causes a primary key violation;
    •    update a row which is deleted;
    •    delete a row which is already deleted.

    Most frequently the goal in a bi-directional replication topology is to present two data sets to your clients which are within seconds of synchronization with each other.

    When a data consistency error occurs and the agents shut down the two data sets will get progressively more and more out of sync; in other words they will become disconnected islands of data as each set of clients is entering/modifying data on each island. An insert occurring on Node A will stay in Node A. If the client is redirected to node B and does not see their data – chances are that they will try to re-enter their data this time on Node B – which when the synchronization process occurs, will cause a conflict.


    It becomes critical to clear any data consistency errors as quickly as possible to prevent further conflicts.  For some implementations this can be difficult. Some implementation it is very simple. I normally have email alerts going off to the team when these events occur. Some implementations never raise alerts and it makes me check them to make sure they are working. Others are very noisy. For example one company feels the need to replicate the ASP.NET state tables. This results in a lot of noise, all of it false positives – well so far - all of it false positives.


    Another factor to consider is there is apparently a bug in bi-directional transactional replication which will raise “row not found” error messages. I have only heard about is anecdotally.


    Quite often it is more important to keep the agents running, and then run a validation to determine where the error is, as opposed to let the agents stop, investigate and clear the error. The reason why it is more important to keep the agents running is that the longer the agents are not running for, the greater the chances of future conflicts occurring is.


    The best way is to skip the error(s) using the Continue On Data Consistency Error Profile. However, this means that if errors are skipped there is a chance your database will get progressively more and more out of sync or different.


    It then becomes critical to run publication validations to find out what tables are different.


    I normally schedule validations at least daily and run the validation twice to see if the validation error occurs on the same tables for both validation runs. My reasoning is the first time a validation error occurs, it could be due to in-flight data. If the table shows up with a validation error twice in a row, it likely is a true out of sync condition as opposed to inflight data.

    Non-Convergence and swapping

    Another factor to consider is that if I update the same column on the same row on both sides of a bi-directional replication topology, my values will swap.


    Consider this update.


    Update TableName1 set Col1=’NodeA’ where PK =1 -- on NodeA
    Update TableName1 set Col1=’NodeB where PK =1 -- on NodeB


    Before my synchronization the state of the row will look like this:


    Select * from TableName1 where PK=1  -- on NodeA
    1, NodeA
    Select * from TableName1 where PK=1  -- on Nodeb
    1, NodeB

    After synchronization it will look like this:


    Select * from TableName1 where PK=1  -- on NodeA
    1, NodeB
    Select * from TableName1 where PK=1  -- on Nodeb
    1, NodeA


    The values have swapped. Granted the chances of such an update happening are slim, but it can happen. Ideally with your data perfectly partitioned only one client will be active on one node at one time and will only work with data he/she owns so the chances of swapping are slim.


    DML Changes


    Bi-Directional Transactional Replication can replicate DML changes.  However there is a caveat. A change made on Node A will be replicated to Node B and the replication stored procedures Node A uses to apply commands to Node B will be updated for the schema change on Node B.  But the replication stored procedures that Node B uses to replicate changes occurring on Node B to Node A will not reflect this change.


    It becomes necessary to regenerate these stored procedures. Either through a database trigger or a SQL Server job than runs at periodic intervals. I use
    •    sp_scriptinsproc
    •    sp_scriptupdproc


    For this. I don’t need to worry about the delete procs as they only need to account for the PK.


    Here is an example of such a job.


    DECLARE TEST CURSOR
    FOR
        SELECT  DEST_TABLE
        FROM    SYSARTICLES
        ORDER BY DEST_TABLE
    OPEN TEST
    DECLARE @TABLENAME SYSNAME
    FETCH NEXT FROM TEST INTO @TABLENAME
    WHILE @@FETCH_STATUS = 0
        BEGIN
            EXEC UPDATEPROCS @TABLENAME
            FETCH NEXT FROM TEST INTO @TABLENAME
        END
    CLOSE TEST
    DEALLOCATE TEST


    --UpdateProcs looks like this:
    CREATE PROCEDURE [dbo].[UPDATEPROCS] ( @table SYSNAME ) AS
        DECLARE @articleNumber INT
        DECLARE @insertType INT
        DECLARE @updateType INT
        DECLARE @rowcount INT
        DECLARE @Statement VARCHAR(MAX)
        DECLARE @end VARCHAR(MAX)
        DECLARE @top VARCHAR(MAX)
        DECLARE @middle VARCHAR(MAX)
        CREATE TABLE #table1
            (
              pk INT IDENTITY ,
              varcharcol VARCHAR(MAX)
            )
        SELECT  @articleNumber = artid ,
                @insertType = CASE WHEN ins_cmd LIKE 'C%' THEN 1
                                   ELSE 0
                              END ,
                @updateType = CASE WHEN upd_cmd LIKE 'C%' THEN 1
                                   ELSE 0
                              END
        FROM    sysarticles
        WHERE   dest_table = @table
        IF @insertType = 1
            BEGIN
                SET @statement = ' '
                SET @end = ' '
                TRUNCATE TABLE #table1
                INSERT  INTO #table1
                        EXEC sp_scriptinsproc @articleNumber
                SELECT  @rowcount = MAX(pk)
                FROM    #table1
                SELECT  @top = varcharcol
                FROM    #table1
                WHERE   pk = 1
                SELECT  @middle = varcharcol
                FROM    #table1
                WHERE   pk = 3
                SELECT  @statement = @statement + ' ' + varcharcol
                FROM    #table1
                WHERE   pk > 4
                        AND pk < @rowcount - 2
                SELECT  @end = @end + ' ' + varcharcol
                FROM    #table1
                WHERE   pk >= @rowcount - 1
                EXEC(@top)
                EXEC(@middle)
                EXEC(@statement)
                EXEC(@end)
            END
        ELSE
            BEGIN
                SET @statement = ' '
                SET @statement = ' '
                SET @end = ' '
                TRUNCATE TABLE #table1
                INSERT  INTO #table1
                        EXEC sp_scriptsinsproc @articleNumber
                SELECT  @rowcount = MAX(pk)
                FROM    #table1
                SELECT  @top = varcharcol
                FROM    #table1
                WHERE   pk = 1
                SELECT  @middle = varcharcol
                FROM    #table1
                WHERE   pk = 3
                SELECT  @statement = @statement + ' ' + varcharcol
                FROM    #table1
                WHERE   pk > 4
                        AND pk < @rowcount - 2
                SELECT  @end = @end + ' ' + varcharcol
                FROM    #table1
                WHERE   pk >= @rowcount - 1
                EXEC(@top)
                EXEC(@middle)
                EXEC(@statement)
                EXEC(@end)
            END
    --this is the update statement
        IF @UpdateType = 1
            BEGIN
                SET @statement = ' '
                SET @end = ' '
                TRUNCATE TABLE #table1
                SET @statement = ''
                INSERT  INTO #table1
                        EXEC sp_scriptupdproc @articleNumber
                SELECT  @rowcount = MAX(pk)
                FROM    #table1
                SELECT  @top = varcharcol
                FROM    #table1
                WHERE   pk = 1
                SELECT  @middle = varcharcol
                FROM    #table1
                WHERE   pk = 3
                SELECT  @statement = @statement + ' ' + varcharcol
                FROM    #table1
                WHERE   pk > 4
                        AND pk < @rowcount - 2
                SELECT  @end = @end + ' ' + varcharcol
                FROM    #table1
                WHERE   pk >= @rowcount - 1
                EXEC(@top)
                EXEC(@middle)
                EXEC(@statement)
                EXEC(@end)
            END
        ELSE
            BEGIN
                SET @statement = ' '
                SET @end = ' '
                TRUNCATE TABLE #table1
                SET @statement = ''
                INSERT  INTO #table1
                        EXEC sp_scriptsupdproc @articleNumber
                SELECT  @rowcount = MAX(pk)
                FROM    #table1
                SELECT  @top = varcharcol
                FROM    #table1
                WHERE   pk = 1
                SELECT  @middle = varcharcol
                FROM    #table1
                WHERE   pk = 3
                SELECT  @statement = @statement + ' ' + varcharcol
                FROM    #table1
                WHERE   pk > 4
                        AND pk < @rowcount - 2
                SELECT  @end = @end + ' ' + varcharcol
                FROM    #table1
                WHERE   pk >= @rowcount - 1
                EXEC(@top)
                EXEC(@middle)
                EXEC(@statement)
                EXEC(@end)
            END


    As you can see bi-directional transactional replication can be a good fit for some scenarios.
    Before you implement a bi-directional. Ensure you test with a representative workload before deploying into production.

  • SQL Full-text search anomolies

    On the newsgropes recently I have encountered some posters who are perplexed by the results of their full-text searches.

    The first poster was puzzled as to why searches on f.b.i. would not find F.B.I. in his content.  SQL FTS and Microsoft Search products index uppercased tokens with periods in them differently than other lower or upper cased tokens. During indexing most language word breakers will index upper cased tokens with periods in them as the token, and then the token with all the periods stripped out. So F.B.I. is indexed as F.B.I. and FBI. All tokens are stored upper cased in the index with the exception of path names (for cosmetic reasons - e.g. in Indexing services a path like c:\TeMp is stored as c:\temp as lowercase was considered to look better than uppercase).

    There is no length cut off as far as I am aware. For example, C.R.E.E.P. (Committee to Re-Elect the President - the unfortunately and what we can only hope was unintentional acronym for President Richard Nixon's second election campaign) is indexed as CREEP and C.R.E.E.P. I've tried up to 10 letter combinations.

    Here is an example:

    Create table testAcronymTable(pk int not null identity constraint testAcronymTablePK primary key, charcol char(20))

    GO

    insert into  testAcronymTable(charcol) values('F.B.I.')

    insert into  testAcronymTable(charcol) values('f.b.i.')

    insert into  testAcronymTable(charcol) values('fbi')

    insert into  testAcronymTable(charcol) values('FBI') 

    insert into  testAcronymTable(charcol) values('F B I') 

    insert into  testAcronymTable(charcol) values('f b i') 

    GO

    create fulltext catalog testacronymcatalog

    GO
    --indexing using the neutral word breaker
    create fulltext index on testAcronymTable(charcol language 0) key index testAcronymTablePK on testAcronymCatalog
    GO
    --returns f.b.i., F.B.I., F B I and f b i  - expected, SQL FTS
    --throws away the .'s
    --you get the same results if f, b, and i are removed from the noise list
    drop fulltext index on testAcronymTable
    GO
    --indexing using the American English Word breaker word breaker
    create fulltext index on testAcronymTable(charcol language 1033) key index testAcronymTablePK on testAcronymCatalog
    GO
    select * from  testAcronymTable where contains(*,'fbi')
    GO
    --returns fbi F.B.I, and FBI - fbi, and FBI are expected, unexpected is
    --F.B.I, F.B.I is indexed as F.B.I. and FBI.
    select * from  testAcronymTable where contains(*,'FBI')
    GO
    --returns fbi F.B.I, and FBI - fbi, and FBI are expected, unexpected is
    --F.B.I, F.B.I is indexed as F.B.I. and FBI.
    select * from  testAcronymTable where contains(*,'F.B.I.')
    GO
    --returns fbi F.B.I, and FBI - F.B.I is expected, unexpected is
    --FBI and fbi, F.B.I is indexed as F.B.I. and FBI.
    select * from  testAcronymTable where contains(*,'f.b.i.')
    GO
    --returns f.b.i., F B I, and f b i -if your noise word list does not contain
    --f, b, and i
    --returns nothing -if your noise word list does contain
    --f, b, and i

    Note that this works for most languages but does NOT work for the neutral, Chinese (both types),  Japanese, and Thai word breakers. They will index F.B.I. as F, B, and I - in other words as three separate letters. Korean will index F.B.I. as F.B.I

    The next post covers some of the unusual behaviors while searching on C#, C+, C++, and F#, etc

     

  • handy replication proc

    A friend of mine called me up last night. He had a problem replicating 300,000,000 transactions per day, or was it per hour. Can't recall, it was a late night.

    Anyhowze, with volumes like this replication might not be the best tool. His problem was that the distribution agent to two of his subscribers was hanging on initializing. From experience I know this is caused by depleted buffers on the publisher and the only solution I know of is a reboot. Naturally this was not an option for him.

    His has his max distribution retention setting to 72 hours and was under the impression that this would decrease the amount of time required to purse distributed transactions. I told him that min distribution retention setting controlled this, and that his problem with expired subscribers was likely caused by this low max distribution retention setting or the history setting. Subscribers are expired based on the lesser of these two settings.

     Anyhowze on with my story - so he asks me what he can do to improve performance and I recommended he use sp_scriptdynamicupdproc to create a more efficient update proc (as his app did heavy updating), or its SQL 2005 analogue sp_scriptsupdproc.

    I expect another midnight call from him soon as he continues in his losing battle.   

  • transaction marks

    There seems to be some confusion out there in the internet about how to implement transaction marks. Or perhaps I was merely very confused about how to implement them.

     

    Transaction marks allow you to restore two or more database to a consistent point in time. For example if you have two database and your app writes to both databases, how do you restore them to a consistent point in time without kicking all your users off the databases? This is complicated by many things, for example backup speed.

    I googled looking for a working example but could not find one, so here is a script I wrote to demonstrate it.

     

    create database MarkedTransaction
    GO
    create database MarkedTransaction1
    GO
    use MarkedTransaction
    go
    create table test1(pk int)
    GO
    use MarkedTransaction1
    GO
    create table test1(pk int)
    GO
    backup database MarkedTransaction to disk='c:\MarkedTransaction.bak' with init
    GO
    backup database MarkedTransaction1 to disk='c:\MarkedTransaction1.bak'with init
    GO
    begin transaction
    insert into MarkedTransaction.dbo.test1(pk) values(1)
    insert into MarkedTransaction1.dbo.test1(pk) values(1)
    commit tran
    GO
    begin transaction mark6 with mark 'marked transaction'
    insert into MarkedTransaction.dbo.test1(pk) values(2)
    insert into MarkedTransaction1.dbo.test1(pk) values(2)
    commit transaction mark6
    GO
    begin transaction mark7 with mark 'marked transaction - 1'
    insert into MarkedTransaction.dbo.test1(pk) values(3)
    insert into MarkedTransaction1.dbo.test1(pk) values(3)
    commit tran mark7
    GO
    backup log MarkedTransaction to disk='c:\MarkedTransactionlog.bak' with init
    backup log MarkedTransaction1 to disk='c:\MarkedTransactionlog1.bak' with init
    GO

    use master
    GO
    --backing up the tail
    backup log MarkedTransaction to disk='c:\MarkedTransactionlogTail.bak' with init, norecovery
    backup log MarkedTransaction1 to disk='c:\MarkedTransactionlog1Tail.bak' with init, norecovery
    GO
    --restoring the database
    restore database markedtransaction from disk='c:\markedtransaction.bak' with norecovery
    restore database markedtransaction1 from disk='c:\markedtransaction1.bak' with norecovery
    GO
    --restoring the log
    RESTORE LOG MarkedTransaction FROM 
    DISK = 'c:\MarkedTransactionlog.bak'
    WITH  STOPATMARK = 'mark6', recovery
    GO

    RESTORE LOG MarkedTransaction1 FROM 
    DISK = 'c:\MarkedTransactionlog1.bak'
    WITH  STOPATMARK = 'mark6', recovery
    GO

    select * from MarkedTransaction.dbo.test1
    select * from MarkedTransaction1.dbo.test1
    --both 1, 2 are there, 3 is not
    --repeating,  but this time we will stop before
    backup database MarkedTransaction to disk='c:\MarkedTransaction.bak' with init
    GO
    backup database MarkedTransaction1 to disk='c:\MarkedTransaction1.bak'with init
    GO
    begin transaction
    insert into MarkedTransaction.dbo.test1(pk) values(6)
    insert into MarkedTransaction1.dbo.test1(pk) values(6)
    commit transaction
    GO
    begin transaction mark8 with mark 'marked transaction -2'
    insert into MarkedTransaction.dbo.test1(pk) values(7)
    insert into MarkedTransaction1.dbo.test1(pk) values(7)
    commit transaction mark8
    GO
    begin transaction
    insert into MarkedTransaction.dbo.test1(pk) values(8)
    insert into MarkedTransaction1.dbo.test1(pk) values(8)
    commit transaction
    GO
    backup log MarkedTransaction to disk='c:\MarkedTransactionlog.bak' with init
    backup log MarkedTransaction1 to disk='c:\MarkedTransactionlog1.bak' with init
    GO

    use master
    GO
    backup log MarkedTransaction to disk='c:\MarkedTransactionlogTail.bak' with init, norecovery
    backup log MarkedTransaction1 to disk='c:\MarkedTransactionlog1Tail.bak' with init, norecovery
    GO
    RESTORE Database MarkedTransaction  from disk='c:\markedtransaction.bak' with norecovery
    RESTORE Database MarkedTransaction1  from disk='c:\markedtransaction1.bak' with norecovery
    GO
    RESTORE LOG MarkedTransaction FROM 
    DISK = N'c:\MarkedTransactionlog.bak'
    WITH  STOPBEFOREMARK = N'mark8' , recovery
    GO
    RESTORE LOG MarkedTransaction1 FROM 
    DISK = N'c:\MarkedTransactionlog1.bak'
    WITH  STOPBEFOREMARK = N'mark8' , recovery
    GO
    select * from MarkedTransaction.dbo.test1
    select * from MarkedTransaction1.dbo.test1
    --both 1, 2, and 6 are there, 7 is not







     

     

     

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement