【www.9778.com】The log scan number (620023:3702:1) passed to log scan in database ‘xxxx’ is not valid

 

sp_repldone updates the record that
identifies the last distributed transaction of the server.

To set this option, run this
command:

 

Unable to execute
procedure. The database is not published. Execute the procedure in a
database that is published for replication.

So this issue is, what is more
important, faster replication or less locks? The ideal size of the
CommitBatchSize parameter depends on your specific situation. For
example, if there is a large amount of activity in the distribution
database coming from both the publisher and many subscribers, then
reducing the batch size can be beneficial because of reduced locking
(improved concurrency) in the distribution database. By increasing
concurrency, transactions that have to wait on locks don’t have to wait
as long, and overall performance if boosted. For example, if there are
100 subscribers, each subscriber needs to be updated from the
Distribution database. This can create a lot of activity in the
distribution database, which be slowed down if there are many locks
occurring.

 The log scan
number (620023:3702:1) passed to log scan in database ‘xxxx’ is not
valid. This error may indicate data corruption or that the log file
(.ldf) does not match the data file (.mdf). If this error occurred
during replication, re-create the publication. Otherwise, restore from
backup if the problem results in a failure during startup.

exec sys.sp_repltrans

While the option is effective, it is
also a resource hog. Because of this, you should only use
Immediate-Updating when it is absolutely necessary. As an alternative to
Immediate-Updating, consider only replicating changes from a Subscriber
to a Publisher at regular intervals, such as once an hour, or once a
day. This will significantly reduce server overhead. [7.0, 2000,
2005] Updated 1-6-2006

How to fix this

 

  • Ensure database consistency by running DBCC CHECKDB on the
    database. 

  • If an old backup was restored on top of published database then
    use sp_replrestart

  • If going back to the most recent transaction log backup is not an
    option then execute sp_replrestart  on
    publisher in published database. This stored procedure is used when
    the highest log sequence number (LSN) value at the Distributor does
    match the highest LSN value at the Publisher.

  • This stored procedure will insert compensating LSNs (No Operation)
    in the publisher database log file till one the compensating LSN
    becomes more than the highest distributed LSN in distribution
    database for this published database. After this it inserts this new
    high LSN in the msrepl_transactions table in the distribution
    database and executes sp_repldone on published database to update
    the internal structures to mark a new starting point for log reader
    agent.

  • Ensure that the log reader agent is stopped and there is no incoming
    transactions on the published database, when this SP is executed.

  • Since transactions may have been lost, we recommend to reinitialize
    the subscriber(s) and/or recreate publication/subscription(s).  For
    large databases consider using “Initialize from Backup” as discussed
    in SQL Book Online.

 

但是在这个案例当中,
数据库既没有损坏,也没有还原过。 只能是Replication出现了错误,但是在SQL
Server的Replication中又没有找到相关错误信息,本身这个是AWS的DMS自动生成的Replication,很多内部信息不太清楚(例如,是否出现异常),官方也没有找到很详细的介绍这个错误的相关资料。在此记录一下。

 

 

 

 

参考资料:

 

EXEC sys.sp_repldone 
        @xactid = NULL, 
        @xact_segno = NULL, 
        @numtrans = 0,     
        @time = 0, 
        @reset = 1  

If you decide to increase the
PollingInterval, you will have to experiment with different values to
come up with the optimum value for your particular situation. If you do
this, be sure you have a good way to determine performance before and
after your experimentation to ensure that you get good results. [7.0,
2000] Added 12-26-2001

昨天一台SQL
Server 2008R2的数据库在凌晨5点多抛出下面告警信息:

exec sys.sp_replicationdboption
        @dbname = N'testdb', 
        @optname = N'publish', 
        @value = N'true' 

In transactional replication, the
transaction log of the database involved in replication is both written
to and read from
. Without replication, a transaction log is almost
always written to, and rarely read from. Because a transaction log is
both written to and read from when using transactional replication, this
can cause I/O performance issues on databases that experience large
numbers of transactions.

 

在Publisher database中,使用 sp_repltrans
查看没有被LogReader标记为Distributed的Transaction。

*****

Common Causes

 

  • The last LSN in Transaction Log is less than what the LSN Log Reader
    is trying to find. An old backup may have been restored on top of
    Published Database. After the restore, the new Transaction Log
    doesn’t contain the data now distributor & subscriber(s) have.

  • Database corruption.

 

select count(0)
from sys.fn_dblog(null,null) f
where f.Description ='REPLICATE'

By increasing the PollingInterval,
reads occur less often, which reduce the interference with sequential
writes, which helps to boost performance. For this performance bonus to
work, the drive that has the transaction log must be devoted to
transaction logs. If it is not, then that drive most likely will already
be experiencing random writes, and increasing the PollingInterval won’t
help performance.

  
乍一看,还以为数据库损坏了(data corruption),但是在做完DBCC
CHECKDB后,发现其实数据库其实是完好无损的。那么肯定是跟Replication有关。但是在搜索了相关资料,仅仅在The
process could not execute
‘sp_repldone/sp_replcounters”
这篇博客中找到了类似错误的资料:

DBCC SQLPERF(LOGSPACE)

The counters for these two processes
are the SQL Server Replication LogReader: Delivery Latency counter and
the SQL Server Replication Dist.: Delivery Latency counter.

 

sp_repltrans
(Transact-SQL).aspx)

*****

命名执行完成之后,发现还有300多GB,实际Log
File占用的空间的百分比十分低,0.000428%

[7.0, 2000] Updated
1-6-2006

dbcc shrinkfile(testdb_log_5,10240,notruncate)
dbcc shrinkfile(testdb_log_5,10240,truncateonly)

On the other hand, if most of the
activity in the Distribution database is caused by the Log Reader Agent
that is running often (or even continuously), and not other sources
(such as subscribers), then increasing the size of the CommitBatchSize
parameter makes more sense, as larger batch sizes mean that the Log
Reader Agent doesn’t have to work as often, which in turn reduces
overhead, boosting performance.

由于test db的还原模式是Simple,并且没有active
user,最大的可能性是db的Trasaction
log被标记为Replication,使用以下函数统计,发现有大量的log未被LogReader读取。

If you are using transactional
replication, you may want to monitor the latency that it takes the Log
Reader to move transactions
 from a database’s transaction log until it
puts it in the distribution database, and to also monitor the latency it
takes the Distributor Agent to move transactions from the distribution
database to the Subscriber database. The total of these two figures is
the amount of time it takes a transaction to get from the publication
database to the subscriber database.

参考doc:

Transactional replication
performance can be boosted by changing the ReadBatchSize parameter
 of
the log reader agent in cases where a large number of transactions are
being written to a published database, but only a small portion of them
will be replicated to subscribers. The default value is 500. You will
probably have to experiment with different values until you find the
optimal one for your particular configuration.

sp_repldone
(Transact-SQL).aspx)

If you see a significant increase in
the latency for either of these processes, this should be a signal to
you to find out what new or different action has happened to cause the
increased latency. [6.5, 7.0, 2000, 2005] Updated 1-6-2006

由于testdb是使用backup还原的测试数据库,没有在master中注册为Publisher
database,必须设置 database 为publish,表示 Database can be used for
other types of publications.

The distributor has a setting called
the CommitBatchSize, which is configured from the “Replication Agent
Profile Details” dialog box. This parameter determines how many
replication transactions are committed in the distribution database in a
single batch. The default value is 100.

 

Transactional replication offers an
option called Immediate-Updating Subscribers. This feature provides
for transactional consistency for all of the various Subscribers of a
publisher. By making use of the Microsoft Distributed Transaction
Coordinator (MSDTC), Immediate-Updating Subscribers allows subscribers
to update a copy of the local data, and the Publishers data is also
updated, simultaneously. This change is then replicated from the
Publisher to the other Subscribers, so they all have consistent
data.

When xactid is NULL, xact_seqno is
NULL, and reset is 1, all replicated transactions in the log are
marked as distributed. This is useful when there are replicated
transactions in the transaction log that are no longer valid and you
want to truncate the log,

The reason performance is boosted is
because normally the I/O activity on transactions logs is limited to
sequential writes. When the Log Reader Agent polls the transaction log,
it has to read the log, which in effect causes random reads to occur on
the device with the transaction log, which prevents writes from being
purely sequential in nature. If the device has to perform both reads and
writes, as is the case in this instance, then sequential writes, in
effect, turn into random writes, which hurts performance.

最后,使用DBCC ShrinkFile命令,Transaction Log File收缩完成。

While increasing the CommitBatchSize
sounds like a good idea, there is also a corresponding downside to
increasing it. The problem occurs because larger batch sizes mean that
the actual commit takes longer to occur, which in turn causes locks in
the distribution database to be held longer than if the commits were to
take less time. Locks, as you know, can reduce concurrency in a
database, reducing performance.

使用Backup创建测试环境之后,发现testdb的Log
File过大,达到400GB,由于测试环境实际上不需要这么大的Log
Space,占用400GB的Disk Space实在浪费Disk Resource,于是使用DBCC
Shrink收缩Log File:

To help reduce this problem, locate
the transaction log of databases involved in transactional replication
on its own dedicated RAID 1 or RAID 10 disk array, and connected to its
own SCSI or fiber channel. [6.0, 7.0, 2000, 2005] Updated
1-6-2006

sp_repltrans returns a result set of all
the transactions in the publication database transaction log that are
marked for replication but have not been marked as distributed.

So what should you do? Unless you are
aware of specific issues with transactional replication performance, you
should leave the CommitBatchSize option alone. If you are experiencing
performance problems, you may want to experiment with different
CommitBatchSize settings until you can find one that helps boost
performance. If you do this, be sure you have a good way to determine
performance before and after your experimentation to ensure that you get
good results. [7.0, 2000] Added 12-26-2001

sp_replicationdboption
(Transact-SQL).aspx)

One way around this overhead is to
employ DTS custom partitions (available in SQL Server 2000) to perform
the horizontal partitioning for you. This reduces the log reader’s
overhead, boosting performance. [2000] Updated 1-6-2006

注册成功之后,使用 sp_repldone,将所有的Transaction Log
标记为Distributed。

This setting can affect performance
two different ways. First, the larger the CommitBatchSize parameter is,
the fewer commits that have to be made to the distribution database,
which results in reduced overhead. This is because this process is
resource intensive, and the fewer times that a commit occurs, the less
overhead that is incurred.

*****

Although horizontal filtering can
reduce the amount of data that is replicated from one database to
another, using it also incurs high overhead when used with transactional
replication
. The log reader agent is used to perform horizontal
filtering, and every row that is updated on the Publisher causes extra
work for the log reader.

logread
-ReadBatchSize number_of_transactions

Transactional replication
performance can sometimes be boosted by changing the PollingInterval
parameter of the Log Reader Agent
. By default, the Log Reader Agent
polls the published database’s transaction log every 10 seconds. If the
transaction log of the published database is very busy, then performance
may be boosted if the PollingInterval is increased.

*****

*****