Rsync
2012-05-18 11:11:59 阿炯

数据同步-rsync
rsync是类unix系统下的数据镜像备份工具,从软件的命名上就可以看出来了--remote sync。rsync 全名 Remote Sync,是类unix系统下的数据镜像备份工具。

rsync - a fast, versatile, remote (and local) file-copying tool.

rsync是一个功能非常强大的工具,其命令也有很多功能特色选项。它的特性如下:
 1、可以镜像保存整个目录树和文件系统。
 2、可以很容易做到保持原来文件的权限、时间、软硬链接等等。
 3、无须特殊权限即可安装。
 4、优化的流程,文件传输效率高,第一次同步时 rsync 会复制全部内容,但在下一次只传输修改过的文件。rsync 在传输数据的过程中可以实行压缩及解压缩操作,可以使用更少的带宽。
 5、可以使用rcp、ssh等方式来传输文件,当然也可以通过直接的socket连接。
 6、支持匿名传输,以方便进行网站镜像。

rsync is an open source utility that provides fast incremental file transfer. rsync is freely available under the GNU General Public License and is currently being maintained by Wayne Davison.



rsync is a file transfer program for Unix systems. rsync uses the "rsync algorithm" which provides a very fast method for bringing remote files into sync. It does this by sending just the differences in the files across the link, without requiring that both sets of files are present at one of the ends of the link beforehand.

Features
 can update whole directory trees and filesystems
 optionally preserves symbolic links, hard links, file ownership, permissions, devices and times
 requires no special privileges to install
 internal pipelining reduces latency for multiple files
 can use rsh, ssh or direct sockets as the transport
 supports anonymous rsync which is ideal for mirroring

最新版本:3.1
该版本有很多性能的增强(如重写的I/O层代码),小功能改进增强以及Bug修正,建议升级。

官方主页:http://rsync.samba.org/

网络数据同步开发库-libsync
libsync是一个用于网络数据同步的开发库。


假设现在有两台计算机A和B ,计算机A能够访问A文件,计算机B能够访问B文件,文件A和B非常相似,计算机A和B通过低速网络互联。基于dedupe技术的数据同步算法大致流程与Rsync相似,简单描述如下:
 1、B采用数据切分算法,如FSP(fixed-size partition)、CDC(content-defined chuking),将文件B分割成大小相等或不等的数据块;
 2、B对于每一个数据块,计算一个类似rsync弱校验值和md5强校验值,并记录数据块长度len和在文件B中的偏移量offset;
 3、B将这将数据块信息发送给A;
 4、A采用同样的数据块切分技术将文件A切成大小相等或不等的数据块,并与B发过来的数据信息进行搜索匹配,生成差异编码信息;
 5、A将差异编码信息发送给B,并同时发送重构文件A的指令;
 6、B根据差异编码信息和文件B重构文件A。
 上面算法描述中,有几个关键问题需要解决,即文件切分、切分数据块信息描述、差异编码、差异编码信息描述、文件同步。

libsync函数库由提供三个API,原型描述如下:
1、int file_chunk(char src_filename, char chunk_filename, int chunk_algo)
功能:对件进行切分生成分块描述文件。
参数:src_filename为源文件,chunk_filename为生成的块信息描述文件,chunk_algo为文件切分算法,目前支持FSP、CDC、SB三种。

2、int file_delta(char src_filename, char chunk_filename, char delta_filename, int chunk_algo)
功能:使用生成的块描述信息对文件进行差异编码。
参数:src_filename为待编码文件,chunk_filename为通过函数file_chunk生成的块描述文件,chunk_algo为文件切分算法。

3、int file_sync(char src_filename, char delta_filename)
功能:使用差异编码文件将源文件同步至目标文件。
参数:src_filename为基本文件,delta_filename为通过函数file_delta生成的差异编码文件。

数据同步有PULL和PUSH两种应用模式,PULL是将远程数据同步到本地,而PUSH是将本地数据同步到远程。对应到同步算法,主要区别在于数据分块和差异编码位置不同。PULL和PUSH同步模式步骤分别如下所述。
 PULL同步模式流程:
 1、本地对文件A进行数据切分,生成数据块描述文件chunk;
 2、上传chunk文件至远程服务器;
 3、远程服务器对文件B进行差异编码,生成差异编码文件delta;
 4、下载delta文件至本地;
 5、本地同步文件A至文件B,相当于下载文件B到本地文件A。

PUSH同步模式流程:
 1、远程服务器对文件B进行数据切分,生成数据块描述文件chunk;
 2、下载chunk文件至本地;
 3、本地对文件A进行差异编码,生成差异编码文件delta;
 4、上传delta文件至远程服务器;
 5、远程同步文件B到A,相当于上传文件A到远程文件B。

最新版本:

项目主页:http://code.google.com/p/libsync/

数据同步工具-cwRsync
cwRsync是运用于windows 平台的数据同步机制,等于是Windows平台的 rsync 解决方案。cwRsync 打包了 rsync 和 cygwin。

cwRsync is a yet another packaging of Rsync and Cygwin for Windows with a client GUI. You can use cwRsync for fast remote file backup and synchronization. Rsync uses the Rsync algorithm which provides a very fast method for bringing remote files into sync. It does this by sending just the differences in the files across the link, without requiring that both sets of files are present at one of the ends of the link beforehand. At first glance this may seem impossible because the calculation of diffs between two files normally requires local access to both files.

Rsync normally uses ssh for communication. It requires no special privileges for installation. You must, however, have a working ssh system.

Alternatively, rsync can run in `daemon' mode, listening on a socket. This is generally used for public file distribution, although authentication and access control are available. Cygwin is a Linux-like environment for Windows. It consists of a DLL (cygwin1.dll), which emulates substantial Linux API functionality, and a collection of tools.

最新版本:4.0

项目主页:https://www.itefix.no/i2/cwrsync

基于HTTP的文件同步工具-zsync
zsync 是一个基于 HTTP 协议的文件同步(rsync)工具,通过它可以从远程的Web服务器上同步文件的改动。

zsync is a file transfer program. It allows you to download a file from a remote server, where you have a copy of an older version of the file on your computer already. zsync downloads only the new parts of the file. It uses the same algorithm as rsync. However, where rsync is designed for synchronising data from one computer to another within an organisation, zsync is designed for file distribution, with one file on a server to be distributed to thousands of downloaders. zsync requires no special server software — just a web server to host the files — and imposes no extra load on the server, making it ideal for large scale file distribution.

zsync is open source, distributed under version 2 of the Artistic License. Feedback, bugs reports and patches are welcome.

Features
zsync fills a gap in the technology available for large-scale file distribution. Three key points explain why zsync provides a genuinely new technique for file distribution:
Client-side rsync — zsync uses the rsync algorithm, but runs it on the client side, thus avoiding the high server load associated with rsync.
Rsync over HTTP — zsync provides transfers that are nearly as efficient as rsync -z or cvsup, without the need to run a special server application. All that is needed is an HTTP/1.1-compliant web server. So it works through firewalls and on shared hosting accounts, and gives less security worries.
Handling for compressed files — rsync is ineffective on compressed files, unless they are compressed with a patched version of gzip. zsync has special handling for gzipped files, which enables update transfers of files which are distributed in compressed form.

最新版本:0.6

项目主页:http://zsync.moria.org.uk/

图形界面的-grsync
Grsync 是一个 rsync 的图形界面程序,rsync是类unix系统下的数据镜像备份工具。

Grsync is a rsync GUI (Graphical User Interface). Rsync is the well-known and powerful command line directory and file synchronization tool. Grsync makes use of the GTK libraries and is released under the GPL license, so it is opensource. It doesn't need the gnome libraries to run, but can of course run under gnome pretty fine. It can be effectively used to synchronize local directories and it supports remote targets as well (even though it doesn't support browsing the remote folder). Sample uses of grsync include: synchronize a music collection with removable devices, backup personal files to a networked drive, replication of a partition to another one, mirroring of files, etc.

Features
Most commonly used rsync options available, additional options may be specified by command line switches
Saves multiple settings with customized names (no limit on number of "sessions")
Session sets can be created: run multiple sessions at once!
Can do simulation or normal execution
Captures and prints rsync output nicely on a own window and log to a file
Parses rsync output to display progress bars and other information
Highlights errors and show them on a separate window, for better and faster control over rsync runs
Can pause rsync execution
A good number of translations available
Can run custom commands before (and stop in case of failure) and after rsync
Shell script for batch, crontab use etc. provided (grsync-batch)
Can import and export sessions on file; i.e. share your settings with people!
Can minimize to system tray (status icon)
Can run specific sessions with superuser privileges
Rsync backup made easy!
Needs rsync installed on the system (command line tool only, no need for server-side daemon) and GTK
Available for free and with sources!
Works on many linux distributions (including Nokia Maemo), Mac OS X and windows!

最新版本:1.2

项目主页:http://www.opbyte.it/grsync/

Perl文件同步脚本-fsync
Fsync 是一个允许与远程主机进行档案同步的Perl脚本。包含的功能类似于rsync和CVS软件包。自fsync是一个单一的Perl脚本,建立档案同步化 的一个新的机器是相当简单的。主机之间的通信是通过一个插槽机制或以上的硫醇(或SSH)方面,与远程服务器开始硫醇,通过SSH或手动。这一项目是书面缓慢调制解调器连接到。 Fsync支持合并的概念差异本地/远程主机与钩的工具,以合并的树木。 Fsync需要的Perl 5.004或更高版本。这项计划已获得了GNU公共许可证。

Fsync is a Perl script which allows for file synchronization between remote hosts, containing functionality similar to that of the rsync and CVS packages. Since fsync is a single Perl script, setting up file synchronization on a new machine is relatively simple. Communication between the hosts is via a socket mechanism or over an rsh (or ssh) connection, with the remote server started by rsh, by ssh or manually. The program was written with slow modem connections in mind. Fsync supports the concept of merging differences from local/remote hosts with hooks for tools to merge the trees. Fsync requires perl 5.004 or newer. This program is licensed under the GNU Public License.

最新版本:2.1

项目主页:http://schwieters.org/fsync/


实时同步工具-lsyncd

Lysncd是Lua语言封装了 inotify 和 rsync 工具,采用了 linux 内核(2.6.13 及以后)里的 inotify 触发机制,通过inotify或者fsevents监控本地目录的变化事件,然后通过rsync同步。其最强大之处在于简单高效传输海量数据并且Lsyncd支持多种工作模式,lsyncd.conf可以有多个sync,各自的模式,互不影响。

Lsyncd -- Live Syncing (Mirror) Daemon.

相关参考信息

模式运行:rsync、rsyncssh、direct三种模式。
1).bwlimit 限速,单位kb/s,与rsync相同;
2).compress 压缩传输默认为true。在带宽与cpu负载之间权衡,本地目录同步可以考虑把它设为false ;
3).perms 默认保留文件权限。

相关配置设置
1). pidfile 文件记录lsyncd当前进程id值,如果没有pid文件,那么则可以启动多个lsyncd进程,这会导致多个lsyncd进程同时运行造成同步的数据混乱。如果设置了pid文件,同时只能启动一个lsyncd进程。 避免多个lsyncd进程导致同步混乱的情况,pid文件一定要设置。

2). inotifyMode 指定lsyncd监控的事件,其值有:CloseWrite, Modify, CloseWrite or Modify 默认是CloseWrite。官方文档写的是 inotify 事件,其实有偏差。lsyncd事件是对inotify事件的包装,CloseWrite事件是lsyncd的事件,该事件并不是单一事件而是包含了很多inotify事件,如下: CloseWrite包含了以下inotify事件:
IN_ATTRIB 文件属性被修改,如 chmod、chown、touch 等
IN_CLOSE_WRITE 可写文件被 关闭
IN_CREATE创建新文件
IN_DELETE文件/目录已在监控目录中删除
IN_DELETE_SELF 监控的项目本身已删除
IN_MOVED_FROM 文件被移出监控目录,如 mv
IN_MOVED_TO 文件被移动到监控目录,如 mv、cp
IN_DONT_FOLLOW 不追踪符号链接的真实路径
IN_ONLYDIR 仅监视目录 Modify是在CloseWrite的基础上增加了
IN_MODIFY事件,并删除了
IN_CLOSE_WRITE事件
IN_MODIFY 文件已被修改 CloseWrite or Modify 是在CloseWrite的基础上增加了IN_MODIFY事件。

3). Insist 默认情况下,当启动lsyncd失败时,Lsyncd会结束允许并显示一条错误消息。 开启了容错模式以后,lsyncd并不会因为某个配置的错误导致启动失败,而是记录错误日志并忽略错误配置继续允许。

4). maxProcesses lsyncd会产生一个子进程去运行sync任务, 多个sync的时候lsyncd会产生多个子进程并发运行sync任务, 但最多不会超过maxProcesses的值。

5). maxDelays 累计到多少所监控的事件激活一次同步,即使后面的sync配置的delay延迟时间还未到。

全局设置

--开头表示注释,下面是几个常用选项说明:
logfile 定义日志文件
stausFile 定义状态文件
statusInterval 将lsyncd的状态写入上面的statusFile的间隔,默认10秒
nodaemon=true 表示不启用守护模式
默认 inotifyMode 指定inotify监控的事件,默认是CloseWrite,还可以是Modify或CloseWrite or Modify
maxProcesses 同步进程的最大个数。假如同时有20个文件需要同步,而maxProcesses = 8,则最大能看到有8个rysnc进程
maxDelays 累计到多少所监控的事件激活一次同步,即使后面的delay延迟时间还未到。

sync
定义同步参数,可以继续使用maxDelays来重写settings的全局变量。

模式运行
default.rsync 本地目录间同步,使用rsync,也可以达到使用ssh形式的远程rsync效果,或daemon方式连接远程rsyncd进程; default.direct 本地目录间同步,使用cp、rm等命令完成差异文件备份; default.rsyncssh 同步到远程主机目录,rsync的ssh模式,需要使用key来认证

目录设置
source 同步的源目录,使用绝对路径。 target 定义目的地址.对应不同的模式有几种写法:/tmp/dest 本地目录同步,可用于direct和rsync模式 ipaddr:/tmp/dest 同步到远程服务器目录,可用于rsync和rsyncssh模式 excludeFrom 排除选项,后面指定排除的列表文件,如excludeFrom = "/etc/lsyncd.exclude",如果是简单的排除,可以使用exclude = LIST。

其它rsync的选项
其它还有rsyncssh模式独有的配置项,如host、targetdir、rsync_path、password_file。

最新版本:3.5


项目主页:https://github.com/lsyncd/lsyncd