开源数据压缩工具-bzip2
2013-05-24 10:16:27

bzip2是Julian Seward开发并按照自由软件/开源软件协议发布的数据压缩算法及程序。Seward在1996年7月第一次公开发布,在随后几年中这个压缩工具稳定性得到改善并且日渐流行。这是一个基于数据块排序算法的文件压缩工具,作为gzip的替代者逐渐得到流行,它可以生成相当小的压缩文件,尤其是对于源代码以及其它的结构化文本来说更是这样,但是这样做的代价是最高达4倍内存与处理器时间消耗,不过它采用bsd协议授权。bzip2压缩的tar包传统上叫作.tar.bz2。


bzip2 is a freely available, patent free, high-quality data compressor. It typically compresses files to within 10% to 15% of the best available techniques (the PPM family of statistical compressors), whilst being around twice as fast at compression and six times faster at decompression.

Why would I want to use it?

Because it compresses well. So it packs more stuff into your overfull disk drives, distribution CDs, backup tapes, USB sticks, etc. And/or it reduces your customer download times, long distance network traffic, etc. It's not the world's fastest compressor, but it's still fast enough to be very useful.

Because it's open-source (BSD-style license), and, as far as I know, patent-free. (To the best of my knowledge. I can't afford to do a full patent search, so I can't guarantee this. Caveat emptor). So you can use it for whatever you like. Naturally, the source code is part of the distribution.

Because it supports (limited) recovery from media errors. If you are trying to restore compressed data from a backup tape or disk, and that data contains some errors, bzip2 may still be able to decompress those parts of the file which are undamaged.

Because you already know how to use it. bzip2's command line flags are similar to those of GNU Gzip, so if you know how to use gzip, you know how to use bzip2.

Because it's very portable. It should run on any 32 or 64-bit machine with an ANSI C compiler. The distribution should compile unmodified on Unix and Win32 systems. Earlier versions have been ported with little difficulty to a large number of weird and wonderful systems.

压缩效率

bzip2比传统的gzip或者ZIP的压缩效率更高,但是它的压缩速度较慢。从这点来说,它非常类似于最近出现的其它一些压缩算法。与RAR或者ZIP等其它不同的是,bzip2只是一个数据压缩工具,而不是归档工具,在这一点上它与gzip类似。程序本身不包含用于多个文件、加密或者文档切分的工具,相反按照UNIX的传统需要使用如tar或者GnuPG这样的外部工具。

在有些情况下,按照绝对压缩效率来讲bzip2不如7z和RAR格式。根据摩尔定律的持续效应,计算时间越来越少并且也变得越来越不重要,所以类似的压缩方法变得越来越流行。根据作者的说法,在目前所有已知的压缩算法中,bzip2可以排到百分之十到十五这样最好的一类算法中(PPM),尽管它在压缩速度时大致快两倍,而解压速度有六倍快。

起初,bzip2的前一代bzip在数据块排列之后使用算术编码进行压缩,由于软件专利的限制现在已经不再使用算术编码。


基于bz2的并行压缩-lbzip2

基于bz2的并行压缩-pbzip2


最新版本:1.0
Removes a potential security vulnerability, CVE-2010-0405, so all users are recommended to upgrade immediately.

官方主页:http://bzip.org/