Linux块设备BlockSize探究-FreeOA

Linux块设备BlockSize探究

2010-08-04 22:56:17

阿炯

在Linux管理中经常碰到“Block Size”。但经常发现此block size非彼block size，意义不相同，大小值也不相同，下面是一些总结。通常Linux的“block size”指的是1024 bytes，Linux用1024 byte blocks 作为buffer cache的基本单位。但linux的文件系统的block确不相同，例如ext3系统，block size是4096。使用tune2fs或dumpe2fs指令能够查看有文件系统的磁盘分区的相关信息，其中就包括了block size。例如：
# tune2fs -l /dev/sda1 |grep "Block"
Block count:131072
Block size:1024
Blocks per group:8192

其实本来这几个概念不是很难，主要是他们的名字都相同，都叫"Block Size"。
1).硬盘上的 block size, 应该是"Sector size"，即通常的扇区大小是512 bytes
2).有文件系统的分区的block size, 是"block size"，有时也显示为bsize，大小不一，能够用工具查看
3).没有文件系统的分区的block size，也叫"block size"，大小指的是1024 bytes
4).Kernel buffer cache 的block size, 就是"block size"，大部分PC是1024
5).磁盘分区的"cylinder size"，用fdisk 能够查看，见1中的说明。

来看看fdisk显示的不同的信息，理解一下这几个概念：
Disk /dev/hda: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot    Start       End    Blocks   Id System
/dev/hda1   *         1      1305 10482381   83 Linux
/dev/hda2          1306      1566   2096482+ 82 Linux swap
/dev/hda3          1567     30401 231617137+ 83 Linux

8225280就是cylinder size，一共有30401个cylinder。Start和End分别标记的是各个分区的起始cylinder。第4列显示的就是以1024为单位的block。为什么“2096482+”有个“+”号呢？因为啊，总size除1024除不尽，是个约数，表示2096482强！

块是文件系统的抽象，而非磁盘的属性，一般是 Sector Size 的倍数；扇区大小则是磁盘的物理属性，它是磁盘设备寻址的最小单元。另外内核中要求 Block_Size = Sector_Size * (2的n次方)，且 Block_Size <= 内存的 Page_Size (页大小)。

Block与Inode

Unix/Linux下的目录和文档是有权限、属性和本身数据的，一般权限、属性这些元信息都存放在inode中，一个文件就有且一个inode，inode记录文件数据所在block的号码；而文件内的数据信息放置在block中。超级块superblock会记录整个文件系统的整体信息，包括inode与block的总量、使用及剩余量等。找到文档的inode号，就可以找到文档所使用的block，因此inode也是一个索引式文件。

对于inode，
每个 inode 大小均固定为 128 bytes
每个文件都仅会占用一个 inode 而已
文件系统能够建立的档案数量与 inode数量正相关
系统读档案时需要先找到 inode，从 inode 中分析是否有相关的权限

对于block，
连续的八个扇区组成一个block(块)
它是文件的存取的最小单位：512*8=4096bytes=4KiB

访问文件流程

用户访问文件，系统则查找对应的inode信息，根据信息内容，判断是否有权限进行访问，有权限则直接访问到数据块block，无权限则拒绝访问。

来看看这段外文对linux上block size的解析

block size on linux

Theoretically, it could be possible to use any block size. Most devices are using 512-byte blocks, and some of them, particularly large HDDs are using 4096-byte blocks. Some optical media are using 2304byte blocks.

The important thing is: the block device controller doesn't know anything from the filesystem on it. It can only read and write blocks, in its block size, to his medium. This is what the block device driver uses to provide the block device for the kernel: essentially a single, large byte array. It doesn't matter, how is it partitioned or which fs is using it.

The filesystem block size is the block size in which the filesystem data structures are organized in the filesystem. It is the internal feature of the filesystem, there isn't even a requirement to use block-oriented data structures, and some filesystems doesn't even do it.

page size on Intel x86 are 4 KiloBytes aligned i.e.. each page size in memory is of 4KB while for IA-64 they are 8 KiloBytes aligned. And the alignement on disk is of 512 Bytes (on most case where the size of a sector is 512 bytes) its like this because to optimize the loading time or mapping the file to memory.

Ext4 uses most typically 4096byte blocks, Disk in most case where the size of a sector is 512 bytes.

Furthermore, disk IO data is handled typically not directly by the processes, but with the virtual memory of your OS. It uses extensively paging. The VM page size is typically 4096 bytes (might be different on non-x86 CPUs), it is determined by the CPU architecture. (For example, newer amd64 CPUs can handle 2MB pages, or dec alpha used 8192 byte pages).

To optimize the data IO, the best if all of them are the multiply of eachother, yet better if they are equal. This typically means: use 4096 byte fs blocks.

It is also important: if your block device is partitioned, the partitions should begin/end of exact page sizes. If you don't do it, for example your sda1 starts on the 17. block of your sda, the CPU will have to issue TWO read/write commands for all page read/write operations, because the physical and the filesystem blocks will overlap.

In the most common scenario, it means: all partitions should start or begin on a sector divisible by 8 (4096/512 = 8). The page size will likely be a multiple of the block size.

Note, typically the low level block IO happens not in single block read/write operations, instead multiple blocks are sent/received in a single command. And re-organizing data is typically not a very big overhead, because memory IO is typically much faster that block device IO. Thus, not following these won't be a big overhead.

上面的解析没有看明白，没有关系，2022年9月笔者再度从不同方面对Linux下Blocks和Block size的一些解析。

一个字符在不同的编码下占有几个字节，不一样的字符所占的字节数是不一样的。

ASCII码：一个英文字母(不分大小写)占一个字节的位置，一个中文汉字占两个字节的位置。一个二进制数字序列在计算机中作为一个数字单元，一般为8位二进制数，换算为十进制时最小值0，最大值255。一个ASCII码就是一个字节。
UTF-8编码：一个英文字符等于一个字节，一个中文(含繁体)等于三个字节。
Unicode编码：一个英文字符等于两个字节，一个中文(含繁体)等于两个字节。

符号：英文标点占一个字节，中文标点占两个字节。示例：英文句号“.”占1个字节的大小，中文句号“。”占2个字节的大小。

先盘点Linux系统中能获取硬盘和各分区的大小及使用情况的命令。

当然df无出其右

df --total -TH --exclude-type=tmpfs | awk '{print $3}' | tail -n 1
468G

df -k | grep /dev/sda
将为提供大小（KB）以及使用的空间和可用空间。

在ext2+的文件系统上可使用dumpe2fs
dumpe2fs /dev/sdxN|grep '^Free blocks'

或者使用更快tune2fs
tune2fs -l /dev/sdxN|grep '^Free blocks:'

带有-I开关的hdparm显示直接从驱动器请求的标识信息。这包括驱动器的大小以及可能感兴趣的几乎所有内容。要仅显示驱动器的大小，可以使用以下命令：
hdparm -I /dev/sdX | grep "device size"

device size with M = 1024*1024:     1907729 MBytes
device size with M = 1000*1000:     2000398 MBytes (2000 GB)

警告：hdparm需要root用户权限，它是一种潜在的破坏性工具。虽然-I开关使用起来非常安全，但其他开关则不一定。

实际上有多种指令来取的这些信息，但并不是所有这些程序都会默认安装，它们都可以在任何GNU/Linux发行版上轻松使用：

lsblk -do NAME,SIZE /dev/sd?
NAME      SIZE
sda       75G
sdb       200G

lsblk --bytes --list
产生信息丰富、无歧义和可解析的输出

lshw | grep -A 15 disk | grep size
size: 465GiB (500GB)

fdisk -l | grep "^Disk /" | gawk '{print $3,$4}'
500.1 GB,

cfdisk /dev/sda

或内置的stat指令

stat -f /dev/sda
File: "/dev/sda"
    ID: 0        Namelen: 255     Type: tmpfs
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 2048       Free: 2048       Available: 2048
Inodes: Total: 16384      Free: 16082

从运行时虚拟目录(/proc或/sys)中获取

/sys/block/sda/size
该文件将返回一些数字，如312581808，然后这个数字需要乘以512标准块(standard block size)大小，然后得到以字节为单位的长整型值，视情况转换为GiB或TiB。还有其它方法：
more /proc/partitions
major minor #blocks name
   8     0 120627360 sda
   8     1 120624021 sda1
   8    16 120627360 sdb

blockdev命令查看指定分区的Block Size
blockdev --getbsz /dev/sda6
4096

Linux 中 1 blocks 为多少 kb

block 是块，这个是系统文件系统的最小分配单位，注意是操作系统的，不是硬件的。这个block是看文件系统建立时的设置情况，类似于 Windows 下面所说的簇。是在格式化系统时进行设置，具体多大看文件系统，默认都是 4k 。至少 Ext3 默认是 4k 。而且 block 大小和磁盘最大限制有关系的，如果你用 4k ，ext3 极限最高 16T，也就是 4k x 2^32 ，如果用的是 1k 大小，那么就缩小到了 4T。

在 Linux 系统中一个 block 的默认大小是 512 bytes。

每个文件系统都需要将分区分割成块来存储文件和文件部分；这就是为什么文件系统有不同的块大小，正如在这里看到的：

stat -f /dev/sda1
File: "/dev/sda1"
    ID: 0        Namelen: 255     Type: tmpfs
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 128237     Free: 128237     Available: 128237
Inodes: Total: 128237     Free: 127912

stat -f /boot
File: "/boot"
    ID: db28d01327503824 Namelen: 255     Type: ext2/ext3
Block size: 1024       Fundamental block size: 1024
Blocks: Total: 198337     Free: 68998      Available: 58758
Inodes: Total: 51200      Free: 50853

1.空闲计数和可用计数之间的差异来自对root用户的保留块。
2.使用env来确保不使用shell的内置stat命令（可能提供或可能不提供所有使用的选项）。

root@crux:/tmp# stat -f .
File: "."
    ID: 80200000000 Namelen: 255     Type: xfs
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 4032000    Free: 2003084    Available: 2003084
Inodes: Total: 8069120    Free: 7914521

因此，如果将文件存储在此文件系统中，它将存储在4096字节的块中，即使文件仅包含5字节，它也将从磁盘容量中减去4096字节。

root@crux:/tmp# df .
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/root       16128000 8115664   8012336 51% /

root@crux:/tmp# echo '123' > a.txt
root@crux:/tmp# df .
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/root       16128000 8115688   8012312 51% /

# du -csh a.txt
4.0K   a.txt
4.0K   total

1. stat命令输出的Blocks单位通常是512 bytes，也就是一个扇区的大小。
2. 一个文件假设只有几个字节，其实也会占用一个文件块(Block size)大小，通常是4096 bytes。
3. 系统通常一次会读取一个Block size大小，而不是一个扇区大小。

stat -f -c %a /var   # Find the number of free blocks on /var
stat -f -c %S /var   # Determine the block size

因此可以使用自定义的（-c）stat输出格式来获取/上的可用空间（%a）：
env stat -f -c %a /
1711744

还是来看df的输出吧
df
文件系统            1K-块       已用      可用已用% 挂载点
/dev/sda3      1950359084 1333202556 617156528   69% /

1K块的标题行是可用的总空间，以1kB为单位。根据POSIX标准，df应以512字节块为单位报告空间。旧版本的Unix在文件系统中使用512字节的块。

The 1K block in GNU coreutils df(1) means 1024 bytes. Confirmed by taking a quick look at GNU coreutils, version 8.13, source code:
964   if (human_output_opts == -1)
965     {
966       if (posix_format)
967         {
968           human_output_opts = 0;
969           output_block_size = (getenv ("POSIXLY_CORRECT") ? 512 : 1024);
970         }
971       else
972         human_options (getenv ("DF_BLOCK_SIZE"),
973                        &human_output_opts, &output_block_size);
974     }

如上所见，GNU的核心工具命令df从8.13开始的默认输出块大小为1024，除非设置了环境变量POSIXLY_CORRECT它才会以512字节为块的基本单位。当然这也只是在显式上的区别而已。

1K-blocks – the size of the filesystem, measured in 1K blocks.
Used – the amount of space used in 1K blocks.
Available – the amount of available space in 1K blocks.

块大小(block size)是文件系统的基本单位，每次读写都以块大小的整数倍来进行。块大小也是文件在磁盘上分配的最小单位。如果块大小为16字节，则16字节的文件刚好占用磁盘上的整个块。

The book "Practical file system design" states:
Block: The smallest unit writable by a disk or file system. Everything a file system does is composed of operations done on blocks. A file system block is always the same size as or larger (in integer multiples) than the disk block size.

Block size on fs refers to mapping disk surface; minor the size of the single block major the number of blocks (and so the elements in the table that keeps information on allocation of files). With stat command:
%b   number of blocks allocated (see %B)
%B   the size in bytes of each block reported by %b
%s   block size (for faster transfers)

file "a.txt" now contains 6 bytes, an "12345" and a newline character.

# stat -c "%b %B %s" a.txt
8 512 4

There are 8 blocks allocated, each block is 512 bytes in size.
这是文件系统跟踪的最小空间量。

sysfs下记录了操作系统所发现设备或分区的大小，如整个硬盘的大小：/sys/block/sda/size

more /sys/class/block/sda/size
This gives you its size in 512-byte blocks.

A block is a sequence of bit or Bytes with a fixed length ie 512 bytes, 4kB, 8kB, 16kB, 32kB etc.
块是具有固定长度的位或字节序列，即512字节、4kB、8kB、16kB、32kB等。

/proc/partitions give similar information in 1K-sized blocks

man excerpt of blockdev
--getsize64   Print device size in bytes.
--getsize   Print device size (32-bit!) in sectors. Deprecated in favor of the --getsz option.
--getsz   Get size in 512-byte sectors.

blockdev --getsize64=512*(blockdev --getsz)

使用blockdev来获取块设备分区的大小信息
blockdev --getsize64 <dev> returns device size in bytes;
blockdev --getsz <dev> returns device size in 512-bytes sectors;

blockdev --getsize64 /dev/sda returns size in bytes.
blockdev --getsz /dev/sda returns size in 512-byte sectors.

Deprecated: blockdev --getsize /dev/sda returns size in sectors.

Options --getsz and deprecated --getsize are not the same.

BLKSSZGET (blockdev --getss) is for physical sector size and
BLKBSZGET (blockdev --getbsz) is for logical sector size.

取的sda块设备大小
echo $(($(blockdev --getsize64 /dev/sda)/$(blockdev --getss /dev/sda)))
3907029168

调用bc来做计算
echo "`cat /sys/class/block/sda2/size`*512" | bc

如果使用bash或任何其他类似POSIX的shell，其算术运算符可以处理64位整数，甚至不需要调用bc
echo "$((512*$(cat /sys/class/block/sda2/size)))"

以字节为单位给出大小。可以优化cat对fork的调用（bash除外）以兼容ksh93和zsh：
echo "$((512*$(</sys/class/block/sda2/size)))"

$(($(blockdev --getsize64 /dev/sda)/$(blockdev --getss /dev/sda))) = $(blockdev --getsz /dev/sda)

blockdev --getbsz partition
blockdev --getbsz /dev/sda1
4096

So the block size of this file system is 4kB.The ioctl BLKGETSIZE has the same problem as it is in units of 512 rather than BLKSSZGET. BLKGETSIZE64 solves this ambiguity. The real block count is BLKGETSIZE64/BLKSSZGET.
所以该文件系统的块大小是4kB。ioctl BLKGETSIZE与以512为单位而不是以BLKSSZGET为单位存在相同的问题。BLKGETSIZE64解决了这种不确定性。实际块计数为BLKGETSIZE64/BLKSSZGET。

Block size typically refers to File System block size. In General, Linux uses default block size of 4096 bytes (or 4 KB). Even if you create a file with size of just 10 bytes, it occupies 1 block aka 4096-byte block.
块大小通常指文件系统块大小。通常Linux使用默认块大小4096字节（或4KB）。即使创建一个大小仅为10字节的文件，它也会占用一个块，即4096字节块。

Linux内核对待分区与扇区大小字节
size of linux kernel partition sector size

扇区的信息在block/stat.txt文档中定义。

With the patch:
diff --git a/Documentation/ABI/testing/sysfs-block
+What:      /sys/block/<disk>/<partition>/size
+Date:      October 2002
+Contact:   linux-block@vger.kernel.org
+Kernel Version:    2.5.43
+Description:
+       Size of the partition in standard UNIX 512-byte sectors
+       (not a device-specific block size).

从Linux源代码注释可知：Linux总是认为扇区长度为512字节，与设备的实际块大小无关。

开发者视角查询块设备大小
No need for ioctl in C. Just seek to the end of the file and get the size (in bytes) that way:
/* define this before any #includes when dealing with large files: */
#define _FILE_OFFSET_BITS 64
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

// ...
int fd = open("/dev/sda", O_RDONLY);
off_t size = lseek(fd, 0, SEEK_END);
// Now size is the size of the file, in bytes, or -1 on error.
// lseek(fd, 0, SEEK_SET) to get back to the start of the file.

参考链接：
Gnu-coreutils-Block-size
determine-the-size-of-a-block-device
file-block-size-difference-between-stat-and-ls
计算机网络知识全介绍