内存使用诊断工具-Valgrind-FreeOA

内存使用诊断工具-Valgrind

2013-12-10 17:10:56

Valgrind是一个运行时诊断工具，它可以监视一个指定程序的活动并通知你在你的代码中可能存在的各种各样的内存管理问题。它类似于老式的Electric Fence工具(该工具将标准的内存分配函数替换为自己的函数以提高诊断能力)，但被认为更容易使用并且在多个方面都提供了更丰富的功能，而且现在大多数主流Linux发行版都提供了该工具，所以在你的系统中使用它不需要花费太多时间，你只需安装它的软件包即可。

采用C/C++开发并在GPLv2协议下授权。

Valgrind is an instrumentation framework for building dynamic analysis tools. There are Valgrind tools that can automatically detect many memory management and threading bugs, and profile your programs in detail. You can also use Valgrind to build new tools.

The Valgrind distribution currently includes six production-quality tools: a memory error detector, two thread error detectors, a cache and branch-prediction profiler, a call-graph generating cache and branch-prediction profiler, and a heap profiler. It also includes three experimental tools: a stack/global array overrun detector, a second heap profiler that examines how heap blocks are used, and a SimPoint basic block vector generator. It runs on the following platforms: X86/Linux, AMD64/Linux, ARM/Linux, PPC32/Linux, PPC64/Linux, S390X/Linux, MIPS32/Linux, MIPS64/Linux, ARM/Android (2.3.x and later), X86/Android (4.0 and later), X86/Darwin and AMD64/Darwin (Mac OS X 10.7, with limited support for 10.8).

用C/C++开发其中最令人头疼的一个问题就是内存管理，有时候为了查找一个内存泄漏或者一个内存访问越界，需要要花上好几天时间，如果有一款工具能够帮助我们做这件事情就好了，Valgrind正好就是这样的一款工具；它是一款基于模拟linux下的程序调试器和剖析器的软件套件，可以运行于x86，amd64和ppc32架构上。valgrind包含一个核心，它提供一个虚拟的CPU运行程序，还有一系列的工具，它们完成调试，剖析和一些类似的任务。valgrind是高度模块化的，所以开发人员或者用户可以给它添加新的工具而不会损坏己有的结构。

valgrind包含几个标准的工具，分别是：

1、memcheck
memcheck探测程序中内存管理存在的问题。它检查所有对内存的读/写操作，并截取所有的malloc/new/free/delete调用。因此memcheck工具能够探测到以下问题：
1）使用未初始化的内存
2）读/写已经被释放的内存
3）读/写内存越界
4）读/写不恰当的内存栈空间
5）内存泄漏
6）使用malloc/new/new[]和free/delete/delete[]不匹配。

2、cachegrind
cachegrind是一个cache剖析器。它模拟执行CPU中的L1, D1和L2 cache，因此它能很精确的指出代码中的cache未命中。如果你需要，它可以打印出cache未命中的次数，内存引用和发生cache未命中的每一行代码，每一个函数，每一个模块和整个程序的摘要。如果你要求更细致的信息，它可以打印出每一行机器码的未命中次数。在x86和amd64上，cachegrind通过CPUID自动探测机器的cache配置，所以在多数情况下它不再需要更多的配置信息了。

3、helgrind
helgrind查找多线程程序中的竞争数据。helgrind查找内存地址，那些被多于一条线程访问的内存地址，但是没有使用一致的锁就会被查出。这表示这些地址在多线程间访问的时候没有进行同步，很可能会引起很难查找的时序问题。

valgrind对程序都做了些什么
valgrind被设计成非侵入式的，它直接工作于可执行文件上，因此在检查前不需要重新编译、连接和修改你的程序。要检查一个程序很简单，只需要执行下面的命令就可以了。

代码如下
valgrind --tool=tool_name program_name

比如要对ls -l命令做内存检查，只需要执行下面的命令就可以了
valgrind --tool=memcheck ls -l

不管是使用哪个工具，valgrind在开始之前总会先取得对程序的控制权，从可执行关联库里读取调试信息。然后在valgrind核心提供的虚拟CPU上运行程序，它会根据选择的工具来处理代码，该工具会向代码中加入检测代码，并把这些代码作为最终代码返回给valgrind核心，最后valgrind核心运行这些代码。如果要检查内存泄漏，只需要增加–leak-check=yes就可以了，命令如下：
valgrind --tool=memcheck --leak-check=yes ls -l

不同工具间加入的代码变化非常的大。在每个作用域的末尾，memcheck加入代码检查每一片内存的访问和进行值计算，代码大小至少增加12倍，运行速度要比平时慢25到50倍。

valgrind模拟程序中的每一条指令执行，因此，检查工具和剖析工具不仅仅是对你的应用程序，还有对共享库，GNU C库，X的客户端库都起作用。其工作流程大致如下：

首先在编译程序的时候打开调试模式（gcc编译器的-g选项）。如果没有调试信息，即使最好的valgrind工具也将中能够猜测特定的代码是属于哪一个函数。打开调试选项进行编译后再用valgrind检查，valgrind将会给你的个详细的报告，比如哪一行代码出现了内存泄漏。

当检查的是C++程序的时候，还应该考虑另一个选项 -fno-inline。它使得函数调用链很清晰，这样可以减少你在浏览大型C++程序时的混乱。比如在使用这个选项的时候，用memcheck检查openoffice就很容易。当然，你可能不会做这项工作，但是使用这一选项使得valgrind生成更精确的错误报告和减少混乱。

一些编译优化选项(比如-O2或者更高的优化选项)，可能会使得memcheck提交错误的未初始化报告，因此，为了使得valgrind的报告更精确，在编译的时候最好不要使用优化选项。

如果程序是通过脚本启动的，可以修改脚本里启动程序的代码，或者使用–trace-children=yes选项来运行脚本。下面是用memcheck检查ls -l命令的输出报告，在终端下执行下面的命令：
valgrind --tool=memcheck ls -l

程序会打印出ls -l命令的结果，最后是valgrind的检查报告如下：

==4187==
==4187== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 19 from 2)
==4187== malloc/free: in use at exit: 15,154 bytes in 105 blocks.
==4187== malloc/free: 310 allocs, 205 frees, 60,093 bytes allocated.
==4187== For counts of detected errors, rerun with: -v
==4187== searching for pointers to 105 not-freed blocks.
==4187== checked 145,292 bytes.
==4187==
==4187== LEAK SUMMARY:
==4187== definitely lost: 0 bytes in 0 blocks.
==4187== possibly lost: 0 bytes in 0 blocks.
==4187== still reachable: 15,154 bytes in 105 blocks.
==4187== suppressed: 0 bytes in 0 blocks.
==4187== Reachable blocks (those to which a pointer was found) are not shown.
==4187== To see them, rerun with: –show-reachable=yes

这里的“4187”指的是执行ls -l的进程ID，这有利于区别不同进程的报告。memcheck会给出报告，分配置和释放了多少内存，有多少内存泄漏了，还有多少内存的访问是可达的，检查了多少字节的内存。

再举两个用valgrind做内存检查的例子

例子一 (test.c)：
#include <string.h>
int main(int argc, char *argv[]){
    char *ptr;
    ptr = (char*) malloc(10);
    strcpy(ptr, "01234567890");

    return 0;
}

编译程序
gcc -g -o test test.c

用valgrind执行命令
valgrind --tool=memcheck --leak-check=yes ./test

报告如下
==7015== Memcheck, a memory error detector.
==7015== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==7015== Using LibVEX rev 1606, a library for dynamic binary translation.
==7015== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==7015== Using valgrind-3.2.0, a dynamic binary instrumentation framework.
==7015== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==7015== For more details, rerun with: -v
==7015==
==7015== Invalid write of size 1
==7015== at 0×4006190: strcpy (mc_replace_strmem.c:271)
==7015== by 0x80483DB: main (test.c:8)
==7015== Address 0×4023032 is 0 bytes after a block of size 10 alloc'd
==7015== at 0x40044F6: malloc (vg_replace_malloc.c:149)
==7015== by 0x80483C5: main (test.c:7)
==7015==
==7015== Invalid write of size 1
==7015== at 0x400619C: strcpy (mc_replace_strmem.c:271)
==7015== by 0x80483DB: main (test.c:8)
==7015== Address 0×4023033 is 1 bytes after a block of size 10 alloc'd
==7015== at 0x40044F6: malloc (vg_replace_malloc.c:149)
==7015== by 0x80483C5: main (test.c:7)
==7015==
==7015== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 12 from 1)
==7015== malloc/free: in use at exit: 10 bytes in 1 blocks.
==7015== malloc/free: 1 allocs, 0 frees, 10 bytes allocated.
==7015== For counts of detected errors, rerun with: -v
==7015== searching for pointers to 1 not-freed blocks.
==7015== checked 51,496 bytes.
==7015==
==7015==
==7015== 10 bytes in 1 blocks are definitely lost in loss record 1 of 1
==7015== at 0x40044F6: malloc (vg_replace_malloc.c:149)
==7015== by 0x80483C5: main (test.c:7)
==7015==
==7015== LEAK SUMMARY:
==7015== definitely lost: 10 bytes in 1 blocks.
==7015== possibly lost: 0 bytes in 0 blocks.
==7015== still reachable: 0 bytes in 0 blocks.
==7015== suppressed: 0 bytes in 0 blocks.
==7015== Reachable blocks (those to which a pointer was found) are not shown.
==7015== To see them, rerun with: –show-reachable=yes

从这份报告可以看出，进程号是7015，test.c的第8行写内存越界了，引起写内存越界的是strcpy函数，
第7行泄漏了10个字节的内存，引起内存泄漏的是malloc函数。

例子二（test2.c)
#include <stdio.h>
int foo(int x){
    if (x < 0) {
        printf("%d ", x);
    }
    return 0;
}

int main(int argc, char *argv[]){
    int x;
    foo(x);
    return 0;
}

编译程序
gcc -g -o test2 test2.c

用valgrind做内存检查
valgrind --tool=memcheck ./test2

输出报告如下
==5619== Memcheck, a memory error detector.
==5619== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==5619== Using LibVEX rev 1606, a library for dynamic binary translation.
==5619== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==5619== Using valgrind-3.2.0, a dynamic binary instrumentation framework.
==5619== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==5619== For more details, rerun with: -v
==5619==
==5619== Conditional jump or move depends on uninitialised value(s)
==5619== at 0×8048372: foo (test2.c:5)
==5619== by 0x80483B4: main (test2.c:16)
==5619==p p
==5619== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 12 from 1)
==5619== malloc/free: in use at exit: 0 bytes in 0 blocks.
==5619== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
==5619== For counts of detected errors, rerun with: -v
==5619== All heap blocks were freed — no leaks are possible.

从这份报告可以看出进程PID是5619，test2.c文件的第16行调用了foo函数，在test2.c文件的第5行foo函数使用了一个未初始化的变量。

valgrind还有很多使用选项，具体可以查看valgrind的man手册页和valgrind官方网站的在线文档。

最新版本：3.9

官方主页：http://valgrind.org/