纯C语言实现的HTML5解析库-Gumbo
2013-08-26 09:44:48 阿炯

本站赞助商链接,请多关照。 Gumbo 是 Google 的一款用C语言实现的HTML5解析库,无需任何外部依赖。在Apache协议下授权。

Gumbo - A pure-C HTML5 parser.

Gumbo is an implementation of the HTML5 parsing algorithm implemented as a pure C99 library with no outside dependencies. It's designed to serve as a building block for other tools and libraries such as linters, validators, templating languages, and refactoring and analysis tools.

特点
遵循 HTML5 规范
功能强大,可处理一些不规范的 HTML 标签
简单的 API
支持源位置和指针回到原始文本
轻巧、没有外部依赖
通过 html5lib-0.95 兼容测试
已在超过25亿个来自谷歌索引的页面中进行过测试

示例代码:
#include "gumbo.h"
int main(int argc, char** argv) {
  GumboOutput* output = gumbo_parse(argv[1]);
  // Do stuff with output->root
  gumbo_destroy_output(&kGumboDefaultOptions, output);
}

最新版本:


项目主页:https://github.com/google/gumbo-parser