正则表达式C函数库-TRE-FreeOA

正则表达式C函数库-TRE

2024-09-26 11:29:28

阿炯

正则表达式是一种强大的工具，用于文本处理和模式匹配。TRE（The Regular Expression Library）是一个符合 POSIX 标准、功能丰富的正则表达式C语言函数库，在BSD 2类许可协议授权。

The free and portable approximate regex matching library.

TRE is a lightweight, robust, and efficient POSIX compliant regexp matching library with some exciting features such as approximate (fuzzy) matching.

The matching algorithm used in TRE uses linear worst-case time in the length of the text being searched, and quadratic worst-case time in the length of the used regular expression.

TRE 具有许多优点。首先它的 POSIX 兼容性确保了在不同平台和环境下的一致性。这使得开发者可以在各种项目中放心使用 TRE，而不必担心兼容性问题。其次，TRE 功能丰富。它提供了广泛的正则表达式语法和功能，能够满足各种复杂的文本处理需求。无论是简单的字符串匹配还是复杂的模式识别，TRE 都能高效地完成任务。

为了更好地满足特定项目的需求，可以适当增加代码来扩展 TRE 的功能。例如，可以添加自定义的正则表达式函数或优化匹配算法，以提高处理速度和准确性。还可以结合其他库和工具，实现更复杂的文本处理流程。另外提供了 Perl、Python、Haskell、R语言的绑定。另有一个正则表达式库-PCRE。

Structured roughly as follows:

xmalloc.c:
- Wrappers for the malloc() functions, for error generation and memory leak checking purposes.

tre-mem.c:
- A simple and efficient memory allocator.

tre-stack.c:
- Implements a simple stack data structure.

tre-ast.c:
- Abstract syntax tree (AST) definitions.

tre-parse.c:
- Regexp parser. Parses a POSIX regexp (with TRE extensions) into an abstract syntax tree (AST).

tre-compile.c:
- Compiles ASTs to ready-to-use regex objects. Comprised of two parts:
* Routine to convert an AST to a tagged AST. A tagged AST has appropriate minimized or maximized tags added to keep track of submatches.
* Routine to convert tagged ASTs to tagged nondeterministic state machines (TNFAs) without epsilon transitions (transitions on empty strings).

tre-match-parallel.c:
- Parallel TNFA matcher.
* The matcher basically takes a string and a TNFA and finds the leftmost longest match and submatches in one pass over the input string. Only the beginning of the input string is scanned until a leftmost match and longest match is found.
* The matcher cannot handle back references, but the worst case time consumption is O(l) where l is the length of the input string. The space consumption is constant.

tre-match-backtrack.c:
- A traditional backtracking matcher.
* Like the parallel matcher, takes a string and a TNFA and finds the leftmost longest match and submatches. Portions of the input string may (and usually are) scanned multiple times.
* Can handle back references. The worst case time consumption, however, is O(k^l) where k is some constant and l is the length of the input string. The worst case space consumption is O(l).

tre-match-approx.c:
- Approximate parallel TNFA matcher.
* Finds the leftmost and longest match and submatches in one passover the input string. The match may contain errors. Each missing, substituted, or extra character in the match increases the cost of the match. A maximum cost for the returned match can be given. The cost of the found match is returned.
* Cannot handle back references. The space and time consumption bounds are the same as for the parallel exact matcher, but in general this matcher is slower than the exact matcher.

regcomp.c:
- Implementation of the regcomp() family of functions as simple wrappers for tre_compile().

regexec.c:
- Implementation of the regexec() family of functions.
* The appropriate matcher is dispatched according to the features used in the compiled regex object.

regerror.c:
- Implements the regerror() function.

在实际应用中，TRE 已经被广泛应用于各种软件开发项目中。从文本编辑器到网络服务器，从数据库管理系统到编程语言解释器，TRE 都发挥着重要作用。它是一个功能强大、符合 POSIX 标准的正则表达式库。通过适当增加代码进行扩展，我们可以充分发挥其优势，满足不同项目的特定需求，为软件的质量和性能提升做出贡献。

char *str = "your_string_here";
ret = regexec(&regex, str, 0, NULL, 0);

最新版本：0.9

项目主页：
https://laurikari.net/tre/

https://github.com/laurikari/tre