正则表达式引擎C函数库-Onigmo


正则表达式在软件开发中是一种强大的工具,用于文本匹配和处理。Onigmo 作为 Oniguruma 的一个分支,支持更高级的正则表达式,首发于2002年1月,由K. Kosako主导开发。采用BSD-2-Clause许可协议。
Oniguruma is a modern and flexible regular expressions library. It encompasses features from different regular expression implementations that traditionally exist in different languages.
Character encoding can be specified per regular expression object.
Supported character encodings:
ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE, EUC-JP, EUC-TW, EUC-KR, EUC-CN, Shift_JIS, Big5, GB18030, KOI8-R, CP1251, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10, ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16
GB18030: contributed by KUBO Takehiro
CP1251: contributed by Byte
doc/SYNTAX.md: contributed by seanofw
Onigmo具有许多出色的特性。它能够高效地处理复杂的正则表达式模式,快速准确地进行文本匹配。无论是在文本编辑器、编程语言解释器还是其他需要文本处理的应用中,Onigmo 都能发挥重要作用。其灵活性使得开发者可以根据具体需求定制正则表达式的行为。同时它还提供了丰富的 API,方便与各种编程语言进行集成。为了进一步增强 Onigmo 的功能,开发者可以适当增加代码。例如可以添加对特定字符集或编码的支持,以满足不同语言和地区的需求。或者开发插件,扩展 Onigmo 的应用场景。
示例代码
#include <stdio.h>
#include <onigmo.h>
int main() {
OnigRegex regex;
const char* pattern = "hello.*world";
const char* text = "This is a test. hello everyone world!";
int r; OnigErrorInfo einfo;
OnigEncoding enc = ONIG_ENCODING_UTF8;
onig_init();
r = onig_new(®ex, (OnigUChar*)pattern, (OnigUChar*)pattern + strlen(pattern), ONIG_OPTION_DEFAULT, enc, ONIG_SYNTAX_DEFAULT, &einfo);
if (r!= ONIG_NORMAL) {
char s[ONIG_MAX_ERROR_MESSAGE_LEN];
onig_error_code_to_str((UChar*)s, r, &einfo);
fprintf(stderr, "ERROR: %s\n", s);
return -1;
}
OnigRegion* region = onig_region_new();
r = onig_search(regex, (OnigUChar*)text, (OnigUChar*)text + strlen(text), (OnigUChar*)text, (OnigUChar*)text + strlen(text), region, ONIG_OPTION_NONE);
if (r >= 0) {
printf("Matched at position %d\n", region->beg[0]);
} else {
printf("No match\n");
}
onig_region_free(region, 1);
onig_free(regex);
onig_end();
return 0;
}
#include <stdio.h>
#include <onigmo.h>
int main() {
OnigRegex regex;
const char* pattern = "(hello) (world)";
const char* text = "hello world";
int r;
OnigErrorInfo einfo;
OnigEncoding enc = ONIG_ENCODING_UTF8;
onig_init();
r = onig_new(®ex, (OnigUChar*)pattern, (OnigUChar*)pattern + strlen(pattern), ONIG_OPTION_DEFAULT, enc, ONIG_SYNTAX_DEFAULT, &einfo);
if (r!= ONIG_NORMAL) {
char s[ONIG_MAX_ERROR_MESSAGE_LEN];
onig_error_code_to_str((UChar*)s, r, &einfo);
fprintf(stderr, "ERROR: %s\n", s);
return -1;
}
OnigRegion* region = onig_region_new();
r = onig_search(regex, (OnigUChar*)text, (OnigUChar*)text + strlen(text), (OnigUChar*)text, (OnigUChar*)text + strlen(text), region, ONIG_OPTION_NONE);
if (r >= 0) {
printf("Matched at position %d\n", region->beg[0]);
// 打印捕获的子表达式
printf("First captured group: %.*s\n", region->end[1] - region->beg[1], text + region->beg[1]);
printf("Second captured group: %.*s\n", region->end[2] - region->beg[2], text + region->beg[2]);
} else {
printf("No match\n");
}
onig_region_free(region, 1);
onig_free(regex);
onig_end();
return 0;
}
在实际应用中,Onigmo 已经被广泛应用于众多项目中。它帮助开发者提高了文本处理的效率和准确性,节省了开发时间和精力。它作为一个强大的正则表达式引擎,为软件开发提供了有力的支持。通过适当增加代码进行扩展可以充分发挥其优势,满足各种复杂的文本处理需求。
最新版本:6.9
v6.9.9于2023的10月发布。
项目主页:
https://github.com/kkos/oniguruma
Oniguruma is a modern and flexible regular expressions library. It encompasses features from different regular expression implementations that traditionally exist in different languages.
Character encoding can be specified per regular expression object.
Supported character encodings:
ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE, EUC-JP, EUC-TW, EUC-KR, EUC-CN, Shift_JIS, Big5, GB18030, KOI8-R, CP1251, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10, ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16
GB18030: contributed by KUBO Takehiro
CP1251: contributed by Byte
doc/SYNTAX.md: contributed by seanofw
Onigmo具有许多出色的特性。它能够高效地处理复杂的正则表达式模式,快速准确地进行文本匹配。无论是在文本编辑器、编程语言解释器还是其他需要文本处理的应用中,Onigmo 都能发挥重要作用。其灵活性使得开发者可以根据具体需求定制正则表达式的行为。同时它还提供了丰富的 API,方便与各种编程语言进行集成。为了进一步增强 Onigmo 的功能,开发者可以适当增加代码。例如可以添加对特定字符集或编码的支持,以满足不同语言和地区的需求。或者开发插件,扩展 Onigmo 的应用场景。
示例代码
#include <stdio.h>
#include <onigmo.h>
int main() {
OnigRegex regex;
const char* pattern = "hello.*world";
const char* text = "This is a test. hello everyone world!";
int r; OnigErrorInfo einfo;
OnigEncoding enc = ONIG_ENCODING_UTF8;
onig_init();
r = onig_new(®ex, (OnigUChar*)pattern, (OnigUChar*)pattern + strlen(pattern), ONIG_OPTION_DEFAULT, enc, ONIG_SYNTAX_DEFAULT, &einfo);
if (r!= ONIG_NORMAL) {
char s[ONIG_MAX_ERROR_MESSAGE_LEN];
onig_error_code_to_str((UChar*)s, r, &einfo);
fprintf(stderr, "ERROR: %s\n", s);
return -1;
}
OnigRegion* region = onig_region_new();
r = onig_search(regex, (OnigUChar*)text, (OnigUChar*)text + strlen(text), (OnigUChar*)text, (OnigUChar*)text + strlen(text), region, ONIG_OPTION_NONE);
if (r >= 0) {
printf("Matched at position %d\n", region->beg[0]);
} else {
printf("No match\n");
}
onig_region_free(region, 1);
onig_free(regex);
onig_end();
return 0;
}
#include <stdio.h>
#include <onigmo.h>
int main() {
OnigRegex regex;
const char* pattern = "(hello) (world)";
const char* text = "hello world";
int r;
OnigErrorInfo einfo;
OnigEncoding enc = ONIG_ENCODING_UTF8;
onig_init();
r = onig_new(®ex, (OnigUChar*)pattern, (OnigUChar*)pattern + strlen(pattern), ONIG_OPTION_DEFAULT, enc, ONIG_SYNTAX_DEFAULT, &einfo);
if (r!= ONIG_NORMAL) {
char s[ONIG_MAX_ERROR_MESSAGE_LEN];
onig_error_code_to_str((UChar*)s, r, &einfo);
fprintf(stderr, "ERROR: %s\n", s);
return -1;
}
OnigRegion* region = onig_region_new();
r = onig_search(regex, (OnigUChar*)text, (OnigUChar*)text + strlen(text), (OnigUChar*)text, (OnigUChar*)text + strlen(text), region, ONIG_OPTION_NONE);
if (r >= 0) {
printf("Matched at position %d\n", region->beg[0]);
// 打印捕获的子表达式
printf("First captured group: %.*s\n", region->end[1] - region->beg[1], text + region->beg[1]);
printf("Second captured group: %.*s\n", region->end[2] - region->beg[2], text + region->beg[2]);
} else {
printf("No match\n");
}
onig_region_free(region, 1);
onig_free(regex);
onig_end();
return 0;
}
在实际应用中,Onigmo 已经被广泛应用于众多项目中。它帮助开发者提高了文本处理的效率和准确性,节省了开发时间和精力。它作为一个强大的正则表达式引擎,为软件开发提供了有力的支持。通过适当增加代码进行扩展可以充分发挥其优势,满足各种复杂的文本处理需求。
最新版本:6.9
v6.9.9于2023的10月发布。
项目主页:
https://github.com/kkos/oniguruma