Perl Thrift 操作HBase入门
2015-10-16 11:55:16 阿炯
-----------------------------
Apache Thrift - 可伸缩的跨语言服务开发框架,它是 Facebook 实现的一种高效的、支持多种编程语言的远程服务调用的框架。
目前流行的服务调用方式有很多种,例如基于 SOAP 消息格式的 Web Service,基于 JSON 消息格式的 RESTful 服务等。其中所用到的数据传输方式包括 XML,JSON 等,然而 XML 相对体积太大,传输效率低,JSON 体积较小,新颖,但还不够完善。由 Facebook 开发的远程服务调用框架 Apache Thrift,它采用接口描述语言定义并创建服务,支持可扩展的跨语言服务开发,所包含的代码生成引擎可以在多种语言中,如 C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk 等创建高效的、无缝的服务,其传输数据采用二进制格式,相对 XML 和 JSON 体积更小,对于高并发、大数据量和多语言的环境更有优势。
本文简单地介绍在perl中使用thrift来操作hbase的简单方法。
-----------------------------
Perl环境下安装Thrift过程中的问题
提示缺少yacc、flex命令
编译时抛出错误信息如下:
/bin/sh ../../ylwrap `test -f ¨src/thrifty.yy¨ || echo ¨./¨`src/thrifty.yy y.tab.c thrifty.cc y.tab.h thrifty.h y.output thrifty.output -- yacc -d
../../ylwrap: line 109: yacc: command not found
直接通过yum命令安装byacc、flex
检查configure的输出日志,发现与在perl有关的部分提示:
checking for perl... /usr/bin/perl
checking for perl module Bit::Vector... no
这是缺少模块,安装Bit::Vector模块:
可通过cpanm脚本来快速安装缺少的模块:
Bit::Vector
Class::Accessor
出现上类错误,安装软件包后如果编译仍然出错,那么就删除MakeFile和config.log文件,而后重新执行./configure命令编译安装。
Installing /usr/local/lib/perl5/Thrift.pm
Installing /usr/local/lib/perl5/Thrift/HttpClient.pm
Installing /usr/local/lib/perl5/Thrift/BufferedTransport.pm
Installing /usr/local/lib/perl5/Thrift/MemoryBuffer.pm
Installing /usr/local/lib/perl5/Thrift/Socket.pm
Installing /usr/local/lib/perl5/Thrift/Protocol.pm
Installing /usr/local/lib/perl5/Thrift/BinaryProtocol.pm
Installing /usr/local/lib/perl5/Thrift/Server.pm
Installing /usr/local/lib/perl5/Thrift/Transport.pm
Installing /usr/local/lib/perl5/Thrift/FramedTransport.pm
首先要生成perl的映射,这段要在装有HBase的环境上操作:而后将生成的gen-perl目录中的Hbase目录,复制到具有perl环境的服务器适当路径下:
mkdir perl-src/packages -p
cp -a gen-perl/Hbase perl-src/packages
cp -rf sf/thrift-0.9.2/lib/perl pthrift
Hbase文件中的*.pm就是我们要用到的连接模板。
下面创建一段perl脚本,通过thrift连接HBase,输出当前所有表名:
use lib "./pthrift/lib";
use lib "./pthrift/packages";
use Data::Dumper;
use Thrift;
use Thrift::BinaryProtocol;
use Thrift::Socket;
use Thrift::BufferedTransport;
use Hbase::Hbase;
my $host="192.168.0.8";
my $port="9090";
my $socket=new Thrift::Socket($host,$port);
$socket->setRecvTimeout(10000);
my $transport = new Thrift::BufferedTransport($socket,1024,1024);
my $protocol = new Thrift::BinaryProtocol($transport);
my $client= new Hbase::HbaseClient($protocol);
eval {$transport->open()};
my $tables = $client->getTableNames();
foreach my $table (sort @{$tables}){
print " found {$table}\n";
}
查询表中的记录的示例:
use v5.20;
use Data::Dumper;
use lib "./pthrift/lib";
use lib "./pthrift/packages";
use Thrift;
use Thrift::BinaryProtocol;
use Thrift::Socket;
use Thrift::BufferedTransport;
use Hbase::Hbase;
my $host = '192.168.0.8';
my $port = '9090';
my $table = 'freeoa';
my $filter ="PrefixFilter('')";
my $socket= new Thrift::Socket($host,$port);
my $transport = new Thrift::BufferedTransport($socket,1024,1024);
my $protocol= new Thrift::BinaryProtocol($transport);
my $client= new Hbase::HbaseClient($protocol);
#my $rows=$client->getRow($table,"column=col1");
#say Dumper($rows);
eval{
$transport->open();
my $scan = new Hbase::TScan();
$scan->{filterString} = $filter;
my $scanner = $client->scannerOpenWithScan($table,$scan);
#my $row=$client->scannerGet($scanner);
#say Dumper(@$row);
#say $row->getRowsWithColumns();
#get row
#my $grow=$client->getRow($table,'row2');
sub scan_row{
my $rowlst=$client->scannerGetList($scanner,3);
say Dumper($rowlst);
foreach my $row (@$rowlst){
say Dumper($row->{columns});
foreach my $cols (keys %$row->{columns}){
say qq[$cols:$row->{columns}->{$cols}->{value}];
#say qq[$row->{row}:$row->{columns}->{value}];
}
}
}
for(1..100) {
print "============ $_ ========\n";
my $get_arr = $client->scannerGetList($scanner,1);
last unless @$get_arr;
print Dumper ($get_arr);
foreach my $rowresult (@$get_arr) {
#foreach my $k (keys %$rowresult) {
print "row: ",$rowresult->{row}, $/;
print "cols: ";
print Dumper $rowresult->{columns};
#foreach (keys %$rowresult->{columns}){
#say $_;
#say $rowresult->{columns}->{$_}->{value};
#say $rowresult->{columns}->get(columnname).value;
#}
print "----------\n";
#}
}
}
$client->scannerClose($scan);
$transport->close();
};
if($@){
warn(Dumper($@));
}
-----------------------------
可供参考的链接
How-to: Use the Apache HBase REST Interface, Part 1
How-to: Use the Apache HBase REST Interface, Part 2
How-to: Use the Apache HBase REST Interface, Part 3
How-to: Use the HBase Thrift Interface, Part 1
How-to: Use the HBase Thrift Interface, Part 2: Inserting/Getting Rows
How-to: Use the HBase Thrift Interface, Part 3 – Using Scans
php/perl/python , 通过thrift 连接 hbase,进行条件过滤选择
-----------------------------
总结
这个接口很不好用,太缺少参考文档了,本身的代码写的也不太好(基于perl 5.6)。实不敢用,上面的从hb中读取记录的api是从同事的python脚本中猜测出来,从外界能获取的帮助实在太少了,个人转投REST方法了。
-------------------------------
项目主页:http://thrift.apache.org/
Apache Thrift - 可伸缩的跨语言服务开发框架,它是 Facebook 实现的一种高效的、支持多种编程语言的远程服务调用的框架。
目前流行的服务调用方式有很多种,例如基于 SOAP 消息格式的 Web Service,基于 JSON 消息格式的 RESTful 服务等。其中所用到的数据传输方式包括 XML,JSON 等,然而 XML 相对体积太大,传输效率低,JSON 体积较小,新颖,但还不够完善。由 Facebook 开发的远程服务调用框架 Apache Thrift,它采用接口描述语言定义并创建服务,支持可扩展的跨语言服务开发,所包含的代码生成引擎可以在多种语言中,如 C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk 等创建高效的、无缝的服务,其传输数据采用二进制格式,相对 XML 和 JSON 体积更小,对于高并发、大数据量和多语言的环境更有优势。
本文简单地介绍在perl中使用thrift来操作hbase的简单方法。
-----------------------------
Perl环境下安装Thrift过程中的问题
提示缺少yacc、flex命令
编译时抛出错误信息如下:
/bin/sh ../../ylwrap `test -f ¨src/thrifty.yy¨ || echo ¨./¨`src/thrifty.yy y.tab.c thrifty.cc y.tab.h thrifty.h y.output thrifty.output -- yacc -d
../../ylwrap: line 109: yacc: command not found
直接通过yum命令安装byacc、flex
检查configure的输出日志,发现与在perl有关的部分提示:
checking for perl... /usr/bin/perl
checking for perl module Bit::Vector... no
这是缺少模块,安装Bit::Vector模块:
可通过cpanm脚本来快速安装缺少的模块:
Bit::Vector
Class::Accessor
出现上类错误,安装软件包后如果编译仍然出错,那么就删除MakeFile和config.log文件,而后重新执行./configure命令编译安装。
Installing /usr/local/lib/perl5/Thrift.pm
Installing /usr/local/lib/perl5/Thrift/HttpClient.pm
Installing /usr/local/lib/perl5/Thrift/BufferedTransport.pm
Installing /usr/local/lib/perl5/Thrift/MemoryBuffer.pm
Installing /usr/local/lib/perl5/Thrift/Socket.pm
Installing /usr/local/lib/perl5/Thrift/Protocol.pm
Installing /usr/local/lib/perl5/Thrift/BinaryProtocol.pm
Installing /usr/local/lib/perl5/Thrift/Server.pm
Installing /usr/local/lib/perl5/Thrift/Transport.pm
Installing /usr/local/lib/perl5/Thrift/FramedTransport.pm
首先要生成perl的映射,这段要在装有HBase的环境上操作:而后将生成的gen-perl目录中的Hbase目录,复制到具有perl环境的服务器适当路径下:
mkdir perl-src/packages -p
cp -a gen-perl/Hbase perl-src/packages
cp -rf sf/thrift-0.9.2/lib/perl pthrift
Hbase文件中的*.pm就是我们要用到的连接模板。
下面创建一段perl脚本,通过thrift连接HBase,输出当前所有表名:
use lib "./pthrift/lib";
use lib "./pthrift/packages";
use Data::Dumper;
use Thrift;
use Thrift::BinaryProtocol;
use Thrift::Socket;
use Thrift::BufferedTransport;
use Hbase::Hbase;
my $host="192.168.0.8";
my $port="9090";
my $socket=new Thrift::Socket($host,$port);
$socket->setRecvTimeout(10000);
my $transport = new Thrift::BufferedTransport($socket,1024,1024);
my $protocol = new Thrift::BinaryProtocol($transport);
my $client= new Hbase::HbaseClient($protocol);
eval {$transport->open()};
my $tables = $client->getTableNames();
foreach my $table (sort @{$tables}){
print " found {$table}\n";
}
查询表中的记录的示例:
use v5.20;
use Data::Dumper;
use lib "./pthrift/lib";
use lib "./pthrift/packages";
use Thrift;
use Thrift::BinaryProtocol;
use Thrift::Socket;
use Thrift::BufferedTransport;
use Hbase::Hbase;
my $host = '192.168.0.8';
my $port = '9090';
my $table = 'freeoa';
my $filter ="PrefixFilter('')";
my $socket= new Thrift::Socket($host,$port);
my $transport = new Thrift::BufferedTransport($socket,1024,1024);
my $protocol= new Thrift::BinaryProtocol($transport);
my $client= new Hbase::HbaseClient($protocol);
#my $rows=$client->getRow($table,"column=col1");
#say Dumper($rows);
eval{
$transport->open();
my $scan = new Hbase::TScan();
$scan->{filterString} = $filter;
my $scanner = $client->scannerOpenWithScan($table,$scan);
#my $row=$client->scannerGet($scanner);
#say Dumper(@$row);
#say $row->getRowsWithColumns();
#get row
#my $grow=$client->getRow($table,'row2');
sub scan_row{
my $rowlst=$client->scannerGetList($scanner,3);
say Dumper($rowlst);
foreach my $row (@$rowlst){
say Dumper($row->{columns});
foreach my $cols (keys %$row->{columns}){
say qq[$cols:$row->{columns}->{$cols}->{value}];
#say qq[$row->{row}:$row->{columns}->{value}];
}
}
}
for(1..100) {
print "============ $_ ========\n";
my $get_arr = $client->scannerGetList($scanner,1);
last unless @$get_arr;
print Dumper ($get_arr);
foreach my $rowresult (@$get_arr) {
#foreach my $k (keys %$rowresult) {
print "row: ",$rowresult->{row}, $/;
print "cols: ";
print Dumper $rowresult->{columns};
#foreach (keys %$rowresult->{columns}){
#say $_;
#say $rowresult->{columns}->{$_}->{value};
#say $rowresult->{columns}->get(columnname).value;
#}
print "----------\n";
#}
}
}
$client->scannerClose($scan);
$transport->close();
};
if($@){
warn(Dumper($@));
}
-----------------------------
可供参考的链接
How-to: Use the Apache HBase REST Interface, Part 1
How-to: Use the Apache HBase REST Interface, Part 2
How-to: Use the Apache HBase REST Interface, Part 3
How-to: Use the HBase Thrift Interface, Part 1
How-to: Use the HBase Thrift Interface, Part 2: Inserting/Getting Rows
How-to: Use the HBase Thrift Interface, Part 3 – Using Scans
php/perl/python , 通过thrift 连接 hbase,进行条件过滤选择
-----------------------------
总结
这个接口很不好用,太缺少参考文档了,本身的代码写的也不太好(基于perl 5.6)。实不敢用,上面的从hb中读取记录的api是从同事的python脚本中猜测出来,从外界能获取的帮助实在太少了,个人转投REST方法了。
-------------------------------
项目主页:http://thrift.apache.org/