Perl HBase Rest 操作模块-HBase::JSONRest
2015-10-13 14:46:50 阿炯

HBase::JSONRest - Simple REST client for HBase

这是目前metacpan上惟一一个操作HBase的REST接口的模块,HBase上有多种供外部接口使用的操作方法:原生的java接口、Thrift、Avro、Protocol Buffers、REST等;然而很遗憾的是,除了REST操作接口外都没有对perl有比较友好的支持。


HBase::JSONRest是一个简单实用的HBase操作模块,提供了核心的功能,有
new(创建和初始hb的对象)
get(查询记录)
multiget(一次查询多条记录)
put(向其中写入数据)
delete(记录删除)
version(服务器版本信息)
list(表列表)

用法参考如下:


方法

new
Cretes an hbase client object that is used to talk to HBase.
创建和初始hb的对象句柄。

my $hostname = "hbase-restserver.freeoa.net";
my $hbase = HBase::JSONRest->new(host => $hostname);

my ($hostname,$hbtab)=('192.168.0.9:8000','webpage');
my $hbase=HBase::JSONRest->new(host=>$hostname);

get
Scans a table by key prefix or exact key match depending on options passed:
对给定的rowkey(前缀)进行扫描,并返回结果。

# scan by key prefix:
my $records = $hbase->get({
table=> $table_name,
where=> {
key_begins_with => "$key_prefix"
},
});

# exact key match: get the whole row
my $record = $hbase->get({
table=> $table_name,
where=> {
key_equals => "$key"
},
});
 
# exact key match: get only specific columns(除开给定了rowkey,返回指定的列)
my $record = $hbase->get({
table => $table_name,
where => {key_equals => $key},
  columns => [
  'd:some_column_name',
  'd:some_other_column_name'],
});
 
# exact key match: get last $N cell versions(同上,返回指定的版本)
my $records = $hbase->get({
  table => $table_name,
  where => {key_equals => $key},
  columns => [
  'd:some_column_name',
  'd:some_other_column_name'
  ],
  versions => $N,
});
 
# exact key match: get cell versions created within a timestamp range(返回给定时间戳内的数据列)
my $records = $hbase->get({
  table => $table_name,
  where => {key_equals => $key},
  columns => [
  'd:some_column_name',
  'd:some_other_column_name'
  ],
  timestamp_range => {
  from  => $timestamp_from,
  until => $timestamp_until,
  }
});

multiget
Does a multiget request to HBase, so that multiple keys can be matched in one http request. It also makes sure that the request url is not longer than 2000 chars, so if the number of keys passed is large enough and would result in url longer than 2000 chars, the request is split into multiple smaller request so each is shorter than 2000 chars.

HBase的multiget请求,可以一个http请求中处理多个键。但也要确保请求url不超过2000个字符,如果key的数量足够大,在通过请求时url超过2000字符,则该请求被分成多个更小的请求(以使用每个url小于2000个字符)。

# multiget: get only last cell version from matched rows
my @keys = ($key1,...,$keyN);
my $records = $hbase->multiget({
 table => $table_name,
 where => {key_in => \@keys},
});
 
# multiget: get last $N cell versions from matched rows
my @keys = ($key1,...,$keyN);
my $records = $hbase->multiget({
 table => $table_name,
 where => {key_in => \@keys},
 versions => $N,
});

put
Inserts one or multiple rows. If a key allready exists then depending on if HBase versioning is ON for that specific table, the record will be updated (versioning is off) or new version will be inserted (versioning is on)

可向HBase表中插入一个或多个行。如果键已经存在,那么根据HBase表特定的版本控制规则,记录将被更新(版本中)或新版本将插入(版本中)。

# multiple rows
my $rows = [
...
{row_key => "$row_key",

# cells: array of hashes where eash hash is one cell
row_cells => [{
column => "$family_name1:$colum_name1",
value  => "$value1",
 timestamp => "$timestamp1", # <- optional (override HBase timestamp)
},
...,
{
column => "$family_nameN:$colum_nameN",
value  => "$valueN",
timestamp => "$timestampN", # <- optional (override HBase timestamp)
},
],
},
...
];
 
my $res = $hbase->put({
table   => $table_name,
changes => $rows
});

# single row - basically the same as multiple rows, but
# the rows array has just one elements
my $rows = [{
row_key => "$row_key",
 
# cells: array of hashes where eash hash is one cell
row_cells => [
  { column => "$family_name:$colum_name", value => "$value" },
],
   },
];
 
my $res = $hbase->put({
table   => $table_name,
changes => $rows
});

delete
Deletes an entire record or selected columns of it

删除整个记录或选定的列。

my $success = $hbase->delete({
table=> 'table',
key  => 'key',
family => 'family', # optional, unless column is given
column => 'column', # optional
});

version
Returns a hashref with server version info

以hashref的形式返回与服务器版本信息。

my $version_info = $hbase->version;
print Dumper($version_info);
 
输出示例:
---------------
$VAR1 = {
'REST' => '0.0.3',
'Server' => 'jetty/6.1.26.cloudera.4',
'OS' => 'Linux 3.2.0-4-amd64 amd64',
'Jersey' => '1.9',
'JVM' => 'Oracle Corporation 1.7.0_79-24.79-b02'
};

list
Returns a list of tables available in HBase

以列表形式返回在HBase可用的表。

say 'HB Table List:';
say Dumper($hbase->list);

输出示例:
---------------
$VAR1 = [
{
  'name' => 'freeoa'
},
{
  'name' => 'freeoarticle3'
},
{
  'name' => 'webpage'
}];


ERROR HANDLING(错误处理)
Information on error is stored in hbase object under key last error:

在hbase对象中存储在最后关键错误的信息:

my $records = $hbase->get({
table => $table_name,
where => {
key_begins_with => "$key_prefix"
},
});

if ($hbase->{last_error}) {
# handle error
}
else {
# process records   
}

参考来源:

HBase::JSONRest