Hbase Cluster Replication(集群复制)笔记
前提条件:两组完全配置好的hbase集群,网络上互联互通。配置复制的原因有多种,这里就不复述了。这里重点讲解配置的过程及其中可能出现的问题,另外就是对已有数据的同步。这里的情况是主库已经运行超过两年了,从库才加入的情形,且在从库建立好后主库就开始运行,没有将主库上的快照同步到从库上还原。
环境:hbase 0.98.12.1-hadoop2
另外,必须为整个系统留足够的内存,内存一紧张,程序很容易出错退出。为什么我的mapreduce作业总是运行到某个阶段就报出如下错误,然后失败呢,以前同一个作业没出现过的呀?”
INFO mapred.JobClient: Task Id : attempt_201701061331_0002_m_000027_0, Status : FAILED
java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:498)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
这个实际上是 Out Of Memory(OOM)问题。
其实这样的错误有时候并不是程序逻辑的问题(当然有可能是由于程序写的不够高效,产生的内存消耗不合理而导致),而是由于同样的作业,在数据量和数据本身发生不同时就会占据不同数量的内存空间。由于hadoop的mapreduce作业的运行机制是:在jobtracker接到客户端来的job提交后,将许多的task分配到集群中各个tasktracker上进行分块的计算,而根据代码中的逻辑可以看出,其实是在tasktracker上启了一个java进程进行运算,进程中有特定的端口和网络机制来保持map 和reduce之间的数据传输,所以这些OOM的错误,其实就是这些java进程中报出了OOM的错误。知道了原因以后就好办了,hadoop的mapreduce作业启动的时候,都会读取jobConf中的配置(yarm-site.xml),只要在该配置文件中将每个task的jvm进程中的-Xmx所配置的java进程的max heap size加大,就能解决这样的问题:
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1024m</value>
</property>
该选项默认是200M
新版本应该是在conf/Hadoop-env.sh文件中修改。默认为2000M,如果不生效,那么就在mapreduce中写jobconf配置。
---------------主---------------
hadoop配置
不仅开启hdfs,还要开启mapreduce框架:yarn。
hbase配置文件中加入
<property>
<name>hbase.replication</name>
<value>true</value>
</property>
在hadoop和hbase配置完成后,可以停止程序对hbase的写入了,当写入完全停止好后便可生成快照。
生成快照语句:snapshot '表名','快照名'
进入hbase shell脚本,加入从节点,为其编一个号,这个号可以自行决定(将从库的所有regionserver全部写上):
add_peer '12','xds:2181:/hbase'
add_peer '10','alonemaster,alonemaster2,alonemaster3:2181:/hbase'
查看是否成功:list_peers
现在修改表结构,开启其复制功能:
describe 'freeoa_hbt'
disable 'freeoa_hbt'
alter 'freeoa_hbt', {NAME => '列族名',REPLICATION_SCOPE => '1'}
alter 'freeoa_hbt', {NAME => 'content',REPLICATION_SCOPE => '1'}
在创建表时不能指定REPLICATION_SCOPE属性,只能在建好表之后进行修改,下面这条语句是关闭其这种属性的:
alter 'freeoa_hbt', {NAME => 'cf',REPLICATION_SCOPE => '0'}
enable 'freeoa_hbt'
describe 'freeoa_hbt'
可将主上的快照导入到从hbase上:
bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -overwrite -snapshot snapshot_wz_20170713 -copy-to hdfs://xds:9000/hbase
---------------备从---------------
从库中要先建立好表,不然即使修改的数据都不会同步过来。
从快照中恢复
disable 'wz'
restore_snapshot 'snapshot_wz_20170713'
enable 'wz'
这个没有什么问题,但主在做快照之后的更新的记录内容在从上快照恢复之前能看到(也只能看到这个更新过的内容,其它列中没有被改过的数据)的内容在恢复之后被覆盖了。
这种情况可将主上做快照后的一段时间修改过的记录导出到从上。
主库上执行:
bin/hbase org.apache.hadoop.hbase.mapreduce.Export con hdfs://xds:9000/import/con 12 1500022950830 1500023160200
指令解析为:
bin/hbase org.apache.hadoop.hbase.mapreduce.Export table hdfs://slave_hdfs:9000/import/con 从节点编号 开始的毫秒时间戳 结束的毫秒时间戳
从库上执行(从导入目录中导入到正式表中):
bin/hbase org.apache.hadoop.hbase.mapreduce.Import con hdfs://xds:9000/import/con
2017-07-14 17:21:27,321 INFO [main] mapreduce.Job: The url to track the job: http://http://hbak:8088/proxy/application_1500019039263_0001/
2017-07-14 17:21:27,321 INFO [main] mapreduce.Job: Running job: job_1500019039263_0001
2017-07-14 17:21:35,405 INFO [main] mapreduce.Job: Job job_1500019039263_0001 running in uber mode : false
2017-07-14 17:21:35,406 INFO [main] mapreduce.Job: map 0% reduce 0%
2017-07-14 17:21:40,458 INFO [main] mapreduce.Job: map 100% reduce 0%
2017-07-14 17:21:40,465 INFO [main] mapreduce.Job: Job job_1500019039263_0001 completed successfully
Exception in thread "main" java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_MAPS
at java.lang.Enum.valueOf(Enum.java:236)
at org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.valueOf(FrameworkCounterGroup.java:148)
at org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.findCounter(FrameworkCounterGroup.java:182)
at org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154)
at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240)
at org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:370)
at org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:511)
at org.apache.hadoop.mapreduce.Job$7.run(Job.java:756)
at org.apache.hadoop.mapreduce.Job$7.run(Job.java:753)
at java.security.AccessController.doPrivileged(Native Method)
出错了,也不知道成功了没有。检查了一下,成功了,新的数据过来了。这个问题后面会给出解决办法。
一个主写从查看的示例
scan 'con', {LIMIT =>10,COLUMNS => ['cf:id', 'cf:issue_dt'], CACHE_BLOCKS => false,REVERSED => true,STARTROW => '20110902042646'}
2015062217590742158
scan 'wz', {LIMIT =>10,REVERSED => true,STARTROW => '20150902042646'}
get 'con','2015082615145741838'
get 'con','2015082615145741838','cf:hits'
get 'con','2015082615145741838',{LIMIT =>10,COLUMNS => ['cf:hits', 'cf:title']}
put 'con','2015082615145741838','cf:hits','700'
put 'con','2011082908441625317','cf:title','EnterpriseDB base on Postgresql'
get 'con','2011082908441625317',{COLUMNS => ['cf:hits', 'cf:title']}
在从上scan
---
hbase(main):009:0> scan 'con'
ROW COLUMN+CELL
2011082908441625317 column=cf:title, timestamp=1500023160180, value=EnterpriseDB base on Postgresql
2015082615145741838 column=cf:hits, timestamp=1500022950832, value=700
2015082615145741838 column=cf:title, timestamp=1500023123231, value=EnterpriseDB base on Postgresql
---
scan 'con', {LIMIT =>2,REVERSED => true,COLUMNS => ['cf:id','cf:issue_dt','cf:title','cf:hits']}
即使从集群上没有建立对应的表,建表后也会将之前的修改记录同步过去。
put 'con','2017051420434877614','cf:hits','700'
从建表后
put 'con','2017051420434877614','cf:title','postgres的逻辑备份还原:pg_dump和pg_restore的使用'
原以为不会将hits改为700的那个同步过来,结果还是同步过来了。
put 'con','2017052118082332250','cf:title','PostgreSQL与MySQL两大开源数据库比较'
再写入数据,却发现rowkey写错了,需要将数据从快照中恢复到之前:
disable 'con'
restore_snapshot 'snapshot_con_20170726'
enable 'con'
但从库却没有像主库上那样重放这个操作,可见主库上的从快照中恢复不会在从库上实现。那将主上的快照导入到从上去呢:
bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -overwrite -snapshot snapshot_con_20170726 -copy-to hdfs://xds:9000/hbase -mappers 2
受制于系统资源,没有成功。看能不能导出到本地文件:
bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshot_con_20170726 -copy-to /tmp/cons
还是失败。不知是因为版本原因还是什么问题,指定任务数量的(mappers)参数和限制带宽的(bandwidth)参数都没有能发挥其作用,
##############################
主从做好后有两种方法来将前面的数据同步到从集群:
1、停止写入,将快照导出到从集群,但线上有56TB的数据,77张表。在多千兆网卡做bond时依然要很长的时间,对于应用来说,根本不可接受。
2、对表分时间段(快照之前)进行导出,再在从库中导入,如此往复,这样不影响应用的写入,不过用时和精力都会用去很多。
##############################
[数据校验]
主库上执行:
bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --starttime=1501059600000 --stoptime=1501060200000 12 con -mappers 2
2017-07-26 17:12:26,218 ERROR [main] replication.ReplicationPeersZKImpl: Could not get configuration for peer because it doesn't exist. peerId=-mappers
2017-07-26 17:12:26,224 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2017-07-26 17:12:26,224 INFO [main] zookeeper.ZooKeeper: Session: 0x5d7e01208a0005 closed
Exception in thread "main" java.io.IOException: Couldn't get peer conf!
将mappers,bandwidth两个参数后不再报参数错误,但又报出了:No enum constant org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_MAPS
可以通过这里找到解决方法:
最终校验通过。
org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters
GOODROWS=17
现在开始同步主从同步之前的数据,首先看一下主库最早数据的时间戳:
scan 'con', {LIMIT =>2,REVERSED => false}
rowkey:ust
2009030815340389100:1495700438602
2009032915265550408:1495700439848
看一下从库中最老数据时间戳:
rowkey:ust
2017051420434877614:1501059257359
2017052118082332250:1501059611854
那么就需要拉取主库最老时戳与从库最老时戳间的数据了:
bin/hbase org.apache.hadoop.hbase.mapreduce.Export con hdfs://xds:9000/import/con 12 1495700438000 1501059257000 -mappers 2 -bandwidth 10
任务是成功了,但没有数据被同步过去。。难道要用记录里的时间来开始同步?
20090308153403=1236497643
now=1501062386
bin/hbase org.apache.hadoop.hbase.mapreduce.Export con hdfs://xds:9000/import/con 12 1236497643000 1501062386000 -mappers 2 -bandwidth 10
很快结束:
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://xds:9000/import/con already exists
看来前面的是导过去了的,但需要在从库上执行导入:
bin/hbase org.apache.hadoop.hbase.mapreduce.Import con hdfs://xds:9000/import/con
2017-07-26 17:51:12,714 INFO [main] mapreduce.Job: map 0% reduce 0%
2017-07-26 17:51:18,782 INFO [main] mapreduce.Job: map 100% reduce 0%
2017-07-26 17:51:18,794 INFO [main] mapreduce.Job: Job job_1501058627844_0001 completed successfully
Exception in thread "main" java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_MAPS
修正这个问题后,任务是成功了的,但从库数据量还是没有变化,问题出在哪里?看来是操作方法上有问题,不应该用记录里的时间作为参考,而是该用最早记录的时戳(要是记录里没有日期时间字段该怎么办,呵呵)。
指定时间段的数据主从同步:
1、主集群执行:
bin/hbase org.apache.hadoop.hbase.mapreduce.Export con hdfs://xds:9000/import/con 12 1495700438000 1501059257000
指定了版本数,与线上一致
指定了起始时间戳(1495700438000 1501059257000)
但由于前面跑过一次,会报:
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://xds:9000/import/con already exists
手动在从库上删除了该表在import下的对应目录:
hadoop/bin/hdfs dfs -rm -r /import/con/
重拉一次:
2017-07-26 18:05:25,184 INFO [main] util.RegionSizeCalculator: Calculating region sizes for table "con".
2017-07-26 18:05:25,462 DEBUG [main] util.RegionSizeCalculator: Region con,,1495700417514.b3c7ec6db740711373ccedc0105d57ae. has size 26214400
2017-07-26 18:05:25,468 DEBUG [main] util.RegionSizeCalculator: Region sizes calculated
2017-07-26 18:05:25,518 DEBUG [main] mapreduce.TableInputFormatBase: getSplits: split -> 0 -> HBase table split(table name: con, scan: , start row: , end row: , region location: xd0)
2017-07-26 18:05:25,730 INFO [main] mapreduce.JobSubmitter: number of splits:1
2017-07-26 18:05:25,771 INFO [main] Configuration.deprecation: dfs.socket.timeout is deprecated. Instead, use dfs.client.socket-timeout
2017-07-26 18:05:25,771 INFO [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2017-07-26 18:05:25,941 INFO [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1501057582035_0004
2017-07-26 18:05:26,160 INFO [main] impl.YarnClientImpl: Submitted application application_1501057582035_0004
2017-07-26 18:05:26,195 INFO [main] mapreduce.Job: The url to track the job: http://htcom:8088/proxy/application_1501057582035_0004/
2017-07-26 18:05:26,196 INFO [main] mapreduce.Job: Running job: job_1501057582035_0004
2017-07-26 18:06:03,469 INFO [main] mapreduce.Job: Job job_1501057582035_0004 running in uber mode : false
2017-07-26 18:06:03,472 INFO [main] mapreduce.Job: map 0% reduce 0%
2017-07-26 18:06:43,813 INFO [main] mapreduce.Job: map 100% reduce 0%
2017-07-26 18:06:43,829 INFO [main] mapreduce.Job: Job job_1501057582035_0004 completed successfully
2017-07-26 18:06:43,941 INFO [main] mapreduce.Job: Counters: 40
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=136134
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=62
HDFS: Number of bytes written=26163577
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=37800
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=37800
Total vcore-milliseconds taken by all map tasks=37800
Total megabyte-milliseconds taken by all map tasks=38707200
Map-Reduce Framework
Map input records=3117
Map output records=3117
Input split bytes=62
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=60
CPU time spent (ms)=2870
Physical memory (bytes) snapshot=217321472
Virtual memory (bytes) snapshot=1660932096
Total committed heap usage (bytes)=183500800
HBase Counters
BYTES_IN_REMOTE_RESULTS=0
BYTES_IN_RESULTS=26040693
MILLIS_BETWEEN_NEXTS=2247
NOT_SERVING_REGION_EXCEPTION=0
NUM_SCANNER_RESTARTS=0
REGIONS_SCANNED=1
REMOTE_RPC_CALLS=0
REMOTE_RPC_RETRIES=0
RPC_CALLS=34
RPC_RETRIES=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=26163577
感觉数量上是正确的。在从上看看大小:
hadoop/bin/hdfs dfs -du -s -h /import/con
25.0 M /import/con
2、从集群执行:
bin/hbase org.apache.hadoop.hbase.mapreduce.Import con hdfs://xds:9000/import/con
2017-07-26 18:09:23,851 INFO [main] mapreduce.TableOutputFormat: Created table instance for con
2017-07-26 18:09:25,279 INFO [main] input.FileInputFormat: Total input paths to process : 1
2017-07-26 18:09:25,374 INFO [main] mapreduce.JobSubmitter: number of splits:1
2017-07-26 18:09:25,607 INFO [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1501058627844_0003
2017-07-26 18:09:25,989 INFO [main] impl.YarnClientImpl: Submitted application application_1501058627844_0003
2017-07-26 18:09:26,040 INFO [main] mapreduce.Job: The url to track the job: http://xds:8088/proxy/application_1501058627844_0003/
2017-07-26 18:09:26,041 INFO [main] mapreduce.Job: Running job: job_1501058627844_0003
2017-07-26 18:09:33,253 INFO [main] mapreduce.Job: Job job_1501058627844_0003 running in uber mode : false
2017-07-26 18:09:33,255 INFO [main] mapreduce.Job: map 0% reduce 0%
2017-07-26 18:09:43,458 INFO [main] mapreduce.Job: map 100% reduce 0%
2017-07-26 18:09:43,468 INFO [main] mapreduce.Job: Job job_1501058627844_0003 completed successfully
2017-07-26 18:09:43,591 INFO [main] mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=135678
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=26163681
HDFS: Number of bytes written=0
HDFS: Number of read operations=3
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=7354
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=7354
Total vcore-milliseconds taken by all map tasks=7354
Total megabyte-milliseconds taken by all map tasks=7530496
Map-Reduce Framework
Map input records=3117
Map output records=3117
Input split bytes=104
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=109
CPU time spent (ms)=3430
Physical memory (bytes) snapshot=235057152
Virtual memory (bytes) snapshot=1779216384
Total committed heap usage (bytes)=273678336
File Input Format Counters
Bytes Read=26163577
File Output Format Counters
Bytes Written=0
2017-07-26 18:09:43,593 INFO [main] mapreduce.Job: Running job: job_1501058627844_0003
2017-07-26 18:09:43,606 INFO [main] mapreduce.Job: Job job_1501058627844_0003 running in uber mode : false
2017-07-26 18:09:43,606 INFO [main] mapreduce.Job: map 100% reduce 0%
2017-07-26 18:09:43,611 INFO [main] mapreduce.Job: Job job_1501058627844_0003 completed successfully
2017-07-26 18:09:43,617 INFO [main] mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=135678
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=26163681
HDFS: Number of bytes written=0
HDFS: Number of read operations=3
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=7354
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=7354
Total vcore-milliseconds taken by all map tasks=7354
Total megabyte-milliseconds taken by all map tasks=7530496
Map-Reduce Framework
Map input records=3117
Map output records=3117
Input split bytes=104
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=109
CPU time spent (ms)=3430
Physical memory (bytes) snapshot=235057152
Virtual memory (bytes) snapshot=1779216384
Total committed heap usage (bytes)=273678336
File Input Format Counters
Bytes Read=26163577
File Output Format Counters
Bytes Written=0
对比一下表的数量,主从是相等的。数据主从是否完成一致,用上面的校验方法来试一下:
stoptime为NOW的时戳。
bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --starttime=1495700438000 --stoptime=1501064274000 12 con
org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters
GOODROWS=3132
现在开移一个大小在2.7G的一个表,得到其中记录的最老最新时间戳。
scan 'wz', {LIMIT =>1,REVERSED => false}
2015060723264517045:1489744357327
scan 'wz', {LIMIT =>1,REVERSED => true}
2015070309135227806:1489745908349
bin/hbase org.apache.hadoop.hbase.mapreduce.Export wz hdfs://xds:9000/import/wz 12 1489744357000 1489745909000
[从]
bin/hbase org.apache.hadoop.hbase.mapreduce.Import wz hdfs://xds:9000/import/wz
从机在执行导入时,负载会变得较高。
wz count:3959850 row(s) in 418.4610 seconds
bin/hbase org.apache.hadoop.hbase.mapreduce.Export wz hdfs://xds:9000/import/wz 12 1489744357000 1501064274000 -mappers 10 -bandwidth 80
mapreduce.Job: Task Id : attempt_1500974684883_0003_m_000000_0, Status : FAILED Error: Java heap space
Regions in Transition
Region State RIT time (ms)
1588230740 hbase:meta,,1.1588230740 state=PENDING_OPEN, ts=Wed Jul 26 14:57:12 CST 2017 (3s ago), server=xd1,60020,1501052156299 3062
Total number of Regions in Transition for more than 60000 milliseconds 0
Total number of Regions in Transition 1
执行:bin/hbase hbck
...
Summary:
Table hbase:meta is okay.
Number of regions: 1
Deployed on: xd1,60020,1501052156299
Table wz is okay.
Number of regions: 34
Deployed on: xd0,60020,1501052155967 xd1,60020,1501052156299 xd3,60020,1501052156041
Table con is okay.
Number of regions: 1
Deployed on: xd0,60020,1501052155967
Table wnm is okay.
Number of regions: 271
Deployed on: xd0,60020,1501052155967 xd1,60020,1501052156299 xd3,60020,1501052156041
Table hbase:namespace is okay.
Number of regions: 1
Deployed on: xd0,60020,1501052155967
Table wbanalysis_contentid is okay.
Number of regions: 1
Deployed on: xd1,60020,1501052156299
0 inconsistencies detected.
Status: OK
2017-07-26 15:04:58,679 INFO [main] client.HConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
2017-07-26 15:04:58,679 INFO [main] client.HConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x25d7dadfa580004
2017-07-26 15:04:58,686 INFO [main] zookeeper.ZooKeeper: Session: 0x25d7dadfa580004 closed
2017-07-26 15:04:58,686 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
参考来源:
hbase备份恢复
HBase运维问题
Importing Data Into HBase
HBase集群数据迁移方案,hbase集群迁移
How to import/export hbase data via hdfs (hadoop commands)