HBase is NoSql Key-value database over HDFS. It is inspired from Google’s BigTable. HBase is consistent and partition tolerant.
Features
- Strongly consistent
- Linearly ( i.e. horizontally ) scalable : Add region server whenever you need.
- Columnar database
- Schema-less
- Built in recovery : It uses Write Ahead Log for recovery.
Data storage model
create ‘profiles’, ‘name’, ‘address’
In above command we are creating a table profiles with two column families name and address.
Column family : Physical divide of data in any table. Data of any column family is stored in one file and this file is called HFile . Column family can only be updated or added after disabling table. That is any change in column family can be treated as schema change of table and needs downtime.
Qualifier (i.e. column ): Logical divide of data inside a column family. You can add qualifier at runtime that is in put command.
For e.g.
put ‘profiles’ , ‘K1’ , ‘name:first’ , ‘mahendra’
put ‘profiles’ , ‘K1’ , ‘name:middle’ , ‘Singh’
put ‘profiles’ , ‘K1’ , ‘name:last’ , ‘Dhoni’
Here first, middle , last are qualifier under name Column family.
So data will stored as follows ( it’s just representation ) in HFile of name column family
K1#name:first#mahendra
K1#name:middle#Singh
K1#name:last#Dhoni
Similarly for address there will be seperate HFile.
These HFiles for a range of row keys are grouped into a Region.
HBase Region : A region consists of all the rows between the start key and the end key which are assigned to that Region. And, those Regions which we assign to the nodes in the HBase Cluster, is what we call “Region Servers” ( region server are explained in later part ) . Data is sorted on row key.
default size = ( 256 MB this is configurable)
Data replication :
- First local copy on a region server.
- Two copies on other data nodes ( Out of which it keep one on same rack as of local node)
Namespaces in hbase
Namespace is logical grouping of tables. By default there are two namespace
hbase(main):002:0> list_namespace
NAMESPACE
default
hbase
When namespace is not specified in table creation then table goes to default namespace.
All system table are in hbase namespace.
hbase(main):003:0> list_namespace_tables ‘hbase’
TABLE
meta
namespace
meta table is special table which store region info .
key = <namespace ( empty for default )> : <table> : <start region key>
Value = info:regioninfo , info:seqnumDuringOpen , info:server , info:serverStartCode
namespace table is special table which store meta related to namespaces
Architecture
Hbase comprises of HMaster, Region server and zookeeper.
HMaster
HMaster does all DDL operations on tables.
- create — Creates a table.
- list — Lists all the tables in HBase.
- disable — Disables a table.
- is_disabled — Verifies whether a table is disabled.
- enable — Enables a table.
- is_enabled — Verifies whether a table is enabled.
- describe — Provides the description of a table.
- alter — Alters a table.
- exists — Verifies whether a table exists.
- drop — Drops a table from HBase.
- drop_all — Drops the tables matching the ‘regex’ given in the command.
HMaster does region assignment , re-assignment and monitors region server.
Region server
Region server runs on HDFS data node and has following component:
WAL ( Write ahead Log ) : It is distributed file which stores new data that hasn’t yet been persisted to permanent storage. More importantly, we also use it for recovery in the case of any failure.
Edits are maintained at qualifier level in WAL. Here is some extract from WAL of above table
➜ ~ hdfs dfs -text /var/folders/vr/ttjxspk92gbffy5qy41r0121g3b_7q/T/hbase-mahendra.kumar/hbase/WALs/*
040741fe3288ec1c0eeab3c1717eafbprofiles �����082namefirst�Ț#�mahendra7
040741fe3288ec1c0eeab3c1717eafbprofiles �����081namemiddle�ȟfYSingh%
MemStore ( write cache ) : This is on heap cache which store new data which has not yet been written to disk. Also, before writing to disk, it gets sorted.
BlockCache ( read LRU cache ) : This is off heap cache which store the frequently read data in memory. And also, the data which is least recently used data gets evicted when full. This is configurable in size.
HFile : These files store the rows as sorted KeyValues on disk.
Interactions of above component are clearly mentioned in HBase write and read section.
Zookeeper
Listens to heartbeat of region servers , HMaster .
HMaster assignment on failure of Active HMaster
Maintains following meta
[zk: localhost:2181(CONNECTED) 0] ls /hbase
[backup-masters, draining, flush-table-proc, hbaseid, master, meta-region-server, namespace, online-snapshot, recovering-regions, region-in-transition, replication, rs, running, splitWAL, switch, table, table-lock]
Explore these meta on your own.
Hbase read and write
- For first time Hbase client get region server informations from meta-region-server from zookeeper.
- From meta-region-server it access regions info of any table using hbase:meta table and caches it for further calls. ( This is important as whenever new region server is added or any region server is deleted it affect latencies of R/W calls ).
- From there it direct it R/W call to particular region server.
Till this all ateps were common and handled by Hbase client.
Write
- Write to WAL
- From WAL it writes to MemStore. Then it sends ACK to client for successful write.
- From MemStore data get flushed ( when configured limit is reached ) in sorted order of keys to HFile. More important HFile are seperate for each column family.
Read
- Checks data in BlockCache.
- If not found check in MemStore.
- If not found fetch from HFiles ( HDFS).
HBase compaction
As we all know in big data system small files are headache due to following reasons:
- More small files mean more storage of meta data.
- Big problem is read latencies. In scan ( read) call we need to scan lot of small files. So lot of disk head movement is required which is mechanical movement in disk. Which increase latencies of read.
To solve this problem most of these systems use compaction of files. In which system try to combine small files into single file on regular basis.
Minor compaction
Minor compaction is triggered automatically in between where it picks configured number of small HFile and commit into single Hfile. It uses merge sort to perform this.
Major compaction
Major compaction is triggered automatically or manually. Where it picks all the files corresponding to single column family in a region and merges them into a single file. While compacting it delete deleted and expired cells as well.
However, it is a possibility that input-output disks and network traffic might get congested during this process. Hence, generally during low peak load timings, it is manually scheduled.
Scaling Hbase
To scale up or down we can add/ delete region server to cluster. During this process older regions are assigned to new nodes ( in case of addition ) or extra regions are assigned to existing nodes ( in case of deletion ) such that regions are distributed homogeneously.
- Due to region assignment we will see spike in R/W latencies as Hbase client caches region info. After re-assignment some regions will be assigned to some other node. So Hbase client fetches new info from zookeeper again. But you will not see exceptions in your service will taken care by Hbase client internally.
- Data locality is reduced and get recovered after major compaction.
Region Split
- When a Region becomes too large, it is further subdivided into two child regions.
- Each of these child regions represents the
exact half
of theparent
region. - This split is reported to HMaster.
- Until HMaster allocated them to a new Region Server it is handled by the present Region Server itself.
Cross cluster replication
For disaster recovery most of big services runs on multi-DC environment. In such cases we need to replicate our hbase data to other cluster. Hbase comes with in-built replication. Where you need to just define replication peer on both side. Post that hbase will take care of replication.
list_peers
Use above command to see current peers of that cluster.
add_peer ‘1’, CLUSTER_KEY => ‘<comma seperacted zookeeper quorum>:2181:/hbase’, TABLE_CFS => {}
Above command enable replication of all tables as TABLE_CFS is given empty.
add_peer ‘1’, CLUSTER_KEY => ‘<comma seperacted zookeeper quorum>:2181:/hbase’, TABLE_CFS => {“profiles”=>[]}
To enable of replication of only profiles table then use above command.
add_peer ‘1’, CLUSTER_KEY => ‘<comma seperacted zookeeper quorum>:2181:/hbase’, TABLE_CFS => {“profiles”=>[“name”]}
To enable of replication of only profiles:name column family then use above command.
Now to enable replication use
enable_peer ‘1’
This command enable oneway replication only i.e. from current cluster to target cluster. If you want to replicate data of target cluster to current cluster then run same commands from target cluster.
HBase does replication using WAL. So once replication starts it replicate from current WAL file only not the older data. To replicate older data you need to take snapshot of data and copy it to target cluster.