数据总数:746条数据
因为后面需要进行算法合成,而且spark目前对这种算法支持并不好,因此采用代码编写,所以在查询hbase的过程中采用的是java直接查询,
但是为了加快查询速度,我尽可能的使用了过滤器
1:初期Hbase的rowkey组合:时间+"_"+订单id
查询思路:
1:能快速检索,减少GC,采用过滤器
2:支持时间段查询
根据上面两点,我采用时间过滤,比如:startTime=201904010000 endTime=201904180000|;【注意这个符号:“|” 】然后根据行键过滤器
CompareFilter.CompareOp.GREATER_OR_EQUAL和
CompareFilter.CompareOp.LESS_OR_EQUAL进行大小对比
使用代码在查询的时候,添加了行键过滤器
FilterList filterList=new FilterList();
//time+id
if(startTime != null){
RowFilter rf = new RowFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL,
new BinaryComparator(Bytes.toBytes(startTime)));
filterList.addFilter(rf);
}
if(endTime != null){
RowFilter rf = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL,
new BinaryComparator(Bytes.toBytes(endTime)));
filterList.addFilter(rf);
}
scan.setFilter(filterList);
完整代码:
/**
* 行键过滤器
* */
public static List<Map<String , String>> rowFilter(String tableName , String startTime , String endTime){
Connection connection = null;
Scan scan = new Scan();
scan.setCacheBlocks(false);
ResultScanner rs = null;
Table table = null;
List<Map<String , String>> list = new ArrayList<Map<String , String>>();
try{
connection = ConnectionFactory.createConnection(config);
table = connection.getTable(TableName.valueOf(tableName));
FilterList filterList=new FilterList();
//time+id
if(startTime != null){
RowFilter rf = new RowFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL,
new BinaryComparator(Bytes.toBytes(startTime)));
filterList.addFilter(rf);
}
if(endTime != null){
RowFilter rf = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL,
new BinaryComparator(Bytes.toBytes(endTime)));
filterList.addFilter(rf);
}
scan.setFilter(filterList);
rs = table.getScanner(scan);
for (Result r : rs) {
Map<String , String> map = new HashMap<String , String>();
for (Cell cell : r.listCells()) {
map.put(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength())
, Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
list.add(map);
}
}catch (Exception e){
e.printStackTrace();
}finally {
if (null != rs) {
rs.close();
}
try {
if (null != table) {
table.close();
}
if (null != connection && !connection.isClosed()) {
System.out.println("scan Result is closed");
connection.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
return list; }
初期完整代码
那么这种方案查询后返回的结果是:361条数据! 实际Hbase测试表中有746条数据,那么可以肯定,是行键过滤器出错了(后面再研究为啥出错)
改善:
更改rowkey结构,采用:订单id+"_"+time来实现
然后过滤器代码改善:
FilterList filterList=new FilterList();
//id+time
if(startTime != null){
RowFilter rf = new RowFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL,
new RegexStringComparator(".*_"+startTime));
filterList.addFilter(rf);
}
if(endTime != null){
RowFilter rf = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL,
new RegexStringComparator(".*_"+endTime));
filterList.addFilter(rf);
}
scan.setFilter(filterList);
上面其实就是采用正则表达式进行后缀识别,这样我就可以根据后缀进行时间过滤
完整代码:
/**
* 行键过滤器
* */
public static List<Map<String , String>> rowEndFilter(String tableName , String startTime , String endTime){
Connection connection = null;
Scan scan = new Scan();
scan.setCacheBlocks(false);
ResultScanner rs = null;
Table table = null;
List<Map<String , String>> list = new ArrayList<Map<String , String>>();
try{
connection = ConnectionFactory.createConnection(config);
table = connection.getTable(TableName.valueOf(tableName));
FilterList filterList=new FilterList();
//id+time
if(startTime != null){
RowFilter rf = new RowFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL,
new RegexStringComparator(".*_"+startTime));
filterList.addFilter(rf);
}
if(endTime != null){
RowFilter rf = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL,
new RegexStringComparator(".*_"+endTime));
filterList.addFilter(rf);
}
scan.setFilter(filterList);
rs = table.getScanner(scan);
for (Result r : rs) {
Map<String , String> map = new HashMap<String , String>();
for (Cell cell : r.listCells()) {
map.put(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength())
, Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
list.add(map);
}
}catch (Exception e){
e.printStackTrace();
}finally {
if (null != rs) {
rs.close();
}
try {
if (null != table) {
table.close();
}
if (null != connection && !connection.isClosed()) {
System.out.println("scan Result is closed");
connection.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
return list; }
上面就会查询出完整数据。