几乎所有的企业都需要了解如何快速并且高效地影响客户来购买他们的产品并且推荐其他相关商品给他们。这可能需要用到云服务的推荐,个性化,网络分析工具。图非常适合这些类似的分析用例,如推荐产品,或基于用户数据,过去行为,推荐个性化广告。下面我们来看看怎么使用图做个性化推荐
购买graphdb服务
- 登录www.aliyun.com 后进入hbase产品控制台,选择创建hbase集群
- 选择Graphdb(图)子产品,之后分别勾选付费方式,地域,可用区,网络类型,vpc。跟选购其他产品一致
- 接来下选择master及core的规则,购买core个数及磁盘容量,参考自身实际需求,这里笔者这里选择4核8G,之后选择立即购买
- 之后同意协议完成支付,回到hbase控制台,当前状态初始化,等待集群创建完毕
- 集群创建完毕后,可以拿到如下图库地址,记住这个地址,替换下面命令行中的$host变量
如果需要使用graph-loader工具批量导入数据,请联系钉钉 云hbase答疑
我们帮您开通hbase服务端口
- 修改访问控制,把客户端ecs加入访问白名单
- 在客户端ecs执行如下命令,有消息体返回,说明客户端可以正常访问graphdb了。
如果出现异常先尝试telnet $host 8180,如果不通则说明需要添加这台ECS IP至访问白名单
curl -XPOST -d '{"gremlin": "1+1" }' http://$host:8180
{"requestId":"9db39735-9b69-4425-b58c-2b8c230d2fb9","status":{"message":"","code":200,"attributes":{}},"result":{"data":[2],"meta":{}}}
接下来您可以使用练习下graphdb快速入门
构造图示数据模型
用户在电商网站会产生订单历史记录,订单会包含一堆商品,让我们构造一个如下图数据模型
数据建模
我们重点考虑下面三类顶点,顾客customer, 订单order, 产品product
- 顾客下订单行为用 ordered边分别关联customer,order顶点
- 订单会包含多个商品,定义contains边,分别连接order顶点及product顶点。
表示一个订单会包含多个商品。
丰富顶点及边的属性
作为示例,我们假定
- 顾客只有customerid, name
- order顶点只要orderid, ordertime
- product顶点只有productid, name。
- ordered/contains边暂时不需要额外属性字段,
创建schema
我们使用hgraphdb-loader导入图schema及数据,hgraphdb-loader下载地址: http://public-hbase.oss-cn-hangzhou.aliyuncs.com/installpackage/hgraphdb-loader.tar.gz
创建schema.json文件,编写如下schema
{
"vertexLabels" : [ {
"name" : "customer",
"properties" : [ {
"name" : "id",
"type" : "String"
}, {
"name" : "name",
"type" : "String"
} ],
"indexes" : [ {
"propertykey" : "name",
"unique" : false
} ]
}, {
"name" : "order",
"properties" : [ {
"name" : "id",
"type" : "String"
}, {
"name" : "ordertime",
"type" : "Date"
} ],
"indexes" : [ ]
}, {
"name" : "product",
"properties" : [ {
"name" : "id",
"type" : "String"
}, {
"name" : "name",
"type" : "String"
} ],
"indexes" : [ ]
} ],
"edgeLabels" : [ {
"name" : "ordered",
"properties" : [ ],
"indexes" : [ ],
"connections" : [ {
"outV" : "customer",
"inV" : "order"
} ]
}, {
"name" : "contains",
"properties" : [ ],
"indexes" : [ ],
"connections" : [ {
"outV" : "order",
"inV" : "product"
} ]
} ]
}
准备顶点&边数据集,csv格式
所有的顶点都放到一个vertex.csv文件中
1,customer,oAECseuFIx
2,customer,UpOqBuMQSG
3,customer,WTlnfKULti
10001,order,2015-09-06
10002,order,1998-06-08
110001,product,ZDoVtEBlDq
110002,product,GXsssxOJSq
...
所有的边都放到一个edge.csv文件中
9610,58824,ordered
2069,12200,ordered
85589,113864,contains
50591,110945,contains
我们直接准备了一些测试数据, 点击此处下载
hgraph-loader执行导入
点击下载工具
执行解压,进入目录,执行如下导入命令。
sh run.sh import emr-header-1,emr-header-2,emr-header-3 demo/schema.json demo/vertex.csv demo/edge.csv
使用gremlin-console做图遍历
下载gremlin-console客户端,解压,
remote.yaml 配置remote server地址, 购买graphdb后控制台可以看到这个地址。
hosts: [$host]
port: 8180
打开gremlin-console
bin/gremlin.sh
连接gremlin-server,并设置脚本自动提交至server
gremlin> :remote connect tinkerpop.server conf/remote.yaml session
==>Configured localhost/127.0.0.1:8180-[b1725987-7f5a-4a61-914a-d0ab39473105]
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8180]-[b1725987-7f5a-4a61-914a-d0ab39473105] - type ':remote console' to return to local mode
执行
使用如下germlin语句做实时推荐,为id=2的顾客做商品推荐
注:无超级顶点的小图场景可以这样查询,生产环境配合使用dedup()操作
gremlin> g.V("2").as("customer").out("ordered").out("contains").aggregate("products").in("contains").in("ordered").where(neq("customer")).out("ordered").out("contains").where(not(within("products"))).groupCount().by("name").order(local).by(values,decr).limit(local,5)
==>[SUKjnHCshw:13,pqiGHapYGW:12,ktkZSKsEdK:11,twLtDaYJmo:11,JAbhZdfhkO:10]
如果使用关系型数据库,sql如下:
select top (5) [t14].[productname]
from (select count(*) as [value], [t13].[productname]
from [customers] as [t0]
cross apply (select [t9].[productname]
from [orders] as [t1]
cross join [order details] as [t2]
inner join [products] as [t3]
on [t3].[productid] = [t2].[productid]
cross join [order details] as [t4]
inner join [orders] as [t5]
on [t6].[customerid] = [t5].[customerid]
cross join ([orders] as [t7]
cross join [order details] as [t8]
inner join [products] as [t9]
on [t9].[productid] = [t8].[productid])
where not exists(select NULL as [empty]
from [orders] as [t10]
cross join [order details] as [t11]
inner join [products] as [t12]
on [t12].[productid] = [t11].[productid]
where [t9].[productid] = [t12].[productid]
and [t10].[customerid] = [t0].[customerid]
and [t11].[orderid] = [t10].[orderid])
and [t6].[customerid] <> [t0].[customerid]
and [t1].[customerid] = [t0].[customerid]
and [t2].[orderid] = [t1].[orderid]
and [t4].[productid] = [t3].[productid]
and [t7].[customerid] = [t6].[customerid]
and [t8].[orderid] = [t7].[orderid]) as [t13]
where [t0].[customerid] = N'1' //customerId
group by [t13].[productname]) as [t14]
order by [t14].[value] desc
可见非常繁琐,效率低下