有的时候需要在不同的数据库实例之间做集合操作,这就无法直接使用SQL语句的join,left join了。相同类型的数据库之间虽然也有类似于DBLINK和FEDERATED之类的东西,但一来这些东西不同的数据库之间是不能通用的,还有就是这些都要在数据库服务端上操作的,即使是dba要走这么一个申请流程也是非常麻烦的。所以就想自己写几个常用的:
inner_join:内连接操作,t1 join t2 on t1.col=t2.col,返回连接字段相等的行,方式包括下面的外连接用的都是最简单的nested loop join。
left_join:左连接操作,t1 left join t2 on t1.col=t2.col,返回左边所有的行以及右表中连接字段相等的行,注意的是右边没有的要以NULL填充。
right_join:右连接操作,t1 right join t2 on t1.col=t2.col,和左连接一样,就是左右的位置换了一下。
union_all:两个集合合并。
union:和union_all一样,就是要除去重复的。
代码如下:
#!/usr/local/bin/python
# -*- coding: utf8 -*- '''
Created on 2016年5月23日 @author: PaoloLiu
''' class sql_merge(object):
'''
classdocs
''' def __init__(self):
'''
Constructor
''' def inner_join(self, left_table, right_table, left_join_col, right_join_col): pass new_result = [] for left_row in left_table:
for right_row in right_table: if left_row[left_join_col] == right_row[right_join_col]:
new_row = left_row + right_row
new_result.append(new_row) return new_result def left_join(self, left_table, right_table, left_join_col, right_join_col): pass new_result = [] addnull = None if len(right_table) > 0:
for i in range (1, len(right_table[0])):
addnull = (addnull, None) for left_row in left_table: n = 0 for right_row in right_table: if left_row[left_join_col] == right_row[right_join_col]:
new_row = left_row + right_row
new_result.append(new_row)
else:
n = n + 1 if n == len(right_table):
new_result.append(left_row + addnull)
else:
new_result = left_table return new_result def right_join(self, left_table, right_table, left_join_col, right_join_col):
return self.left_join(right_table, left_table, right_join_col, left_join_col) def union_all(self, left_table, right_table):
return left_table + right_table def union(self, left_table, right_table):
result = self.union_all(left_table, right_table) new_result = [] for row in result:
if row not in new_result:
new_result.append(row) return new_result def order_asc(self, result, col):
result.sort(key=lambda x:x[col])
return result def order_desc(self, result, col):
result.sort(key=lambda x:x[col], reverse=True)
return result def test():
pass left_result = [(1, "aaa"), (2, "bbb"), (3, "ccc"), (6, None), (1, "")]
right_result = [(1, ""), (2, ""), (4, ""), (5, "abc")]
print "left_result=" + str(left_result)
print "right_result=" + str(right_result)
print "==========================================================" sqlmerge = sql_merge() print "inner_join=" + str(sqlmerge.inner_join(left_result, right_result, 0, 0))
print "left_join=" + str(sqlmerge.left_join(left_result, right_result, 0, 0))
print "right_join=" + str(sqlmerge.right_join(left_result, right_result, 0, 0))
print "union_all=" + str(sqlmerge.union_all(left_result, right_result))
print "union=" + str(sqlmerge.union(left_result, right_result))
print "order_asc=" + str(sqlmerge.order_asc(left_result, 0))
print "order_desc=" + str(sqlmerge.order_desc(left_result, 0)) if __name__ == "__main__":
test()
测试记录运行结果:
left_result=[(1, 'aaa'), (2, 'bbb'), (3, 'ccc'), (6, None), (1, '')]
right_result=[(1, ''), (2, ''), (4, ''), (5, 'abc')]
==========================================================
inner_join=[(1, 'aaa', 1, ''), (2, 'bbb', 2, ''), (1, '', 1, '')]
left_join=[(1, 'aaa', 1, ''), (2, 'bbb', 2, ''), (3, 'ccc', None, None), (6, None, None, None), (1, '', 1, '')]
right_join=[(1, '', 1, 'aaa'), (1, '', 1, ''), (2, '', 2, 'bbb'), (4, '', None, None), (5, 'abc', None, None)]
union_all=[(1, 'aaa'), (2, 'bbb'), (3, 'ccc'), (6, None), (1, ''), (1, ''), (2, ''), (4, ''), (5, 'abc')]
union=[(1, 'aaa'), (2, 'bbb'), (3, 'ccc'), (6, None), (1, ''), (2, ''), (4, ''), (5, 'abc')]
order_asc=[(1, 'aaa'), (1, ''), (2, 'bbb'), (3, 'ccc'), (6, None)]
order_desc=[(6, None), (3, 'ccc'), (2, 'bbb'), (1, 'aaa'), (1, '')]
最后貼一张我认为最能说明SQL JOIN关系的图。还有要注意的是not in子查询是可以用左边从上往下第二个left join所代替的。