kettle简介
http://www.cnblogs.com/limengqiang/archive/2013/01/16/KettleApply1.html
Oozie介绍
http://blog.csdn.net/john_f_lau/article/details/18972607
camus
LinkedIn's previous generation Kafka to HDFS pipeline.
https://github.com/linkedin/camus
Gobblin is a distributed big data integration framework (ingestion, replication, compliance, retention) for batch and streaming systems. Gobblin features integrations with Apache Hadoop, Apache Kafka, Salesforce, S3, MySQL, Google etc.https://github.com/linkedin/gobblin/wiki