在debug hive的问题时,我们经常需要在源码里面增加必要的日志信息,来跟踪变量的变化或者是方法执行的情况,性能等。。hive里面有下面几个日志的实现方式。
1种是常用的通过
1
|
org.apache.commons.logging.Log |
和
1
|
org.apache.commons.logging.LogFactory |
类实现的。
比如要打印类的info信息时,只需要下面这样就可以。
1
2
|
static final private Log LOG = LogFactory.getLog(CLASS_NAME)
LOG.info( "xxxxxx" );
|
如果要跟踪一个方法的执行时间,只需要在在方法调用前后各加一条日志信息记录当前时间戳,然后取差值即可。
另一种是LogHelper
1
|
org.apache.hadoop.hive.ql.session.SessionState.LogHelper |
LogHelper类是SessionState的内部类
输出日志最终调用的是LOG.xxx和对应日志输出的println方法(比如System.out.println/log4j等)
但是日志只有两个级别,info和error。
以printInfo方法为例,具体实现:
1
2
3
4
5
6
|
public void printInfo(String info, String detail) {
if (!getIsSilent()) {
getInfoStream().println(info);
}
LOG.info(info + StringUtils.defaultString(detail)); //调用LOG.info(即第一种方法)
}
|
使用的sample:
1
2
|
static final private LogHelper console = new LogHelper (LOG);
console.printError( "Authorization failed:" + authExp.getMessage() + ". Use SHOW GRANT to get more details." );
|
还有一种重要的类是PerfLogger类,可以实现记录方法执行时间的信息,在一些情况下可以取代第一种记录性能的方法。
PerfLogger类主要是用在Driver类的compile方法中,用来记录各个step使用的时间。
看下具体实现:
首先声明存储starttime和endtime的map:
1
2
|
protected final Map<String, Long> startTimes = new HashMap<String, Long>();
protected final Map<String, Long> endTimes = new HashMap<String, Long>();
|
主要方法:
1).构造方法可以传入一个boolean参数,控制是否reset(默认是false)
1
2
3
4
5
6
7
8
9
10
11
|
protected static final ThreadLocal<PerfLogger> perfLogger = new ThreadLocal<PerfLogger>();
public static PerfLogger getPerfLogger( boolean resetPerfLogger) {
if (SessionState.get() == null ) {
if (perfLogger.get() == null || resetPerfLogger) {
perfLogger.set( new PerfLogger());
}
return perfLogger.get();
} else {
return SessionState.get().getPerfLogger(resetPerfLogger);
}
}
|
2).PerfLogBegin 用来记录开始时间,主要是使用了LOG.info(第一种记录日志的方法)
1
2
3
4
5
|
public void PerfLogBegin(String callerName, String method) {
long startTime = System.currentTimeMillis();
LOG.info( "<PERFLOG method=" + method + " from=" + callerName + ">" );
startTimes.put(method, new Long(startTime)); //将方法和对应的startTime放在startTimes的map中
}
|
3).相应的PerfLogEnd用来记录结束时间
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
public long PerfLogEnd(String callerName, String method) {
Long startTime = startTimes.get(method);
long endTime = System. currentTimeMillis();
long duration = - 1 ;
endTimes.put(method, new Long(endTime));
StringBuilder sb = new StringBuilder( "</PERFLOG method=" ).append(method);
if (startTime != null ) {
sb.append( " start=" ).append(startTime);
}
sb.append( " end=" ).append(endTime);
if (startTime != null ) {
duration = endTime - startTime.longValue();
sb.append( " duration=" ).append(duration);
}
sb.append( " from=" ).append(callerName).append( ">" );
LOG.info(sb);
return duration;
}
|
getStartTime和getEndTime分别通过对应的map来获取starttime和endtime
4.是不是所有的method都可以用这种方法获取执行的性能?答案是no,在PerfLogger类中定义了部分属性,对应于method
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
|
public static final String ACQUIRE_READ_WRITE_LOCKS = "acquireReadWriteLocks" ;
public static final String COMPILE = "compile" ;
public static final String PARSE = "parse" ;
public static final String ANALYZE = "semanticAnalyze" ;
public static final String DO_AUTHORIZATION = "doAuthorization" ;
public static final String DRIVER_EXECUTE = "Driver.execute" ;
public static final String INPUT_SUMMARY = "getInputSummary" ;
public static final String GET_SPLITS = "getSplits" ;
public static final String RUN_TASKS = "runTasks" ;
public static final String SERIALIZE_PLAN = "serializePlan" ;
public static final String DESERIALIZE_PLAN = "deserializePlan" ;
public static final String CLONE_PLAN = "clonePlan" ;
public static final String TASK = "task." ;
public static final String RELEASE_LOCKS = "releaseLocks" ;
public static final String PRUNE_LISTING = "prune-listing" ;
public static final String PARTITION_RETRIEVING = "partition-retrieving" ;
public static final String PRE_HOOK = "PreHook." ;
public static final String POST_HOOK = "PostHook." ;
public static final String FAILURE_HOOK = "FailureHook." ;
public static final String DRIVER_RUN = "Driver.run" ;
public static final String TIME_TO_SUBMIT = "TimeToSubmit" ;
public static final String TEZ_SUBMIT_TO_RUNNING = "TezSubmitToRunningDag" ;
public static final String TEZ_BUILD_DAG = "TezBuildDag" ;
public static final String TEZ_SUBMIT_DAG = "TezSubmitDag" ;
public static final String TEZ_RUN_DAG = "TezRunDag" ;
public static final String TEZ_CREATE_VERTEX = "TezCreateVertex." ;
public static final String TEZ_RUN_VERTEX = "TezRunVertex." ;
public static final String TEZ_INITIALIZE_PROCESSOR = "TezInitializeProcessor" ;
public static final String TEZ_RUN_PROCESSOR = "TezRunProcessor" ;
public static final String TEZ_INIT_OPERATORS = "TezInitializeOperators" ;
public static final String LOAD_HASHTABLE = "LoadHashtable" ;
public static final String ORC_GET_SPLITS = "OrcGetSplits" ;
|
再来看一下其调用方式:
1
2
3
4
5
6
7
8
9
10
|
比如org.apache.hadoop.hive.ql.Driver类中的compile方法: static final private String CLASS_NAME = Driver. class .getName();
public int compile(String command, boolean resetTaskIds) {
PerfLogger perfLogger = PerfLogger.getPerfLogger(); //实例化一个PerfLogger的对象
....... perfLogger.PerfLogBegin( CLASS_NAME, PerfLogger.PARSE); //记录开始时间,传入本类的类名记录的方法(方法对应了PerfLogger类中对应的属性)
ParseDriver pd = new ParseDriver();
ASTNode tree = pd.parse(command, ctx);
tree = ParseUtils.findRootNonNullToken(tree);
perfLogger.PerfLogEnd( CLASS_NAME, PerfLogger.PARSE);
|
记录parse消耗时间的对应日志:
1
2
3
|
14 / 07 / 31 14 : 33 : 21 INFO log.PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
..... 14 / 07 / 31 14 : 33 : 21 INFO log.PerfLogger: </PERFLOG method=parse start= 1406788401914 end= 1406788401916 duration= 2 from=org.apache.hadoop.hive.ql.Driver>
|
3种方法可以根据需要选择使用。
本文转自菜菜光 51CTO博客,原文链接:http://blog.51cto.com/caiguangguang/1541771,如需转载请自行联系原作者