我的环境
DPM源码版本:voc-release3.1
VOC开发包版本:VOC2007_devkit_08-Jun
使用的训练数据集:VOC2007
Matlab版本:MatlabR2012b
c++编译器:VS2010
系统:Win7 32位
为什么不使用voc-release4.01呢?因为第4版中加入了目标检测语法(Grammars),并且还使用了非对称部件分部等等,虽然准确度提高了,但源码变得更加复杂,不利于源码分析。而相对来说第3版精简了不少,更容易分析。
首先需要下载voc-release3.1和VOCdevkit开发包:
Deformable Part Model 第三版voc-release3.1下载:http://cs.brown.edu/~pff/latent-release3/
PASCAL VOC 2007 数据集及开发包下载:http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/index.html
有关Deformable Part Model参见论文
A Discriminatively Trained, Multiscale,Deformable Part Model[CVPR 2008]的中文翻译
Object Detection with Discriminatively Trained Part Based Models[PAMI 2010]的中文翻译
及 有关可变形部件模型(Deformable Part Model)的一些说明
Pedro Felzenszwalb的个人主页:http://cs.brown.edu/~pff/
1、修改globals.m中的一些全局变量(主要是目录设定)
cachedir= ‘D:\DPMtrain\VOCCache\‘;
% 训练好的模型结果和中间数据的文件目录
tmpdir =‘D:\DPMtrain\VOCtemp\‘;
% 训练中用到的临时文件的目录,临时文件可能会很大
VOCdevkit =[‘H:\文档文件\工作\█计算机视觉█\数据集\PascalVOC\VOC2007\VOCdevkit‘];
% PASCAL VOC 开发包目录
2、修改VOCdevkit开发包中的VOCinit.m文件,设定数据集目录
我们使用VOC2007数据集,所以将VOC2006标识设为false(默认就是false)。
如果将解压出来的VOC2007数据集的文件夹放在VOCdevkit目录下的话,就不用再修改VOCinit.m中的目录设定了,因为代码中默认就是这样的目录安排。但如果想把数据集放到其他地方,可以修改VOCopts.datadir目录,指向数据集所在目录。
我是直接将VOC2007文件夹放在了VOCdevkit下,目录结构如下:
-VOCdevkit
-local
-VOC2006
-VOC2007
-results
-VOC2006
-VOC2007
-VOC2007
-Annotations
-ImageSets
-JPEGImages
-SegmentationClass
-SegmentationObject
-VOCcode
3、pascal_data.m 从PASCAL数据集中获取指定类别目标的训练数据
分析代码时,为了避免加载整个数据集,所以我在VOCdevkit\VOC2007\ImageSets\Main中新建了两个图片文件列表:
trainval_smallset.txt和train_smallset.txt
每个里面只有一两百个文件名,分别拷贝自trainval.txt和train.txt。
对应的,在pascal_data.m中修改下代码,加载我新建的小文件集:
加载正样本列表改为:
ids= textread(sprintf(VOCopts.imgsetpath, ‘trainval_smallset‘), ‘%s‘);
负样本列表对应修改。
这里要注意下:VOCdevkit\VOC2007\ImageSets\Main中有几个列表文件容易混淆:
train.txt 是所有用来训练的图片文件的文件名列表
trianval.txt是所有用来训练和验证的图片文件的文件名列表
val.txt是所有用来验证的图片文件的文件名列表
上面三个集合对应所有目标类别的训练和验证。
还有几个train_train.txt,train_trainval.txt,train.val.txt
这三个是“火车”类别的训练、训练+验证、验证图片集,
所以要分清楚。
4、修改rewritedat.m文件中的两个系统调用命令
在matlab命令行中运行pascal(‘person’,1),即训练含一个组件(component)模型的人体类别DPM模型,看看出什么错误,一点一点来修改。
错误如下:
‘mv‘ 不是内部或外部命令,也不是可运行的程序
Error in rewritedat (line 38)
所以来修改下rewritedat.m文件
将第13行的 unix([‘mv ‘ datfile ‘ ‘ oldfile]);
替换为:system([‘move ‘ datfile ‘ ‘ oldfile]);
第44行的 unix([‘cp ‘ inffile ‘ ‘ oldfile]);
替换为:system([‘copy ‘ inffile ‘ ‘ oldfile]);
5、修改train.m文件
修改完rewritedat.m文件后,再次调用pascal(‘person’,1),这次的问题是:
executing: ./learn 0.0020 1.0000 D:\DPMtrain\VOCtemp\person.hdr D:\DPMtrain\VOCtemp\person.dat D:\DPMtrain\VOCtemp\person.mod D:\DPMtrain\VOCtemp\person.inf D:\DPMtrain\VOCtemp\person.lob
‘.‘ 不是内部或外部命令,也不是可运行的程序或批处理文件。
这又是一个linux下的系统调用,所以需要修改train.m,
将第123行的 cmd = sprintf(‘./learn %.4f %.4f %s %s %s %s %s‘, ...
C, J, hdrfile, datfile, modfile, inffile, lobfile);
命令字符串中的./去掉,变成:
cmd = sprintf(‘learn %.4f %.4f %s %s %s %s %s‘, ...
C, J, hdrfile, datfile, modfile, inffile, lobfile);
将第128行的 status = unix(cmd);
替换为: status = system(cmd);
修改完这部分后,还是会提示‘learn‘ 不是内部或外部命令,也不是可运行的程序 ,因为我们还没有编译learn.cc呢。
6、编译learn.cc
首先将learn.cc改名为learn.cpp,在VS2010中新建一个空的控制台工程,添加原文件learn.cpp
尝试编译,错误提示为:Cannot open include file: ‘sys/time.h‘: No such file or directoryc:\test\dpm_learn\dpm_learn\learn.cpp5
这是linux中的目录格式,直接将第5行的#include <sys/time.h> 改为 #include <time.h>,再次编译,然后就出现了一大堆错误,有以下几点:
(1) windows中的time.h文件中没有结构体timeval的定义,也没有gettimeofday函数,从网上找了一段代码添加到learn.cpp中。
(2) 上一步加入的代码需要添加头文件windows.h,而包含进这个头文件后,就包含了系统中定义的min和max函数,所以需要注释掉learn.cpp中定义的min和max函数,否则出错。
(3) windows中没有drand48和srand48的定义,把网友博客中自己写的这两个函数添加进去。
(4) windows中没有INFINITY的定义,自己定义一个。
(5) main函数中的int buf[labelsize+2];出错,原因是VS中的c++编译器不允许用变量指定数组长度,改为使用new动态分配:
int *buf = new int[labelsize+2]; ,同时,在同一作用域最后delete [] buf;
以上5点修改完后,就没有错误了,当然还有些警告,不过不用管,编译运行,生成learn.exe可执行文件,拷贝到voc-release3.1目录下,等待训练时被matlab代码通过系统调用来执行。
修改完后的完整learn.cpp文件为:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <math.h> //#include <sys/time.h> #include <errno.h> #include <time.h> //windows中用以替代sys/time.h #include <windows.h>//windows中用以替代sys/time.h /* * Optimize LSVM objective function via gradient descent. * * We use an adaptive cache mechanism. After a negative example * scores beyond the margin multiple times it is removed from the * training set for a fixed number of iterations. */ // Data File Format // EXAMPLE* // // EXAMPLE: // long label ints // blocks int // dim int // DATA{blocks} // // DATA: // block label float // block data floats // // Internal Binary Format // len int (byte length of EXAMPLE) // EXAMPLE <see above> // unique flag byte // number of iterations #define ITER 5000000 // small cache parameters #define INCACHE 3 #define WAIT 10 // error checking #define check(e) (e ? (void)0 : (printf("%s:%u error: %s\n%s\n", __FILE__, __LINE__, #e, strerror(errno)), exit(1))) // number of non-zero blocks in example ex #define NUM_NONZERO(ex) (((int *)ex)[labelsize+1]) // float pointer to data segment of example ex #define EX_DATA(ex) ((float *)(ex + sizeof(int)*(labelsize+3))) // class label (+1 or -1) for the example #define LABEL(ex) (((int *)ex)[1]) // block label (converted to 0-based index) #define BLOCK_IDX(data) (((int)data[0])-1) //windows中没有INFINITY的定义,自己定义一个 #define INFINITY 0xFFFFFFFFF int labelsize; int dim; //windows下没有gettimeofday函数,从网上找的一个替代函数 int gettimeofday(struct timeval *tp, void *tzp) { time_t clock; struct tm tm; SYSTEMTIME wtm; GetLocalTime(&wtm); tm.tm_year = wtm.wYear - 1900; tm.tm_mon = wtm.wMonth - 1; tm.tm_mday = wtm.wDay; tm.tm_hour = wtm.wHour; tm.tm_min = wtm.wMinute; tm.tm_sec = wtm.wSecond; tm. tm_isdst = -1; clock = mktime(&tm); tp->tv_sec = clock; tp->tv_usec = wtm.wMilliseconds * 1000; return (0); } //参照网上自己写的drand48和srand48函数 #define MNWZ 0x100000000 #define ANWZ 0x5DEECE66D #define CNWZ 0xB16 static unsigned long long seed = 1; double drand48(void) { seed = (ANWZ * seed + CNWZ) & 0xFFFFFFFFFFFFLL; unsigned int x = seed >> 16; return ((double)x / (double)MNWZ); } //static unsigned long long seed = 1; void srand48(unsigned int i) { seed = (((long long int)i) << 16) | rand(); } // comparison function for sorting examples int comp(const void *a, const void *b) { // sort by extended label first, and whole example second... int c = memcmp(*((char **)a) + sizeof(int), *((char **)b) + sizeof(int), labelsize*sizeof(int)); if (c) return c; // labels are the same int alen = **((int **)a); int blen = **((int **)b); if (alen == blen) return memcmp(*((char **)a) + sizeof(int), *((char **)b) + sizeof(int), alen); return ((alen < blen) ? -1 : 1); } // a collapsed example is a sequence of examples struct collapsed { char **seq; int num; }; // set of collapsed examples struct data { collapsed *x; int num; int numblocks; int *blocksizes; float *regmult; float *learnmult; }; // seed the random number generator with the current time void seed_time() { struct timeval tp; check(gettimeofday(&tp, NULL) == 0); srand48((long)tp.tv_usec); } //包含include.h后,系统中有min和max函数的定义,所以注释掉下面的定义,否则出错 //static inline double min(double x, double y) { return (x <= y ? x : y); } //static inline double max(double x, double y) { return (x <= y ? y : x); } // gradient descent void gd(double C, double J, data X, double **w, double **lb) { int num = X.num; // state for random permutations int *perm = (int *)malloc(sizeof(int)*X.num); check(perm != NULL); // state for small cache int *W = (int *)malloc(sizeof(int)*num); check(W != NULL); for (int j = 0; j < num; j++) W[j] = 0; int t = 0; while (t < ITER) { // pick random permutation for (int i = 0; i < num; i++) perm[i] = i; for (int swapi = 0; swapi < num; swapi++) { int swapj = (int)(drand48()*(num-swapi)) + swapi; int tmp = perm[swapi]; perm[swapi] = perm[swapj]; perm[swapj] = tmp; } // count number of examples in the small cache int cnum = 0; for (int i = 0; i < num; i++) { if (W[i] <= INCACHE) cnum++; } for (int swapi = 0; swapi < num; swapi++) { // select example int i = perm[swapi]; collapsed x = X.x[i]; // skip if example is not in small cache if (W[i] > INCACHE) { W[i]--; continue; } // learning rate double T = t + 1000.0; double rateX = cnum * C / T; double rateR = 1.0 / T; if (t % 10000 == 0) { printf("."); fflush(stdout); } t++; // compute max over latent placements int M = -1; double V = 0; for (int m = 0; m < x.num; m++) { double val = 0; char *ptr = x.seq[m]; float *data = EX_DATA(ptr); int blocks = NUM_NONZERO(ptr); for (int j = 0; j < blocks; j++) { int b = BLOCK_IDX(data); data++; for (int k = 0; k < X.blocksizes[b]; k++) val += w[b][k] * data[k]; data += X.blocksizes[b]; } if (M < 0 || val > V) { M = m; V = val; } } // update model for (int j = 0; j < X.numblocks; j++) { double mult = rateR * X.regmult[j] * X.learnmult[j]; for (int k = 0; k < X.blocksizes[j]; k++) { w[j][k] -= mult * w[j][k]; } } char *ptr = x.seq[M]; int label = LABEL(ptr); if (label * V < 1.0) { W[i] = 0; float *data = EX_DATA(ptr); int blocks = NUM_NONZERO(ptr); for (int j = 0; j < blocks; j++) { int b = BLOCK_IDX(data); double mult = (label > 0 ? J : -1) * rateX * X.learnmult[b]; data++; for (int k = 0; k < X.blocksizes[b]; k++) w[b][k] += mult * data[k]; data += X.blocksizes[b]; } } else if (label == -1) { if (W[i] == INCACHE) W[i] = WAIT; else W[i]++; } } // apply lowerbounds for (int j = 0; j < X.numblocks; j++) { for (int k = 0; k < X.blocksizes[j]; k++) { w[j][k] = max(w[j][k], lb[j][k]); } } } free(perm); free(W); } // score examples double *score(data X, char **examples, int num, double **w) { double *s = (double *)malloc(sizeof(double)*num); check(s != NULL); for (int i = 0; i < num; i++) { s[i] = 0.0; float *data = EX_DATA(examples[i]); int blocks = NUM_NONZERO(examples[i]); for (int j = 0; j < blocks; j++) { int b = BLOCK_IDX(data); data++; for (int k = 0; k < X.blocksizes[b]; k++) s[i] += w[b][k] * data[k]; data += X.blocksizes[b]; } } return s; } // merge examples with identical labels void collapse(data *X, char **examples, int num) { collapsed *x = (collapsed *)malloc(sizeof(collapsed)*num); check(x != NULL); int i = 0; x[0].seq = examples; x[0].num = 1; for (int j = 1; j < num; j++) { if (!memcmp(x[i].seq[0]+sizeof(int), examples[j]+sizeof(int), labelsize*sizeof(int))) { x[i].num++; } else { i++; x[i].seq = &(examples[j]); x[i].num = 1; } } X->x = x; X->num = i+1; } int main(int argc, char **argv) { seed_time(); int count; data X; // command line arguments check(argc == 8); double C = atof(argv[1]); double J = atof(argv[2]); char *hdrfile = argv[3]; char *datfile = argv[4]; char *modfile = argv[5]; char *inffile = argv[6]; char *lobfile = argv[7]; // read header file FILE *f = fopen(hdrfile, "rb"); check(f != NULL); int header[3]; count = fread(header, sizeof(int), 3, f); check(count == 3); int num = header[0]; labelsize = header[1]; X.numblocks = header[2]; X.blocksizes = (int *)malloc(X.numblocks*sizeof(int)); count = fread(X.blocksizes, sizeof(int), X.numblocks, f); check(count == X.numblocks); X.regmult = (float *)malloc(sizeof(float)*X.numblocks); check(X.regmult != NULL); count = fread(X.regmult, sizeof(float), X.numblocks, f); check(count == X.numblocks); X.learnmult = (float *)malloc(sizeof(float)*X.numblocks); check(X.learnmult != NULL); count = fread(X.learnmult, sizeof(float), X.numblocks, f); check(count == X.numblocks); check(num != 0); fclose(f); printf("%d examples with label size %d and %d blocks\n", num, labelsize, X.numblocks); printf("block size, regularization multiplier, learning rate multiplier\n"); dim = 0; for (int i = 0; i < X.numblocks; i++) { dim += X.blocksizes[i]; printf("%d, %.2f, %.2f\n", X.blocksizes[i], X.regmult[i], X.learnmult[i]); } // read examples f = fopen(datfile, "rb"); check(f != NULL); printf("Reading examples\n"); char **examples = (char **)malloc(num*sizeof(char *)); check(examples != NULL); for (int i = 0; i < num; i++) { // we use an extra byte in the end of each example to mark unique // we use an extra int at the start of each example to store the // example‘s byte length (excluding unique flag and this int) //int buf[labelsize+2]; //windows下不支持这样分配,换成new动态分配 int *buf = new int[labelsize+2]; //动态分配 count = fread(buf, sizeof(int), labelsize+2, f); check(count == labelsize+2); // byte length of an example‘s data segment int len = sizeof(int)*(labelsize+2) + sizeof(float)*buf[labelsize+1]; // memory for data, an initial integer, and a final byte examples[i] = (char *)malloc(sizeof(int)+len+1); check(examples[i] != NULL); // set data segment‘s byte length ((int *)examples[i])[0] = len; // set the unique flag to zero examples[i][sizeof(int)+len] = 0; // copy label data into example for (int j = 0; j < labelsize+2; j++) ((int *)examples[i])[j+1] = buf[j]; // read the rest of the data segment into the example count = fread(examples[i]+sizeof(int)*(labelsize+3), 1, len-sizeof(int)*(labelsize+2), f); check(count == len-sizeof(int)*(labelsize+2)); delete [] buf; //删除buf } fclose(f); printf("done\n"); // sort printf("Sorting examples\n"); char **sorted = (char **)malloc(num*sizeof(char *)); check(sorted != NULL); memcpy(sorted, examples, num*sizeof(char *)); qsort(sorted, num, sizeof(char *), comp); printf("done\n"); // find unique examples int i = 0; int len = *((int *)sorted[0]); sorted[0][sizeof(int)+len] = 1; for (int j = 1; j < num; j++) { int alen = *((int *)sorted[i]); int blen = *((int *)sorted[j]); if (alen != blen || memcmp(sorted[i] + sizeof(int), sorted[j] + sizeof(int), alen)) { i++; sorted[i] = sorted[j]; sorted[i][sizeof(int)+blen] = 1; } } int num_unique = i+1; printf("%d unique examples\n", num_unique); // collapse examples collapse(&X, sorted, num_unique); printf("%d collapsed examples\n", X.num); // initial model double **w = (double **)malloc(sizeof(double *)*X.numblocks); check(w != NULL); f = fopen(modfile, "rb"); for (int i = 0; i < X.numblocks; i++) { w[i] = (double *)malloc(sizeof(double)*X.blocksizes[i]); check(w[i] != NULL); count = fread(w[i], sizeof(double), X.blocksizes[i], f); check(count == X.blocksizes[i]); } fclose(f); // lower bounds double **lb = (double **)malloc(sizeof(double *)*X.numblocks); check(lb != NULL); f = fopen(lobfile, "rb"); for (int i = 0; i < X.numblocks; i++) { lb[i] = (double *)malloc(sizeof(double)*X.blocksizes[i]); check(lb[i] != NULL); count = fread(lb[i], sizeof(double), X.blocksizes[i], f); check(count == X.blocksizes[i]); } fclose(f); // train printf("Training"); gd(C, J, X, w, lb); printf("done\n"); // save model printf("Saving model\n"); f = fopen(modfile, "wb"); check(f != NULL); for (int i = 0; i < X.numblocks; i++) { count = fwrite(w[i], sizeof(double), X.blocksizes[i], f); check(count == X.blocksizes[i]); } fclose(f); // score examples printf("Scoring\n"); double *s = score(X, examples, num, w); // Write info file printf("Writing info file\n"); f = fopen(inffile, "w"); check(f != NULL); for (int i = 0; i < num; i++) { int len = ((int *)examples[i])[0]; // label, score, unique flag count = fprintf(f, "%d\t%f\t%d\n", ((int *)examples[i])[1], s[i], (int)examples[i][sizeof(int)+len]); check(count > 0); } fclose(f); printf("Freeing memory\n"); for (int i = 0; i < X.numblocks; i++) { free(w[i]); free(lb[i]); } free(w); free(lb); free(s); for (int i = 0; i < num; i++) free(examples[i]); free(examples); free(sorted); free(X.x); free(X.blocksizes); free(X.regmult); free(X.learnmult); return 0; }
7 数组下标越界错误
数组越界错误有好几处,我注意到的有:
(1)pascal_train.m中
合并模型并进行LSVM训练部分
即注释 %merge models and train using latent detections & hard negatives下的
model = train(cls, model, pos, neg(1:200), 0, 0, 2, 2, 2^28, true, 0.7);
原因是这里我的负样本集neg中的负样本数目没有达到200个,所以这里改为:
model = train(cls, model, pos, neg(1:min(length(neg),200)), 0, 0, 2, 2, 2^28, true, 0.7);
还有添加部件更新模型部分
即注释% add parts and update models using latent detections & hard negatives.下的两处调用train函数的地方,
都将neg(1:200)改为neg(1:min(length(neg),200))
(2)网友pozen提出的rewritedat.m中可能出现的下标越界情况,
将28行左右的dim = info(end);改为:
if length(info) == 0
dim = 0;
else
dim = info(end);
end
将38行左右的dim = y(end);改为:
if length(y) == 0
dim = 0;
else
dim = y(end);
end
参考:
http://blog.csdn.net/pozen/article/details/7103412
http://blog.csdn.net/dreamd1987/article/details/7399151
自己用少量数据训练了一个单组件人体模型,截取trainval.txt中的前50个图片文件名做正样本,train.txt中的前300个做负样本,经过pascal.data函数处理后,获得了含176个负样本的neg数组,含45个正样本的pos数组。learn.cc中的迭代次数我没改,还是每次train迭代500万次(感觉时间都花在这里了,如果只是做测试的话,可以修改learn.cc中的迭代次数ITER值),训练过程用了大概1小时左右吧,训练完后用PASCAL开发包中的评价函数做了评价,正确率和召回率都是0,平均精度AP也是0,在预料之中。
训练完后,最终结果在cachedir目录中,很多中间数据也在这个目录中,如下:
其中person_final.mat就是训练好的最终模型。源码中训练的每个阶段都会将中间数据保存下来,所以即使某一阶段出现了错误,下次重新运行时自动加载上次保存的数据,而不用再次计算,非常方便。
训练出来的模型的可视化如下:
在windows下运行Felzenszwalb的Deformable Part Model(DPM)源码voc-release3.1来训练自己的模型,布布扣,bubuko.com
在windows下运行Felzenszwalb的Deformable Part Model(DPM)源码voc-release3.1来训练自己的模型