【笔记】JSON :COCO API 读取 COCO数据集

一、COCO数据集的结构

假定dataDir的目录结构:annotations,test2014,train2014,val2014

由于annotations文件是一个json文件,所以用json来看看数据基本结构

import json
dataDir=r'D:\data\coco\coco2014'
dataType='val2014'
annFile='{}/annotations/instances_{}.json'.format(dataDir,dataType)
data=json.load(open(annFile,'r'))

先看看最顶层的结构:

for k in data:
    print(k)
-------------------------------
info
images
licenses
annotations
categories

其中最重要的是三个:images,annotations ,categories

1、images的结构:

for k in data["images"][0]:
    print(k)
-----------------------------
license
file_name
coco_url
height
width
date_captured
flickr_url
id

2、annotations 的结构:

for k in data["annotations "][0]:
    print(k)
------------------------------------
segmentation
area
iscrowd
image_id
bbox
category_id
id

3、categories的结构:

for k in data["categories"][0]:
    print(k)
-----------------------------------
supercategory
id
name

二、cocoapi(cocoapi):

pycocotools下有三个模块:coco、cocoeval、mask、_mask。

1、coco模块:

# The following API functions are defined:
#  COCO       - COCO api class that loads COCO annotation file and prepare data structures.
#  getAnnIds  - Get ann ids that satisfy given filter conditions.
#  getCatIds  - Get cat ids that satisfy given filter conditions.
#  getImgIds  - Get img ids that satisfy given filter conditions.
#  loadAnns   - Load anns with the specified ids.
#  loadCats   - Load cats with the specified ids.
#  loadImgs   - Load imgs with the specified ids.
#  annToMask  - Convert segmentation in an annotation to binary mask.
#  showAnns   - Display the specified annotations.
#  loadRes    - Load algorithm results and create API for accessing them.
#  download   - Download COCO images from mscoco.org server.
# Throughout the API "ann"=annotation, "cat"=category, and "img"=image.
# Help on each functions can be accessed by: "help COCO>function".

COCO类定义了10个方法:

(1)获取标注id:

def getAnnIds(self, imgIds=[], catIds=[], areaRng=[], iscrowd=None):
        """
        Get ann ids that satisfy given filter conditions. default skips that filter
        :param imgIds  (int array)     : get anns for given imgs
               catIds  (int array)     : get anns for given cats
               areaRng (float array)   : get anns for given area range (e.g. [0 inf])
               iscrowd (boolean)       : get anns for given crowd label (False or True)
        :return: ids (int array)       : integer array of ann ids
        """

(2)获取类别id:

def getCatIds(self, catNms=[], supNms=[], catIds=[]):
        """
        filtering parameters. default skips that filter.
        :param catNms (str array)  : get cats for given cat names
        :param supNms (str array)  : get cats for given supercategory names
        :param catIds (int array)  : get cats for given cat ids
        :return: ids (int array)   : integer array of cat ids
        """

(3)获取图片id:

def getImgIds(self, imgIds=[], catIds=[]):
        '''
        Get img ids that satisfy given filter conditions.
        :param imgIds (int array) : get imgs for given ids
        :param catIds (int array) : get imgs with all given cats
        :return: ids (int array)  : integer array of img ids
        '''

(4)加载标注:

def loadAnns(self, ids=[]):
        """
        Load anns with the specified ids.
        :param ids (int array)       : integer ids specifying anns
        :return: anns (object array) : loaded ann objects
        """

(5)加载类别:

def loadCats(self, ids=[]):
        """
        Load cats with the specified ids.
        :param ids (int array)       : integer ids specifying cats
        :return: cats (object array) : loaded cat objects
        """

(6)加载图片:

def loadImgs(self, ids=[]):
        """
        Load anns with the specified ids.
        :param ids (int array)       : integer ids specifying img
        :return: imgs (object array) : loaded img objects
        """

(7)用matplotlib在图片上显示标注:

def showAnns(self, anns):
        """
        Display the specified annotations.
        :param anns (array of object): annotations to display
        :return: None
        """

(8)加载结果文件:

def loadRes(self, resFile):
        """
        Load result file and return a result api object.
        :param   resFile (str)     : file name of result file
        :return: res (obj)         : result api object
        """

(9)下载数据集(国内用这个真的行吗?还是百度网盘更好吧?):

def download(self, tarDir = None, imgIds = [] ):
        '''
        Download COCO images from mscoco.org server.
        :param tarDir (str): COCO results directory name
               imgIds (list): images to be downloaded
        :return:
        '''

(10)ann转为rle格式:

def annToRLE(self, ann):
        """
        Convert annotation which can be polygons, uncompressed RLE to RLE.
        :return: binary mask (numpy 2D array)
        """

(11)获取mask:

def annToMask(self, ann):
        """
        Convert annotation which can be polygons, uncompressed RLE, or RLE to binary mask.
        :return: binary mask (numpy 2D array)
        """

2、mask模块下定义了四个函数:

def encode(bimask):
def decode(rleObjs):
def area(rleObjs):
def toBbox(rleObjs):

3、cocoeval模块定义了COCOeval和Params类:

    # The usage for CocoEval is as follows:
    #  cocoGt=..., cocoDt=...       # load dataset and results
    #  E = CocoEval(cocoGt,cocoDt); # initialize CocoEval object
    #  E.params.recThrs = ...;      # set parameters as desired
    #  E.evaluate();                # run per image evaluation
    #  E.accumulate();              # accumulate per image results
    #  E.summarize();               # display summary metrics of results

4、更底层的模块_mask:(略)

三、示例(jupyter notebook):

%matplotlib inline
from pycocotools.coco import COCO
from pycocotools.mask import encode,decode,area,toBbox

import numpy as np
import skimage.io as io
import matplotlib.pyplot as plt
import pylab
pylab.rcParams['figure.figsize'] = (8.0, 10.0)

dataDir=r'D:\data\coco\coco2014'
dataType='val2014'
annFile='{}/annotations/instances_{}.json'.format(dataDir,dataType)

coco=COCO(annFile)

imgIds = coco.getImgIds()
imags=coco.loadImgs(imgIds)

annIds = coco.getAnnIds(imgIds=imgIds)
ann = coco.loadAnns(annIds)[0]

mask=coco.annToMask(ann)
rle=coco.annToRLE(ann)

rle=encode(mask)
mask=decode(rle)

area(rle)
toBbox(rle)

四、segmentation的两种格式:RLE(run-length encoding)和polygon:

1、iscrowd=1时表示格式是RLE,iscrowd=0时表示格式是polygon:

polygon:

{"segmentation": [[499.71, 397.28,......342.71, 172.31]], 
"area": 43466.12825, 
"iscrowd": 0, 
"image_id": 182155, 
"bbox": [338.89, 51.69, 205.82, 367.61], 
"category_id": 1, 
"id": 1248258},

RLE:

{"segmentation": {"counts": [66916, 6, 587,..... 1, 114303], "size": [594, 640]}, 
"area": 6197, 
"iscrowd": 1, 
"image_id": 284445, 
"bbox": [112, 322, 335, 94], 
"category_id": 1, 
"id": 9.001002844e+11}

关于这两个问题的讨论见The RLE or Polygon format of "segmentation".

coco数据集好像都是polygon格式,而understanding_cloud_organization就用的是RLE。

2、polygon与mask之间的转换:

import cv2

def mask2polygon(mask):
    contours, hierarchy = cv2.findContours((mask).astype(np.uint8), cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
    # mask_new, contours, hierarchy = cv2.findContours((mask).astype(np.uint8), cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
    segmentation = []
    for contour in contours:
        contour_list = contour.flatten().tolist()
        if len(contour_list) > 4:# and cv2.contourArea(contour)>10000
            segmentation.append(contour_list)
    return segmentation

def polygons_to_mask(img_shape, polygons):
    mask = np.zeros(img_shape, dtype=np.uint8)
    polygons = np.asarray(polygons, np.int32) # 这里必须是int32,其他类型使用fillPoly会报错
    shape=polygons.shape
    polygons=polygons.reshape(shape[0],-1,2)
    cv2.fillPoly(mask, polygons,color=1) # 非int32 会报错
    return mask
#test------------------------------
import numpy as np
mask = np.ones((100, 100))
for i in range(10):
    for j in range(10):
        mask[i][j]=0
mask2polygon(mask)
--------------------------
[[10, 0, 10, 9, 9, 10, 0, 10, 0, 99, 99, 99, 99, 0]]

另外的方法,binary_mask_to_polygon(没试过,供参考)。

3、RLE与mask之间的转换:

def mask2rle(img):
    '''
    img: numpy array, 1 - mask, 0 - background
    Returns run length as string formated
    '''
    pixels= img.T.flatten()
    pixels = np.concatenate([[0], pixels, [0]])
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
    runs[1::2] -= runs[::2]
    return ' '.join(str(x) for x in runs)

def rle2mask(rle, input_shape):
    width, height = input_shape[:2]
    
    mask= np.zeros( width*height ).astype(np.uint8)
    
    array = np.asarray([int(x) for x in rle.split()])
    starts = array[0::2]
    lengths = array[1::2]

    current_position = 0
    for index, start in enumerate(starts):
        mask[int(start):int(start+lengths[index])] = 1
        current_position += lengths[index]   
    return mask.reshape(height, width).T

4、计算mask的bbox:

def bounding_box(img):
    # return max and min of a mask to draw bounding box
    rows = np.any(img, axis=1)
    cols = np.any(img, axis=0)
    rmin, rmax = np.where(rows)[0][[0, -1]]
    cmin, cmax = np.where(cols)[0][[0, -1]]

    return rmin, rmax, cmin, cmax

五、其他格式的数据集转化为coco格式数据集

参看一个示例:convert-dataset-to-coco-format-tools

上一篇:数据库基础三____SQL语言


下一篇:HDU 1465 不容易系列之一(错排,递归)