一、GTF文件格式
Fields must be tab-separated. Also, all but the final field in each feature line must contain a value; "empty" columns should be denoted with a '.'
1.seqname - name of the chromosome or scaffold; chromosome names can be given with or without the 'chr' prefix. Important note: the seqname must be one used within Ensembl, i.e. a standard chromosome name or an Ensembl identifier such as a scaffold ID, without any additional content such as species or assembly. See the example GFF output below.
2.source- name of the program that generated this feature, or the data source (database or project name)
3.feature- feature type name, e.g. Gene, Variation, Similarity
4.start- Start position of the feature, with sequence numbering starting at 1.
5.end- End position of the feature, with sequence numbering starting at 1.
6.score- A floating point value.
7.strand- defined as + (forward) or - (reverse).
8.frame- One of '0', '1' or '2'. '0' indicates that the first base of the feature is the first base of a codon, '1' that the second base is the first base of a codon, and so on..
9.attribute- A semicolon-separated list of tag-value pairs, providing additional information about each feature.
1.染色体名
2.注释信息的来源,比如”Genescan”、”Genbank” 等,可以为空,为空用”.”点号代替
3.注释信息的类型,比如Gene、cDNA、mRNA等,或者是SO对应的编号
4、5.开始和结束位置
7.序列的方向, +表示正义链, -反义链 , ? 表示未知
8.阅读框:有数字0、1和2。0代表序列的第一个碱基为密码子的第一个碱基,1代表是密码子第二个,2代表第三个。