bedtools 每天都会用到的工具

详细的使用说明:http://bedtools.readthedocs.org/en/latest/

Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersectmergecountcomplement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.

Summary of available tools.

bedtools support a wide range of operations for interrogating and manipulating genomic features. The table below summarizes the tools available in the suite.

Utility Description
annotate Annotate coverage of features from multiple files.
bamtobed Convert BAM alignments to BED (& other) formats.
bamtofastq Convert BAM records to FASTQ records.
bed12tobed6 Breaks BED12 intervals into discrete BED6 intervals.
bedpetobam Convert BEDPE intervals to BAM records.
bedtobam Convert intervals to BAM records.
closest Find the closest, potentially non-overlapping interval.
cluster Cluster (but don’t merge) overlapping/nearby intervals.
complement Extract intervals _not_ represented by an interval file.
coverage Compute the coverage over defined intervals.
expand Replicate lines based on lists of values in columns.
flank Create new intervals from the flanks of existing intervals.
genomecov Compute the coverage over an entire genome.
getfasta Use intervals to extract sequences from a FASTA file.
groupby Group by common cols. & summarize oth. cols. (~ SQL “groupBy”)
igv Create an IGV snapshot batch script.
intersect Find overlapping intervals in various ways.
jaccard Calculate the Jaccard statistic b/w two sets of intervals.
links Create a HTML page of links to UCSC locations.
makewindows Make interval “windows” across a genome.
map Apply a function to a column for each overlapping interval.
maskfasta Use intervals to mask sequences from a FASTA file.
merge Combine overlapping/nearby intervals into a single interval.
multicov Counts coverage from multiple BAMs at specific intervals.
multiinter Identifies common intervals among multiple interval files.
nuc Profile the nucleotide content of intervals in a FASTA file.
overlap Computes the amount of overlap from two intervals.
pairtobed Find pairs that overlap intervals in various ways.
pairtopair Find pairs that overlap other pairs in various ways.
random Generate random intervals in a genome.
reldist Calculate the distribution of relative distances b/w two files.
shuffle Randomly redistribute intervals in a genome.
slop Adjust the size of intervals.
sort Order the intervals in a file.
subtract Remove intervals based on overlaps b/w two files.
tag Tag BAM alignments based on overlaps with interval files.
unionbedg Combines coverage intervals from multiple BEDGRAPH files.
window

Find overlapping intervals within a window around an interval.

安装: yum install BEDTools

1, 将bam文件(tophat得到的结果)转化为fastq

先将比对得到的accepted_hits.bam和unmapped.bam合并

samtools  merge RC6-1_ATTCCT_L005.bam accepted_hits.bam unmapped.bam

得到合并后的RC6-1_ATTCCT_L005.bam文件

将该bam文件按照reads名称排序:

samtools_0.1.18 sort -n RC6-1_ATTCCT_L005.bam RC6-1_ATTCCT_L005.sorted

得到RC6-1_ATTCCT_L005.sorted.bam文件

最后用bedtools转化

bedtools bamtofastq -i RC6-1_ATTCCT_L005.sorted.bam -fq RC6-1_ATTCCT_L005_R1.fastq -fq2 RC6-1_ATTCCT_L005_R2.fastq

得到双端的fastq文件。

上一篇:iOS 常见错误:CALayer position contains NaN: [14 nan]


下一篇:@ResponseBody返回中文乱码