ExomeCQA 1.0
This software asseses the coverage in a WES dataset generated from multiple samples. Specifically, ExomeCQA aims to calculate CCS scores and UE scores automatically for any cohort WES dataset.
System Requirements
ExomeCQA will run on Mac OS or any Linux distribution. (Windows installation will require the use of Windows Bash or another Linux-based emulator; this software has not been tested on Windows however.)
ExomeCQA will require Bedtools (tested with version 2.X) to calculate genomic coverage for the desired target regions. Bedtools software could be downloaded from here.
Installation
Download the source code: exomeCQA.tar.gz
Open a command line terminal, move to the directory where you downloaded the package, and enter the following commands:
-
$ tar -vxf exomeCQA.tar.gz
$ cd exomeCQA
$ make
Usage
Pre-process and format requirements
ExomeCQA need to use coverage files as the input. Therefore, there are several preprocessing steps that need to be done first before running exomeCQA:
- Generate coverage files for each position in target regions you would like to assess by using the Bedtools "coverageBed" command:
- $coverageBed -abam xxx.bam -b xxx.bed -d > xxx_coverage.txt
*Note1: the syntax shown above to generate coerage files using BEDTools may be changed due to BEDTools update. Please check BEDTools documentation (here) to use correspnoding syntax
*Note2: the coverage files must be generated with name ending in .txt
*Note3: when generating coverage files by using Bedtools, please using bed files follow the columns order shown below:
#Chromosome_# #Start_pos #End_pos #Exon_index #Total_exon_numbers_in_gene #Strand #GeneName
- Generate index files for all coverage files by using program genIndex available in the ExomeCQA package:
- $genIndex xxx_coverage.txt > xxx_coverage.txt.idx
*Note: the index files must be ending as .idx with the same name of the coverage file. For example, if the coverage file name is abc.txt, the index file's name must be abc.txt.idx
- Please make sure that all coverage files and index files from all samples are in the same directory.
*Note: Since our program is featured as evaluating coverage based on multiple samples, the number of input sample must be greater than 1. If you want to evaluate a single Exome, please duplicate both coverage file and index files and put them into the same folder.
Run program
Run the program by opening a command line terminal and moving to the directory with all coverage files and index files as input. You also need to give the name of output files of metrics for exons and genes. The usage of exomeCQA is as below:
$ exomeCQA <folder_name> <output_file_for_exon_data> <output_file_for_gene_data> [chromosome_name]
parameters:
- chromosome_name : the number of chromosome to be assessed, should be in the format as "chr1", "chr2"..."chr22", or "all". The default value is "all".
For example, to run exomeCQA for chromosome 1 with coverage files in the folder named "all_targets" and save the result for exons in exon.chr1 and for genes in gene.chr1, run:
- $ exomeCQA all_targets exon.chr1 gene.chr1 chr1
Output
The output file will include a list of all exon regions or genes with CCS scores and UE scores. The meaning of output columns is shown as below (based on the order of output)
- #1: chromosome number
- #2: the start position of the exon
- #3: the end position of the exon
- #4: gene name the current exon is belonged to
- #5: the total number of exons in the gene
- #6: strand
- #7: the (order) number of exon region in the gene
- #8: the size of the exon (number of bases)
- #9: CCS score
- #10: the median average coverage of all samples of the exon region
- #11: the sample number with the highest average coverage of the exon
- #12: the sample number with the lowest average coverage of the exon
- #13: the number of peaks
- #14: the location of peaks and troughs at the exon (negative number means location of trough, positive number means location of peaks)
- #15: the features of each peak of the median coverage of all samples (hight, normalized height, width)
- #16: Unevenness score
More details on algorithm development, software implementation, and performance evaluation of Rescuer could be found in our paper.