Go into the directory exDay2LF

cd ~/Data/exDay2LF

There are 2 sets of paired-end reads: setA_1.fq & setA_2.fq and setB_1.fq & setB_2.fq with insert size of about 350bp plus a few scripts, reference and config files

Quality Control the files with fastqc by typing fastqc and open the file you want to analyse (or directly fastqc <filename>)

How many reads are there? what is their size?
What is their encoding?
Do you see a GC content bias or a contamination?
What is the quality of the reads?
Do you see a difference between sets or between reads pairs?

If quality is not good: use sickle, e.g,
sickle pe -f setA_1.fq -r setA_2.fq -t sanger -o setA_1qc.fq -p setA_2qc.fq -s setA_sqc.fq -q 20 -l 101 -x -n

if contamination appears: use fastx_clipper
fastx_clipper -Q 33 -a AAAAAAAAAAAAA -C -i setA_1.fq -o setA_1c.fq

Then start the assembly with either oases, soap or trinity

example oases
python ~/Application/oases_0.2.08/scripts/oases_pipeline.py -m 23 -M 35 -o pairedEnd -d " -shortPaired -separate -fastq setAc_1.fq setAc_2.fq"
The transcripts are in the directory oasesMerged.

example SOAPdenovo-trans
SOAPdenovo-Trans-31mer all -s configsoapA -K 31 -p 3 -d 2 -o setA31
The transcripts are in the .scafSeq files.

Don't try Trinity on this VM machine, the results are in this file for setB:

Check assembly stats with abyss-fac, have a look at the files, check one or two transcripts by blast

abyss-fac <filename>

What is the difference between trinity, soap and oases?
In terms of nr of transcripts, their size, their type
Last modified: Friday, 18 January 2013, 4:41 PM