Sponsored Content
Full Discussion: Combining 3 fastq files
Top Forums Shell Programming and Scripting Combining 3 fastq files Post 302730723 by ljk on Tuesday 13th of November 2012 01:43:38 PM
Old 11-13-2012
Combining 3 fastq files

Hello,
I am working with next-gen short-read sequence data, which we receive in 3 fastq files. These are arranged in 4-line groups for each read:
line1: read identifier, beginning, e.g., "@HWI-ST1342..."
line2: DNA sequence, for files 1 and 2, 101 characters, for file 3, 7 chars.
line3: "+"
line4: quality score codes equaling line 2 in length.

There are ~160 million reads in total per file, so quite big files.

I need to compile the data from all three files, which are in the same order and have the same read identifier between the files. So what I need to do is:

line1: identifier
line2: File1sequenceFile2sequenceFile3sequence
line3: "+"
line4: File1qualFile2qualFile3qual

Can anyone suggest an efficient way of doing this with shell commands?

thanks a lot!
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

combining files

how will i combine these 2 files below, with the desired output specified below: file1: one two three four file2: red blue yellow green file3: aaa bbb ccc ddd (3 Replies)
Discussion started by: apalex
3 Replies

2. UNIX for Dummies Questions & Answers

Combining files

Hi, is there a way to combine 2 files together, joining line 1 from file A with line 1 from file B, line 2 from A with line 2 from B etc. File A File B 1 4 2 5 3 6 Combined result = File C 14 25 36 (2 Replies)
Discussion started by: Enda Martin
2 Replies

3. Shell Programming and Scripting

Combining Two Files

I have two files which contain data from two different transactions in the same format: <Name> - <Count> My goal is to end up with data in this format after combining the two: <Name> - <Count1> - <Count2> Is this possible to do with awk, or is there something better? Thanks... (3 Replies)
Discussion started by: bat711
3 Replies

4. Shell Programming and Scripting

Combining Two Files

Could someone help me reduce the number of runs for a shell program I created? I have two text files below: $ more list1.txt 01 AAA 02 BBB 03 CCC 04 DDD $ more list2.txt 01 EEE 02 FFF 03 GGG I want to combine the lines with the same number to get the below: 01 AAA 01 EEE 02... (4 Replies)
Discussion started by: stevefox
4 Replies

5. UNIX for Dummies Questions & Answers

combining two files

Hi Gurus, I have 2 files: File1 Filename1 xx Filename1 yy Filename1 Total Filename2 xx Filename2 yy Filename2 zz Filename2 Total Filename3 xx Filename3 Total and File2: Filename1 10296 xxx Date: 09/01/08 Filename2 10296 xxx Date: 09/05/08... (36 Replies)
Discussion started by: rock1
36 Replies

6. UNIX for Dummies Questions & Answers

Need Help in reading N days files from a Directory & combining the files

Hi All, Request your expertise in tackling one requirement in my project,(i dont have much expertise in Shell Scripting). The requirement is as below, 1) We store the last run date of a process in a file. When the batch run the next time, it should read this file, get the last run date from... (1 Reply)
Discussion started by: dsfreddie
1 Replies

7. Shell Programming and Scripting

Combining 2 files

i am having 2 files like this file 1 1, 2, 3, 4, file2 5, 6, 7, 8, what i want do is like this i want to put all the contents for file 2 after file 1,means adding column in file1 (5 Replies)
Discussion started by: sagar_1986
5 Replies

8. Shell Programming and Scripting

Combining files

Hi I have about 108 files (text files) that end with .avg and each one of these files have a distinct name that describes what is in the file. In each file there is a set of 80 values that are tab separated. I want to combine all 108 files into ONE main file. So each file is named: 1.avg... (5 Replies)
Discussion started by: phil_heath
5 Replies

9. UNIX for Dummies Questions & Answers

Diff command on two Fastq.gz files

Hello. I have to compare two different fastq.gz files that I concatenated, and then zipped into a new merge fastq.gz file. The files that need to be merged are: Sample-136-P_S7_L001_R1_001.fastq.gz and Sample -136-P_S7_L002_R1_001.fastq.gz They were meged to a new file called:... (1 Reply)
Discussion started by: arcolombo698
1 Replies

10. UNIX for Beginners Questions & Answers

Comparing fastq files and outputting common records

I have two files: File_1: @M04961:22:000000000-B5VGJ:1:1101:9280:7106 1:N:0:86 GGCATGAAAACATACAAACCGTCTTTCCAGAAATTGTTCCAAGTATCGGCAACAGCTTTATCAATACCATGAAAAATATCAACCACACCAGAAGCAGCAT + GGGGGGGGGGGGGGGGGCCGGGGGF,EDFFGEDFG,@DGGCGGEGGG7DCGGGF68CGFFFGGGG@CGDGFFDFEFEFF:30CGAFFDFEFF8CAF;;8F ... (3 Replies)
Discussion started by: Xterra
3 Replies
srf2fastq(1)							   Staden io_lib						      srf2fastq(1)

NAME
srf2fastq - Converts SRF files to Sanger fastq format SYNOPSIS
srf2fastq [options] srf_archive ... DESCRIPTION
srf2fastq extracts sequences and qualities from one or more SRF archives and writes them in Sanger fastq format to stdout. Note that Illumina also have a fastq format (used in the GERALD directories) which differs slightly in the use of log-odds scores for the quality values. The format described here is using the traditional Phred style of quality encoding. OPTIONS
-c Outputs calibrated confidence values using the ZTR CNF1 chunk type for a single quality per base. Without this use the original Illumina _prb.txt files consisting of four quality values per base, stored in the ZTR CNF4 chunks. -C Masks out sequences tagged as bad quality. -s root Generates files on disk with filenames starting root, one file per non-explicit element in the SRF/ZTR region (REGN) chunk. Typi- cally this results in two files for paired end runs. The filename suffixes come from the names listed in the SRF region chunks. This option conflicts with the -S parameter. -S Splits sequences into regions, but sequentially lists each sequence region to stdout instead of splitting to separate files on disk. This option conflicts with the -s parameter. -n When using -s the filename suffixes are simply numbered (starting with 1) instead of using the names listed in the SRF region chunks. -a Appends region index to the sequence names. Ie generate "name/1" and "name/2" for a paired read. -e Include any explicit sequence (ZTR region chunk of type 'E') in the sequence output. The explicit sequence is also included in the quality line too. Currently this is utilised by ABI SOLiD to store the last base of the primer. -r region list Reverse complements the sequence and reverses the quality values for all regions in the region list. This is a comma separated list of integer values enumerating the regions, starting from 1. Note that this option only works when either -s or -S are specified. EXAMPLES
To extract only the good quality sequences from all srf files in the current directory using calibrated confidence values (if available). srf2fastq -c -C *.srf > runX.fastq To extract a paired end run into two separate files with sequences named name/1 and name/2. srf2fastq -s runX -a -n runX.srf To extract a paired end run as a single file, alternating forward and reverse sequences, with the second read being reverse complemented. srf2fastq -S -r 2 runX.srf > runX.fastq AUTHOR
James Bonfield, Steven Leonard - Wellcome Trust Sanger Institute December 10 srf2fastq(1)
All times are GMT -4. The time now is 04:56 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy