Redirecting stdout inside a loop


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Redirecting stdout inside a loop
# 8  
Old 08-11-2014
Yes. I need to integrate the two output files,
summary.fastqc.$date_formatted
stout.miRNA.bash.$date_formatted
so each iteration (sample info) gets printed to file, as in my example:
Code:
PASS	Basic Statistics	cutadapt_output_DEHP1_29B.fastq
PASS	Per base sequence quality	cutadapt_output_DEHP1_29B.fastq
PASS	Per sequence quality scores	cutadapt_output_DEHP1_29B.fastq
WARN	Per base sequence content	cutadapt_output_DEHP1_29B.fastq
WARN	Per base GC content	cutadapt_output_DEHP1_29B.fastq
WARN	Per sequence GC content	cutadapt_output_DEHP1_29B.fastq
PASS	Per base N content	cutadapt_output_DEHP1_29B.fastq
FAIL	Sequence Length Distribution	cutadapt_output_DEHP1_29B.fastq
FAIL	Sequence Duplication Levels	cutadapt_output_DEHP1_29B.fastq
FAIL	Overrepresented sequences	cutadapt_output_DEHP1_29B.fastq
FAIL	Kmer Content	cutadapt_output_DEHP1_29B.fastq
Processed reads:      2139267
Trimmed reads:      2075206 (97.0%)
PASS	Basic Statistics	cutadapt_output_PB1_82B.fastq
PASS	Per base sequence quality	cutadapt_output_PB1_82B.fastq
PASS	Per sequence quality scores	cutadapt_output_PB1_82B.fastq
FAIL	Per base sequence content	cutadapt_output_PB1_82B.fastq
FAIL	Per base GC content	cutadapt_output_PB1_82B.fastq
FAIL	Per sequence GC content	cutadapt_output_PB1_82B.fastq
PASS	Per base N content	cutadapt_output_PB1_82B.fastq
FAIL	Sequence Length Distribution	cutadapt_output_PB1_82B.fastq
FAIL	Sequence Duplication Levels	cutadapt_output_PB1_82B.fastq
FAIL	Overrepresented sequences	cutadapt_output_PB1_82B.fastq
FAIL	Kmer Content	cutadapt_output_PB1_82B.fastq
 Processed reads:      2159244
Trimmed reads:      254803954 (97.0%)....

---------- Post updated at 01:36 PM ---------- Previous update was at 01:34 PM ----------

this would effectively
Code:
 print summary.fastqc.$date_formatted
and print / Processed reads/ for that sample
and print / Trimmed reads/ for that sample
repeat for N sample directories....

# 9  
Old 08-11-2014
Now we're getting somewhere.

Let me change your pseudocode a little bit. Do you really mean:

Code:
for VAR in "$LOCATION"/*_fastqc/summary.txt
do
        cat "$VAR"
        print /Processed reads/ for that sample (which comes from where exactly?  You have given no input so we cannot tell.)
        print /Trimmed reads/ for that sample (which comes from what exactly?  You have given no input so we cannot tell.)
done > combined.txt

# 10  
Old 08-11-2014
I am not familiar with VAR. But the stout.miRNA.bash.$date_formatted is where /Processed reads/ and /Trimmed reads/ comes from. It was buried in my previous posting. The first few lines of the file for the first sample are:
Code:
**testing** /illumina/runs/Runs/140513_H207_0249_AD2B9LACXX/Unaligned_Lane1/Project_DefaultProject/Sample_GRC270_DEHP2_67C/GRC270_DEHP2_67C_CGATGT_L001_R1_001.fastq.gz
/illumina/runs/Runs/140513_H207_0249_AD2B9LACXX/Unaligned_Lane1/Project_DefaultProject/Sample_GRC270_DEHP2_67C
GRC270_DEHP2_67C
cutadapt version 1.2.1
Command line parameters: -a AGATCGGAAGAGCACACGTCT -o /illumina/runs/Runs/140513_H207_0249_AD2B9LACXX/Unaligned_Lane1/Project_DefaultProject/Sample_GRC270_DEHP2_67C/GRC270_DEHP2_67C_cutadapt_AdapterRemoved.fastq.gz /illumina/runs/Runs/140513_H207_0249_AD2B9LACXX/Unaligned_Lane1/Project_DefaultProject/Sample_GRC270_DEHP2_67C/GRC270_DEHP2_67C_CGATGT_L001_R1_001.fastq.gz
Maximum error rate: 10.00%
   No. of adapters: 1
   Processed reads:      2139267
   Processed bases:    109102617 bp (109.1 Mbp)
     Trimmed reads:      2075206 (97.0%)
     Trimmed bases:     53680380 bp (53.7 Mbp) (49.20% of total)
   Too short reads:            0 (0.0% of processed reads)
    Too long reads:            0 (0.0% of processed reads)
        Total time:    155.78 s
     Time per read:      0.07 ms

=== Adapter 1 ===

Adapter 'AGATCGGAAGAGCACACGTCT', length 21, was trimmed 2075206 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2

Lengths of removed sequences
length	count	expected	max. errors
3	13563	33426.0	0
4	13593	8356.5	0
5	19992	2089.1	0
6	143974	522.3	0
7	48435	130.6	0
8	22001	32.6	0

So, you can see I only need grab a few lines using identifiers, it is just that it needs to iterate so that we get context of sample 1, sample 2, etc. and concatenated with summary.txt for each sample.
Thank you! for you help:-)
# 11  
Old 08-11-2014
Quote:
Originally Posted by hmortens
The stout.miRNA.bash.$date_formatted is where /Processed reads/ and /Trimmed reads/ comes from.
No it is not... They only get put there because your program writes them there in the first place. So that's no more use to me than it is to you -- less, actually, because you have some idea where it's being extracted from and I don't.
Quote:
It was buried in my previous posting. The first few lines of the file for the first sample are:
Code:
**testing** /illumina/runs/Runs/140513_H207_0249_AD2B9LACXX/Unaligned_Lane1/Project_DefaultProject/Sample_GRC270_DEHP2_67C/GRC270_DEHP2_67C_CGATGT_L001_R1_001.fastq.gz
/illumina/runs/Runs/140513_H207_0249_AD2B9LACXX/Unaligned_Lane1/Project_DefaultProject/Sample_GRC270_DEHP2_67C
GRC270_DEHP2_67C
cutadapt version 1.2.1
Command line parameters: -a AGATCGGAAGAGCACACGTCT -o /illumina/runs/Runs/140513_H207_0249_AD2B9LACXX/Unaligned_Lane1/Project_DefaultProject/Sample_GRC270_DEHP2_67C/GRC270_DEHP2_67C_cutadapt_AdapterRemoved.fastq.gz /illumina/runs/Runs/140513_H207_0249_AD2B9LACXX/Unaligned_Lane1/Project_DefaultProject/Sample_GRC270_DEHP2_67C/GRC270_DEHP2_67C_CGATGT_L001_R1_001.fastq.gz
Maximum error rate: 10.00%
   No. of adapters: 1
   Processed reads:      2139267
   Processed bases:    109102617 bp (109.1 Mbp)
     Trimmed reads:      2075206 (97.0%)
     Trimmed bases:     53680380 bp (53.7 Mbp) (49.20% of total)
   Too short reads:            0 (0.0% of processed reads)
    Too long reads:            0 (0.0% of processed reads)
        Total time:    155.78 s
     Time per read:      0.07 ms

=== Adapter 1 ===

Adapter 'AGATCGGAAGAGCACACGTCT', length 21, was trimmed 2075206 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2

Lengths of removed sequences
length	count	expected	max. errors
3	13563	33426.0	0
4	13593	8356.5	0
5	19992	2089.1	0
6	143974	522.3	0
7	48435	130.6	0
8	22001	32.6	0

Okay, so you have two seperate kinds of files -- the summaries and the samples. They come in matched pairs which you want to process together, extracting all of one and some of the other.

"$LOCATION"/Sample_* matches all sample files (not folders).

"$LOCATION"/*_fastqc matches all folders containing summaries, corresponding to the samples above.

What exactly do these paths / filenames have in common? You can't use * to match one and * to match the other when you want pairs, since they both expand to complete lists. Do you find the sample in Sample_abcde, and the summary in abcde_fastqc ?
# 12  
Old 08-11-2014
Ah. The file stout.miRNA.bash.$date_formatted is capturing the stdout from all three programs. I am trying to extract lines from the Cutadapt stdout and the sRNAbench stdout here. The fastQC is writing the summary.txt.

Code:
#run CutAdapt to trim Wafergen adapter 
/usr/local/bin/python2.7 /illumina/runs/Runs/cutadapt-1.2.1/bin/cutadapt -a AGATCGGAAGAGCACACGTCT -o $i/${y#*/Sample_*}_cutadapt_AdapterRemoved.fastq.gz $infile 

#define cutadapt output as infile2  
	infile2=`ls $i/*cutadapt_AdapterRemoved.fastq.gz`
	echo "**testing**" $infile2

#run QC with FastQC
/illumina/runs/Runs/FastQC/fastqc --outdir=$LOCATION --quiet $infile2

#run sRNAbench w/o adapter trimming
java -jar /illumina/runs/Runs/miRanalyzer/sRNAbenchDB/sRNAbench.jar input=$infile2 output=${y#*/Sample_*}.sRNAbench.$date_formatted dbPath=/illumina/runs/Runs/miRanalyzer/sRNAbenchDB/ species=$speciesBuild microRNA=$speciesAbbrev p=14 noMM=0 alignType=v minRC=2

Quote:
"$LOCATION"/Sample_* matches all sample files (not folders).
This is actually a folder (containing the actual data being processed).

Quote:
What exactly do these paths / filenames have in common? You can't use * to match one and * to match the other when you want pairs, since they both expand to complete lists. Do you find the sample in Sample_abcde, and the summary in abcde_fastqc ?
The Location is the Project (a folder with many sample data folders); The Sample_* is the data folder for each sample; the *_fastqc is the output folder for each sample from one program, fastqc, where each folder contains a summary.txt.

thank you for taking the time. I really appreciate your help.
# 13  
Old 08-11-2014
Quote:
Originally Posted by hmortens
Ah. The file stout.miRNA.bash.$date_formatted is capturing the stdout from all three programs. I am trying to extract lines from the Cutadapt stdout and the sRNAbench stdout here.
Great. Which is which? What do they look like individually? What do you want from each individually?

You have to organize it before it gets dumped into the same giant pile, not after, so we need to alter their output, not edit the great giant file, I think.

Quote:
This is actually a folder (containing the actual data being processed).
Great. What files are in it that I need to worry about?

Is the fastqc inside the Sample, then?
# 14  
Old 08-11-2014
Quote:
You have to organize it before it gets dumped into the same giant pile, not after, so we need to alter their output, not edit the great giant file, I think.
is it possible to organize it real time? like as it is being written to file?

Here is the code for CutAdapt, where the /Processed reads/ and /Trimmed reads/ come from:
Code:
#run CutAdapt to trim Wafergen adapter 
#/usr/local/bin/python2.7 /illumina/runs/Runs/cutadapt-1.2.1/bin/cutadapt -a AGATCGGAAGAGCACACGTCT -o $i/${y#*/Sample_*}_cutadapt_AdapterRemoved.fastq.gz $infile

Here is the call to sRNAbench where the other bits come from.
Code:
#run sRNAbench w/o adapter trimming
#java -jar /illumina/runs/Runs/miRanalyzer/sRNAbenchDB/sRNAbench.jar input=$infile2 output=${y#*/Sample_*}.sRNAbench.$date_formatted dbPath=/illumina/runs/Runs/miRanalyzer/sRNAbenchDB/ species=$speciesBuild microRNA=$speciesAbbrev p=14 noMM=0 alignType=v minRC=2

I was not specific about this before because I thought it would be the same pattern search, but I need to pull lines containing "No. input reads:", "No. reads in analysis:", "mapped...reads to genomes(s)", and "Detected:" The stdout for that portion looks like this:
Code:
           START WITH THE PRE-PROCESSING OF THE READS             

               Reading file: /illumina/runs/Runs/140513_H207_0249_AD2B9LACXX/Unaligned_Lane1/Project_DefaultProject/Sample_GRC270_DEHP2_67C/GRC270_DEHP2_67C_cutadapt_AdapterRemoved.fastq.gz

             Result of pre-processing

               No. input reads: 2139267
               No. cleaned input reads (adapter found and trimmed): 0
               No. input reads where the adapter was not found: 2139267
               No. length filtered input reads (min. Length): 192482
               No. length filtered input reads (max. Length): 0
               No. reads in analysis: 1570818
               No. unique reads in analysis: 78295
               Max. read length in analysis: 51
               Filtered reads (low quality or low read count): 375967
               Max. read length in input file: 51

           FINISHED PRE-PROCESSING

I have already shown you the summary.txt file.
The call to that program is
#run QC with FastQC
#/illumina/runs/Runs/FastQC/fastqc --outdir=$LOCATION --quiet $infile2
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Redirecting stdion, stdout within an AT command

Hello, I'm strugling with some redirecting and all help is apreciated. The following program is working as expected, but the result of the AT command doesn't go to any file. Thanks in advance for the help. #!/bin/bash modem=/dev/ttyUSB1 file=/root/imsi.txt # print error to stderr and exit... (4 Replies)
Discussion started by: cleitao
4 Replies

2. Shell Programming and Scripting

Redirecting stdout continously to a file

I have a C program that continously outputs info to stdout. The problem is that I am redirecting the stdout and stderr to a file and stdout is written at the end of the problem rather than continously to the file. This could be a problem if for example the program is killed and the stdout output is... (3 Replies)
Discussion started by: igurov
3 Replies

3. Shell Programming and Scripting

Redirecting stdout problem

I have a simple bash script that prints sth every 5 seconds. What I do is the following. I redirect the output of the script to a file, tail the file and see that it works and then from another console I delete the file where the output is redirected to. Even though I have deleted the file, the... (2 Replies)
Discussion started by: igurov
2 Replies

4. Shell Programming and Scripting

Redirecting stdout on background task

Hello, I have a script (videostream.sh) which invokes the GStreamer command-line tool gst-launch with all the correct command line parameters. When I invoke this program, I add the '&' character at the end to make it a background task, so that my script can complete and exit, i.e. gst-launch... (1 Reply)
Discussion started by: salukibob
1 Replies

5. Shell Programming and Scripting

Redirecting stdout to variable while printing it

Hi everybody, I am trying to do the thing you see in the title, and I can't simply do a=$(svn up) echo $a because the program (svn) gives output on lots of lines and in the variable the output is stored on only one line (resulting in a horribly formatted text). Any tips? Thanks,... (2 Replies)
Discussion started by: ocirne94
2 Replies

6. Shell Programming and Scripting

redirecting to stdout in betwen command

can anyone help me in making singleline command for Capital Letters are folders ,small letter are files X,Y,Z are subfolders of A as shown below A - X,Y,Z Folder X has three files a.txt,b.txt,c.txt similarly Y,Z. as shown below X- a.txt,b.txt,c.txt Y- a.txt,b.txt,c.txt Z-... (4 Replies)
Discussion started by: phoenix_nebula
4 Replies

7. UNIX for Dummies Questions & Answers

Redirecting several outputs to /dev/stdout

I have an executable that, depending on its input, outputs to either one file or several. It usually prints nothing on screen. The usual way to call this program is to specify an input and output filenames, like this: ./executable.exe -i inputfile -o outputfileIt will then try to use the output... (1 Reply)
Discussion started by: aplaydoc
1 Replies

8. Shell Programming and Scripting

Redirecting part of output to stdout

Hi, I am trying to execute a command like this: find ./ -name "*.gz" -exec sh -c 'zcat {} | awk -f parse.awk' \; >> output If I want to print the filename, i generally use the -print argument to the find command but when I am redirecting the output to a file, how can I print just the... (2 Replies)
Discussion started by: Legend986
2 Replies

9. Shell Programming and Scripting

implicitly redirecting stdout to a file

Is there a way to redirect all stdout to a file implicitly - like defining stdout=/home/me/process.log - so that all "echo" commands in several scripts/subscripts are written to that file; instead of having to edit all scripts to redirect the "echo" (e.g. echo 'This is a test ' >>... (1 Reply)
Discussion started by: ALTRUNVRSOFLN
1 Replies

10. Shell Programming and Scripting

redirecting STDOUT & STDERR

In bash, I need to send the STDOUT and STDERR from a command to one file, and then just STDERR to another file. Doing one or the other using redirects is easy, but trying to do both at once is a bit tricky. Anyone have any ideas? (9 Replies)
Discussion started by: jshinaman
9 Replies
Login or Register to Ask a Question