Unix/Linux Go Back    


Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

Bash repeating lines for some files but not all

Shell Programming and Scripting


Tags
bash

Reply    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 1 Week Ago   -   Original Discussion by cmccabe
cmccabe's Unix or Linux Image
cmccabe cmccabe is offline
Registered User
 
Join Date: Nov 2013
Last Activity: 14 January 2018, 9:30 AM EST
Location: Chicago
Posts: 1,205
Thanks: 723
Thanked 14 Times in 13 Posts
Bash repeating lines for some files but not all

The bash below executes and seems to work fine on those files in which . However on those files where there is no additional CNV detected that line repeats multiple times
instead of only once. I tried adding an END as all lines are printed but that doesn't help. I can not seem to solve this without encountering new issues. Thank you Linux.



Code:
for f in /home/cmccabe/Desktop/oca/*.tsv ; do # loop through all files in directory and start processing
     echo "Start check for cnv creation: $(date) - file: $f" # log start
     bname=`basename $f` # strirp of path
     pref=${bname%%_*.tsv} # strip off extension
     awk ' # call awk script
             # capture CNV gain and loss in 26 CDS genes
             NR==FNR { a[$1]; next }
               $2=="CNV" {
                 c=split($12, b, "[,:]")
               if (b[2]>=4.0 || (b[2]<=1.0 && b[c]<=1.9 && ($14 in a))) {
                  if (!wasfound) {
      print "Additional CNV Detected:"
      wasfound=1
    }
    print
  }
   END {
    if (!wasfound) { print "No Additional CNV Detected" }
    }
  }' /home/cmccabe/Desktop/oca/gene FS='\t' $f >> /home/cmccabe/Desktop/oca/${pref}_oca.txt
     echo "End check for CNV creation: $(date) - file: $f" # log end
done

file with CNV detected (correct)


Code:
5 Expression controls detected
13 NOCALL detected
2178 REF detected
3 ASSAYS_5P_3P absent controls detected
1 ASSAYS_5P_3P NoCall controls detected
No Oncomine Drivers Detected
No Additional Clinvar Detected
No Additional Function Detected
No Additional Fusion Detected
No Additional Hotspots Detected
Additional CNV Detected:
chr1:11184539	CNV		32772		1.0E-10					1p36.22(11184539-11217311)x2.03333	5%:4.52,95%:2.93		MTOR																						
chr16:68771250	CNV		96180		1.0E-10					16q22.1(68771250-68867430)x1.02222	5%:0.9,95%:1.16		CDH1																esv25425:esv29196:nsv817735:esv2714658:nsv833267:nsv103068:nsv457515:esv2661913

No additional CNV detected repeats


Code:
5 Expression controls detected
17 NOCALL detected
2174 REF detected
3 ASSAYS_5P_3P absent controls detected
1 ASSAYS_5P_3P NoCall controls detected
No Oncomine Drivers Detected
No Additional Clinvar Detected
No Additional Function Detected
No Additional Fusion Detected
No Additional Hotspots Detected
No Additional CNV Detected
No Additional CNV Detected
No Additional CNV Detected
No Additional CNV Detected


Last edited by cmccabe; 1 Week Ago at 03:20 PM.. Reason: fixed format
Sponsored Links
    #2  
Old Unix and Linux 1 Week Ago   -   Original Discussion by cmccabe
Don Cragun's Unix or Linux Image
Don Cragun Don Cragun is offline Forum Staff  
Administrator
 
Join Date: Jul 2012
Last Activity: 21 January 2018, 11:05 PM EST
Location: San Jose, CA, USA
Posts: 10,929
Thanks: 611
Thanked 3,819 Times in 3,263 Posts
  1. The output you have shown us might have come from the code you have shown us as an output based on input files you have not shown us or it might be totally unrelated to the code you have shown us. And, we have no way to determine whether it is a product of this code or not.
  2. You definitely have not shown us the output produced by the echo statements in the code you have shown us.
  3. We have no idea what your input files look like.
  4. We have no idea what the names of the input files you are processing look like.
  5. We have no idea what operating system you're using.
  6. We don't know what output you're hoping to get.
Under these conditions, how do you expect us to help you?
The Following User Says Thank You to Don Cragun For This Useful Post:
cmccabe (1 Week Ago)
Sponsored Links
    #3  
Old Unix and Linux 1 Week Ago   -   Original Discussion by cmccabe
cmccabe's Unix or Linux Image
cmccabe cmccabe is offline
Registered User
 
Join Date: Nov 2013
Last Activity: 14 January 2018, 9:30 AM EST
Location: Chicago
Posts: 1,205
Thanks: 723
Thanked 14 Times in 13 Posts
I apologize and hopefully the below will help:

each input file is a tsv of 40 columns, the below is an example of multiple lines (I only show 4 columns, as all 40 are the same or close to it and the script does produce the desired output). The problem I am having is that if there and Additional CNV detected as in output 1, then the script works printing Additional CNV detected followed by the line or lines.
If there are No Additional CNV detected as in output 2, that line prints multiple times (presumably 2800 because that is the total lines).
I am using ubuntu 14.04 as my OS.

Post 1 is actual output produces with the complete files, but to keep the post easier to read I only used several lines.

The output is close as is, I just cant seem to solve why No Additional CNV detected repeats. Thank you very much Linux.

file


Code:
chr1:11184539	CNV	5%:5.5,95%:2.68	Name
chr1:11184539	REF
chr1:11184539	SNV		A
chr1:11184555	CNV	5%:0.9,95%:1.9	BRCA1
chr1:11184539	FUSION
chr1:11184539	INDEL	G
chr1:11184555	CNV	5%:2.5,95%:2.68	Name2
chr1:11184555	CNV	5%:1.1,95%:1.8	BRCA2


desired output 1 ---- if detected


Code:
Additional CNV detected:
chr1:11184539	CNV	5%:5.5,95%:2.68	Name
chr1:11184555	CNV	5%:0.9,95%:1.9	BRCA1

desired output 2 --- if not detected


Code:
No Additional CNV detected


Last edited by cmccabe; 1 Week Ago at 07:57 PM.. Reason: fixed format
    #4  
Old Unix and Linux 1 Week Ago   -   Original Discussion by cmccabe
Don Cragun's Unix or Linux Image
Don Cragun Don Cragun is offline Forum Staff  
Administrator
 
Join Date: Jul 2012
Last Activity: 21 January 2018, 11:05 PM EST
Location: San Jose, CA, USA
Posts: 10,929
Thanks: 611
Thanked 3,819 Times in 3,263 Posts
Quote:
Originally Posted by cmccabe View Post
I apologize and hopefully the below will help:

each input file is a tsv of 40 columns, the below is an example of multiple lines (I only show 4 columns, as all 40 are the same or close to it and the script does produce the desired output). The problem I am having is that if there and Additional CNV detected as in output 1, then the script works printing Additional CNV detected followed by the line or lines.
If there are No Additional CNV detected as in output 2, that line prints multiple times (presumably 2800 because that is the total lines).
I am using ubuntu 14.04 as my OS.

Post 1 is actual output produces with the complete files, but to keep the post easier to read I only used several lines.

The output is close as is, I just cant seem to solve why No Additional CNV detected repeats. Thank you very much Linux.

file


Code:
chr1:11184539	CNV	5%:5.5,95%:2.68	Name
chr1:11184539	REF
chr1:11184539	SNV		A
chr1:11184555	CNV	5%:0.9,95%:1.9	BRCA1
chr1:11184539	FUSION
chr1:11184539	INDEL	G
chr1:11184555	CNV	5%:2.5,95%:2.68	Name2
chr1:11184555	CNV	5%:1.1,95%:1.8	BRCA2


desired output 1 ---- if detected


Code:
Additional CNV detected:
chr1:11184539	CNV	5%:5.5,95%:2.68	Name
chr1:11184555	CNV	5%:0.9,95%:1.9	BRCA1

desired output 2 --- if not detected


Code:
No Additional CNV detected
Yes, the code you have shown us is easy to understand. And, with or without sample input files, we can easily say that most of the output you have shown did not come from the code you have shown us.

We have no reason not to believe that the code you have not shown us is what is producing the extra output that you don't want.

I asked what files were being processed. You didn't answer. For all we know, there are hundreds of files being processed by your loop with many of them adding a line to the output you say you don't want.

I asked for sample input files and you showed us a sample with at most 4 input fields that is being fed into code you showed us that is evaluating data found in fields 12 and 14.

You have made it very clear that you want us to explain why code you won't show us won't work with data you won't show us using filenames you won't show us. I wish you luck, but I can't help you under these conditions.
The Following User Says Thank You to Don Cragun For This Useful Post:
cmccabe (1 Week Ago)
Sponsored Links
    #5  
Old Unix and Linux 1 Week Ago   -   Original Discussion by cmccabe
cmccabe's Unix or Linux Image
cmccabe cmccabe is offline
Registered User
 
Join Date: Nov 2013
Last Activity: 14 January 2018, 9:30 AM EST
Location: Chicago
Posts: 1,205
Thanks: 723
Thanked 14 Times in 13 Posts
I am sorry I did not understand what you were asking fully until now.

I am only testing on two tsv files that get converted that get processed by the loop and the output is 2 text files.

I will try again tomorrow. I apologize I can only post samples input as the file is not fully usable. i work in heathcare and am somewhat limited. That being said I do not mean to frustrate or be difficult. My posts are not always as clear as they should be but I try to include important pieces. Thank you Linux.
Sponsored Links
    #6  
Old Unix and Linux 1 Week Ago   -   Original Discussion by cmccabe
Corona688's Unix or Linux Image
Corona688 Corona688 is offline Forum Staff  
Mead Rotor
 
Join Date: Aug 2005
Last Activity: 18 January 2018, 3:03 PM EST
Location: Saskatchewan
Posts: 22,574
Thanks: 1,163
Thanked 4,293 Times in 3,961 Posts
Quote:
Originally Posted by cmccabe View Post
I apologize I can only post samples input as the file is not fully usable.
Don't apologize, just work with us. Create a stripped-down sample file and stripped-down code file that still show the same problem. Until then, good luck. Without that we can't help you.

(And if it doesn't show the same problem? That's a giant clue to whatever the problem was.)
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Printing the lines which are repeating in a files jpkumar10 Shell Programming and Scripting 3 11-24-2011 09:25 AM
Remove groups of repeating lines glev2005 UNIX for Dummies Questions & Answers 1 02-07-2011 12:14 PM
Merging non-repeating columns of lines menenuh Shell Programming and Scripting 5 02-09-2010 01:42 PM
merge 2 files (without repeating any lines) bluemoon1 Shell Programming and Scripting 9 10-25-2007 11:31 PM
Omit repeating lines TheCrunge UNIX for Dummies Questions & Answers 6 02-22-2005 06:26 PM



All times are GMT -4. The time now is 04:30 AM.