Bash repeating lines for some files but not all


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Bash repeating lines for some files but not all
# 1  
Old 01-09-2018
Bash repeating lines for some files but not all

The bash below executes and seems to work fine on those files in which . However on those files where there is no additional CNV detected that line repeats multiple times
instead of only once. I tried adding an END as all lines are printed but that doesn't help. I can not seem to solve this without encountering new issues. Thank you Smilie.

Code:
for f in /home/cmccabe/Desktop/oca/*.tsv ; do # loop through all files in directory and start processing
     echo "Start check for cnv creation: $(date) - file: $f" # log start
     bname=`basename $f` # strirp of path
     pref=${bname%%_*.tsv} # strip off extension
     awk ' # call awk script
             # capture CNV gain and loss in 26 CDS genes
             NR==FNR { a[$1]; next }
               $2=="CNV" {
                 c=split($12, b, "[,:]")
               if (b[2]>=4.0 || (b[2]<=1.0 && b[c]<=1.9 && ($14 in a))) {
                  if (!wasfound) {
      print "Additional CNV Detected:"
      wasfound=1
    }
    print
  }
   END {
    if (!wasfound) { print "No Additional CNV Detected" }
    }
  }' /home/cmccabe/Desktop/oca/gene FS='\t' $f >> /home/cmccabe/Desktop/oca/${pref}_oca.txt
     echo "End check for CNV creation: $(date) - file: $f" # log end
done

file with CNV detected (correct)
Code:
5 Expression controls detected
13 NOCALL detected
2178 REF detected
3 ASSAYS_5P_3P absent controls detected
1 ASSAYS_5P_3P NoCall controls detected
No Oncomine Drivers Detected
No Additional Clinvar Detected
No Additional Function Detected
No Additional Fusion Detected
No Additional Hotspots Detected
Additional CNV Detected:
chr1:11184539	CNV		32772		1.0E-10					1p36.22(11184539-11217311)x2.03333	5%:4.52,95%:2.93		MTOR																						
chr16:68771250	CNV		96180		1.0E-10					16q22.1(68771250-68867430)x1.02222	5%:0.9,95%:1.16		CDH1																esv25425:esv29196:nsv817735:esv2714658:nsv833267:nsv103068:nsv457515:esv2661913

No additional CNV detected repeats
Code:
5 Expression controls detected
17 NOCALL detected
2174 REF detected
3 ASSAYS_5P_3P absent controls detected
1 ASSAYS_5P_3P NoCall controls detected
No Oncomine Drivers Detected
No Additional Clinvar Detected
No Additional Function Detected
No Additional Fusion Detected
No Additional Hotspots Detected
No Additional CNV Detected
No Additional CNV Detected
No Additional CNV Detected
No Additional CNV Detected


Last edited by cmccabe; 01-09-2018 at 03:20 PM.. Reason: fixed format
# 2  
Old 01-09-2018
  1. The output you have shown us might have come from the code you have shown us as an output based on input files you have not shown us or it might be totally unrelated to the code you have shown us. And, we have no way to determine whether it is a product of this code or not.
  2. You definitely have not shown us the output produced by the echo statements in the code you have shown us.
  3. We have no idea what your input files look like.
  4. We have no idea what the names of the input files you are processing look like.
  5. We have no idea what operating system you're using.
  6. We don't know what output you're hoping to get.
Under these conditions, how do you expect us to help you?
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 01-09-2018
I apologize and hopefully the below will help:

each input file is a tsv of 40 columns, the below is an example of multiple lines (I only show 4 columns, as all 40 are the same or close to it and the script does produce the desired output). The problem I am having is that if there and Additional CNV detected as in output 1, then the script works printing Additional CNV detected followed by the line or lines.
If there are No Additional CNV detected as in output 2, that line prints multiple times (presumably 2800 because that is the total lines).
I am using ubuntu 14.04 as my OS.

Post 1 is actual output produces with the complete files, but to keep the post easier to read I only used several lines.

The output is close as is, I just cant seem to solve why No Additional CNV detected repeats. Thank you very much Smilie.

file
Code:
chr1:11184539	CNV	5%:5.5,95%:2.68	Name
chr1:11184539	REF
chr1:11184539	SNV		A
chr1:11184555	CNV	5%:0.9,95%:1.9	BRCA1
chr1:11184539	FUSION
chr1:11184539	INDEL	G
chr1:11184555	CNV	5%:2.5,95%:2.68	Name2
chr1:11184555	CNV	5%:1.1,95%:1.8	BRCA2


desired output 1 ---- if detected
Code:
Additional CNV detected:
chr1:11184539	CNV	5%:5.5,95%:2.68	Name
chr1:11184555	CNV	5%:0.9,95%:1.9	BRCA1

desired output 2 --- if not detected
Code:
No Additional CNV detected


Last edited by cmccabe; 01-09-2018 at 07:57 PM.. Reason: fixed format
# 4  
Old 01-09-2018
Quote:
Originally Posted by cmccabe
I apologize and hopefully the below will help:

each input file is a tsv of 40 columns, the below is an example of multiple lines (I only show 4 columns, as all 40 are the same or close to it and the script does produce the desired output). The problem I am having is that if there and Additional CNV detected as in output 1, then the script works printing Additional CNV detected followed by the line or lines.
If there are No Additional CNV detected as in output 2, that line prints multiple times (presumably 2800 because that is the total lines).
I am using ubuntu 14.04 as my OS.

Post 1 is actual output produces with the complete files, but to keep the post easier to read I only used several lines.

The output is close as is, I just cant seem to solve why No Additional CNV detected repeats. Thank you very much Smilie.

file
Code:
chr1:11184539	CNV	5%:5.5,95%:2.68	Name
chr1:11184539	REF
chr1:11184539	SNV		A
chr1:11184555	CNV	5%:0.9,95%:1.9	BRCA1
chr1:11184539	FUSION
chr1:11184539	INDEL	G
chr1:11184555	CNV	5%:2.5,95%:2.68	Name2
chr1:11184555	CNV	5%:1.1,95%:1.8	BRCA2


desired output 1 ---- if detected
Code:
Additional CNV detected:
chr1:11184539	CNV	5%:5.5,95%:2.68	Name
chr1:11184555	CNV	5%:0.9,95%:1.9	BRCA1

desired output 2 --- if not detected
Code:
No Additional CNV detected

Yes, the code you have shown us is easy to understand. And, with or without sample input files, we can easily say that most of the output you have shown did not come from the code you have shown us.

We have no reason not to believe that the code you have not shown us is what is producing the extra output that you don't want.

I asked what files were being processed. You didn't answer. For all we know, there are hundreds of files being processed by your loop with many of them adding a line to the output you say you don't want.

I asked for sample input files and you showed us a sample with at most 4 input fields that is being fed into code you showed us that is evaluating data found in fields 12 and 14.

You have made it very clear that you want us to explain why code you won't show us won't work with data you won't show us using filenames you won't show us. I wish you luck, but I can't help you under these conditions.
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 01-09-2018
I am sorry I did not understand what you were asking fully until now.

I am only testing on two tsv files that get converted that get processed by the loop and the output is 2 text files.

I will try again tomorrow. I apologize I can only post samples input as the file is not fully usable. i work in heathcare and am somewhat limited. That being said I do not mean to frustrate or be difficult. My posts are not always as clear as they should be but I try to include important pieces. Thank you Smilie.
# 6  
Old 01-10-2018
Quote:
Originally Posted by cmccabe
I apologize I can only post samples input as the file is not fully usable.
Don't apologize, just work with us. Create a stripped-down sample file and stripped-down code file that still show the same problem. Until then, good luck. Without that we can't help you.

(And if it doesn't show the same problem? That's a giant clue to whatever the problem was.)
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash: copying lines with specific character to files with same name as copied line.

I am trying to make my script as simple as a possible but, I am not sure if the way I am approaching is necessarily the most efficient or effective it can be. What I am mainly trying to fix is a for loop to remove a string from the specified files and within this loop I am trying to copy the lines... (2 Replies)
Discussion started by: Allie_gastrator
2 Replies

2. Shell Programming and Scripting

Deleting Repeating lines from a txt file via script

Hi, I'm having trouble in achieving the following scenario. There is a txt file with thousands of lines and few lines are repeated, which needs to be removed using a script. File.txt 20140522121432,0,12,ram Loc=India From=ram@xxx.com, To=ravi@yyy.com,, 1 2 3 4 . . 30... (18 Replies)
Discussion started by: Gautham
18 Replies

3. Shell Programming and Scripting

Compare last 90 logs and print repeating lines with >20

*log files are in date order sample logs... ciscoresets_20120314 ciscoresets_20120313 ciscoresets_20120312 ciscoresets_20120311 ciscoresets_20120310 cat ciscoresets_20120314 SYDGRE04,10,9 SYDGRE04,10,10 SYDGRE04,10,11 SYDGRE04,10,12 SYDGRE04,10,13 SYDGRE04,10,14 SYDGRE04,10,15... (2 Replies)
Discussion started by: slashbash
2 Replies

4. Shell Programming and Scripting

Printing the lines which are repeating in a files

Hi, I need to find the lines which are repeating in a file cat file1 abcdef 23-1 abcdef 24-1 bcdeff 25-0 ttdcfg 26-0 ttdcfg 20-0 bcdef1 25-0 bcdef2 25-0 bcdef3 25-0 bcdef4 25-0 bcdef4 00-0any help is greatly appreciated. Thanks in advance. In need to find which one are... (3 Replies)
Discussion started by: jpkumar10
3 Replies

5. Shell Programming and Scripting

Removing repeating lines from a data frame (AWK)

Hey Guys! I have written a code which combines lots of files into one big file(.csv). However, each of the original files had headers on the first line, and now that I've combined the files the headers are interspersed throughout the new combined data frame. For example, throughout the data... (21 Replies)
Discussion started by: gd9629
21 Replies

6. UNIX for Dummies Questions & Answers

Remove groups of repeating lines

I know uniq exists, but am not sure how to remove repeating lines when they are groups of two different lines repeating themselves, without using sort. I need them to be sorted in the original order, just to remove repeats. cd /media/AUDIO/WAVE/9780743518673/mp3 ~/Desktop/mp3-to-m4b... (1 Reply)
Discussion started by: glev2005
1 Replies

7. Shell Programming and Scripting

Bash script find longest line/lines in several files

Hello everyone... I need to find out, how to find longest line or possibly lines in several files which are arguments for script. The thing is, that I tried some possibilities before, but nothing worked correctly. Example when i use: awk ' { if ( length > L ) { L=length ;s=$0 } }END{ print... (23 Replies)
Discussion started by: 1tempus1
23 Replies

8. Shell Programming and Scripting

Merging non-repeating columns of lines

Hello, I have file to work with. It has 5 columns. The first three, altogether, constitutes the position. The 4th column contains some values for downstream analysis and the fifth column contains some values that I want to add to 4th column (only if they happen to be in the same position). My... (5 Replies)
Discussion started by: menenuh
5 Replies

9. Shell Programming and Scripting

merge 2 files (without repeating any lines)

I need to add the content of file1 to file2 - all lines but not those existing in file2 already, so the "cat file1 >> file2" doesn't work. For example, file1: 100 xxxxxx str1 102 xxxxxx str2 File2: 50 xxxxxxx xxx 30 xxxxxxxxxxx 102 xxxxxx str2 xxxx ...... the result: 50 xxxxxxx... (9 Replies)
Discussion started by: bluemoon1
9 Replies

10. UNIX for Dummies Questions & Answers

Omit repeating lines

Can someone help me with the following 2 objectives? 1) The following command is just an example. It gets a list of all print jobs. From there I am trying to extract the printer name. It works with the following command: lpstat -W "completed" -o | awk -F- '{ print $1}' Problem is, I want... (6 Replies)
Discussion started by: TheCrunge
6 Replies
Login or Register to Ask a Question