awk command with a loop


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk command with a loop
# 1  
Old 08-11-2017
awk command with a loop

Dear all,

I would be grateful for your help with the following.

I have the following file (file.txt), which is about 10,000 lines long:
Code:
ID1  ID2  0  1  0.5  0.6
ID3  ID4  0  0  0.4  0.8
ID1  ID5  0  1  0.5  0.3
ID6  ID2  1  0  0.4  0.8

The IDs in the first two columns can occur between 1 to 10 times in the file (in either column 1 or column 2).

What I want to achieve:

I want to scan this file line by line, and print IDs to an ever-growing exclusion list if they meet the following criteria:
Code:
If $3 > $4, print $2 (ID2) > exclusionlist.txt
If $3 < $4, print $1 (ID1) > exclusionlist.txt
If $3==$4 && $5 < $6, print $2 (ID2) > exclusionlist.txt
If $3==$4 && $5 > $6, print $1 (ID1) > exclusionlist.txt

So applying this to row 1, either ID1 or ID2 should have been added to my exclusion list.

I then want to delete all lines in the file where that ID from the exclusion list appears. This can be up to 10 rows.

Output for file.txt once row 1 has been scanned:

ID3 ID4 0 0 0.4 0.8
ID6 ID2 1 0 0.4 0.8

And exclusionlist.txt:
ID1

I then want to start again at the new row 1, and execute the same process, but keep adding my exclusion from the new row 1 to the same exclusion list.

The commands that I have at my disposal are:

Code:
awk 'NR==1{print;}' file.txt
awk '{if ($3>$4 || $3==$4 && $5<$6) print $2;}' file.txt > exclusionlist.txt
awk '{if ($3>$4 || $3==$4 && $5>$6) print $1;}' file.txt > exclusionlist.txt
grep -v -f exclusionlist.txt file.txt

But there are problems inherent in this:

The exclusionlist.txt does not 'keep growing'.
Also, how do I loop it back so that it starts again at line 1?

I would be grateful for any solutions.

Thank you,

A.B.

Last edited by aberg; 08-11-2017 at 01:15 PM.. Reason: extra code tags
# 2  
Old 08-11-2017
In the second code part, you never append to exclusionlist.txt...
Quite usre you have only one awk result there, the last one...
# 3  
Old 08-11-2017
Yes, I want to append to the same list (rather than over-writing), and I'm not sure how to do that.
Also, I want this to loop so that it starts again at the new line 1 once the original line 1 (plus any other lines containing the exclusion) has been removed.
# 4  
Old 08-11-2017
e.g
Code:
If $3 > $4, print $2 (ID2) > exclusionlist.txt    # This one will create or if exist,overwrite
If $3 < $4, print $1 (ID1) >> exclusionlist.txt   # Then here you append...
If $3==$4 && $5 < $6, print $2 (ID2) >> exclusionlist.txt
If $3==$4 && $5 > $6, print $1 (ID1) >> exclusionlist.txt

This User Gave Thanks to vbe For This Post:
# 5  
Old 08-11-2017
Thank you vbe.

So could I incorporate that into a bash script? Say I renamed my file.txt to 1.txt:

Code:
#! bin/bash
for i in {1..10000}
awk 'NR==1{print;}' $i.txt
awk '{if ($3>$4 || $3==$4 && $5<$6) print $2;}' file.txt > exclusionlist.txt
awk '{if ($3>$4 || $3==$4 && $5>$6) print $1;}' file.txt >> exclusionlist.txt
grep -v -f exclusionlist.txt $i.txt > $(i+1).txt
rm $i.txt
done

Would that help me to execute this function recursively?
# 6  
Old 08-11-2017
Let me paraphrase your request: You select either of the IDs in field1 or 2 depending on conditions in the rest of the line, and then remove all occurrences of the selected ID in the rest of the file. Do you HAVE to populate the exclusion file, i.e. do you need it afterwards? Or would a single pass operation be sufficient, removing ALL the applicable IDs?
# 7  
Old 08-11-2017
Thank you RudiC.

Yes, your interpretation is correct, and it is vitally important that I populate an exclusion file. The original file itself should eventually grind itself down to 0 lines, and it is the the exclusion file that I am interested in.

My latest attempt is as follows (it involves having to rename file.txt to 1.txt):

Code:
#! bin/bash for i in {1..5000} 
do awk 'NR==1{print;}' $i.txt 
awk '{if ($3>$4 || $3==$4 && $5<$6) print $2;}' $i.txt > exclusionlist_$i.txt 
awk '{if ($3>$4 || $3==$4 && $5>$6) print $1;}' $i.txt >> exclusionlist_$i.txt 
grep -v -f exclusionlist_$i.txt $i.txt > $((i+1)).txt 
rm $i.txt 
done

Due to my poor scripting skills, I am having to: (1) rename my file after each loop in order for it to be continuously executed, and (2) ending up with a new exclusion list per loop, rather than a single 'master' exclusion list - I can easily concatenate them all at the end, so this is not a major problem, but it's messy.

The problem I have when I execute this script is that it seems to scan through the whole file on the first pass (rather than just line 1), creating a long exclusion list just from the first run.

Any help/suggestions would be greatly appreciated.

Thank you.

AB
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Loop awk command on files in a folder

Hi, I'd like to loop an action over all files with given extension within a folder. The "main" action is: awk -F "\t" 'BEGIN{OFS="\t"}{if ($10=="S") print$0; }' input.txt > output.txt The input.txt should be every file in the folder with *.subVCF extension; and the output should be a file... (3 Replies)
Discussion started by: dovah
3 Replies

2. Shell Programming and Scripting

How to use a loop for multiple files in a folder to run awk command?

Dear folks I have two data set which there names are "final.map" and "1.geno" and look like this structures: final.map: gi|358485511|ref|NC_006088.3| 2044 gi|358485511|ref|NC_006088.3| 2048 gi|358485511|ref|NC_006088.3| 2187 gi|358485511|ref|NC_006088.3| 17654 ... (2 Replies)
Discussion started by: sajmar
2 Replies

3. Shell Programming and Scripting

awk programming -Passing variable to awk for loop

Hi All, I am new to AWK programming. I have the following for loop in my awk program. cat printhtml.awk: BEGIN -------- <some code here> END{ ----------<some code here> for(N=0; N<H; N++) { for(M=5; M<D; M++) print "\t" D ""; } ----- } ... (2 Replies)
Discussion started by: ctrld
2 Replies

4. Shell Programming and Scripting

awk loop using array:wish to store array values from loop for use outside loop

Here's my code: awk -F '' 'NR==FNR { if (/time/ && $5>10) A=$2" "$3":"$4":"($5-01) else if (/time/ && $5<01) A=$2" "$3":"$4-01":"(59-$5) else if (/time/ && $5<=10) A=$2" "$3":"$4":0"($5-01) else if (/close/) { B=0 n1=n2; ... (2 Replies)
Discussion started by: klane
2 Replies

5. Shell Programming and Scripting

If else condition inside for loop of awk command in UNIX shell scripting

Hi , Please excuse me for opening a new thread i am unable to find out the syntax error in my if else condition inside for loop in awk command , my actual aim is to print formatted html td tag when if condition (True) having string as "failed", could anyone please advise what is the right... (2 Replies)
Discussion started by: karthikram
2 Replies

6. Shell Programming and Scripting

For loop, awk command issue

limit.csv data -------------- 5600050 38Nhava 400077 27Bomay rate.txt data ------------- 38NhaVA 27BomaY 27Bomay below is my script: for i in `cat limit.csv` do b=`awk '{print $1}' $i` (4 Replies)
Discussion started by: p_satyambabu
4 Replies

7. Shell Programming and Scripting

Problem Using If & For loop in AWK Command

I am parsing file for the fields using awk command, first i check 26th field for two characters using substr function if it matches then using for loop on array i search 184th field for 4 chars if it matches then i print the required fields but on execution i get the error, please help...... (5 Replies)
Discussion started by: siramitsharma
5 Replies

8. Shell Programming and Scripting

awk command in script gives error while same awk command at prompt runs fine: Why?

Hello all, Here is what my bash script does: sums number columns, saves the tot in new column, outputs if tot >= threshold val: > cat getnon0file.sh #!/bin/bash this="getnon0file.sh" USAGE=$this" InFile="xyz.38" Min="0.05" # awk '{sum=0; for(n=2; n<=NF; n++){sum+=$n};... (4 Replies)
Discussion started by: catalys
4 Replies

9. Shell Programming and Scripting

Comparison and editing of files using awk.(And also a possible bug in awk for loop?)

I have two files which I would like to compare and then manipulate in a way. File1: pictures.txt 1.1 1.3 dance.txt 1.2 1.4 treehouse.txt 1.3 1.5 File2: pictures.txt 1.5 ref2313 1.4 ref2345 1.3 ref5432 1.2 ref4244 dance.txt 1.6 ref2342 1.5 ref2352 1.4 ref0695 1.3 ref5738 1.2... (1 Reply)
Discussion started by: linuxkid
1 Replies

10. UNIX for Dummies Questions & Answers

Awk command in while loop

Hello! I've got a loop in which I am processing a list of values gotten through a file with read command. It seems that instead of processing the lines (values) one by one, I process them all together. the input file is: 20 20 20 80 70 70 20 The code is: (2 Replies)
Discussion started by: haaru
2 Replies
Login or Register to Ask a Question