Help in awk/bash


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help in awk/bash
# 1  
Old 12-28-2012
Linux Help in awk/bash

Hi, I am also a newbie in awk and trying to find solution of my problem.

I have one reference file 1.txt with 2 columns and I want to search other 10 files (a.txt, b.txt......h.txt each with 5 columns) corresponding to the values of 2nd column from 1.txt. If the value from 2nd column from 1.txt matches with the value of 4th column of 10 files, then print the row as well as file name.
Also, in 1.txt for eg. 1st value is -191.632 but originally in a.txt it is -191.6318, so I also want to print values same upto two decimal places and rest places can be any number.

1.txt:

Code:
1.35732	-191.632
1.36229	-190.8716
1.35503	-191.3254
1.35597	-191.2652

a.txt:

Code:
271640.000	 0.49000	 -0.0000036574 -191.6318 -183.82380	
271650.000	 0.49155	 0.0000033909	 -198.30111	 -198.73140	
271660.000	 0.48775	 0.0000014657	 -191.3254 -199.84910	
271670.000	 0.48212	 -0.0000004152 -195.48446	 -193.15580

Please guide.
Thanks

Last edited by joeyg; 12-28-2012 at 09:44 PM.. Reason: Please wrap scripts and data in CodeTags
# 2  
Old 12-28-2012
You can 'join' file 1.txt to each of the [a-h].txt in a 'for' loop, and process the 'for' output piped to shell 'while read'. The file name will be in the 'for' variable and the file columns will be all present in the 'read' variables. You have to 'sort' every file on the key column using a 'binary' sort (export LC_ALL=C, not a numeric sort). Hopefully the original line order is not critical, else number the lines in a new field. While you can join using a pile of awk or shell commands, this is cleaner.

Man Page for join (opensolaris Section 1) - The UNIX and Linux Forums

Man Page for sort (all Section 1) - The UNIX and Linux Forums
This User Gave Thanks to DGPickett For This Post:
# 3  
Old 12-28-2012
Thanks for the reply.
Can you please help in writing the code as I am not expert in awk. Smilie

Thanks again
# 4  
Old 12-28-2012
Quote:
Originally Posted by bioinfo
Hi, I am also a newbie in awk and trying to find solution of my problem.

I have one reference file 1.txt with 2 columns and I want to search other 10 files (a.txt, b.txt......h.txt each with 5 columns) corresponding to the values of 2nd column from 1.txt. If the value from 2nd column from 1.txt matches with the value of 4th column of 10 files, then print the row as well as file name.
Also, in 1.txt for eg. 1st value is -191.632 but originally in a.txt it is -191.6318, so I also want to print values same upto two decimal places and rest places can be any number.

1.txt:

1.35732 -191.632
1.36229 -190.8716
1.35503 -191.3254
1.35597 -191.2652

a.txt:

271640.000 0.49000 -0.0000036574 -191.6318 -183.82380
271650.000 0.49155 0.0000033909 -198.30111 -198.73140
271660.000 0.48775 0.0000014657 -191.3254 -199.84910
271670.000 0.48212 -0.0000004152 -195.48446 -193.15580

Please guide.
Thanks
I'm not sure if you want the values in 1.txt column 2 and a-h.txt column 4 truncated to two decimal places or rounded to two decimal places (with your sample input, the results are the same) and I'm not sure why DGPickett thinks join and sort would be easier than awk, but here are ways to use awk to do what I think you're requesting...
Code:
echo "awk with rounded values"
awk ' FNR == NR {v[sprintf("%.2f", $2)]}
sprintf("%.2f", $4) in v {print $0, FILENAME}' 1.txt [a-h].txt

echo "awk with truncated values"
awk '
function trunc(val) {
        split(val, a, /[.]/)
        return a[1] "." substr(a[2] "00", 1, 2)
}
FNR == NR {v[trunc($2)]}
trunc($4) in v {print $0, FILENAME}' 1.txt [a-h].txt

This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 12-28-2012
Thanks for the reply.
Can you please explain it somewhat.

Thanks again.
# 6  
Old 12-28-2012
Quote:
Originally Posted by bioinfo
Thanks for the reply.
Can you please explain it somewhat.

Thanks again.
Code:
1  echo "awk with rounded values"
2  awk ' FNR == NR {v[sprintf("%.2f", $2)]; next}
3  sprintf("%.2f", $4) in v {print $0, FILENAME}' 1.txt [a-h].txt
4
5  echo "awk with truncated values"
6  awk '
7  function trunc(val) {
8          split(val, a, /[.]/)
9          return a[1] "." substr(a[2] "00", 1, 2)
10 }
11 FNR == NR {v[trunc($2)]; next}
12 trunc($4) in v {print $0, FILENAME}' 1.txt [a-h].txt

I have added line numbers to aid in this discussion, but note that the line numbers cannot appear in the script when you run it.

Also note that I have added an awk next command to lines 2 and 11. With the given sample data it won't make any difference, but with other data or with different fields being checked, it could be important.

In the suggestion on lines 1-3, the sprint("%.2f", arg) converts the string specified by arg to a floating point value and produces a string that represents that floating point value rounded to two digits after the decimal point. Line two uses that to create an array with indices that are the rounded floating point values of the second field ($2) in the first input file (lines where the record number within the file [FNR] is equal to the line number of all records read by awk [NR]).

(The next command I added here causes awk to skip to the next record instead of checking whether or not any remaining commands in the script should be executed. Without the next, the next line will process lines from all input files. It doesn't affect processing here because there is no field 4 in file one. The empty field 4 will be converted to 0.00 and none of the strings in the second field in the 1.txt will be converted to 0.00.)

Line 3 tests whether the same conversion used in line 2 produces a string that is an index in the array v (index in array evaluates to TRUE if index if is an index in the array named array. So, if $4 (rounded to two decimal places) in any of the files after the 1st file match $2 (rounded to two decimal places) in the first file, the print command will be run printing the current input line ($0) and the name of the file containing the line (FILENAME).

The 1.txt [a-h].txt on lines 3 and 12 specifies the eleven input files to be processed by these awk scripts.

The suggestion on lines 5-12 uses the same logic as the 1st suggestion but truncates the strings to two decimal places instead of rounding to two decimal places. Since the truncation logic is more complex than the single function call to sprint() used to perform the rounding, I wrote a function (lines 7-10) to convert the string to a string representing a floating point value with two decimal places.

The split() on line 8 creates an array of one or two elements with the first element containing all of the characters before the "." and the second element containing all of the characters after the ".". If there is no "." in the input value, the first element of the array will contain the entire input string and the second element of the array will not be set (and when referenced will act as an empty string). The return command on line 9 returns a string that is the concatenation of the first element in the array, a decimal point, and the 1st two characters of the concatenation of the second element of the array followed by "00". (The concatenation with "00" takes care of cases where field 2 in the first file or field 4 in the remaining files have an integer value with no decimal point and the case where the input field has a period but there are less than two digits after the decimal point.)

The logic on lines 11 and 12 is the same as the logic on lines 2 and 3.
This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 12-30-2012
Hi,
Thanks a lot , I have done it. Smilie
I have got the following output for all files (just showing for one file and naming it as o.txt) :
Code:
100.000        0.51332	   0.0000001923	 -191.04738     a.txt
2000.000	   0.49573	   0.0000015512	 -191.40071     a.txt
1000.000	   0.51047	   0.0000028339	 -190.92254     a.txt

Further, I need your help. I have 10 more files, all of same format (11.txt) as follows, showing 2 repeats from this file:
Code:
ATOM      1  N    SER A   1      35.092  83.194 140.076  1.00  0.00           N  
ATOM      2  CA  SER A   1      35.216  83.725 138.725  1.00  0.00           C  
ATOM      3  C    SER A   1      36.530  84.485 138.538  1.00  0.00           C  
TER
ENDMDL
ATOM      1  N   SER A   1      35.683  81.326 139.778  1.00  0.00           N  
ATOM      2  CA  SER A   1      35.422  82.736 139.929  1.00  0.00           C  
ATOM      3  C   SER A   1      36.497  83.588 139.247  1.00  0.00           C  
TER
ENDMDL

ENDMDL is coming around 10000 times in each file. If I give input of 100 at $1 from o.txt, then it should output the first repeat from 11. txt ending with ENDMDL.
Code:
ATOM      1  N    SER A   1      35.092  83.194 140.076  1.00  0.00           N  
ATOM      2  CA  SER A   1      35.216  83.725 138.725  1.00  0.00           C  
ATOM      3  C    SER A   1      36.530  84.485 138.538  1.00  0.00           C  
TER
ENDMDL

So, corresponding to first column of o.txt, I want to retreive the repeat at the number $1/100 from 11.txt i.e. if $1=2000, then I want to retreive the pattern where ENDMDL is at 20 place.


Please guide me.

Thanks again

---------- Post updated at 10:40 PM ---------- Previous update was at 09:52 PM ----------

Please guide me. Its urgent. Smilie

Thanks

Last edited by Scrutinizer; 12-31-2012 at 03:53 AM.. Reason: code tags
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

New problem with awk using bash

Hi! I have a new problem with awk, this time I think is because I'm using it in bash and I don't know how to put the valor of the variable in awk. Here is the code: #!/bin/bash for i in 1 2 3 4 5 do a=$i b=$ awk '$1>=a&&$1<=b {print $1,$2,$3}'>asdf test... (3 Replies)
Discussion started by: florpi
3 Replies

2. Shell Programming and Scripting

Returning a value from awk to bash

Hi I am a newbie starting bash and I have a simple need to return the result of an operation from awk to bash. basically I want to use awk to tell me if "#" exists in a string, and then back in bash, i want to do an IF statement on this return in order to do other things. In my bash shell I... (2 Replies)
Discussion started by: oahmad
2 Replies

3. Shell Programming and Scripting

Help in awk/bash

Hi, I have two files: atom.txt and g.txt atom.txt has multiple patterns but I am showing only two patterns each ending with ENDMDL: ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C TER ENDMDL ATOM 1 N SER A 1 35.683 81.326 139.778 1.00... (11 Replies)
Discussion started by: bioinfo
11 Replies

4. UNIX for Dummies Questions & Answers

Help in awk/bash

Hi, I am also a newbie in awk and trying to find solution of my problem. I have one reference file 1.txt with 2 columns and I want to search other 10 files (a.txt, b.txt......h.txt each with 5 columns) corresponding to the values of 2nd column from 1.txt. If the value from 2nd column from 1.txt... (0 Replies)
Discussion started by: bioinfo
0 Replies

5. Shell Programming and Scripting

AWK/Bash script

I would like to write a script to extend this command to a general case: BEGIN {s_0=0;n_0=0}{n_0++;s_0+=($51-$1)^2}END {print sqrt(s_0/n_0)} i.e. so that BEGIN {s_0=0;n_0=0}{n_0++;s_0+=($51-$1)^2}END {print sqrt(s_0/n_0)} BEGIN {s_1=0;n_1=0}{n_1++;s_1+=($51-$2)^2}END {print... (3 Replies)
Discussion started by: chrisjorg
3 Replies

6. UNIX for Dummies Questions & Answers

Help with BASH/AWK queries ....

Hi Everyone, I have an input file in the following format: score.file1.txt contig00045 length=566 numreads=19 1047 0.0 contig00055 length=524 numreads=7 793 0.0 contig00052 length=535 numreads=10 607 e-176 contig00072 length=472 numreads=46 571 e-165... (8 Replies)
Discussion started by: Fahmida
8 Replies

7. Shell Programming and Scripting

scripting help with bash and awk

I'm trying to reformat some tide information into a useable format and failing. Input file is.... 4452 CHENNAI (MADRAS) 13°06'N, 80°18'E India East Coast 01 June 2009 UT(GMT) Data Area 3. Indian Ocean (northern part) and Red Sea to Singapore 01/06/2009 00:00 0.7 m 00:20 0.7 m 00:40... (3 Replies)
Discussion started by: garethsays
3 Replies

8. Shell Programming and Scripting

awk bash help

Hi, I'm trying to read a file containing lines with spaces in them. The inputfile looks like this ------------------------------ Command1 arg1 arg2 Command2 arg5 arg6 arg7 ------------------------------- The shell code looks like this... lines=`awk '{ print }' inputfile` ... (2 Replies)
Discussion started by: a-gopal
2 Replies

9. Shell Programming and Scripting

Is there any better way for sorting in bash/awk

Hi, I have a file which is:- 1 6 4 8 2 3 2 1 9 3 2 1 3 3 5 6 3 1 4 9 7 8 2 3 I would like to sort from field $2 to field $6 for each of the line to:- 1 2 3 4 6 8 2 1 1 2 3 9 3 1 3 3 5 6 4 2 3 7 8 9 I came across this Arrays on example 26-6. But it is much complicated. I am... (7 Replies)
Discussion started by: ahjiefreak
7 Replies

10. Shell Programming and Scripting

BASH with AWK

Hello, I have a file.txt with 20000 lines and 2 columns each which consists of current_filename and new_filename . I want to create a script to find files in a directory with current_filename and move it to new folder with new_filename. Could you please help me how to do that?? ... (2 Replies)
Discussion started by: narasimhulu
2 Replies
Login or Register to Ask a Question