Help in awk/bash

12-28-2012

Registered User

50, 0

Join Date: Dec 2012

Last Activity: 12 August 2013, 3:07 AM EDT

Posts: 50

Thanks Given: 52

Thanked 0 Times in 0 Posts

Help in awk/bash

Code:

1.35732	-191.632
1.36229	-190.8716
1.35503	-191.3254
1.35597	-191.2652

a.txt:

Code:

271640.000	 0.49000	 -0.0000036574 -191.6318 -183.82380	
271650.000	 0.49155	 0.0000033909	 -198.30111	 -198.73140	
271660.000	 0.48775	 0.0000014657	 -191.3254 -199.84910	
271670.000	 0.48212	 -0.0000004152 -195.48446	 -193.15580

Please guide.
Thanks

Last edited by joeyg; 12-28-2012 at 09:44 PM.. Reason: Please wrap scripts and data in CodeTags

bioinfo

View Public Profile for bioinfo

Find all posts by bioinfo

12-28-2012

Registered User

4,673, 588

Join Date: Oct 2010

Last Activity: 1 February 2016, 3:35 PM EST

Location: Southern NJ, USA (Nord)

Posts: 4,673

Thanks Given: 8

Thanked 588 Times in 561 Posts

You can 'join' file 1.txt to each of the [a-h].txt in a 'for' loop, and process the 'for' output piped to shell 'while read'. The file name will be in the 'for' variable and the file columns will be all present in the 'read' variables. You have to 'sort' every file on the key column using a 'binary' sort (export LC_ALL=C, not a numeric sort). Hopefully the original line order is not critical, else number the lines in a new field. While you can join using a pile of awk or shell commands, this is cleaner.

Man Page for join (opensolaris Section 1) - The UNIX and Linux Forums

Man Page for sort (all Section 1) - The UNIX and Linux Forums

This User Gave Thanks to DGPickett For This Post:

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

12-28-2012

Registered User

50, 0

Join Date: Dec 2012

Last Activity: 12 August 2013, 3:07 AM EDT

Posts: 50

Thanks Given: 52

Thanked 0 Times in 0 Posts

Thanks for the reply.
Can you please help in writing the code as I am not expert in awk.

Thanks again

bioinfo

View Public Profile for bioinfo

Find all posts by bioinfo

12-28-2012

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by bioinfo

Hi, I am also a newbie in awk and trying to find solution of my problem.

I have one reference file 1.txt with 2 columns and I want to search other 10 files (a.txt, b.txt......h.txt each with 5 columns) corresponding to the values of 2nd column from 1.txt. If the value from 2nd column from 1.txt matches with the value of 4th column of 10 files, then print the row as well as file name.
Also, in 1.txt for eg. 1st value is -191.632 but originally in a.txt it is -191.6318, so I also want to print values same upto two decimal places and rest places can be any number.

1.txt:

1.35732 -191.632
1.36229 -190.8716
1.35503 -191.3254
1.35597 -191.2652

a.txt:

271640.000 0.49000 -0.0000036574 -191.6318 -183.82380
271650.000 0.49155 0.0000033909 -198.30111 -198.73140
271660.000 0.48775 0.0000014657 -191.3254 -199.84910
271670.000 0.48212 -0.0000004152 -195.48446 -193.15580

Please guide.
Thanks

I'm not sure if you want the values in 1.txt column 2 and a-h.txt column 4 truncated to two decimal places or rounded to two decimal places (with your sample input, the results are the same) and I'm not sure why DGPickett thinks join and sort would be easier than awk, but here are ways to use awk to do what I think you're requesting...

Code:

echo "awk with rounded values"
awk ' FNR == NR {v[sprintf("%.2f", $2)]}
sprintf("%.2f", $4) in v {print $0, FILENAME}' 1.txt [a-h].txt

echo "awk with truncated values"
awk '
function trunc(val) {
        split(val, a, /[.]/)
        return a[1] "." substr(a[2] "00", 1, 2)
}
FNR == NR {v[trunc($2)]}
trunc($4) in v {print $0, FILENAME}' 1.txt [a-h].txt

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

12-28-2012

Registered User

50, 0

Join Date: Dec 2012

Last Activity: 12 August 2013, 3:07 AM EDT

Posts: 50

Thanks Given: 52

Thanked 0 Times in 0 Posts

Thanks for the reply.
Can you please explain it somewhat.

Thanks again.

bioinfo

View Public Profile for bioinfo

Find all posts by bioinfo

12-28-2012

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by bioinfo

Thanks for the reply.
Can you please explain it somewhat.

Thanks again.

Code:

1  echo "awk with rounded values"
2  awk ' FNR == NR {v[sprintf("%.2f", $2)]; next}
3  sprintf("%.2f", $4) in v {print $0, FILENAME}' 1.txt [a-h].txt
4
5  echo "awk with truncated values"
6  awk '
7  function trunc(val) {
8          split(val, a, /[.]/)
9          return a[1] "." substr(a[2] "00", 1, 2)
10 }
11 FNR == NR {v[trunc($2)]; next}
12 trunc($4) in v {print $0, FILENAME}' 1.txt [a-h].txt

I have added line numbers to aid in this discussion, but note that the line numbers cannot appear in the script when you run it.

Also note that I have added an awk next command to lines 2 and 11. With the given sample data it won't make any difference, but with other data or with different fields being checked, it could be important.

In the suggestion on lines 1-3, the sprint("%.2f", arg) converts the string specified by arg to a floating point value and produces a string that represents that floating point value rounded to two digits after the decimal point. Line two uses that to create an array with indices that are the rounded floating point values of the second field ($2) in the first input file (lines where the record number within the file [FNR] is equal to the line number of all records read by awk [NR]).

(The next command I added here causes awk to skip to the next record instead of checking whether or not any remaining commands in the script should be executed. Without the next, the next line will process lines from all input files. It doesn't affect processing here because there is no field 4 in file one. The empty field 4 will be converted to 0.00 and none of the strings in the second field in the 1.txt will be converted to 0.00.)

Line 3 tests whether the same conversion used in line 2 produces a string that is an index in the array v (index in array evaluates to TRUE if index if is an index in the array named array. So, if $4 (rounded to two decimal places) in any of the files after the 1st file match $2 (rounded to two decimal places) in the first file, the print command will be run printing the current input line ($0) and the name of the file containing the line (FILENAME).

The 1.txt [a-h].txt on lines 3 and 12 specifies the eleven input files to be processed by these awk scripts.

The suggestion on lines 5-12 uses the same logic as the 1st suggestion but truncates the strings to two decimal places instead of rounding to two decimal places. Since the truncation logic is more complex than the single function call to sprint() used to perform the rounding, I wrote a function (lines 7-10) to convert the string to a string representing a floating point value with two decimal places.

The split() on line 8 creates an array of one or two elements with the first element containing all of the characters before the "." and the second element containing all of the characters after the ".". If there is no "." in the input value, the first element of the array will contain the entire input string and the second element of the array will not be set (and when referenced will act as an empty string). The return command on line 9 returns a string that is the concatenation of the first element in the array, a decimal point, and the 1st two characters of the concatenation of the second element of the array followed by "00". (The concatenation with "00" takes care of cases where field 2 in the first file or field 4 in the remaining files have an integer value with no decimal point and the case where the input field has a period but there are less than two digits after the decimal point.)

The logic on lines 11 and 12 is the same as the logic on lines 2 and 3.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

12-30-2012

Registered User

50, 0

Join Date: Dec 2012

Last Activity: 12 August 2013, 3:07 AM EDT

Posts: 50

Thanks Given: 52

Thanked 0 Times in 0 Posts

Hi,
Thanks a lot , I have done it.

I have got the following output for all files (just showing for one file and naming it as o.txt) :

Code:

100.000        0.51332	   0.0000001923	 -191.04738     a.txt
2000.000	   0.49573	   0.0000015512	 -191.40071     a.txt
1000.000	   0.51047	   0.0000028339	 -190.92254     a.txt

Further, I need your help. I have 10 more files, all of same format (11.txt) as follows, showing 2 repeats from this file:

Code:

ATOM      1  N    SER A   1      35.092  83.194 140.076  1.00  0.00           N  
ATOM      2  CA  SER A   1      35.216  83.725 138.725  1.00  0.00           C  
ATOM      3  C    SER A   1      36.530  84.485 138.538  1.00  0.00           C  
TER
ENDMDL
ATOM      1  N   SER A   1      35.683  81.326 139.778  1.00  0.00           N  
ATOM      2  CA  SER A   1      35.422  82.736 139.929  1.00  0.00           C  
ATOM      3  C   SER A   1      36.497  83.588 139.247  1.00  0.00           C  
TER
ENDMDL

ENDMDL is coming around 10000 times in each file. If I give input of 100 at $1 from o.txt, then it should output the first repeat from 11. txt ending with ENDMDL.

Code:

ATOM      1  N    SER A   1      35.092  83.194 140.076  1.00  0.00           N  
ATOM      2  CA  SER A   1      35.216  83.725 138.725  1.00  0.00           C  
ATOM      3  C    SER A   1      36.530  84.485 138.538  1.00  0.00           C  
TER
ENDMDL

So, corresponding to first column of o.txt, I want to retreive the repeat at the number $1/100 from 11.txt i.e. if $1=2000, then I want to retreive the pattern where ENDMDL is at 20 place.

Please guide me.

Thanks again

---------- Post updated at 10:40 PM ---------- Previous update was at 09:52 PM ----------

Please guide me. Its urgent.

Thanks

Last edited by Scrutinizer; 12-31-2012 at 03:53 AM.. Reason: code tags

bioinfo

View Public Profile for bioinfo

Find all posts by bioinfo

Shell Programming and Scripting

Help in awk/bash

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

New problem with awk using bash

Discussion started by: florpi

2. Shell Programming and Scripting

Returning a value from awk to bash

Discussion started by: oahmad

3. Shell Programming and Scripting

Help in awk/bash

Discussion started by: bioinfo

4. UNIX for Dummies Questions & Answers

Help in awk/bash

Discussion started by: bioinfo

5. Shell Programming and Scripting

AWK/Bash script

Discussion started by: chrisjorg

6. UNIX for Dummies Questions & Answers

Help with BASH/AWK queries ....

Discussion started by: Fahmida

7. Shell Programming and Scripting

scripting help with bash and awk

Discussion started by: garethsays

8. Shell Programming and Scripting

awk bash help

Discussion started by: a-gopal

9. Shell Programming and Scripting

Is there any better way for sorting in bash/awk

Discussion started by: ahjiefreak

10. Shell Programming and Scripting

BASH with AWK

Discussion started by: narasimhulu