Home Man
Search
Today's Posts
Register

BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

Help in awk/bash

Tags
shell scripts

Login to Reply

 
Thread Tools Search this Thread
# 1  
Old 12-28-2012
Linux Help in awk/bash

Hi, I am also a newbie in awk and trying to find solution of my problem.

I have one reference file 1.txt with 2 columns and I want to search other 10 files (a.txt, b.txt......h.txt each with 5 columns) corresponding to the values of 2nd column from 1.txt. If the value from 2nd column from 1.txt matches with the value of 4th column of 10 files, then print the row as well as file name.
Also, in 1.txt for eg. 1st value is -191.632 but originally in a.txt it is -191.6318, so I also want to print values same upto two decimal places and rest places can be any number.

1.txt:

Code:
1.35732	-191.632
1.36229	-190.8716
1.35503	-191.3254
1.35597	-191.2652

a.txt:

Code:
271640.000	 0.49000	 -0.0000036574 -191.6318 -183.82380	
271650.000	 0.49155	 0.0000033909	 -198.30111	 -198.73140	
271660.000	 0.48775	 0.0000014657	 -191.3254 -199.84910	
271670.000	 0.48212	 -0.0000004152 -195.48446	 -193.15580

Please guide.
Thanks

Last edited by joeyg; 12-28-2012 at 08:44 PM.. Reason: Please wrap scripts and data in CodeTags
# 2  
Old 12-28-2012
You can 'join' file 1.txt to each of the [a-h].txt in a 'for' loop, and process the 'for' output piped to shell 'while read'. The file name will be in the 'for' variable and the file columns will be all present in the 'read' variables. You have to 'sort' every file on the key column using a 'binary' sort (export LC_ALL=C, not a numeric sort). Hopefully the original line order is not critical, else number the lines in a new field. While you can join using a pile of awk or shell commands, this is cleaner.

Man Page for join (opensolaris Section 1) - The UNIX and Linux Forums

Man Page for sort (all Section 1) - The UNIX and Linux Forums
The Following User Says Thank You to DGPickett For This Useful Post:
bioinfo (12-28-2012)
# 3  
Old 12-28-2012
Thanks for the reply.
Can you please help in writing the code as I am not expert in awk.

Thanks again
# 4  
Old 12-28-2012
Quote:
Originally Posted by bioinfo
Hi, I am also a newbie in awk and trying to find solution of my problem.

I have one reference file 1.txt with 2 columns and I want to search other 10 files (a.txt, b.txt......h.txt each with 5 columns) corresponding to the values of 2nd column from 1.txt. If the value from 2nd column from 1.txt matches with the value of 4th column of 10 files, then print the row as well as file name.
Also, in 1.txt for eg. 1st value is -191.632 but originally in a.txt it is -191.6318, so I also want to print values same upto two decimal places and rest places can be any number.

1.txt:

1.35732 -191.632
1.36229 -190.8716
1.35503 -191.3254
1.35597 -191.2652

a.txt:

271640.000 0.49000 -0.0000036574 -191.6318 -183.82380
271650.000 0.49155 0.0000033909 -198.30111 -198.73140
271660.000 0.48775 0.0000014657 -191.3254 -199.84910
271670.000 0.48212 -0.0000004152 -195.48446 -193.15580

Please guide.
Thanks
I'm not sure if you want the values in 1.txt column 2 and a-h.txt column 4 truncated to two decimal places or rounded to two decimal places (with your sample input, the results are the same) and I'm not sure why DGPickett thinks join and sort would be easier than awk, but here are ways to use awk to do what I think you're requesting...
Code:
echo "awk with rounded values"
awk ' FNR == NR {v[sprintf("%.2f", $2)]}
sprintf("%.2f", $4) in v {print $0, FILENAME}' 1.txt [a-h].txt

echo "awk with truncated values"
awk '
function trunc(val) {
        split(val, a, /[.]/)
        return a[1] "." substr(a[2] "00", 1, 2)
}
FNR == NR {v[trunc($2)]}
trunc($4) in v {print $0, FILENAME}' 1.txt [a-h].txt

The Following User Says Thank You to Don Cragun For This Useful Post:
bioinfo (12-28-2012)
# 5  
Old 12-28-2012
Thanks for the reply.
Can you please explain it somewhat.

Thanks again.
# 6  
Old 12-28-2012
Quote:
Originally Posted by bioinfo
Thanks for the reply.
Can you please explain it somewhat.

Thanks again.
Code:
1  echo "awk with rounded values"
2  awk ' FNR == NR {v[sprintf("%.2f", $2)]; next}
3  sprintf("%.2f", $4) in v {print $0, FILENAME}' 1.txt [a-h].txt
4
5  echo "awk with truncated values"
6  awk '
7  function trunc(val) {
8          split(val, a, /[.]/)
9          return a[1] "." substr(a[2] "00", 1, 2)
10 }
11 FNR == NR {v[trunc($2)]; next}
12 trunc($4) in v {print $0, FILENAME}' 1.txt [a-h].txt

I have added line numbers to aid in this discussion, but note that the line numbers cannot appear in the script when you run it.

Also note that I have added an awk next command to lines 2 and 11. With the given sample data it won't make any difference, but with other data or with different fields being checked, it could be important.

In the suggestion on lines 1-3, the sprint("%.2f", arg) converts the string specified by arg to a floating point value and produces a string that represents that floating point value rounded to two digits after the decimal point. Line two uses that to create an array with indices that are the rounded floating point values of the second field ($2) in the first input file (lines where the record number within the file [FNR] is equal to the line number of all records read by awk [NR]).

(The next command I added here causes awk to skip to the next record instead of checking whether or not any remaining commands in the script should be executed. Without the next, the next line will process lines from all input files. It doesn't affect processing here because there is no field 4 in file one. The empty field 4 will be converted to 0.00 and none of the strings in the second field in the 1.txt will be converted to 0.00.)

Line 3 tests whether the same conversion used in line 2 produces a string that is an index in the array v (index in array evaluates to TRUE if index if is an index in the array named array. So, if $4 (rounded to two decimal places) in any of the files after the 1st file match $2 (rounded to two decimal places) in the first file, the print command will be run printing the current input line ($0) and the name of the file containing the line (FILENAME).

The 1.txt [a-h].txt on lines 3 and 12 specifies the eleven input files to be processed by these awk scripts.

The suggestion on lines 5-12 uses the same logic as the 1st suggestion but truncates the strings to two decimal places instead of rounding to two decimal places. Since the truncation logic is more complex than the single function call to sprint() used to perform the rounding, I wrote a function (lines 7-10) to convert the string to a string representing a floating point value with two decimal places.

The split() on line 8 creates an array of one or two elements with the first element containing all of the characters before the "." and the second element containing all of the characters after the ".". If there is no "." in the input value, the first element of the array will contain the entire input string and the second element of the array will not be set (and when referenced will act as an empty string). The return command on line 9 returns a string that is the concatenation of the first element in the array, a decimal point, and the 1st two characters of the concatenation of the second element of the array followed by "00". (The concatenation with "00" takes care of cases where field 2 in the first file or field 4 in the remaining files have an integer value with no decimal point and the case where the input field has a period but there are less than two digits after the decimal point.)

The logic on lines 11 and 12 is the same as the logic on lines 2 and 3.
The Following User Says Thank You to Don Cragun For This Useful Post:
bioinfo (12-29-2012)
# 7  
Old 12-30-2012
Hi,
Thanks a lot , I have done it.
I have got the following output for all files (just showing for one file and naming it as o.txt) :
Code:
100.000        0.51332	   0.0000001923	 -191.04738     a.txt
2000.000	   0.49573	   0.0000015512	 -191.40071     a.txt
1000.000	   0.51047	   0.0000028339	 -190.92254     a.txt

Further, I need your help. I have 10 more files, all of same format (11.txt) as follows, showing 2 repeats from this file:
Code:
ATOM      1  N    SER A   1      35.092  83.194 140.076  1.00  0.00           N  
ATOM      2  CA  SER A   1      35.216  83.725 138.725  1.00  0.00           C  
ATOM      3  C    SER A   1      36.530  84.485 138.538  1.00  0.00           C  
TER
ENDMDL
ATOM      1  N   SER A   1      35.683  81.326 139.778  1.00  0.00           N  
ATOM      2  CA  SER A   1      35.422  82.736 139.929  1.00  0.00           C  
ATOM      3  C   SER A   1      36.497  83.588 139.247  1.00  0.00           C  
TER
ENDMDL

ENDMDL is coming around 10000 times in each file. If I give input of 100 at $1 from o.txt, then it should output the first repeat from 11. txt ending with ENDMDL.
Code:
ATOM      1  N    SER A   1      35.092  83.194 140.076  1.00  0.00           N  
ATOM      2  CA  SER A   1      35.216  83.725 138.725  1.00  0.00           C  
ATOM      3  C    SER A   1      36.530  84.485 138.538  1.00  0.00           C  
TER
ENDMDL

So, corresponding to first column of o.txt, I want to retreive the repeat at the number $1/100 from 11.txt i.e. if $1=2000, then I want to retreive the pattern where ENDMDL is at 20 place.


Please guide me.

Thanks again

---------- Post updated at 10:40 PM ---------- Previous update was at 09:52 PM ----------

Please guide me. Its urgent.

Thanks

Last edited by Scrutinizer; 12-31-2012 at 02:53 AM.. Reason: code tags
Login to Reply

« Previous Thread | Next Thread »
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Different behavior between bash shell and bash script for cmd da1 Shell Programming and Scripting 5 3 Weeks Ago 06:41 AM
How to run several bash commands put in bash command line? abdulbadii Shell Programming and Scripting 4 01-23-2018 06:11 AM
Bash to select text and apply it to a selected file in bash cmccabe Shell Programming and Scripting 4 09-29-2016 05:36 PM
Im new to bash scriping and i found this expression on a bash script what does this mean. kevin298 UNIX for Dummies Questions & Answers 1 10-26-2012 07:04 PM
Using arrays in bash using strings to bash built-in true kristinu Shell Programming and Scripting 41 03-30-2012 08:07 PM
ARGV and ARGC in bash 3 and bash 3.2 SBC Shell Programming and Scripting 2 06-29-2011 06:53 AM
Bash Script: modify bash LibRid Shell Programming and Scripting 9 10-25-2010 06:17 PM
how to make your bash script run on a machine with csh and bash npatwardhan Shell Programming and Scripting 3 11-19-2008 03:17 AM
bash and ksh: variable lost in loop in bash? estienne Shell Programming and Scripting 2 08-25-2008 02:09 PM
passing variable from bash to perl from bash script arsidh Shell Programming and Scripting 10 06-04-2008 12:25 PM


All times are GMT -4. The time now is 02:39 PM.

Unix & Linux Forums Content Copyright©1993-2018. All Rights Reserved.
UNIX.COM Login
Username:
Password:  
Show Password