Help in awk/bash


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help in awk/bash
# 8  
Old 12-31-2012
Quote:
Originally Posted by bioinfo
Hi,
Thanks a lot , I have done it. Smilie
I have got the following output for all files (just showing for one file and naming it as o.txt) :
100.000 0.51332 0.0000001923 -191.04738 a.txt
2000.000 0.49573 0.0000015512 -191.40071 a.txt
1000.000 0.51047 0.0000028339 -190.92254 a.txt

Further, I need your help. I have 10 more files, all of same format (11.txt) as follows, showing 2 repeats from this file:
ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N
ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C
ATOM 3 C SER A 1 36.530 84.485 138.538 1.00 0.00 C
TER
ENDMDL
ATOM 1 N SER A 1 35.683 81.326 139.778 1.00 0.00 N
ATOM 2 CA SER A 1 35.422 82.736 139.929 1.00 0.00 C
ATOM 3 C SER A 1 36.497 83.588 139.247 1.00 0.00 C
TER
ENDMDL

ENDMDL is coming around 10000 times in each file. If I give input of 100 at $1 from o.txt, then it should output the first repeat from 11. txt ending with ENDMDL.
ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N
ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C
ATOM 3 C SER A 1 36.530 84.485 138.538 1.00 0.00 C
TER
ENDMDL

So, corresponding to first column of o.txt, I want to retreive the repeat at the number $1/100 from 11.txt i.e. if $1=2000, then I want to retreive the pattern where ENDMDL is at 20 place.


Please guide me.

Thanks again

---------- Post updated at 10:40 PM ---------- Previous update was at 09:52 PM ----------

Please guide me. Its urgent. Smilie

Thanks
First, let me be very clear: I am a volunteer in this forum. Nothing that you ask me to do is urgent. If you need me to consider stuff that you'd like me to do for you urgent, you need to put me on your payroll!

I'm not sure I understand what you want. Am I correct in making the following assumptions:
  1. The input for this assignment is a file named o.txt.
  2. The first field of each line in o.txt is of the form x00.000 with 1 <= x <= 10000.
  3. For each line read from o.txt, the xth entry from file 11.txt is to be written to standard output where each entry in 11.txt is terminated by a line containing only ENDMDL.
  4. In addition to 11.txt, there are 9 more files like it in the same format as 11.txt that are to be ignored.
  5. You have already verified that the value in the first field of o.txt will correspond to an existing entry in 11.txt (i.e., I don't need to worry about negative values in the 1st field of o.txt, values in that field that don't end with "00.000", nor values before the "00.000" that identify a number greater than the number of times "ENDMDL" appears in 11.txt).
Are these assumptions correct?

If the above assumptions are all correct, the following script should do what you want:
Code:
#!/bin/ksh
awk 'BEGIN {rc = 1}
FNR == NR {r[rc] = r[rc] $0 "\n"
        if($0 == "ENDMDL") rc++
        next}
{       printf("%s", r[$1])}' 11.txt FS='00[.]000' o.txt

As always, if you're using a Solaris system, use /usr/xpg4/bin/awk or nawk instead of awk.

On some awk implementations, setting the array r could be simplified by setting RS to "ENDMDL" before processing 11.txt, but the standards only define the behavior when RS is set to a single character or to the empty string. The awk on OS X (which I use for testing when I'm working on solutions for issues raised in this forum) is one of the implementations that only uses the first character of RS values as the record separator.

Last edited by Don Cragun; 12-31-2012 at 05:09 AM.. Reason: Add possible solution.
This User Gave Thanks to Don Cragun For This Post:
# 9  
Old 12-31-2012
Hi,
Thanks again for guidance. Smilie
Sorry, I did not mean to hurt anyone.

Most of your assumptions are correct and I wish to make some of them more clear:
2. The values of x are not in a sequence, but surely positive. For e.g. 2000, 7000, 3000, 1982480 (for bigger files) etc.
3. For each each value of first field (x) from o.txt and dividing it by 100, I wish to retreive corresponding entry from 11.txt ending with ENDMDL. That means, if the value of x is 1000.000, then I wish to divide it by 100 and then retreiving 10th entry from 11.txt.

Please explain the concept of rc. Smilie

Thanks again.
# 10  
Old 12-31-2012
rc means record counter.
This User Gave Thanks to DGPickett For This Post:
# 11  
Old 12-31-2012
Quote:
Originally Posted by bioinfo
Hi,
Thanks again for guidance. Smilie
Sorry, I did not mean to hurt anyone.

Most of your assumptions are correct and I wish to make some of them more clear:
2. The values of x are not in a sequence, but surely positive. For e.g. 2000, 7000, 3000, 1982480 (for bigger files) etc.
3. For each each value of first field (x) from o.txt and dividing it by 100, I wish to retreive corresponding entry from 11.txt ending with ENDMDL. That means, if the value of x is 1000.000, then I wish to divide it by 100 and then retreiving 10th entry from 11.txt.

Please explain the concept of rc. Smilie

Thanks again.
The script I provided in message #8 in this thread assumes that the first field in o.txt has the values 7000.000, 3000.000, and 1982400.000 (not 1982480.000 or 1982480) to get the 70th, 30th, and the 19,824th entry from 11.txt. If the 1st field in o.txt does not end with 00.000, the current script won't print anything for that line in o.txt. If you have values like 1982480 which is not evenly divisible by 100, you need to explain if the value is to be skipped, truncated, or rounded to determine which entry from o.txt to print? (In other words since there is no entry numbered 19,824.80, do you want nothing to be printed, do you want the result of the division truncated to return the 19,824th entry, or do you want it rounded to return the 19,825th entry?) Why did all entries in you sample o.txt file end with 00.000 if you are saying that the values in the value are sometimes integers and that the values aren't evenly divisible by 100?

The script I provided does not assume that the values in the first field from o.txt are in sequence; with the data you gave as a sample it will print the 1st, 20th, and 10th entries from 11.txt in that order.

In the script I provided, rc is the number entries that have been read from 11.txt plus one. So when the script starts reading lines from 11.txt, the lines will be accumulated into r[1] until after the line containing ENDMDL is added to the entry. Then rc will be incremented so that subsequent lines will be added to the next entry...
This User Gave Thanks to Don Cragun For This Post:
# 12  
Old 12-31-2012
1st field in o.txt does end with 00.000
Sorry, I did not know that its too critical to include 00.000 Smilie

Now I checked manually in all my files, except one value which is 5390.001, all other ends with .000

As you suggested a good point, (even I have not thought of it Smilie ) regarding value 19,824.80 and three options:
(1) nothing to be printed,
(2) result of the division truncated to return the 19,824th entry, or
(3) rounded to return the 19,825th entry

So, I wish to retreive both entries 19,824th and 19,825 in one file as well as 3 other files with above options.

That means for 1st field values that are non-divisible by 100, I wish to retreive one file containing nothing for them, one file with truncated, one file with rounded and fourth file with both truncated and rounded values (but each of these 4 files must have divisible entries too).


Thanks
# 13  
Old 12-31-2012
Quote:
Originally Posted by bioinfo
1st field in o.txt does end with 00.000
Sorry, I did not know that its too critical to include 00.000 Smilie
It isn't critical, I just took advantage of it since your sample input was always in this form.
Quote:
Now I checked manually in all my files, except one value which is 5390.001, all other ends with .000
OK. So I have to do some arithmetic instead of letting awk treat 00.000 as a field separator.
Quote:
As you suggested a good point, (even I have not thought of it Smilie ) regarding value 19,824.80 and three options:
(1) nothing to be printed,
(2) result of the division truncated to return the 19,824th entry, or
(3) rounded to return the 19,825th entry

So, I wish to retreive both entries 19,824th and 19,825 in one file as well as 3 other files with above options.
In the file with both entries, if the truncated and rounded entries are the same, do you want that entry printed twice, or just once? (For example, 5310.000 truncates to the 53rd entry and rounds to the 53rd entry.)
Quote:
That means for 1st field values that are non-divisible by 100, I wish to retreive one file containing nothing for them, one file with truncated, one file with rounded and fourth file with both truncated and rounded values (but each of these 4 files must have divisible entries too).


Thanks
  1. What names do you want for these four files?
  2. In the file that has both rounded and truncated entries, do you want any kind of marker added to the output entries indicating that there are two output record for a single input line? If so, what should the marker be?
  3. In the file with truncated and rounded entries, do you want any kind of marker added to the output entries indicating that the record from 11.txt was selected based on truncating a value or rounding a value, respectively? If so, what should the markers be?
  4. In the file with nothing for values that are not evenly divisible by 100, do you want any kind of marker in the output to show that an entry was skipped? If so, what should the markers be?
  5. Do you want one of these four files to be written to standard output, or do you want all output to be written directly to the four files?
If you want markers, it would be relatively easy to include markers of the form:
Code:
Following entry (%d) comes from %s truncated:
Following entry (%d) comes from %s rounded:
Entry skipped because %s is not evenly divisible by 100.

where the %d is replaced by the entry number of the following lines and %s is replaced by the 1st field in o.txt, if that is what you want.
This User Gave Thanks to Don Cragun For This Post:
# 14  
Old 12-31-2012
Quote:
In the file with both entries, if the truncated and rounded entries are the same, do you want that entry printed twice, or just once? (For example, 5310.000 truncates to the 53rd entry and rounds to the 53rd entry.)
They can be printed once.

Quote:
What names do you want for these four files?
File names can be no.txt, trun.txt, round.txt, tro.txt

Quote:
In the file that has both rounded and truncated entries, do you want any kind of marker added to the output entries indicating that there are two output record for a single input line? If so, what should the marker be?

In the file with truncated and rounded entries, do you want any kind of marker added to the output entries indicating that the record from 11.txt was selected based on truncating a value or rounding a value, respectively? If so, what should the markers be?

In the file with nothing for values that are not evenly divisible by 100, do you want any kind of marker in the output to show that an entry was skipped? If so, what should the markers be?

Do you want one of these four files to be written to standard output, or do you want all output to be written directly to the four files?
If you want markers, it would be relatively easy to include markers of the form:
Yes, I wish to have markers. All output should be directly written to four files.

Thanks Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

New problem with awk using bash

Hi! I have a new problem with awk, this time I think is because I'm using it in bash and I don't know how to put the valor of the variable in awk. Here is the code: #!/bin/bash for i in 1 2 3 4 5 do a=$i b=$ awk '$1>=a&&$1<=b {print $1,$2,$3}'>asdf test... (3 Replies)
Discussion started by: florpi
3 Replies

2. Shell Programming and Scripting

Returning a value from awk to bash

Hi I am a newbie starting bash and I have a simple need to return the result of an operation from awk to bash. basically I want to use awk to tell me if "#" exists in a string, and then back in bash, i want to do an IF statement on this return in order to do other things. In my bash shell I... (2 Replies)
Discussion started by: oahmad
2 Replies

3. Shell Programming and Scripting

Help in awk/bash

Hi, I have two files: atom.txt and g.txt atom.txt has multiple patterns but I am showing only two patterns each ending with ENDMDL: ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C TER ENDMDL ATOM 1 N SER A 1 35.683 81.326 139.778 1.00... (11 Replies)
Discussion started by: bioinfo
11 Replies

4. UNIX for Dummies Questions & Answers

Help in awk/bash

Hi, I am also a newbie in awk and trying to find solution of my problem. I have one reference file 1.txt with 2 columns and I want to search other 10 files (a.txt, b.txt......h.txt each with 5 columns) corresponding to the values of 2nd column from 1.txt. If the value from 2nd column from 1.txt... (0 Replies)
Discussion started by: bioinfo
0 Replies

5. Shell Programming and Scripting

AWK/Bash script

I would like to write a script to extend this command to a general case: BEGIN {s_0=0;n_0=0}{n_0++;s_0+=($51-$1)^2}END {print sqrt(s_0/n_0)} i.e. so that BEGIN {s_0=0;n_0=0}{n_0++;s_0+=($51-$1)^2}END {print sqrt(s_0/n_0)} BEGIN {s_1=0;n_1=0}{n_1++;s_1+=($51-$2)^2}END {print... (3 Replies)
Discussion started by: chrisjorg
3 Replies

6. UNIX for Dummies Questions & Answers

Help with BASH/AWK queries ....

Hi Everyone, I have an input file in the following format: score.file1.txt contig00045 length=566 numreads=19 1047 0.0 contig00055 length=524 numreads=7 793 0.0 contig00052 length=535 numreads=10 607 e-176 contig00072 length=472 numreads=46 571 e-165... (8 Replies)
Discussion started by: Fahmida
8 Replies

7. Shell Programming and Scripting

scripting help with bash and awk

I'm trying to reformat some tide information into a useable format and failing. Input file is.... 4452 CHENNAI (MADRAS) 13°06'N, 80°18'E India East Coast 01 June 2009 UT(GMT) Data Area 3. Indian Ocean (northern part) and Red Sea to Singapore 01/06/2009 00:00 0.7 m 00:20 0.7 m 00:40... (3 Replies)
Discussion started by: garethsays
3 Replies

8. Shell Programming and Scripting

awk bash help

Hi, I'm trying to read a file containing lines with spaces in them. The inputfile looks like this ------------------------------ Command1 arg1 arg2 Command2 arg5 arg6 arg7 ------------------------------- The shell code looks like this... lines=`awk '{ print }' inputfile` ... (2 Replies)
Discussion started by: a-gopal
2 Replies

9. Shell Programming and Scripting

Is there any better way for sorting in bash/awk

Hi, I have a file which is:- 1 6 4 8 2 3 2 1 9 3 2 1 3 3 5 6 3 1 4 9 7 8 2 3 I would like to sort from field $2 to field $6 for each of the line to:- 1 2 3 4 6 8 2 1 1 2 3 9 3 1 3 3 5 6 4 2 3 7 8 9 I came across this Arrays on example 26-6. But it is much complicated. I am... (7 Replies)
Discussion started by: ahjiefreak
7 Replies

10. Shell Programming and Scripting

BASH with AWK

Hello, I have a file.txt with 20000 lines and 2 columns each which consists of current_filename and new_filename . I want to create a script to find files in a directory with current_filename and move it to new folder with new_filename. Could you please help me how to do that?? ... (2 Replies)
Discussion started by: narasimhulu
2 Replies
Login or Register to Ask a Question