Help in awk/bash


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help in awk/bash
# 22  
Old 01-01-2013
Thanks. Smilie

How should I start learning shell scripting/awk programming better. Any book?

Thanks again.
# 23  
Old 01-02-2013
Quote:
Originally Posted by bioinfo
Thanks a lot Don Cragun and Corona688. I edited script in vi and its working. Yippie Smilie

I have one more query. I am using the following tro.txt as my input file for further program:



I wish to delete all following lines in this file:

Following entry (2659) comes from 265920.000 truncated:
Following entry (2703) comes from 270330.000 rounded:
Following entry (2703) comes from 270360.000 rounded:
..........................................................................
..........................................................................

Required output:



Please guide.
Thanks.
In addition to the grep Corona688 provided, you could also add another output file to the awk script I provided, or add an option to the script to control whether or not marker lines should be included in the tro.txt output file, or just always leave out the markers in the tro.txt output file.
This User Gave Thanks to Don Cragun For This Post:
# 24  
Old 01-03-2013
Hi,
I have two files:

Code:
11.txt showing two patterns:

ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL
ATOM 1 N SER A 1 35.683 81.326 139.778 1.00 0.00 N 
ATOM 2 CA SER A 1 35.422 82.736 139.929 1.00 0.00 C 
TER
ENDMDL

Code:
c.txt

Number of groups: 40  3.95
Group: 0 Branches: 1
0    001
Centre: 001 Nodes: 1
Group: 1 Branches: 1
0    002
Centre: 002 Nodes: 1
Group: 2 Branches: 6
0    009
1    004
2    008
3    007
4    005
5    006
Centre: 006 Nodes: 6

ENDMDL is coming many times in 11.txt. I wish to retreive that pattern corresponds to the value of Id. It means, if I give input of 004 (Id) from group 2, then it should output the fourth repeat from 11. txt ending with ENDMDL.

Code:
Id004.txt

Group2: Id 004
ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL

So, corresponding to value of Id from c.txt, I want to retreive the repeat at the number from 11.txt.


Please guide, how, corresponding to value of Id from c.txt, I can retreive the repeat at the number from 11.txt.
Also, I wish to retreive these patterns in individual files based on their Id, group, centre. For example:
group0.txt contains all patterns with Id
group1.txt contains all patterns with Id
group2.txt contains all patterns with Id
One file containing patterns with corresponding to centre ID
Code:
Id001.txt
Id002.txt
Id009.txt
............
............

Thanks

Last edited by Scrutinizer; 01-04-2013 at 12:40 AM.. Reason: quote tags -> code tags
# 25  
Old 01-04-2013
Quote:
Originally Posted by bioinfo
Hi,
I have two files:





ENDMDL is coming many times in 11.txt. I wish to retreive that pattern corresponds to the value of Id. It means, if I give input of 004 (Id) from group 2, then it should output the fourth repeat from 11. txt ending with ENDMDL.



So, corresponding to value of Id from c.txt, I want to retreive the repeat at the number from 11.txt.


Please guide, how, corresponding to value of Id from c.txt, I can retreive the repeat at the number from 11.txt.
Also, I wish to retreive these patterns in individual files based on their Id, group, centre. For example:
group0.txt contains all patterns with Id
group1.txt contains all patterns with Id
group2.txt contains all patterns with Id
One file containing patterns with corresponding to centre ID
Id001.txt
Id002.txt
Id009.txt
............
............

Thanks
This is the third or fourth problem you have posted to this thread. Reading through the thread it is getting hard to determine which problem is being addressed by some of the comments.

I have shown you how to read 11.txt, accumulate the entries in it for each set of lines ending with an ENDMDL line, and print selected entries from the accumulated list. You know what files you want to create and what you want in them, so why don't you try putting together an awk script to do that and let us know what isn't working.

From your description of groups, centres, and IDs, I have no idea how many files you want created nor what is supposed to be in each of them. I also don't see any use for the lines starting with Centre: in your c.txt file; they just have the characters Centre: followed by the Id of the last Branch in the Group that they follow, followed by the characters Nodes: , followed by the number of branches listed on the preceding Group: line. What is the difference between a Node and a Branch? What is the difference between a Group and a Centre?

If you can't do this awk script yourself, you're going to have to give us a lot more detail specifying the exact list of the files you want produced in response to the snippet from c.txt you provided, along with the data that you want written into those files.
This User Gave Thanks to Don Cragun For This Post:
# 26  
Old 01-04-2013
Thanks
I will post it in a new thread with more detail.
# 27  
Old 01-07-2013
Hi,
Script at # 15 is working great Smilie
I have two questions related to it.

(1) If I only want patterns from 11.txt which are divisible by 100 with field 1 ( that means file for no entry if $1%100 != 0), only file no.txt
(2) Also, is it possible to number rows (whose 1st field is divisible by 100 and used for retreiving patterns from 11.txt) and also to number patters retreived from 11.txt

Shall I use following code for (1):
Code:
no=${1:-no.txt}         # name of file for no entry if $1%100 != 0
awk -v no="$no" 'BEGIN {rc = 1}
FNR == NR {r[rc] = r[rc] $0 "\n"
    if($0 == "ENDMDL") rc++
    next}
{   # If we got to here, we are reading lines from the 2nd file.
    # Determine exact, truncated, and rounded entry numbers.
    if (substr($1, length($1) - 5) == "00.000") {
        # $1 ends in 00.000; no truncation or rounding needed.
        entry = substr($1, 1, length($1) - 6)
        round = trunc = 0
    } else {
	# $1 is not evenly divisible by 100; calculate rounded and truncated
        # values.
        entry = 0
        round = sprintf("%.0f", $1 / 100)
        trunc = substr($1, 1, length($1) - 6)
    }
          # Write the appropriate entry
        # to each output file.
        printf("%s", r[entry]) > no
       } 
    }'
11.txt o.txt

Thanks.
# 28  
Old 01-07-2013
Quote:
Originally Posted by bioinfo
Hi,
Script at # 15 is working great Smilie
I have two questions related to it.

(1) If I only want patterns from 11.txt which are divisible by 100 with field 1 ( that means file for no entry if $1%100 != 0), only file no.txt
(2) Also, is it possible to number rows (whose 1st field is divisible by 100 and used for retreiving patterns from 11.txt) and also to number patters retreived from 11.txt

Shall I use following code for (1):
Code:
no=${1:-no.txt}         # name of file for no entry if $1%100 != 0
awk -v no="$no" 'BEGIN {rc = 1}
FNR == NR {r[rc] = r[rc] $0 "\n"
    if($0 == "ENDMDL") rc++
    next}
{   # If we got to here, we are reading lines from the 2nd file.
    # Determine exact, truncated, and rounded entry numbers.
    if (substr($1, length($1) - 5) == "00.000") {
        # $1 ends in 00.000; no truncation or rounding needed.
        entry = substr($1, 1, length($1) - 6)
        round = trunc = 0
    } else {
	# $1 is not evenly divisible by 100; calculate rounded and truncated
        # values.
        entry = 0
        round = sprintf("%.0f", $1 / 100)
        trunc = substr($1, 1, length($1) - 6)
    }
          # Write the appropriate entry
        # to each output file.
        printf("%s", r[entry]) > no
       } 
    }'
11.txt o.txt

Thanks.
No. I assume that you tried running this awk script and got an error saying that your open "{" s didn't match your "}"s. Since you moved the filenames to be processed to a line of their own, if the awk script had run it would have tried to read both input files from standard input (not from 11.txt and o.txt). And, instead of skipping over lines that had $1 that did not end in 00.000, it would have written an entry for the 0th element in 11.txt. In this case you would get what you want since r[0] is an empty string and writing it to the file no wouldn't have done anything.

A corrected and simplified version of this script would be something like:
Code:
awk -v no="no.txt" 'BEGIN {rc = 1}
FNR == NR {r[rc] = r[rc] $0 "\n"
    if($0 == "ENDMDL") rc++
    next}
{   # If we got to here, we are reading lines from the 2nd file.
    # Determine exact, truncated, and rounded entry numbers.
    if (substr($1, length($1) - 5) == "00.000") {
        # $1 ends in 00.000; write an entry corresponding to this line.
        entry = substr($1, 1, length($1) - 6)

        # Write the appropriate entry
        # to each output file.
        printf("%s", r[entry]) > no
    }
}' 11.txt o.txt

Yes it is possible to number entries from 11.txt and to number rows from o.txt, but you'll have to specify what you mean by that by showing the exact output that you want to appear in no.txt when using your 11.txt and the following instead of your version of o.txt:
Code:
100.000
2010.000
1000.000

If you're talking about adding a tag line to the output specifying the entry # from 11.txt and the line number from o.txt, you have seen examples of how to produce tag lines in earlier scripts I have provided (including the script your stripped down to produce the script above). The entry number from 11.txt being printed is specified by the variable entry and the line number from o.txt producing an output line is specified by the variable FNR.

One way to add a tag doing this would be to change the last printf in the above script from:
Code:
        printf("%s", r[entry]) > no

to:
Code:
        printf("The following entry from line %d is for Branch %d:\n%s",
            FNR, entry, r[entry]) > no

If you want each line of output in no.txt to include the Branch #. That is also easy to do, but changes the code where entries are accumulated from 11.txt instead of changing the printf at the end of the script. If you want each line of output in no.txt to include the Branch # and the line # from o.txt, that can also be done, but it will involve changing the way the script accumulates and prints entries from 11.txt.

Last edited by Don Cragun; 01-07-2013 at 11:05 PM.. Reason: add missing [ICODE] tag
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

New problem with awk using bash

Hi! I have a new problem with awk, this time I think is because I'm using it in bash and I don't know how to put the valor of the variable in awk. Here is the code: #!/bin/bash for i in 1 2 3 4 5 do a=$i b=$ awk '$1>=a&&$1<=b {print $1,$2,$3}'>asdf test... (3 Replies)
Discussion started by: florpi
3 Replies

2. Shell Programming and Scripting

Returning a value from awk to bash

Hi I am a newbie starting bash and I have a simple need to return the result of an operation from awk to bash. basically I want to use awk to tell me if "#" exists in a string, and then back in bash, i want to do an IF statement on this return in order to do other things. In my bash shell I... (2 Replies)
Discussion started by: oahmad
2 Replies

3. Shell Programming and Scripting

Help in awk/bash

Hi, I have two files: atom.txt and g.txt atom.txt has multiple patterns but I am showing only two patterns each ending with ENDMDL: ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C TER ENDMDL ATOM 1 N SER A 1 35.683 81.326 139.778 1.00... (11 Replies)
Discussion started by: bioinfo
11 Replies

4. UNIX for Dummies Questions & Answers

Help in awk/bash

Hi, I am also a newbie in awk and trying to find solution of my problem. I have one reference file 1.txt with 2 columns and I want to search other 10 files (a.txt, b.txt......h.txt each with 5 columns) corresponding to the values of 2nd column from 1.txt. If the value from 2nd column from 1.txt... (0 Replies)
Discussion started by: bioinfo
0 Replies

5. Shell Programming and Scripting

AWK/Bash script

I would like to write a script to extend this command to a general case: BEGIN {s_0=0;n_0=0}{n_0++;s_0+=($51-$1)^2}END {print sqrt(s_0/n_0)} i.e. so that BEGIN {s_0=0;n_0=0}{n_0++;s_0+=($51-$1)^2}END {print sqrt(s_0/n_0)} BEGIN {s_1=0;n_1=0}{n_1++;s_1+=($51-$2)^2}END {print... (3 Replies)
Discussion started by: chrisjorg
3 Replies

6. UNIX for Dummies Questions & Answers

Help with BASH/AWK queries ....

Hi Everyone, I have an input file in the following format: score.file1.txt contig00045 length=566 numreads=19 1047 0.0 contig00055 length=524 numreads=7 793 0.0 contig00052 length=535 numreads=10 607 e-176 contig00072 length=472 numreads=46 571 e-165... (8 Replies)
Discussion started by: Fahmida
8 Replies

7. Shell Programming and Scripting

scripting help with bash and awk

I'm trying to reformat some tide information into a useable format and failing. Input file is.... 4452 CHENNAI (MADRAS) 13°06'N, 80°18'E India East Coast 01 June 2009 UT(GMT) Data Area 3. Indian Ocean (northern part) and Red Sea to Singapore 01/06/2009 00:00 0.7 m 00:20 0.7 m 00:40... (3 Replies)
Discussion started by: garethsays
3 Replies

8. Shell Programming and Scripting

awk bash help

Hi, I'm trying to read a file containing lines with spaces in them. The inputfile looks like this ------------------------------ Command1 arg1 arg2 Command2 arg5 arg6 arg7 ------------------------------- The shell code looks like this... lines=`awk '{ print }' inputfile` ... (2 Replies)
Discussion started by: a-gopal
2 Replies

9. Shell Programming and Scripting

Is there any better way for sorting in bash/awk

Hi, I have a file which is:- 1 6 4 8 2 3 2 1 9 3 2 1 3 3 5 6 3 1 4 9 7 8 2 3 I would like to sort from field $2 to field $6 for each of the line to:- 1 2 3 4 6 8 2 1 1 2 3 9 3 1 3 3 5 6 4 2 3 7 8 9 I came across this Arrays on example 26-6. But it is much complicated. I am... (7 Replies)
Discussion started by: ahjiefreak
7 Replies

10. Shell Programming and Scripting

BASH with AWK

Hello, I have a file.txt with 20000 lines and 2 columns each which consists of current_filename and new_filename . I want to create a script to find files in a directory with current_filename and move it to new folder with new_filename. Could you please help me how to do that?? ... (2 Replies)
Discussion started by: narasimhulu
2 Replies
Login or Register to Ask a Question