Help in awk/bash

01-01-2013

Registered User

50, 0

Join Date: Dec 2012

Last Activity: 12 August 2013, 3:07 AM EDT

Posts: 50

Thanks Given: 52

Thanked 0 Times in 0 Posts

Thanks.

How should I start learning shell scripting/awk programming better. Any book?

Thanks again.

bioinfo

View Public Profile for bioinfo

Find all posts by bioinfo

01-02-2013

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by bioinfo

Thanks a lot Don Cragun and Corona688. I edited script in vi and its working. Yippie Smilie

I have one more query. I am using the following tro.txt as my input file for further program:

I wish to delete all following lines in this file:

Following entry (2659) comes from 265920.000 truncated:
Following entry (2703) comes from 270330.000 rounded:
Following entry (2703) comes from 270360.000 rounded:
..........................................................................
..........................................................................

Required output:

Please guide.
Thanks.

In addition to the grep Corona688 provided, you could also add another output file to the awk script I provided, or add an option to the script to control whether or not marker lines should be included in the tro.txt output file, or just always leave out the markers in the tro.txt output file.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

01-03-2013

Registered User

50, 0

Join Date: Dec 2012

Last Activity: 12 August 2013, 3:07 AM EDT

Posts: 50

Thanks Given: 52

Thanked 0 Times in 0 Posts

Hi,
I have two files:

Code:

11.txt showing two patterns:

ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL
ATOM 1 N SER A 1 35.683 81.326 139.778 1.00 0.00 N 
ATOM 2 CA SER A 1 35.422 82.736 139.929 1.00 0.00 C 
TER
ENDMDL

Code:

c.txt

Number of groups: 40  3.95
Group: 0 Branches: 1
0    001
Centre: 001 Nodes: 1
Group: 1 Branches: 1
0    002
Centre: 002 Nodes: 1
Group: 2 Branches: 6
0    009
1    004
2    008
3    007
4    005
5    006
Centre: 006 Nodes: 6

ENDMDL is coming many times in 11.txt. I wish to retreive that pattern corresponds to the value of Id. It means, if I give input of 004 (Id) from group 2, then it should output the fourth repeat from 11. txt ending with ENDMDL.

Code:

Id004.txt

Group2: Id 004
ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL

So, corresponding to value of Id from c.txt, I want to retreive the repeat at the number from 11.txt.

Please guide, how, corresponding to value of Id from c.txt, I can retreive the repeat at the number from 11.txt.
Also, I wish to retreive these patterns in individual files based on their Id, group, centre. For example:
group0.txt contains all patterns with Id
group1.txt contains all patterns with Id
group2.txt contains all patterns with Id
One file containing patterns with corresponding to centre ID

Code:

Id001.txt
Id002.txt
Id009.txt
............
............

Thanks

Last edited by Scrutinizer; 01-04-2013 at 12:40 AM.. Reason: quote tags -> code tags

bioinfo

View Public Profile for bioinfo

Find all posts by bioinfo

01-04-2013

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by bioinfo

Hi,
I have two files:

ENDMDL is coming many times in 11.txt. I wish to retreive that pattern corresponds to the value of Id. It means, if I give input of 004 (Id) from group 2, then it should output the fourth repeat from 11. txt ending with ENDMDL.

So, corresponding to value of Id from c.txt, I want to retreive the repeat at the number from 11.txt.

Please guide, how, corresponding to value of Id from c.txt, I can retreive the repeat at the number from 11.txt.
Also, I wish to retreive these patterns in individual files based on their Id, group, centre. For example:
group0.txt contains all patterns with Id
group1.txt contains all patterns with Id
group2.txt contains all patterns with Id
One file containing patterns with corresponding to centre ID
Id001.txt
Id002.txt
Id009.txt
............
............

Thanks

This is the third or fourth problem you have posted to this thread. Reading through the thread it is getting hard to determine which problem is being addressed by some of the comments.

I have shown you how to read 11.txt, accumulate the entries in it for each set of lines ending with an ENDMDL line, and print selected entries from the accumulated list. You know what files you want to create and what you want in them, so why don't you try putting together an awk script to do that and let us know what isn't working.

From your description of groups, centres, and IDs, I have no idea how many files you want created nor what is supposed to be in each of them. I also don't see any use for the lines starting with Centre: in your c.txt file; they just have the characters Centre: followed by the Id of the last Branch in the Group that they follow, followed by the characters Nodes: , followed by the number of branches listed on the preceding Group: line. What is the difference between a Node and a Branch? What is the difference between a Group and a Centre?

If you can't do this awk script yourself, you're going to have to give us a lot more detail specifying the exact list of the files you want produced in response to the snippet from c.txt you provided, along with the data that you want written into those files.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

01-04-2013

Registered User

50, 0

Join Date: Dec 2012

Last Activity: 12 August 2013, 3:07 AM EDT

Posts: 50

Thanks Given: 52

Thanked 0 Times in 0 Posts

Thanks
I will post it in a new thread with more detail.

bioinfo

View Public Profile for bioinfo

Find all posts by bioinfo

01-07-2013

Registered User

50, 0

Join Date: Dec 2012

Last Activity: 12 August 2013, 3:07 AM EDT

Posts: 50

Thanks Given: 52

Thanked 0 Times in 0 Posts

Hi,
Script at # 15 is working great

I have two questions related to it.

(1) If I only want patterns from 11.txt which are divisible by 100 with field 1 ( that means file for no entry if $1%100 != 0), only file no.txt
(2) Also, is it possible to number rows (whose 1st field is divisible by 100 and used for retreiving patterns from 11.txt) and also to number patters retreived from 11.txt

Shall I use following code for (1):

Code:

no=${1:-no.txt}         # name of file for no entry if $1%100 != 0
awk -v no="$no" 'BEGIN {rc = 1}
FNR == NR {r[rc] = r[rc] $0 "\n"
    if($0 == "ENDMDL") rc++
    next}
{   # If we got to here, we are reading lines from the 2nd file.
    # Determine exact, truncated, and rounded entry numbers.
    if (substr($1, length($1) - 5) == "00.000") {
        # $1 ends in 00.000; no truncation or rounding needed.
        entry = substr($1, 1, length($1) - 6)
        round = trunc = 0
    } else {
	# $1 is not evenly divisible by 100; calculate rounded and truncated
        # values.
        entry = 0
        round = sprintf("%.0f", $1 / 100)
        trunc = substr($1, 1, length($1) - 6)
    }
          # Write the appropriate entry
        # to each output file.
        printf("%s", r[entry]) > no
       } 
    }'
11.txt o.txt

Thanks.

bioinfo

View Public Profile for bioinfo

Find all posts by bioinfo

01-07-2013

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by bioinfo

Hi,
Script at # 15 is working great Smilie

Code:

no=${1:-no.txt}         # name of file for no entry if $1%100 != 0
awk -v no="$no" 'BEGIN {rc = 1}
FNR == NR {r[rc] = r[rc] $0 "\n"
    if($0 == "ENDMDL") rc++
    next}
{   # If we got to here, we are reading lines from the 2nd file.
    # Determine exact, truncated, and rounded entry numbers.
    if (substr($1, length($1) - 5) == "00.000") {
        # $1 ends in 00.000; no truncation or rounding needed.
        entry = substr($1, 1, length($1) - 6)
        round = trunc = 0
    } else {
	# $1 is not evenly divisible by 100; calculate rounded and truncated
        # values.
        entry = 0
        round = sprintf("%.0f", $1 / 100)
        trunc = substr($1, 1, length($1) - 6)
    }
          # Write the appropriate entry
        # to each output file.
        printf("%s", r[entry]) > no
       } 
    }'
11.txt o.txt

Thanks.

No. I assume that you tried running this awk script and got an error saying that your open "{" s didn't match your "}"s. Since you moved the filenames to be processed to a line of their own, if the awk script had run it would have tried to read both input files from standard input (not from 11.txt and o.txt). And, instead of skipping over lines that had $1 that did not end in 00.000, it would have written an entry for the 0th element in 11.txt. In this case you would get what you want since r[0] is an empty string and writing it to the file no wouldn't have done anything.

A corrected and simplified version of this script would be something like:

Code:

awk -v no="no.txt" 'BEGIN {rc = 1}
FNR == NR {r[rc] = r[rc] $0 "\n"
    if($0 == "ENDMDL") rc++
    next}
{   # If we got to here, we are reading lines from the 2nd file.
    # Determine exact, truncated, and rounded entry numbers.
    if (substr($1, length($1) - 5) == "00.000") {
        # $1 ends in 00.000; write an entry corresponding to this line.
        entry = substr($1, 1, length($1) - 6)

        # Write the appropriate entry
        # to each output file.
        printf("%s", r[entry]) > no
    }
}' 11.txt o.txt

Yes it is possible to number entries from 11.txt and to number rows from o.txt, but you'll have to specify what you mean by that by showing the exact output that you want to appear in no.txt when using your 11.txt and the following instead of your version of o.txt:

Code:

100.000
2010.000
1000.000

If you're talking about adding a tag line to the output specifying the entry # from 11.txt and the line number from o.txt, you have seen examples of how to produce tag lines in earlier scripts I have provided (including the script your stripped down to produce the script above). The entry number from 11.txt being printed is specified by the variable entry and the line number from o.txt producing an output line is specified by the variable FNR.

One way to add a tag doing this would be to change the last printf in the above script from:

Code:

        printf("%s", r[entry]) > no

to:

Code:

        printf("The following entry from line %d is for Branch %d:\n%s",
            FNR, entry, r[entry]) > no

If you want each line of output in no.txt to include the Branch #. That is also easy to do, but changes the code where entries are accumulated from 11.txt instead of changing the printf at the end of the script. If you want each line of output in no.txt to include the Branch # and the line # from o.txt, that can also be done, but it will involve changing the way the script accumulates and prints entries from 11.txt.

Last edited by Don Cragun; 01-07-2013 at 11:05 PM.. Reason: add missing [ICODE] tag

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

Shell Programming and Scripting

Help in awk/bash

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

New problem with awk using bash

Discussion started by: florpi

2. Shell Programming and Scripting

Returning a value from awk to bash

Discussion started by: oahmad

3. Shell Programming and Scripting

Help in awk/bash

Discussion started by: bioinfo

4. UNIX for Dummies Questions & Answers

Help in awk/bash

Discussion started by: bioinfo

5. Shell Programming and Scripting

AWK/Bash script

Discussion started by: chrisjorg

6. UNIX for Dummies Questions & Answers

Help with BASH/AWK queries ....

Discussion started by: Fahmida

7. Shell Programming and Scripting

scripting help with bash and awk

Discussion started by: garethsays

8. Shell Programming and Scripting

awk bash help

Discussion started by: a-gopal

9. Shell Programming and Scripting

Is there any better way for sorting in bash/awk

Discussion started by: ahjiefreak

10. Shell Programming and Scripting

BASH with AWK

Discussion started by: narasimhulu