Help in awk/bash


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help in awk/bash
# 8  
Old 01-05-2013
I have uploaded a part of first file and full second file. I have posted real values for second file, but first file is very big.

Last edited by bioinfo; 01-05-2013 at 04:54 PM..
# 9  
Old 01-05-2013
Quote:
Originally Posted by RudiC
As much as I want to help, I am sorry I have to say I can't. Thank you for the effort explaining your input in detail, but post #4 does not relate to post #1 by no means. E.g. group No. 10 being centered at 052 here and 051 there, having 31 branches here and 30 there, groups showing up here not showing up there and vice versa, and, groups in file2 not being represented in file 1.
On top, I still can't see what pattern to fill in (see my post #2), where to get it, based on what rule, even if I take file g.txt to be a distilled version of file1 and file2.
It would be helpful if you post a minimum number of input files (e.g. atoms.txt and g.txt) with interrelating data, an output file and a set of understandable rules on how to get one into the other.
You said you had two two files: atom.txt and g.txt. I am assuming that atom.txt is in the same format as 11.txt in your last thread with the same title as this thread. You have not given us anything that includes even a single complete line (after the header line) from the file g.txt. And, you have not shown us what you want to appear in G10.txt, and any other G*.txt file that we can match against what you have shown us from atom.txt.

With the data you gave us in message #4 in this thread, the First file gives us an indication of what might appear in g.txt for groups 0 through 5, but none of them are listed in g.txt in message $1 nor in Second file in message #4 in this thread.

If you don't give us coherent sample data so we can put together with sample output that matches the sample data you give us, it is EXTREMELY hard to figure out what you want. I think I'm close to figuring out what you want done and expect to post something later this afternoon. But, I have no confidence that it be be at all close to what you want because the specification of what you want is so vague. And, you haven't given us sample input and output that we can use to determine if a possible solution we might develop does what you want done.
# 10  
Old 01-05-2013
Quote:
You said you had two two files: atom.txt and g.txt. I am assuming that atom.txt is in the same format as 11.txt in your last thread with the same title as this thread.
Yes, atom.txt is same as 11.txt. While posting in new thread I just used new name Smilie . I am explaining my problem again with more details and concise data. I have two files atom.txt (or 11.txt from other thread) and g.txt (which I made using data from raw files file 1 and file 2). If you feel that it will be easy to retreive data directly from file 1 and file 2 rather that using g.txt for retreiving patterns from atom.txt, I will be happy to go for it too.

g.txt (made it more concise and short; in reality I have 10 groups for this file out of more than 600 groups from file 1. Based on decreasing number of branches they are grouped into 10 groups in g.txt but I am showing only 2 here)
Code:
Group   Centre      Branches       Id_of_Branches
 3       006          6         009,004,008,007,005,006
 5       012          2         012,013

file 1:
Code:
Group: 0 Number of Branches: 1
0    001
Centre: 001 Branches: 1
Group: 1 Number of Branches: 1
0    002
Centre: 002 Branches: 1
Group: 2 Number of Branches: 1
0    003
Centre: 003 Branches: 1
Group: 3 Number of Branches: 6
0    009
1    004
2    008
3    007
4    005
5    006
Centre: 006 Branches: 6
Group: 4 Number of Branches: 2
0    010
1    011
Centre: 010 Branches: 2
Group: 5 Number of Branches: 2
0    012
1    013
Centre: 012 Branches: 2
Upto more than 600 groups

file2:
Code:
Group No:
 3        Centre: 006 Branches: 6                   
 5        Centre: 012 Branches: 2

Required output:
Corresponding to value of Id_of_Branches from g.txt, I wish to retreive that pattern from atom.txt.
Therefore, in this sample data, I required 3 output files; 2 files corresponding to all IDs from 2 groups and 3rd file for patterns corresponding to Id of Centre from all groups:

Code:
(1) g3.txt
#009
ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL
#004
ATOM 1 N SER A 1 34.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL
#008
ATOM 1 N SER A 1 45.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL
#007
ATOM 1 N SER A 1 50.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 65.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL
#005
ATOM 1 N SER A 1 90.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 89.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL
#006
ATOM 1 N SER A 1 67.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 23.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL

(2)g5.txt
#012
ATOM 1 N SER A 1 37.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 37.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL
#13
ATOM 1 N SER A 1 40.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 31.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL

(3) Centre.txt (For Id from centre of all groups)
#006
ATOM 1 N SER A 1 67.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 23.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL
#012
ATOM 1 N SER A 1 37.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 37.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL

Hope I am able to make my problem more clear. Smilie
# 11  
Old 01-05-2013
Hi bioinfo. The awk script I had been testing out given your earlier messages didn't work with the new details you provided in message #10 in this thread. (The output filenames changed from Gx to gx where x is a one to three digit string, the list of branches changed from comma and space separators to just comma separators, and I was guessing completely wrong about what you wanted in one of the output files. I think the script below does what you want. It is LONG, but the vast majority of it is just comments. Hopefullly it will help you figure out how it works:
Code:
awk '
# All data is assumed to meet the requirements stated below, so this script
# does not perform any data verification.  If any data fails to meet these
# assumptions, results are unspecified.
BEGIN {
    # Initialize variables that do not have default values set by awk.
    cf = "Centre.txt"
    rc = "001"
}
FNR == NR {
    # Process lines from atom.txt.  Assumed format is that each entry in this
    # file is a multiple line value with the final line of each entry matching
    # the ERE "^ENDMDL$".  Entries from this file are stored in array r with
    # the index being the entry number (starting with 001).  The variable rc is
    # the index for the value being accumulated.  I use a 3 digit string with
    # leading zero fill to match the format of the Centre-ID and Branch-ID
    # values that will be found in g.txt.
    r[rc] = r[rc] $0 "\n"
    if($0 == "ENDMDL")
        # End of entry found.  Set rc for the next entry to be processed.
        rc = sprintf("%03d", rc + 1)
    next
}
FNR == 1 {
    # Skip the header line on subsequent file(s).  The file g.txt is assumed to
    # be the first such file.  Any number of other files in the same format can
    # be used in addition to or instead of g.txt.
    next
}
{   # Process lines from subsequent files.  Assumed format is:
    #   Group   Centre      Branches              Id_of_Branches
    #   gid     cid         bcnt         bid[1],bid[2],...,bid[bcnt]
    # where gid is a 1-3 digit Group-ID, cid is a 3 digit (zero filled)
    # Centre-ID, bcnt is a count of the number of Branch-IDs to follow, and
    # each bid field is a 3 digit (zero filled) Branch-ID.  The header line
    # has already been discarded.  Commas will be converted to spaces so bid
    # values can be used directly.  It is assumed that each line contains
    # $3 + 3 fields.
    #
    # Create a file named gx.txt (where x is the Group-ID from this line):
    #   Note that it would seem logical to expand x to a 3 digit zero filled
    #   value so the created g* files would sort into Group-ID order, but that
    #   is not what was requested.
    #   One entry from atom.txt (with the entry number determined by the
    #   Branch-ID) will be written to this file for each Branch-ID on this
    #   line.
    #
    # Also create a file named Centre.txt that will contain one entry from
    #   atom.txt (with the entry number determined by the Centre-ID) for each
    #   line processed.
    #   Note: I assume that a Centre-ID is also a Branch-ID and that the value
    #   given as the cid should also appear as one of the Branch-IDs appearing
    #   on each line.
    #
    # Replace commas on input lines with spaces so the Branch-IDs can be used
    # directly without splitting $4 into another array and processing it in a
    # different loop (besides that some descriptions of this input file say
    # elements are comma separated and other say comma-space separated or
    # terminated; this works either way):
    gsub(/,/, " ")
    # Create the g*.txt file for this line.  Uncomment one of the following two
    # lines.  The 1st line provides requested names, the 2nd line creates names
    # that will sort correctly by Group-ID when looking at output by ls and
    # when having the shell match the patterns g*.txt and g???.txt and groups
    # in the list do not all contain the same number of digits.
    gf = "g" $1 ".txt"
    #gf = sprintf("g%03d.txt", $1)
    for(i = 4; i <= NF; i++) printf("#Id %s\n%s", $i, r[$i]) > gf
    close(gf)
    # Add entry to Centre.txt:
    printf("#Id %s\n%s", $2, r[$2]) > cf
}' atom.txt g.txt

As always, if you're running on a Solaris system, use /usr/xpg4/bin/awk or nawk instead of awk.
This User Gave Thanks to Don Cragun For This Post:
# 12  
Old 01-06-2013
Thanks. I will try it and let you know. Smilie

---------- Post updated at 11:33 PM ---------- Previous update was at 08:18 PM ----------

Yippie. Its working.
Thanks a lot. You are a GENIUS Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

New problem with awk using bash

Hi! I have a new problem with awk, this time I think is because I'm using it in bash and I don't know how to put the valor of the variable in awk. Here is the code: #!/bin/bash for i in 1 2 3 4 5 do a=$i b=$ awk '$1>=a&&$1<=b {print $1,$2,$3}'>asdf test... (3 Replies)
Discussion started by: florpi
3 Replies

2. Shell Programming and Scripting

Returning a value from awk to bash

Hi I am a newbie starting bash and I have a simple need to return the result of an operation from awk to bash. basically I want to use awk to tell me if "#" exists in a string, and then back in bash, i want to do an IF statement on this return in order to do other things. In my bash shell I... (2 Replies)
Discussion started by: oahmad
2 Replies

3. Shell Programming and Scripting

Help in awk/bash

Hi, I am also a newbie in awk and trying to find solution of my problem. I have one reference file 1.txt with 2 columns and I want to search other 10 files (a.txt, b.txt......h.txt each with 5 columns) corresponding to the values of 2nd column from 1.txt. If the value from 2nd column from 1.txt... (33 Replies)
Discussion started by: bioinfo
33 Replies

4. UNIX for Dummies Questions & Answers

Help in awk/bash

Hi, I am also a newbie in awk and trying to find solution of my problem. I have one reference file 1.txt with 2 columns and I want to search other 10 files (a.txt, b.txt......h.txt each with 5 columns) corresponding to the values of 2nd column from 1.txt. If the value from 2nd column from 1.txt... (0 Replies)
Discussion started by: bioinfo
0 Replies

5. Shell Programming and Scripting

AWK/Bash script

I would like to write a script to extend this command to a general case: BEGIN {s_0=0;n_0=0}{n_0++;s_0+=($51-$1)^2}END {print sqrt(s_0/n_0)} i.e. so that BEGIN {s_0=0;n_0=0}{n_0++;s_0+=($51-$1)^2}END {print sqrt(s_0/n_0)} BEGIN {s_1=0;n_1=0}{n_1++;s_1+=($51-$2)^2}END {print... (3 Replies)
Discussion started by: chrisjorg
3 Replies

6. UNIX for Dummies Questions & Answers

Help with BASH/AWK queries ....

Hi Everyone, I have an input file in the following format: score.file1.txt contig00045 length=566 numreads=19 1047 0.0 contig00055 length=524 numreads=7 793 0.0 contig00052 length=535 numreads=10 607 e-176 contig00072 length=472 numreads=46 571 e-165... (8 Replies)
Discussion started by: Fahmida
8 Replies

7. Shell Programming and Scripting

scripting help with bash and awk

I'm trying to reformat some tide information into a useable format and failing. Input file is.... 4452 CHENNAI (MADRAS) 13°06'N, 80°18'E India East Coast 01 June 2009 UT(GMT) Data Area 3. Indian Ocean (northern part) and Red Sea to Singapore 01/06/2009 00:00 0.7 m 00:20 0.7 m 00:40... (3 Replies)
Discussion started by: garethsays
3 Replies

8. Shell Programming and Scripting

awk bash help

Hi, I'm trying to read a file containing lines with spaces in them. The inputfile looks like this ------------------------------ Command1 arg1 arg2 Command2 arg5 arg6 arg7 ------------------------------- The shell code looks like this... lines=`awk '{ print }' inputfile` ... (2 Replies)
Discussion started by: a-gopal
2 Replies

9. Shell Programming and Scripting

Is there any better way for sorting in bash/awk

Hi, I have a file which is:- 1 6 4 8 2 3 2 1 9 3 2 1 3 3 5 6 3 1 4 9 7 8 2 3 I would like to sort from field $2 to field $6 for each of the line to:- 1 2 3 4 6 8 2 1 1 2 3 9 3 1 3 3 5 6 4 2 3 7 8 9 I came across this Arrays on example 26-6. But it is much complicated. I am... (7 Replies)
Discussion started by: ahjiefreak
7 Replies

10. Shell Programming and Scripting

BASH with AWK

Hello, I have a file.txt with 20000 lines and 2 columns each which consists of current_filename and new_filename . I want to create a script to find files in a directory with current_filename and move it to new folder with new_filename. Could you please help me how to do that?? ... (2 Replies)
Discussion started by: narasimhulu
2 Replies
Login or Register to Ask a Question