Help in awk/bash

01-04-2013

Registered User

50, 0

Join Date: Dec 2012

Last Activity: 12 August 2013, 3:07 AM EDT

Posts: 50

Thanks Given: 52

Thanked 0 Times in 0 Posts

Help in awk/bash

Hi, I have two files: atom.txt and g.txt
atom.txt has multiple patterns but I am showing only two patterns each ending with ENDMDL:

Code:

ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL
ATOM 1 N SER A 1 35.683 81.326 139.778 1.00 0.00 N 
ATOM 2 CA SER A 1 35.422 82.736 139.929 1.00 0.00 C 
TER
ENDMDL

g.txt

Code:

Group   Centre      Branches              Id_of_Branches
 10       051          30            003, 007, 051, 034, .................. (30 values)   
 72       183          26            100,................................    
394       600          23             ...................................    
391       641          20             .....................................

Corresponding to value of Id of Branches from g.txt, I wish to retreive that pattern from atom.txt.
Therefore, required 4 output files corresponding to 4 groups and 5th file for patterns corresponding to Id from Centre:

Code:

(1) G10.txt
#Id 003
ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL
#Id 007
ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL
.....................
.....................

(2)G72.txt
#Id 100
ATOM 1 N SER A 1 37.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 37.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL
.....................
.....................

(3)G394.txt
..................
(4)G391.txt
...................
(5) Centre.txt
#Id 051
ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL
.....................
.....................
.....................

Thanks

Last edited by bioinfo; 01-04-2013 at 02:15 AM..

bioinfo

View Public Profile for bioinfo

Find all posts by bioinfo

01-04-2013

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Not clear. What do you want to do to which pattern based on which rule/selection/...? I only can infer from your sample data that you want a .txt file for each group containing the id# and pattern #1 plus one for the centre containing ?# and pattern #1.
Pls specify.

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

01-04-2013

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Note that this is the third thread you have started titled "Help in awk/bash". More descriptive titles would help readers find the right thread.

In the last few posts in your last thread: Help in awk/bash, you said that one of the input files for this project was the file 11.txt which you said contained 10,000 entries. That seems to be the file atom.txt in this thread. You show that all IDs (which seem to be indices into that file) have three digit values(with leading zero fill). Is atom.txt limited to less than 1,000 entries or do some IDs have more than three digits?

You have way too many occurrences of ......... in your posting to determine what you want. In your description, you say:

Code:

Group   Centre      Branches              Id_of_Branches
 10       051          30            003, 007, 051, 034, .................. (30 values)   
 72       183          26            100,................................    
394       600          23             ...................................    
391       641          20             .....................................

what does "(30 values)" mean. Are there 30 fields (some with multiple spaces as separators, some with comma-space as separators [or terminators]) in every line? Are there 33 fields on every line (one for Group ID, one for Centre ID, one for Branches ID, and one for each of 30 branches)? Are there 3 + value_of_3rd_field fields? Give us real values at least for these 4 lines instead of making us guess what .......... means! Did you add commas between or after some fields just to make it harder to process the input?

You say you want 5 output files. Does that mean that g.txt will always contain 4 data lines (plus the line of headings)?

You say: "...5th file for patterns (plural) corresponding to Id (singular) from Centre...". Does this mean that Centre.txt is supposed to contain all 120 (or 99, or ???) entries that will be stored in the four Gxxx files? Does it just contain Id051 as shown? Or, does it contain one entry for each data line in g.txt?

I supplied several awk scripts with detailed explanations of how those script worked in your last thread on this subject (see link above). Can you show us the awk script you're writing to solve this problem? Or are you expecting us to figure out what you want done and do it for you? The purpose of The UNIX and Linux Forums is to help you learn how to write your own scripts; not to act as a place where you can get people to do your design and implementation work for you for free.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

01-04-2013

Registered User

50, 0

Join Date: Dec 2012

Last Activity: 12 August 2013, 3:07 AM EDT

Posts: 50

Thanks Given: 52

Thanked 0 Times in 0 Posts

I am very thankful to you Don Cragun for helping me in writing scripts and explaining them as well.
I am very new in this field of shell scripting but I cannot rely on other programming language because I am not expert in any language. I have started reading shell scripting books, but its difficult for me to figure out what to write in a script. I don't know sometimes what are the functions or commands available in shell scripting I can use. But, when you write script then I come to know about lot of things and I try to read it.
I know I should write my own script and post here for help, but sometimes even I am unable to guess how I have to start. I have found this forum as my best guide on the internet.

I am adding more information and real values for the last post.

atom.txt has less than 1000 entries, so Ids don't have more than 3 digits.
There are 30 values in Group 10 with comma separator, Group 72 has 26 values and so on, Id_of_Branches means number of values in each group.
There are not 33 fields on every line (one for Group ID, one for Centre ID, one for Branches ID, and one for each of 30 branches). There are not 3 + value_of_3rd_field fields?

I have two files and I combined them into one g.txt (using comma separator for Id_of branches), two files are:

Code:

First file:

Group: 0 Number of Branches: 1
0    001
Centre: 001 Branches: 1
Group: 1 Number of Branches: 1
0    002
Centre: 002 Branches: 1
Group: 2 Number of Branches: 1
0    003
Centre: 003 Branches: 1
Group: 3 Number of Branches: 6
0    009
1    004
2    008
3    007
4    005
5    006
Centre: 006 Branches: 6
Group: 4 Number of Branches: 2
0    010
1    011
Centre: 010 Branches: 2
Group: 5 Number of Branches: 2
0    012
1    013
Centre: 012 Branches: 2
Upto more than 600 groups


Second file:

Group No:
 10        Centre: 052 Branches: 31                   
 73        Centre: 184 Branches: 25                   
397        Centre: 607 Branches: 23                   
398        Centre: 640 Branches: 22                   
 86        Centre: 245 Branches: 19                   
 71        Centre: 167 Branches: 12                   
 78        Centre: 220 Branches: 11                  
 18        Centre: 084 Branches: 10                   
 09        Centre: 022 Branches: 10                   
400        Centre: 650 Branches: 9

I wish to have 10 files for 10 groups (as per second file) each with pattern corresponding to the Id _of_Branches (from first file) in each group.
Centre.txt (only one file) is supposed to contain patterns corresponding to Centre Id from each group.

Thanks.

Last edited by bioinfo; 01-04-2013 at 11:27 AM..

bioinfo

View Public Profile for bioinfo

Find all posts by bioinfo

01-04-2013

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

Rather than reading shell books, consider some awk tutorials. A lot of bioinformatic folks comne here for help. 95% of their problems are resolved by awk. awk is a language on its own.

This is a great resource. Gawk is GNU awk, which is very probably what you have when you enter the word awk on the screen.
It has examples, explains the bizarre syntax, and program structure:

http://www.gnu.org/software/gawk/manual/gawk.pdf

This User Gave Thanks to jim mcnamara For This Post:

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

01-04-2013

Registered User

50, 0

Join Date: Dec 2012

Last Activity: 12 August 2013, 3:07 AM EDT

Posts: 50

Thanks Given: 52

Thanked 0 Times in 0 Posts

Thanks Jim Mcnamara.
Its great.

bioinfo

View Public Profile for bioinfo

Find all posts by bioinfo

01-05-2013

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

As much as I want to help, I am sorry I have to say I can't. Thank you for the effort explaining your input in detail, but post #4 does not relate to post #1 by no means. E.g. group No. 10 being centered at 052 here and 051 there, having 31 branches here and 30 there, groups showing up here not showing up there and vice versa, and, groups in file2 not being represented in file 1.
On top, I still can't see what pattern to fill in (see my post #2), where to get it, based on what rule, even if I take file g.txt to be a distilled version of file1 and file2.
It would be helpful if you post a minimum number of input files (e.g. atoms.txt and g.txt) with interrelating data, an output file and a set of understandable rules on how to get one into the other.

Last edited by RudiC; 01-05-2013 at 10:21 AM..

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

Shell Programming and Scripting

Help in awk/bash

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

New problem with awk using bash

Discussion started by: florpi

2. Shell Programming and Scripting

Returning a value from awk to bash

Discussion started by: oahmad

3. Shell Programming and Scripting

Help in awk/bash

Discussion started by: bioinfo

4. UNIX for Dummies Questions & Answers

Help in awk/bash

Discussion started by: bioinfo

5. Shell Programming and Scripting

AWK/Bash script

Discussion started by: chrisjorg

6. UNIX for Dummies Questions & Answers

Help with BASH/AWK queries ....

Discussion started by: Fahmida

7. Shell Programming and Scripting

scripting help with bash and awk

Discussion started by: garethsays

8. Shell Programming and Scripting

awk bash help

Discussion started by: a-gopal

9. Shell Programming and Scripting

Is there any better way for sorting in bash/awk

Discussion started by: ahjiefreak

10. Shell Programming and Scripting

BASH with AWK

Discussion started by: narasimhulu