Help in awk/bash


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help in awk/bash
# 1  
Old 01-04-2013
Linux Help in awk/bash

Hi, I have two files: atom.txt and g.txt
atom.txt has multiple patterns but I am showing only two patterns each ending with ENDMDL:
Code:
ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL
ATOM 1 N SER A 1 35.683 81.326 139.778 1.00 0.00 N 
ATOM 2 CA SER A 1 35.422 82.736 139.929 1.00 0.00 C 
TER
ENDMDL

g.txt
Code:
Group   Centre      Branches              Id_of_Branches
 10       051          30            003, 007, 051, 034, .................. (30 values)   
 72       183          26            100,................................    
394       600          23             ...................................    
391       641          20             .....................................

Corresponding to value of Id of Branches from g.txt, I wish to retreive that pattern from atom.txt.
Therefore, required 4 output files corresponding to 4 groups and 5th file for patterns corresponding to Id from Centre:
Code:
(1) G10.txt
#Id 003
ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL
#Id 007
ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL
.....................
.....................

(2)G72.txt
#Id 100
ATOM 1 N SER A 1 37.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 37.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL
.....................
.....................

(3)G394.txt
..................
(4)G391.txt
...................
(5) Centre.txt
#Id 051
ATOM 1 N SER A 1 35.092 83.194 140.076 1.00 0.00 N 
ATOM 2 CA SER A 1 35.216 83.725 138.725 1.00 0.00 C 
TER
ENDMDL
.....................
.....................
.....................

Thanks

Last edited by bioinfo; 01-04-2013 at 02:15 AM..
# 2  
Old 01-04-2013
Not clear. What do you want to do to which pattern based on which rule/selection/...? I only can infer from your sample data that you want a .txt file for each group containing the id# and pattern #1 plus one for the centre containing ?# and pattern #1.
Pls specify.
This User Gave Thanks to RudiC For This Post:
# 3  
Old 01-04-2013
Note that this is the third thread you have started titled "Help in awk/bash". More descriptive titles would help readers find the right thread.

In the last few posts in your last thread: Help in awk/bash, you said that one of the input files for this project was the file 11.txt which you said contained 10,000 entries. That seems to be the file atom.txt in this thread. You show that all IDs (which seem to be indices into that file) have three digit values(with leading zero fill). Is atom.txt limited to less than 1,000 entries or do some IDs have more than three digits?

You have way too many occurrences of ......... in your posting to determine what you want. In your description, you say:
Code:
Group   Centre      Branches              Id_of_Branches
 10       051          30            003, 007, 051, 034, .................. (30 values)   
 72       183          26            100,................................    
394       600          23             ...................................    
391       641          20             .....................................

what does "(30 values)" mean. Are there 30 fields (some with multiple spaces as separators, some with comma-space as separators [or terminators]) in every line? Are there 33 fields on every line (one for Group ID, one for Centre ID, one for Branches ID, and one for each of 30 branches)? Are there 3 + value_of_3rd_field fields? Give us real values at least for these 4 lines instead of making us guess what .......... means! Did you add commas between or after some fields just to make it harder to process the input?

You say you want 5 output files. Does that mean that g.txt will always contain 4 data lines (plus the line of headings)?

You say: "...5th file for patterns (plural) corresponding to Id (singular) from Centre...". Does this mean that Centre.txt is supposed to contain all 120 (or 99, or ???) entries that will be stored in the four Gxxx files? Does it just contain Id051 as shown? Or, does it contain one entry for each data line in g.txt?

I supplied several awk scripts with detailed explanations of how those script worked in your last thread on this subject (see link above). Can you show us the awk script you're writing to solve this problem? Or are you expecting us to figure out what you want done and do it for you? The purpose of The UNIX and Linux Forums is to help you learn how to write your own scripts; not to act as a place where you can get people to do your design and implementation work for you for free.
This User Gave Thanks to Don Cragun For This Post:
# 4  
Old 01-04-2013
I am very thankful to you Don Cragun for helping me in writing scripts and explaining them as well.
I am very new in this field of shell scripting but I cannot rely on other programming language because I am not expert in any language. I have started reading shell scripting books, but its difficult for me to figure out what to write in a script. I don't know sometimes what are the functions or commands available in shell scripting I can use. But, when you write script then I come to know about lot of things and I try to read it.
I know I should write my own script and post here for help, but sometimes even I am unable to guess how I have to start. I have found this forum as my best guide on the internet. Smilie

I am adding more information and real values for the last post.

atom.txt has less than 1000 entries, so Ids don't have more than 3 digits.
There are 30 values in Group 10 with comma separator, Group 72 has 26 values and so on, Id_of_Branches means number of values in each group.
There are not 33 fields on every line (one for Group ID, one for Centre ID, one for Branches ID, and one for each of 30 branches). There are not 3 + value_of_3rd_field fields?

I have two files and I combined them into one g.txt (using comma separator for Id_of branches), two files are:
Code:
First file:

Group: 0 Number of Branches: 1
0    001
Centre: 001 Branches: 1
Group: 1 Number of Branches: 1
0    002
Centre: 002 Branches: 1
Group: 2 Number of Branches: 1
0    003
Centre: 003 Branches: 1
Group: 3 Number of Branches: 6
0    009
1    004
2    008
3    007
4    005
5    006
Centre: 006 Branches: 6
Group: 4 Number of Branches: 2
0    010
1    011
Centre: 010 Branches: 2
Group: 5 Number of Branches: 2
0    012
1    013
Centre: 012 Branches: 2
Upto more than 600 groups


Second file:

Group No:
 10        Centre: 052 Branches: 31                   
 73        Centre: 184 Branches: 25                   
397        Centre: 607 Branches: 23                   
398        Centre: 640 Branches: 22                   
 86        Centre: 245 Branches: 19                   
 71        Centre: 167 Branches: 12                   
 78        Centre: 220 Branches: 11                  
 18        Centre: 084 Branches: 10                   
 09        Centre: 022 Branches: 10                   
400        Centre: 650 Branches: 9

I wish to have 10 files for 10 groups (as per second file) each with pattern corresponding to the Id _of_Branches (from first file) in each group.
Centre.txt (only one file) is supposed to contain patterns corresponding to Centre Id from each group.

Thanks.

Last edited by bioinfo; 01-04-2013 at 11:27 AM..
# 5  
Old 01-04-2013
Rather than reading shell books, consider some awk tutorials. A lot of bioinformatic folks comne here for help. 95% of their problems are resolved by awk. awk is a language on its own.

This is a great resource. Gawk is GNU awk, which is very probably what you have when you enter the word awk on the screen.
It has examples, explains the bizarre syntax, and program structure:

http://www.gnu.org/software/gawk/manual/gawk.pdf
This User Gave Thanks to jim mcnamara For This Post:
# 6  
Old 01-04-2013
Thanks Jim Mcnamara.
Its great. Smilie
# 7  
Old 01-05-2013
As much as I want to help, I am sorry I have to say I can't. Thank you for the effort explaining your input in detail, but post #4 does not relate to post #1 by no means. E.g. group No. 10 being centered at 052 here and 051 there, having 31 branches here and 30 there, groups showing up here not showing up there and vice versa, and, groups in file2 not being represented in file 1.
On top, I still can't see what pattern to fill in (see my post #2), where to get it, based on what rule, even if I take file g.txt to be a distilled version of file1 and file2.
It would be helpful if you post a minimum number of input files (e.g. atoms.txt and g.txt) with interrelating data, an output file and a set of understandable rules on how to get one into the other.

Last edited by RudiC; 01-05-2013 at 10:21 AM..
This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

New problem with awk using bash

Hi! I have a new problem with awk, this time I think is because I'm using it in bash and I don't know how to put the valor of the variable in awk. Here is the code: #!/bin/bash for i in 1 2 3 4 5 do a=$i b=$ awk '$1>=a&&$1<=b {print $1,$2,$3}'>asdf test... (3 Replies)
Discussion started by: florpi
3 Replies

2. Shell Programming and Scripting

Returning a value from awk to bash

Hi I am a newbie starting bash and I have a simple need to return the result of an operation from awk to bash. basically I want to use awk to tell me if "#" exists in a string, and then back in bash, i want to do an IF statement on this return in order to do other things. In my bash shell I... (2 Replies)
Discussion started by: oahmad
2 Replies

3. Shell Programming and Scripting

Help in awk/bash

Hi, I am also a newbie in awk and trying to find solution of my problem. I have one reference file 1.txt with 2 columns and I want to search other 10 files (a.txt, b.txt......h.txt each with 5 columns) corresponding to the values of 2nd column from 1.txt. If the value from 2nd column from 1.txt... (33 Replies)
Discussion started by: bioinfo
33 Replies

4. UNIX for Dummies Questions & Answers

Help in awk/bash

Hi, I am also a newbie in awk and trying to find solution of my problem. I have one reference file 1.txt with 2 columns and I want to search other 10 files (a.txt, b.txt......h.txt each with 5 columns) corresponding to the values of 2nd column from 1.txt. If the value from 2nd column from 1.txt... (0 Replies)
Discussion started by: bioinfo
0 Replies

5. Shell Programming and Scripting

AWK/Bash script

I would like to write a script to extend this command to a general case: BEGIN {s_0=0;n_0=0}{n_0++;s_0+=($51-$1)^2}END {print sqrt(s_0/n_0)} i.e. so that BEGIN {s_0=0;n_0=0}{n_0++;s_0+=($51-$1)^2}END {print sqrt(s_0/n_0)} BEGIN {s_1=0;n_1=0}{n_1++;s_1+=($51-$2)^2}END {print... (3 Replies)
Discussion started by: chrisjorg
3 Replies

6. UNIX for Dummies Questions & Answers

Help with BASH/AWK queries ....

Hi Everyone, I have an input file in the following format: score.file1.txt contig00045 length=566 numreads=19 1047 0.0 contig00055 length=524 numreads=7 793 0.0 contig00052 length=535 numreads=10 607 e-176 contig00072 length=472 numreads=46 571 e-165... (8 Replies)
Discussion started by: Fahmida
8 Replies

7. Shell Programming and Scripting

scripting help with bash and awk

I'm trying to reformat some tide information into a useable format and failing. Input file is.... 4452 CHENNAI (MADRAS) 13°06'N, 80°18'E India East Coast 01 June 2009 UT(GMT) Data Area 3. Indian Ocean (northern part) and Red Sea to Singapore 01/06/2009 00:00 0.7 m 00:20 0.7 m 00:40... (3 Replies)
Discussion started by: garethsays
3 Replies

8. Shell Programming and Scripting

awk bash help

Hi, I'm trying to read a file containing lines with spaces in them. The inputfile looks like this ------------------------------ Command1 arg1 arg2 Command2 arg5 arg6 arg7 ------------------------------- The shell code looks like this... lines=`awk '{ print }' inputfile` ... (2 Replies)
Discussion started by: a-gopal
2 Replies

9. Shell Programming and Scripting

Is there any better way for sorting in bash/awk

Hi, I have a file which is:- 1 6 4 8 2 3 2 1 9 3 2 1 3 3 5 6 3 1 4 9 7 8 2 3 I would like to sort from field $2 to field $6 for each of the line to:- 1 2 3 4 6 8 2 1 1 2 3 9 3 1 3 3 5 6 4 2 3 7 8 9 I came across this Arrays on example 26-6. But it is much complicated. I am... (7 Replies)
Discussion started by: ahjiefreak
7 Replies

10. Shell Programming and Scripting

BASH with AWK

Hello, I have a file.txt with 20000 lines and 2 columns each which consists of current_filename and new_filename . I want to create a script to find files in a directory with current_filename and move it to new folder with new_filename. Could you please help me how to do that?? ... (2 Replies)
Discussion started by: narasimhulu
2 Replies
Login or Register to Ask a Question