Data manipulating script. Please HELP!


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Data manipulating script. Please HELP!
# 1  
Old 12-07-2014
Data manipulating script. Please HELP!

Dear friends,

I'm struggling to preparing a bunch of gromacs input files, say manually. It's really a time-consuming work without any techniques. I suppose that it could be done by a smart script automatically. But I lack some basic knowledge on scripting. Please help!

My original input looks like,(only show the first 10 lines, there are a few hundreds of thousands of lines in the input file)
Code:
ATOM      1  C1  C20 P   1       1.000   1.540   0.000  1.00  0.00           C    
ATOM      2  C2  C20 P   1       2.456   2.041   0.000  1.00  0.00           C    
ATOM      7  C1  C20 P   1       2.456   3.581   0.000  1.00  0.00           C    
ATOM      8  C2  C20 P   1       3.912   4.083  -0.000  1.00  0.00           C    
ATOM     13  C1  C20 P   1       3.912   5.623  -0.000  1.00  0.00           C    
ATOM     14  C2  C20 P   1       5.368   6.124  -0.000  1.00  0.00           C    
ATOM     19  C1  C20 P   1       5.368   7.664  -0.000  1.00  0.00           C    
ATOM     20  C2  C20 P   1       6.824   8.165  -0.000  1.00  0.00           C    
ATOM     25  C1  C20 P   1       6.824   9.705  -0.000  1.00  0.00           C    
ATOM     26  C2  C20 P   1       8.280  10.207  -0.000  1.00  0.00           C

The output file needs to be modified into the following format,
Code:
ATOM      1  CM  ETH P   1       1.000   1.540   0.000  1.00  0.00           C  
ATOM      2  CD  ETH P   1       2.456   2.041   0.000  1.00  0.00           C  
ATOM      3  CM  ETH P   2       2.456   3.581   0.000  1.00  0.00           C  
ATOM      4  CD  ETH P   2       3.912   4.083  -0.000  1.00  0.00           C  
ATOM      5  CM  ETH P   3       3.912   5.623  -0.000  1.00  0.00           C  
ATOM      6  CD  ETH P   3       5.368   6.124  -0.000  1.00  0.00           C  
ATOM      7  CM  ETH P   4       5.368   7.664  -0.000  1.00  0.00           C  
ATOM      8  CD  ETH P   4       6.824   8.165  -0.000  1.00  0.00           C  
ATOM      9  CM  ETH P   5       6.824   9.705  -0.000  1.00  0.00           C  
ATOM     10  CD  ETH P   5       8.280  10.207  -0.000  1.00  0.00           C 
TER 11 
CONECT    1    
CONECT    2    1    3
CONECT    3    2    4
CONECT    4    3    5
CONECT    5    4    6
CONECT    6    5    7
CONECT    7    6    8
CONECT    8    7    9
CONECT    9    8   10
CONECT   10    9

Clearly to say, the script would do:
1. there are 12 columns in each line, keep $1, $5, $7-$12 with no change;
2. ignore the original data in $2, fill with series line number;
3. ignore the original data in $3, fill with series CM CD in the alternative sequence; or just replace C1 with CM and replace C2 with CD;
4. ignore the original data in $4, fill with "ETH"; or simply replace C20 with "ETH"
5. ignore the original data in $6, fill with series 1 1 2 2 3 3 .... up to $(line number)/2;
6. at the end of the output, add some new lines start with "TER LineNum+1";
7. the following line should start with CONECT and fill with three-column numbers of 1-10, 1-9, and 3-10 as shown in the above output sample.


I hope I have already made myself clear in explaining the function of this script. Please do me a favor. All you help will be greatly appreciated. Thank you in advanced!

ZHEN
from Shanghai, China.

Last edited by liuzhencc; 12-07-2014 at 10:52 AM..
# 2  
Old 12-07-2014
OK Zhen. That seems straightforward. Do you need the formatting preserved or would this do?
Code:
awk '
  END { 
    print "TER",NR+1
    for(i=1; i<=NR; i++) print "CONNECT", i, (i==1?x:i-1),(i==NR?x:i+1)
  }

  { 
    $2=NR
    sub(1,"M",$3)
    sub(2,"D",$3)
    $4="ETH"
    $5=int((NR+1)/2)
  }
  1
' OFS='\t' file

These 2 Users Gave Thanks to Scrutinizer For This Post:
# 3  
Old 12-08-2014
Thank you very much, Scrutinizer! it works like a charm! As you metioned, if each column could be printed with format, it'll look much tidy. So, would you please add some function to this script to get the output in the following format?

Code:
ATOM      10       CD      ETH         P        5        8.280      10.207    -0.000     1.00      0.00       C 
(10s%)  (10d%) (10s%) (10s%) (10s%) (10d%) (10.3f%) (10.3f%) (10.3f%) (5.2f%) (5.2f%)  (10s%)

and the new added lines in the following format,
Code:
CONNECT	999	998	1000
CONNECT	1000	999	1001
CONNECT	1001	1000	1002
(10s%)           (10d%)  (10d%)  (10d%)

# 4  
Old 12-08-2014
Hi,
just replace the last
Code:
1

with
Code:
{printf "%10s%10d%10s%10s%10s%10d%10.3f%10.3f%10.3f%5.2f%5.2f%10s\n",$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12}

and remove
Code:
OFS='\t'


Last edited by Scrutinizer; 12-08-2014 at 02:55 AM..
This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Manipulating Data Records for reporting

Hello All, I have Data Records (DRs) with the following format: ... (2 Replies)
Discussion started by: EAGL€
2 Replies

2. Shell Programming and Scripting

Manipulating variables in shell script

Hello, I know this should be simple but cant find a solution yet.I have the following in a sh script called "var" #!/bin/bash var1=0 And on another script called "main" I use a if construct: #!/bin/bash . var if then Do this else do that fi Now in "do this" part,I have to change... (8 Replies)
Discussion started by: vijai
8 Replies

3. Shell Programming and Scripting

Manipulating xml data with awk

Hi everyone, I have a little bit of complicated task to finish with AWK. Here it is; I have a data file in xml format which looks like this <data> a1 a2 a3 a4 a5 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 d1 d2 d3 d4 d5 e1 e2 e3 e4 e5 </data> lets say each data block contains 5 rows and 5 columns,... (13 Replies)
Discussion started by: hayreter
13 Replies

4. Shell Programming and Scripting

Help with manipulating the output on a script

Hi All, I have a question on eliminating spaces from a output. A command returns me output like this Attribute Value --------------- --------------- Total Capacity 500 GB Utilization 10 % ... (12 Replies)
Discussion started by: rrb2009
12 Replies

5. Shell Programming and Scripting

reading from two files and manipulating the data

hi i have a file of the following format FILE1 5 937 8 1860 1850 1 683 2 1 129 2 2 5 938 8 1122 1123 1 20 520 4 1860 1851 1 5 939 8 1122 1124 1 20 521 4i have another file which... (3 Replies)
Discussion started by: vaibhavkorde
3 Replies

6. Shell Programming and Scripting

manipulating data

Hi guys Firstly, I'd like to say hi and how great this forum is. I'm not new to UNIX but am relatively new to scripting. I have a personal project that I'm working on just to try and speed up my learning. I working with a text file, well more of a logfile really. It has several columns of... (6 Replies)
Discussion started by: abcd69
6 Replies

7. Emergency UNIX and Linux Support

Manipulating Data

Hi. I haven't had to write bash scripts in a long time and have a simple task to do, but need some help: Input: chrY:22627291-22651542 chrY:23045932-23070172 chrY:23684890-23696359 chrY:25318610-25330083 chrY:25451096-25462570 chr10:1054847-1061799 chr10:1058606-1080131... (7 Replies)
Discussion started by: awknerd
7 Replies

8. Shell Programming and Scripting

Manipulating Pick multi dimensional data with awk.

Hi. I am reasonably new to awk, but have done quite a lot of unix scripting in the past. I have resolved the issues below with unix scripting but it runs like a dog. Moved to awk for speed and functionality but running up a big learning curve in a hurry, so hope there is some help here. I... (6 Replies)
Discussion started by: mike.strategis
6 Replies

9. Shell Programming and Scripting

Need help is manipulating a file with some arithmetic operations using bash script

Friends, I have a file with contents like: interface Serial0/4/0/0/1/1/1/1:0 encapsulation mfr multilink group 101 Now I need to manipulate the file in such a way that to all the numbers less than 163, 63 gets added and to all numbers greater than 163, 63 gets deducted.(The numbers... (2 Replies)
Discussion started by: shrijith1
2 Replies

10. Shell Programming and Scripting

Manipulating data in variable

Hi, I have two variables - A and B - containing a bunch of file paths. I am comparing them and when I find a match I want to remove that entry from A so that as the compare proceeds A shrinks entry by entry. How can I remove a matched entry from A whilst leaving the non matched entries... (6 Replies)
Discussion started by: ajcannon
6 Replies
Login or Register to Ask a Question