Remove a block of Text at regular intervals


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove a block of Text at regular intervals
# 1  
Old 11-02-2010
Question Remove a block of Text at regular intervals

Hello all,

I have a text files that consists of blocks of text. Each block of text represents a set of Cartesian coordinates for a molecule. Each block of text starts with a line that has a only a number, which is equal to the total number of atoms in the molecule. After this number is a line with "Frame #" where the # symbol is the number of the frame i.e. 1, 2, 3... This is line are the Cartesian coordinates, one set (x, y, z) per line. There are the same number of Cartesian coordinates as the number of atoms in the molecule.

Here is a sample:

Code:
14
Frame 1
Ir 0.4482 -1.2980 -0.2902
P 1.8759 -2.1654 1.4038
P -1.2305 -0.8418 -1.9134
H -2.5605 -0.7775 -1.4067
H -1.3820 -1.8515 -2.9058
H -1.1987 0.3321 -2.7223
H 2.5359 -3.3920 1.1065
H 1.2161 -2.5072 2.6182
H 2.9669 -1.3899 1.8960
C 1.3685 0.2571 -0.5341
O 1.9671 1.2795 -0.7004
Cl -0.8142 -3.4101 0.0318
H -0.8380 -0.5636 2.1141
H -1.0869 -0.4380 2.8141
14
Frame 2
Ir 0.4490 -1.2978 -0.2903
P 1.8738 -2.1613 1.4076
P -1.2367 -0.8359 -1.9047
H -2.5634 -0.7722 -1.3897
H -1.3955 -1.8409 -2.9008
H -1.2083 0.3415 -2.7087
H 2.5186 -3.3989 1.1226
H 1.2155 -2.4815 2.6289
H 2.9753 -1.3923 1.8863
C 1.3731 0.2542 -0.5394
O 1.9771 1.2723 -0.7122
Cl -0.8091 -3.4129 0.0288
H -0.9491 -0.6267 2.0929
H -1.2309 -0.4996 2.7795
.
.
.
14
Frame 72
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
H -2.2321 -1.8577 -1.8841
H -1.2106 -0.1808 -2.8791
H 2.8807 -2.5381 1.3446
H 0.9971 -3.2308 2.2364
H 1.8734 -1.3013 2.8541
C 1.7346 -0.5673 -1.1992
O 2.6174 -0.1436 -1.8621
Cl 0.2051 -3.5805 -0.9057
H 0.1695 0.2982 0.6158
H -0.9713 -1.5095 0.9427

I would like to be able to remove every other frame (which corresponds to a block of text that starts with number of atoms in the molecule, followed by the "Frame #" followed by the coordinates.

The trick is that not every file has the same number of atoms, so the script has to have a variable such that the user can tell it how many atoms are in the coordinates section and then it should remove every other block of data.

So I am trying to remove Frames 2, 4, 6, 8 and so on.

Thus the above sample would become:
Code:
14
Frame 1
Ir 0.4482 -1.2980 -0.2902
P 1.8759 -2.1654 1.4038
P -1.2305 -0.8418 -1.9134
H -2.5605 -0.7775 -1.4067
H -1.3820 -1.8515 -2.9058
H -1.1987 0.3321 -2.7223
H 2.5359 -3.3920 1.1065
H 1.2161 -2.5072 2.6182
H 2.9669 -1.3899 1.8960
C 1.3685 0.2571 -0.5341
O 1.9671 1.2795 -0.7004
Cl -0.8142 -3.4101 0.0318
H -0.8380 -0.5636 2.1141
H -1.0869 -0.4380 2.8141
.
.
.
Frame 71
Ir 0.2798 -1.1422 -0.0743
P 1.5833 -2.0636 1.6848
P -1.4955 -0.6824 -1.5761
H -2.5237 0.2149 -1.1727
H -2.2322 -1.8577 -1.8839
H -1.2099 -0.1819 -2.8797
H 2.8810 -2.5379 1.3438
H 0.9977 -3.2312 2.2359
H 1.8739 -1.3017 2.8539
C 1.7345 -0.5673 -1.1991
O 2.6175 -0.1439 -1.8621
Cl 0.2052 -3.5805 -0.9056
H 0.1693 0.2983 0.6159
H -0.9714 -1.5096 0.9427

Also, if possible, I would like also to be able to remove every 3rd frame as well.

i.e. remove frame 3, 6, 9, 12, 15... and so on.
# 2  
Old 11-02-2010
To print the blocks 2, 4, 6 and so on:

Code:
awk 'END { 
  if (!(rc % v) && rc != prc)
    print r 
  }
NF == 1 { 
  if (!(rc % v)) {
    print r; prc = rc
    }
  r = x; rc ++
  }
{ r = r ? r RS $0 : $0 }
  ' v=2 infile

To print 3, 6, 9 and so on, set v to 3: ... v=3 infile
This User Gave Thanks to radoulov For This Post:
# 3  
Old 11-02-2010
Try:
Code:
awk '/^[0-9]+$/{a=$1;p=0} $1=="Frame"&&$2%m{print a;p=1}p' m=3 infile

m=2,3,4 etc...
This User Gave Thanks to Scrutinizer For This Post:
# 4  
Old 11-02-2010
By setting "Frame" as record separator , i must get rid off the leading line "^14$", i must adjust the offset of record, this is why the NR-1 instead of NR

Code:
awk -F"\n" 'BEGIN {RS="Frame" ;ORS="Frame";printf "%s","Frame"} ; ((NR-1)%2!=0)&&((NR-1)%3!=0)' infile

or
Code:
awk -F"\n" -vRS="Frame" -vORS="Frame" 'BEGIN{printf "%s",RS};((NR-1)%2!=0)&&((NR-1)%3!=0)' infile

Code:
# cat infile
14
Frame 1
Ir 0.4482 -1.2980 -0.2902
H -0.8380 -0.5636 2.1141
...
H -1.0869 -0.4380 2.8141
14
Frame 2
Ir 0.4490 -1.2978 -0.2903
P 1.8738 -2.1613 1.4076
...
H -1.2309 -0.4996 2.7795
14
Frame 3
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
14
Frame 4
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
14
Frame 5
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
14
Frame 6
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
14
Frame 7
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
14
Frame 8
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
14
Frame 9
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
14
Frame 10
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
14
Frame 11
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
.
.
.
14
Frame 72
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
H -2.2321 -1.8577 -1.8841
H -1.2106 -0.1808 -2.8791
H 2.8807 -2.5381 1.3446
H 0.9971 -3.2308 2.2364
H 1.8734 -1.3013 2.8541
C 1.7346 -0.5673 -1.1992
O 2.6174 -0.1436 -1.8621
Cl 0.2051 -3.5805 -0.9057
H 0.1695 0.2982 0.6158
H -0.9713 -1.5095 0.9427
# awk -F"\n" -vRS="Frame" -vORS="Frame" 'BEGIN{printf "%s",RS};((NR-1)%2!=0)&&((NR-1)%3!=0)' infile
Frame 1
Ir 0.4482 -1.2980 -0.2902
H -0.8380 -0.5636 2.1141
...
H -1.0869 -0.4380 2.8141
14
Frame 5
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
P -1.4956 -0.6824 -1.5757
H -2.5240 0.2145 -1.1716
14
Frame 7
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
14
Frame 11
Ir 0.2799 -1.1423 -0.0744
P 1.5830 -2.0634 1.6851
.
.
.
14
#


Last edited by ctsgnb; 11-02-2010 at 02:43 PM..
This User Gave Thanks to ctsgnb For This Post:
# 5  
Old 11-02-2010
MySQL Thanks!

This works exactly as you said, but is there a way to make it print blocks 1,3,5, 7 and 1,4,7,9 etc..?

Quote:
Originally Posted by radoulov
To print the blocks 2, 4, 6 and so on:

Code:
awk 'END { 
  if (!(rc % v) && rc != prc)
    print r 
  }
NF == 1 { 
  if (!(rc % v)) {
    print r; prc = rc
    }
  r = x; rc ++
  }
{ r = r ? r RS $0 : $0 }
  ' v=2 infile

To print 3, 6, 9 and so on, set v to 3: ... v=3 infile
---------- Post updated at 01:36 PM ---------- Previous update was at 01:29 PM ----------

Thanks for your reply but this doesn't work.

It seems that this command adds spaces into certain spots of the file at the begining but does remove the correct blocks towards the end. Here is the the first few blocks:

Code:
Frame 1
Rh -0.3793 -0.3188 -0.3617
C -1.4181 0.5185 -1.7491
P 1.8313 -0.3365 -1.4708
H 2.0493 0.2251 -2.7751
H 2.9159 0.3283 -0.8096
H 2.5143 -1.5807 -1.7041
P -1.0881 -2.6725 -0.6945
H -0.1078 -3.7030 -0.9106
H -1.7533 -3.3073 0.4056
H -2.0033 -3.0952 -1.7185
H 0.4760 -0.9847 0.8641
O -2.0655 1.0426 -2.5964
C -1.5647 0.4833 1.2803
C -0.4855 1.3783 1.0023
H -2.5722 0.7321 0.9603
H -1.5075 -0.1757 2.1410
H -0.6724 2.3044 0.4670
H 0.3854 1.3885 1.6488
18

Frame 2
Rh -0.3799 -0.3194 -0.3618
C -1.4163 0.5199 -1.7496
P 1.8296 -0.3352 -1.4732
H 2.0452 0.2282 -2.7771
H 2.9144 0.3296 -0.8125
H 2.5134 -1.5785 -1.7089
P -1.0872 -2.6741 -0.6913
H -0.1063 -3.7039 -0.9069
H -1.7499 -3.3073 0.4112
H -2.0037 -3.0998 -1.7128
H 0.4765 -0.9847 0.8635
O -2.0594 1.0478 -2.5979
C -1.5655 0.4814 1.2805
C -0.4871 1.3774 1.0028
H -2.5732 0.7295 0.9607
H -1.5077 -0.1779 2.1409
H -0.6748 2.3035 0.4679
H 0.3839 1.3880 1.6492

and here are the blocks at the end:

Code:
Frame 287
Rh -0.4360 -0.2696 -0.6220
C -1.1646 1.0045 -1.7330
P 1.7166 -0.5355 -1.8645
H 1.8842 0.0740 -3.1515
H 2.8434 0.0882 -1.2285
H 2.3789 -1.7764 -2.1787
P -0.6673 -2.6176 0.1277
H 0.3305 -3.6308 0.3709
H -1.4676 -2.8685 1.2952
H -1.4465 -3.3795 -0.8014
H -0.3658 -0.4659 2.3069
O -1.7130 1.6942 -2.5406
C -1.8697 0.1962 0.8475
C -0.9955 0.4085 2.0867
H -2.4370 1.0967 0.5998
H -2.5883 -0.6253 0.9828
H -0.3321 1.2698 1.9625
H -1.6058 0.5920 2.9832
18

Frame 289
Rh -0.4362 -0.2701 -0.6236
C -1.1618 1.0091 -1.7307
P 1.7142 -0.5392 -1.8686
H 1.8803 0.0675 -3.1571
H 2.8442 0.0827 -1.2365
H 2.3725 -1.7824 -2.1820
P -0.6607 -2.6166 0.1343
H 0.3410 -3.6262 0.3757
H -1.4536 -2.8644 1.3074
H -1.4437 -3.3837 -0.7873
H -0.3808 -0.4699 2.3130
O -1.7083 1.7048 -2.5345
C -1.8742 0.1934 0.8435
C -1.0063 0.4059 2.0871
H -2.4418 1.0934 0.5946
H -2.5926 -0.6289 0.9752
H -0.3391 1.2644 1.9635
H -1.6207 0.5944 2.9797
18
Frame 291
Rh -0.4365 -0.2705 -0.6252
C -1.1590 1.0137 -1.7283
P 1.7117 -0.5429 -1.8727
H 1.8763 0.0608 -3.1628
H 2.8450 0.0772 -1.2446
H 2.3659 -1.7885 -2.1852
P -0.6541 -2.6156 0.1410
H 0.3512 -3.6217 0.3813
H -1.4405 -2.8599 1.3191
H -1.4403 -3.3880 -0.7734
H -0.3944 -0.4731 2.3185
O -1.7034 1.7154 -2.5282
C -1.8788 0.1901 0.8395
C -1.0170 0.4034 2.0873
H -2.4472 1.0893 0.5894
H -2.5964 -0.6335 0.9678
H -0.3474 1.2601 1.9648
H -1.6358 0.5956 2.9761

As you can see, it does remove every second block at the end but it also adds in spaces for some reason...


Quote:
Originally Posted by Scrutinizer
Try:
Code:
awk '/^[0-9]+$/{a=$1;p=0} $1=="Frame"&&$2%m{print a;p=1}p' m=3 infile

m=2,3,4 etc...
---------- Post updated at 01:39 PM ---------- Previous update was at 01:36 PM ----------

Quote:
Originally Posted by ctsgnb
Code:
awk -F"\n" 'BEGIN {RS="Frame" ;ORS="Frame";printf "%s","Frame"} ; ((NR-1)%2!=0)&&((NR-1)%3!=0)' infile

Thanks for your suggestion. This does remove blocks of text but it seems it is removing 2 or 3 or 4 blocks rather than removing every second block.

Here is a sample:
Code:
Frame 1
Rh -0.3793 -0.3188 -0.3617
C -1.4181 0.5185 -1.7491
P 1.8313 -0.3365 -1.4708
H 2.0493 0.2251 -2.7751
H 2.9159 0.3283 -0.8096
H 2.5143 -1.5807 -1.7041
P -1.0881 -2.6725 -0.6945
H -0.1078 -3.7030 -0.9106
H -1.7533 -3.3073 0.4056
H -2.0033 -3.0952 -1.7185
H 0.4760 -0.9847 0.8641
O -2.0655 1.0426 -2.5964
C -1.5647 0.4833 1.2803
C -0.4855 1.3783 1.0023
H -2.5722 0.7321 0.9603
H -1.5075 -0.1757 2.1410
H -0.6724 2.3044 0.4670
H 0.3854 1.3885 1.6488
18
Frame 5
Rh -0.3814 -0.3213 -0.3621
C -1.4104 0.5238 -1.7517
P 1.8239 -0.3291 -1.4809
H 2.0315 0.2434 -2.7820
H 2.9094 0.3353 -0.8209
H 2.5109 -1.5685 -1.7278
P -1.0864 -2.6784 -0.6804
H -0.1049 -3.7078 -0.8957
H -1.7414 -3.3064 0.4296
H -2.0081 -3.1123 -1.6940
H 0.4791 -0.9850 0.8613
O -2.0384 1.0622 -2.6046
C -1.5674 0.4749 1.2816
C -0.4913 1.3740 1.0054
H -2.5758 0.7210 0.9626
H -1.5076 -0.1858 2.1409
H -0.6814 2.3004 0.4719
H 0.3800 1.3854 1.6513
18
Frame 7
Rh -0.3823 -0.3227 -0.3624
C -1.4064 0.5263 -1.7531
P 1.8201 -0.3247 -1.4859
H 2.0225 0.2538 -2.7852
H 2.9058 0.3396 -0.8263
H 2.5096 -1.5612 -1.7402
P -1.0862 -2.6812 -0.6731
H -0.1046 -3.7105 -0.8884
H -1.7358 -3.3054 0.4422
H -2.0117 -3.1204 -1.6809
H 0.4809 -0.9852 0.8597
O -2.0243 1.0716 -2.6091
C -1.5686 0.4703 1.2824
C -0.4940 1.3715 1.0071
H -2.5775 0.7151 0.9639
H -1.5074 -0.1914 2.1408
H -0.6858 2.2982 0.4746
H 0.3775 1.3835 1.6527
18
Frame 11
Rh -0.3843 -0.3255 -0.3630
C -1.3984 0.5314 -1.7560
P 1.8126 -0.3158 -1.4960
H 2.0050 0.2739 -2.7918
H 2.8983 0.3492 -0.8369
H 2.5076 -1.5464 -1.7639
P -1.0859 -2.6867 -0.6582
H -0.1046 -3.7161 -0.8740
H -1.7243 -3.3027 0.4680
H -2.0199 -3.1361 -1.6538
H 0.4848 -0.9857 0.8561
O -1.9959 1.0901 -2.6179
C -1.5709 0.4610 1.2839
C -0.4995 1.3664 1.0108
H -2.5808 0.7031 0.9666
H -1.5070 -0.2029 2.1405
H -0.6944 2.2936 0.4804
H 0.3725 1.3793 1.6557
.
.
.

# 6  
Old 11-02-2010
For removing every second block only (2,4,6,8, ...) :
Code:
awk -F"\n" -vRS="Frame" -vORS="Frame" 'BEGIN{printf "%s",RS};((NR-1)%2!=0)' infile

But you initially stated :
Quote:
Also, if possible, I would like also to be able to remove every 3rd frame as well.

i.e. remove frame 3, 6, 9, 12, 15... and so on
.
That is why i also included the code to remove block 3,6,9,12,15 ...
# 7  
Old 11-02-2010
Hi on my system my suggestion prints
m=2: 1,3,5,7 . so 2,4,6,.. are deleted
m=3: 1,2,4,5,7 .. so 3,6,9.... are deleted
etc..
Does it not do that on your system? What platform are you on?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Bulk load testing in regular intervals

I need to write a script which can send files via sftp communication continously for half an hour or any given duration of time. I have already written a batch file to send multiple file via SFTP. but I need to know how can we set a duration of half an hour through shell script. Can we use sleep... (2 Replies)
Discussion started by: talk1234
2 Replies

2. Shell Programming and Scripting

perl regular expression to remove the special characters

I had a string in perl script as below. Tue Augáá7 03:54:12 2012 Now I need to replace the special character with space. After removing the special chaacters Tue Aug 7 03:54:12 2012 Could anyone please help me here for writing the regular expression? Thanks in advance.. Regards, GS (1 Reply)
Discussion started by: giridhar276
1 Replies

3. Programming

Selecting files in regular intervals from a folder

Hi, I need your expertise in selecting files from a folder. I have files named with convention: filename.i.j where j is an interger from 1 to 16, for each i which is an integer from 1 to 2000. I would like to select the files with i in regular interval of 50 like filename.1.j,... (2 Replies)
Discussion started by: rpd25
2 Replies

4. Shell Programming and Scripting

Filter or remove duplicate block of text without distinguishing marks or fields

Hello, Although I have found similar questions, I could not find advice that could help with our problem. The issue: We have several hundreds text files containing repeated blocks of text (I guess back at the time they were prepared like that to optmize printing). The block of texts... (13 Replies)
Discussion started by: samask
13 Replies

5. UNIX for Advanced & Expert Users

awk - remove block of text, multiple actions for 'if', inline edit

I'm having a couple of issues. I'm trying to edit a nagios config and remove a host definition if a certain "host_name" is found. My thought is I would find host definition block containing the host_name I'm looking for and output the line numbers for the first and last lines. Using set, I will... (9 Replies)
Discussion started by: mglenney
9 Replies

6. Shell Programming and Scripting

Need perl regular expression to remove the comment

I need a perl substitution to remove only the comment in the line . That line may have '#' with in double quotes .I used the following , s/(^.*\".+?#.+?\".+?)(#.*)/$1/g It works for , print " not a comment # not a comment " . "not a comment # not a comment" ; # It is a comment ... (3 Replies)
Discussion started by: karthigayan
3 Replies

7. Shell Programming and Scripting

Grouping data numbers in a text file into prescribed intervals and count

I have a text file that contains numbers (listed from the smallest to the largest). For ex. 34 817 1145 1645 1759 1761 3368 3529 4311 4681 5187 5193 5199 5417 5682 . . (5 Replies)
Discussion started by: Lucky Ali
5 Replies

8. Shell Programming and Scripting

Pls Help me out ... I want to check process status at regular intervals of time

I want to check process status at regular interval of time ... so i ha wirtten this BUT its not working when i placed this peace of code in .sh .. please help me out #!/bin/sh w = ps -ef|grep processname | wc - l echo $w if ; then Banner "Proceesname Problem" else Banner " Running... (5 Replies)
Discussion started by: srinivasvandana
5 Replies

9. Programming

performing a task at regular intervals

hi! i m tryin to write a program that will perform a specific tasks after fixed interval of time.say every 1 min. i jus donno how to go abt it.. which functions to use and so on... i wud like to add that i am dont want to use crontab over here. ny lead is appreciated. thanx. (2 Replies)
Discussion started by: mridula
2 Replies

10. Shell Programming and Scripting

mailing myself at regular intervals...

hi all, i wrote a script to mail myself using pine (modified) to keep remind of b'days. #!/bin/bash grep "`date +%D |awk -F/ '{print $2+1, $1+0}'`" dataFile >/home/username/mailme if test -s /home/username/mailme then pine -I '^X,y' -subject "Birthday Remainder" username... (4 Replies)
Discussion started by: timepassman
4 Replies
Login or Register to Ask a Question