selecting and deleting specific lines with condition


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting selecting and deleting specific lines with condition
# 1  
Old 09-08-2011
selecting and deleting specific lines with condition

I have a set of data as below:

Quote:
HBOND SUMMARY
output to file HB_lowLyo_D_lipid_A_water_001_064.tbl,
data was sorted, intra-residue interactions are NOT included,
Distance cutoff is 4.00 angstroms, angle cutoff is 120.00 degrees
Hydrogen bond information dumped for occupancies > 0.00

DONOR ACCEPTORH ACCEPTOR
atom# res@atom atom# res@atom atom# res@atom %occupied distance angle
| 4645 58@O12 | 23489 1174@H1 23488 1174@O | 22.79 2.945 ( 0.28) 26.79 (14.41)
| 4645 58@O12 | 23490 1174@H2 23488 1174@O | 22.49 2.965 ( 0.31) 28.01 (14.47)
| 2701 34@O12 | 23333 1122@H1 23332 1122@O | 20.60 2.965 ( 0.23) 30.07 (14.18)
| 2701 34@O12 | 23334 1122@H2 23332 1122@O | 19.74 2.963 ( 0.23) 31.43 (13.88)
| 271 4@O12 | 23334 1122@H2 23332 1122@O | 19.70 2.825 ( 0.19) 21.92 (12.15)
| 271 4@O12 | 23333 1122@H1 23332 1122@O | 19.55 2.826 ( 0.19) 22.22 (12.71)
| 4655 58@O16 | 21156 396@H2 21154 396@O | 19.43 2.933 ( 0.22) 31.95 (15.18)
| 4658 58@O15 | 21156 396@H2 21154 396@O | 18.96 3.163 ( 0.27) 37.03 (14.63)
| 4310 54@O26 | 23202 1078@H2 23200 1078@O | 18.73 2.821 ( 0.24) 25.87 (13.92)
| 4655 58@O16 | 21155 396@H1 21154 396@O | 18.63 2.917 ( 0.22) 31.91 (15.00)
| 1820 23@O16 | 21167 400@H1 21166 400@O | 18.14 2.910 ( 0.22) 27.20 (13.87)
| 1820 23@O16 | 21168 400@H2 21166 400@O | 17.96 2.907 ( 0.21) 26.69 (13.86)
| 3845 48@O16 | 23454 1162@H2 23452 1162@O | 17.68 2.991 ( 0.31) 28.45 (14.88)
| 4658 58@O15 | 21155 396@H1 21154 396@O | 17.31 3.177 ( 0.27) 38.82 (14.69)
| 3845 48@O16 | 23453 1162@H1 23452 1162@O | 17.29 3.016 ( 0.32) 28.84 (14.57)
| 1489 19@O13 | 23201 1078@H1 23200 1078@O | 16.66 2.884 ( 0.23) 31.39 (15.56)
| 3824 48@O26 | 21099 377@H2 21097 377@O | 15.44 2.992 ( 0.30) 30.78 (15.01)
| 4253 53@O15 | 23454 1162@H2 23452 1162@O | 14.98 2.961 ( 0.27) 33.71 (15.09)
| 1459 19@O22 | 23201 1078@H1 23200 1078@O | 14.84 3.012 ( 0.33) 35.08 (16.12)
| 1081 14@O12 | 21173 402@H1 21172 402@O | 14.76 2.937 ( 0.24) 27.54 (14.26)
| 4253 53@O15 | 23453 1162@H1 23452 1162@O | 14.63 2.955 ( 0.25) 33.68 (15.11)
| 1081 14@O12 | 21174 402@H2 21172 402@O | 14.41 2.944 ( 0.25) 28.34 (14.35)
| 3824 48@O26 | 21098 377@H1 21097 377@O | 13.70 3.002 ( 0.30) 31.00 (15.21)
| 3845 48@O16 | 21156 396@H2 21154 396@O | 13.06 2.934 ( 0.26) 27.71 (14.05)
.
.
.
few thousand lines
The first field, $1 represent "|".
The $3 (3rd field) and $6 (6th field) in my data file represent "number-molecule" which has arrangement as below:

HTML Code:
   1    2   3   4   5   6   7       8

   9    10  11  12  13  14  15      16
  17    18  19  20  21  22  23      24
  25    26  27  28  29  30  31      32
  33    34  35  36  37  38  39      40
  41    42  43  44  45  46  47      48
  49    50  51  52  53  54  55      56

  57    58  59  60  61  62  63      64 
Any pairs made from above numbers actually represents pairs in the 3rd and 6th field of each line in the data file.

What I want is to select the pairs from the data file made only by the numbers which are arranged at the outer most lines of the above number-molecule ordering.

In short, ANY PAIRS made by only the numbers
HTML Code:
 (1 2 3 4 5 6 7 8   57 58 59 60 61 62 63 64   9 17 25 33 41 49 57   8 16 24 32 40 48 56 64)

in other words

1 , 2
1 , 3
1 , 4
.
.
1 , 57
1 , 58
1 , 59
.
.
.
2, 1
2, 3
2, 4
2, 5
.
.
.
2, 57
2, 58
2, 59
.
.
.
are need to be deleted from the data file.

To achieve this I have tried to write awk script as below to test to print out the line which I suppose to delete. But at this level I fail to select those line pairs.

Code:
 #!/usr/bin/awk -f

 BEGIN  {
   i=0
   for (n=1; n<=8; n++) set[i++] = n;
   for (n=57; n<=64; n++) set[i++] = n;
   for (n=9; n<=49; n+=8) {set[i++] = n; set[i++] = n+7};
    }


 ($1== "|") {
     split($3, res1, "@"); split($6, res2, "@"); #print res1[1], res2[1]

     if ( (res1[1] in set) == (res2[1] in set) ); 

     {
       print;
      }

 }

Can I get any help to resolve this needs?

Thanks in advance

Smilie
# 2  
Old 09-08-2011
Generally, by delete one means make a new file without. I am a bit confused in trying to see the objective. It may be a multi-pass project, to collect information, rearrange it to decide what to do, and then apply those results to the original. Where did it get hard?
# 3  
Old 09-08-2011
I'm afraid it's hard to understand your problem. May you give expected or deleted rows (and please, one more time - why?) in this test set:
Code:
awk '/^\|/ {print $3, $6}' INPUTFILE
58@O12 1174@H1
58@O12 1174@H2
34@O12 1122@H1
34@O12 1122@H2
4@O12 1122@H2
4@O12 1122@H1
58@O16 396@H2
58@O15 396@H2
54@O26 1078@H2
58@O16 396@H1
23@O16 400@H1
23@O16 400@H2
48@O16 1162@H2
58@O15 396@H1
48@O16 1162@H1
19@O13 1078@H1
48@O26 377@H2
53@O15 1162@H2
19@O22 1078@H1
14@O12 402@H1
53@O15 1162@H1
14@O12 402@H2
48@O26 377@H1
48@O16 396@H2

# 4  
Old 09-08-2011
I shall give another set of data for clarity purpose.
Quote:

DONOR ACCEPTORH ACCEPTOR
atom# res@atom atom# res@atom atom# res@atom %occupied distance angle
| 4726 59@O12 | 1487 19@H12 1486 19@O12 | 85.66 2.819 ( 0.18) 21.85 (12.11)
| 1499 19@O15 | 1730 24@H12 1729 24@O12 | 83.15 3.190 ( 0.31) 22.36 (12.73)
| 1216 16@O22 | 1460 17@H22 1459 17@O22 | 75.74 2.757 ( 0.14) 24.55 (13.66)
| 4232 53@O25 | 4143 52@H24 4142 52@O24 | 74.35 2.916 ( 0.25) 28.27 (13.26)
| 3683 46@O16 | 4163 52@H13 4162 52@O13 | 73.78 2.963 ( 0.29) 23.65 (14.14)
| 4162 52@O13 | 4079 51@H12 4078 51@O12 | 73.68 2.841 ( 0.19) 21.25 (11.87)
| 3764 47@O16 | 3825 48@H26 3824 48@O26 | 70.52 2.973 ( 0.28) 26.88 (13.14)
| 193 3@O13 | 353 5@H12 352 5@O12 | 67.49 2.780 ( 0.17) 17.85 (10.90)
| 3035 38@O16 | 3350 42@H12 3349 42@O12 | 67.19 2.790 ( 0.16) 18.72 (10.47)
| 686 9@O16 | 893 12@H22 892 12@O22 | 66.87 2.905 ( 0.22) 26.53 (10.90)
| 1478 19@O25 | 1703 22@H22 1702 22@O22 | 64.37 2.864 ( 0.21) 31.87 (14.12)
| 3521 44@O16 | 747 10@H26 746 10@O26 | 63.71 2.941 ( 0.27) 26.82 (13.51)
| 1313 17@O26 | 1217 16@H22 1216 16@O22 | 63.09 2.807 ( 0.16) 22.23 (11.92)
| 4159 52@O12 | 3684 46@H16 3683 46@O16 | 62.43 2.900 ( 0.22) 35.69 (12.23)
| 4331 54@O16 | 1490 19@H13 1489 19@O13 | 61.80 2.989 ( 0.29) 26.58 (14.32)
| 3440 43@O16 | 3906 49@H26 3905 49@O26 | 60.17 2.964 ( 0.28) 28.61 (13.24)
| 1334 17@O16 | 1247 16@H13 1246 16@O13 | 59.31 2.828 ( 0.18) 25.35 (12.61)
| 1729 22@O12 | 1557 20@H26 1556 20@O26 | 58.11 3.036 ( 0.27) 32.81 (11.84)
| 4151 52@O25 | 4484 56@H12 4483 56@O12 | 57.67 2.917 ( 0.32) 27.71 (15.02)
| 1502 19@O11 | 1730 22@H12 1729 22@O12 | 57.53 3.184 ( 0.26) 41.62 (13.24)
| 3014 38@O26 | 3353 42@H13 3352 42@O13 | 57.42 2.884 ( 0.24) 22.59 (12.87)
| 3524 44@O15 | 3917 49@H12 3916 49@O12 | 57.35 3.227 ( 0.35) 25.52 (13.61)
| 2390 30@O15 | 2756 35@H22 2755 35@O22 | 57.28 3.074 ( 0.33) 31.27 (14.44)
| 1739 22@O16 | 5115 64@H24 5114 64@O24 | 56.78 2.876 ( 0.28) 20.94 (13.42)
| 4574 57@O16 | 5061 63@H16 5060 63@O16 | 56.57 2.956 ( 0.25) 30.52 (14.00)
| 2846 36@O24 | 3566 45@H22 3565 45@O22 | 55.92 2.880 ( 0.24) 22.85 (12.39)
| 605 8@O16 | 839 11@H12 838 11@O12 | 55.67 2.894 ( 0.24) 25.45 (13.25)
If you notice the first line field 3 ($3), the residue number is 59 and in filed 6, the residue number is 19. Number 59 is in the outer most line and 19 is not according to the number-molecule arrangement. So this line should NOT be deleted.

If you notice the second line, field 3 ($3), the number 19 and in filed 6 ($6) the number is 24. Number 19 is not in the outer most line but number 24 is in the outer most line. This line also should not be deleted since NOT both the numbers are in the outer most lines.

If you notice the third line, field 3 ($3), the number is 16 and filed 6 ($6) the number is 17. Since both the numbers in this pair belongs to the outer most numbers, then this line should be deleted.

So after testing the criteria of the numbers to be in the outer most lines then that line should be deleted. This is what I need to achieve and this code simply does not work as I wanted.

Thanks in advance.
# 5  
Old 09-09-2011
So, This is a negative join. Semms like his approach should be good: you need to save the outer numbers in an array, and then as you go through the lines, look them up and decide if you want to copy. You could use while read in ksh/bash and put @ in IFS to split that field into two. You could decide each number's row mathematically (( (N%8) < 2 )).

What about when field 6 and 8 do not match? No different?

Last edited by DGPickett; 09-09-2011 at 06:02 PM..
# 6  
Old 09-09-2011
Seems to me that you could use modulus to simplify the tests.

Your number x (assumed to be less than or equal to 64?)

if x % 8 = 0 it's in the right hand column
if x % 8 = 1 it's in the left hand column

then you just have the ranges

2<=x<=7

and

58<=x<=63

---------- Post updated at 05:31 PM ---------- Previous update was at 05:03 PM ----------

Ahhh...just realized, this test is wrong!

Code:
if ( (res1[1] in set) == (res2[1] in set) );

You can't test for the value of the array element this way, only that the subscript exists!

You could set your array differently instead of using set[i++] why not
break up the array and use set[n]? Then your test should work as you would have elements as follows:

Code:
set[1] through set[8]
set[9], set[16]
set[17], set[24]
set[25], set[32]
set[33], set[40]
set[41], set[48]
set[49], set[56]
set[57] through set[64]

Two other things I noticed...remove the semicolon after the test
and change "==" to &&

Code:
if ( (res1[1] in set) && (res2[1] in set) );

This is the biggest problem your script had other than trying to
use "in" to test the set values instead of the subscripts.

This code worked for me.

Code:
#!/usr/bin/awk -f

 BEGIN  {
   i=0
   for (n=1; n<=8; n++) set[n] = n;
   for (n=9; n<=49; n+=8) {
     set[n] = n 
     set[n+7] = n+7 
   };
   for (n=57; n<=64; n++) set[n] = n;
 }

 ($1 == "|") {
     split($3, res1, "@"); split($6, res2, "@");
     if ( (res1[1] in set) && (res2[1] in set) ) # <--- no ';' here!
     {
       print;
     }

 }

Quote:
# ./udc1.awk < udc1.txt
| 1216 16@O22 | 1460 17@H22 1459 17@O22 | 75.74 2.757 ( 0.14) 24.55 (13.66)
| 193 3@O13 | 353 5@H12 352 5@O12 | 67.49 2.780 ( 0.17) 17.85 (10.90)
| 1313 17@O26 | 1217 16@H22 1216 16@O22 | 63.09 2.807 ( 0.16) 22.23 (11.92)
| 1334 17@O16 | 1247 16@H13 1246 16@O13 | 59.31 2.828 ( 0.18) 25.35 (12.61)
| 4574 57@O16 | 5061 63@H16 5060 63@O16 | 56.57 2.956 ( 0.25) 30.52 (14.00)
udc1.txt was both your first and second examples put together in that order.

Last edited by rwuerth; 09-09-2011 at 07:27 PM..
This User Gave Thanks to rwuerth For This Post:
# 7  
Old 09-11-2011
Dear sir,

Thanks so much for your kind reply. The code perfectly works now as per my need. But additionally I want to ask you something related to this. At the end of the code I write "print" so that I want to see if the code selecting the lines which I dont want exactly. Now if I want to delete those selected lines, what command should I should use?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Deleting lines based on a condition for a group of files

hi i have a set of similar files. i want to delete lines until certain pattern appears in those files. for a single file the following command can be used but i want to do it for all the files at a time since the number is in thousands. awk '/PATTERN/{i++}i' file (6 Replies)
Discussion started by: anurupa777
6 Replies

2. Shell Programming and Scripting

Deleting specific lines in a file

Hello, I have a file filled with dates, such as: 04-08-2011 message 04-08-2011 message 03-08-2011 message 01-08-2011 message 31-07-2011 message 24-07-2011 message 15-07-2011 message 13-12-2008 message 26-11-2007 message And I want to delete those lines whose date is older than 10... (5 Replies)
Discussion started by: asanchez
5 Replies

3. Shell Programming and Scripting

deleting specific lines in a file

Hello, I have a file like: 26-07-2011 sunz02 message1 26-07-2011 sunz02 message2 26-07-2011 sunz02 message3 15-07-2011 sunz02 message1 15-07-2011 sunz02 message2 15-07-2011 sunz02 message3... (5 Replies)
Discussion started by: asanchez
5 Replies

4. Shell Programming and Scripting

Deleting specific lines in a file

Hello, I have a file like this one: 03-07-2011 sunz02 message1 03-07-2011 sunz02 message2 03-07-2011 sunz02 message3 01-07-2011 sunz02 message1 01-07-2011 sunz02 message2 01-07-2011 sunz02 ... (1 Reply)
Discussion started by: asanchez
1 Replies

5. Shell Programming and Scripting

deleting specific lines in a file

I want to delete all lines from a file (orig_file) that contain the regex values (bad_inv_list) I tried a for each loop with sed but it isn't working for file in `cat bad_inv_list`; do sed '/$file/d' orig_file > pared_down_file.1 mv pared_down_file.1 orig_file done I've added... (2 Replies)
Discussion started by: verge
2 Replies

6. Shell Programming and Scripting

Shell deleting specific lines

Hi, I'am working under unix solaris I have a text file with set of lines, each set of lines (BLOCK) have three fixed lines : Between SECND line and THEND we have N lines, N differ from a block to another sample : i have to make a script wich delete each 3 fixed lines if N=0... (3 Replies)
Discussion started by: salbanito
3 Replies

7. Shell Programming and Scripting

Selecting specific 'id's from lines and columns using 'SED' or 'AWK'

Hello experts, I am new to this group and to 'SED' and 'AWK'. I have data (text file) with 5 columns (C_1-5) and 100s of lines (only 10 lines are shown below as an example). I have to find or select only the id numbers (C-1) of specific lines with '90' in the same line (of C_3) AND with '20' in... (6 Replies)
Discussion started by: kamskamu
6 Replies

8. UNIX for Dummies Questions & Answers

command for selecting specific lines from a script

I need help on following script: I need to print the lines which are in bold letters in separate file as record string("|") emp_name; string("|") emp_id; decimal("|") emp_salary; string("|") emp_status; string("\n") emp_proj; end (1 Reply)
Discussion started by: gardasgangadhar
1 Replies

9. UNIX for Dummies Questions & Answers

Help with selecting specific lines in a large file

Hello, I need to select the 3 lines above as well as below a search string, including the search string. I have been trying various combinations using sed command without any success. Can anuone help please. Thanking (2 Replies)
Discussion started by: tansha
2 Replies

10. Shell Programming and Scripting

Deleting specific lines in a file

I have a file which has some lines starting with a particular word. I would like to delete 5 lines before each such line containing that particular word. eg: line1 line2 line3 line4 line5 line6 "particular word"... I would like to delete line2-line6 and all such occurences in that... (4 Replies)
Discussion started by: ramu_1980
4 Replies
Login or Register to Ask a Question