Removing rows from a file

12-04-2009

Registered User

945, 8

Join Date: Dec 2009

Last Activity: 28 February 2018, 3:14 PM EST

Posts: 945

Thanks Given: 81

Thanked 8 Times in 8 Posts

Removing rows from a file

I have a file like below and want to use awk to solve this problem. The record separator is ">". I want to look at each record section enclosed within ">". Find the row with the 2nd and 3rd columns being 0, such as

Code:

10 0  0

I need to take the first number which in this case is 10. Then take the first number in each row in the section between ">" and check if the difference from 10 is greater than 40. If it is greater the row is removed.

For example we do something like this

10-10
13-10
16-10
19-10
22-10
25-10
28-10
31-10
34-10
37-10

If value greater than 40, we remove the row.

Code:

>
10 0 0
13 5.92346 5.92346
16 10.3106 10.3106
19 13.9672 13.9672
22 16.9838 16.9838
25 19.4407 19.4407
28 21.4705 21.4705
31 23.1547 23.1547
34 24.6813 24.6813
37 26.0695 26.0695
>
40 27.3611 27.3611
43 28.631 28.631
46 29.8366 29.8366
49 30.9858 30.9858
52 32.0934 32.0934
55 33.1458 33.1458
58 34.1637 34.1637
61 35.1297 35.1297
64 36.0253 36.0253
67 36.9248 36.9248
70 37.8001 37.8001
>

Last edited by kristinu; 12-04-2009 at 11:22 AM..

kristinu

View Public Profile for kristinu

Find all posts by kristinu

12-04-2009

Registered User

486, 10

Join Date: Jan 2009

Last Activity: 9 September 2015, 11:46 AM EDT

Location: canton, michigan

Posts: 486

Thanks Given: 0

Thanked 10 Times in 10 Posts

some better input test data would be nice.

personally, i see no relationship between your initial paragraph ( 10 0 0 ), your test data input and then your output.

Otherwise, this sounds like a fairly straightforward awk script.

quirkasaurus

View Public Profile for quirkasaurus

Find all posts by quirkasaurus

12-04-2009

Registered User

9, 0

Join Date: Dec 2009

Last Activity: 9 December 2009, 3:02 AM EST

Location: Hyderabad

Posts: 9

Thanks Given: 0

Thanked 0 Times in 0 Posts

I didn't get your question. You said that your field seperator is ">" but in your code its something as your row. Can you explain in better way?

ash.g

View Public Profile for ash.g

Find all posts by ash.g

12-04-2009

Registered User

945, 8

Join Date: Dec 2009

Last Activity: 28 February 2018, 3:14 PM EST

Posts: 945

Thanks Given: 81

Thanked 8 Times in 8 Posts

Better explanation

I have this file:

Code:

>
10 0 0
13 5.92346 5.92346
16 10.3106 10.3106
19 13.9672 13.9672
22 16.9838 16.9838
25 19.4407 19.4407
28 21.4705 21.4705
31 23.1547 23.1547
34 24.6813 24.6813
37 26.0695 26.0695
>
40 27.3611 27.3611
43 28.631 28.631
46 29.8366 29.8366
49 0 0
52 32.0934 32.0934
55 33.1458 33.1458
58 34.1637 34.1637
61 35.1297 35.1297
64 36.0253 36.0253
67 36.9248 36.9248
70 37.8001 37.8001
100 37.8001 37.8001
>

From each section withing the ">" signs, I take each row and find the one having 0 as the second and third number. I take the first number. For example in the first section, it's a 10, because we find 10 0 0.

Then we take each row and subtract the first number from 10. Then check whether the result is greater than 40. If it is greater than 40, we remove the row.

Hope this described things better

Output would be

Code:

>
10 0 0
13 5.92346 5.92346
16 10.3106 10.3106
19 13.9672 13.9672
22 16.9838 16.9838
25 19.4407 19.4407
28 21.4705 21.4705
31 23.1547 23.1547
34 24.6813 24.6813
37 26.0695 26.0695
>
40 27.3611 27.3611
43 28.631 28.631
46 29.8366 29.8366
49 0 0
52 32.0934 32.0934
55 33.1458 33.1458
58 34.1637 34.1637
61 35.1297 35.1297
64 36.0253 36.0253
67 36.9248 36.9248
70 37.8001 37.8001
>

The row

Code:

100 37.8001 37.8001

has been removed because in the second section 100 - 49 > 40.

Last edited by kristinu; 12-04-2009 at 11:36 AM..

kristinu

View Public Profile for kristinu

Find all posts by kristinu

12-04-2009

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

Code:

awk 'BEGIN { ARGV[ARGC++] = ARGV[ARGC-1] }
FNR == NR { 
  />/ && fnr[FNR] = ++sec 
  $2 + $3 || idx[sec] = $1
  next
  }
FNR in fnr { v = idx[fnr[FNR]] }
$1 - v < max' max=40 infile

---------- Post updated at 05:10 PM ---------- Previous update was at 04:53 PM ----------

I corrected some really stupid typos from the code, sorry for the previous one

Last edited by radoulov; 12-04-2009 at 12:10 PM.. Reason: Sorry for the stupid typos :)

radoulov

View Public Profile for radoulov

Find all posts by radoulov

12-04-2009

Registered User

945, 8

Join Date: Dec 2009

Last Activity: 28 February 2018, 3:14 PM EST

Posts: 945

Thanks Given: 81

Thanked 8 Times in 8 Posts

Code:

>
10 0 0
13 5.92346 5.92346
16 10.3106 10.3106
19 13.9672 13.9672
22 16.9838 16.9838
25 19.4407 19.4407
28 21.4705 21.4705
31 23.1547 23.1547
34 24.6813 24.6813
37 26.0695 26.0695
40 27.3611 27.3611
43 28.631 28.631
46 29.8366 29.8366
49 30.9858 30.9858
52 32.0934 32.0934
55 33.1458 33.1458
58 34.1637 34.1637
61 35.1297 35.1297
64 36.0253 36.0253
67 36.9248 36.9248
70 37.8001 37.8001
73 38.6296 38.6296
76 39.4503 39.4503
79 40.2424 40.2424
82 40.997 40.997
85 41.7681 41.7681
88 42.5001 42.5001
91 43.2316 43.2316
94 43.9289 43.9289
97 44.6221 44.6221
100 45.3015 45.3015
103 45.9617 45.9617
106 46.6138 46.6138
109 47.2457 47.2457
112 47.8904 47.8904
115 48.5016 48.5016
118 49.1305 49.1305
121 49.7498 49.7498
124 50.3272 50.3272
127 50.8841 50.8841
130 51.472 51.472
133 52.0619 52.0619
136 52.6079 52.6079
139 53.1586 53.1586
142 53.7149 53.7149
145 54.2602 54.2602
148 54.7771 54.7771
151 55.3154 55.3154
154 55.8316 55.8316
157 56.366 56.366
160 56.8704 56.8704
163 57.358 57.358
166 57.8577 57.8577
169 58.338 58.338
172 58.8308 58.8308
175 59.308 59.308
178 59.7918 59.7918
181 60.2547 60.2547
184 60.7199 60.7199
187 61.1781 61.1781
190 61.643 61.643
193 62.1091 62.1091
196 62.5579 62.5579
199 62.9957 62.9957
>

Tried it on the file above, however it has not solved the problems. Rows with 196-10, 199-10 etc are all greater than 40, but they still show in the output file

kristinu

View Public Profile for kristinu

Find all posts by kristinu

12-04-2009

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

Quote:

Originally Posted by kristinu

[...]
Tried it on the file above, however it has not solved the problems. Rows with 196-10, 199-10 etc are all greater than 40, but they still show in the output file

First of all, a few minutes ago I fixed some errors and updated the post.
Second: am I missing something or you should remove all the records after this one:

Code:

49 30.9858 30.9858

52 - 10 > 40 ...

radoulov

View Public Profile for radoulov

Find all posts by radoulov

Shell Programming and Scripting

Removing rows from a file

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing top few and bottom few rows in a file

Discussion started by: calredd

2. UNIX for Dummies Questions & Answers

Removing rows that contain non-unique column entry

Discussion started by: msatseqs

3. Shell Programming and Scripting

Removing Duplicate Rows in a file

Discussion started by: ekbaazigar

4. Shell Programming and Scripting

Removing duplicated first field rows

Discussion started by: palex

5. Shell Programming and Scripting

Bash script help - removing certain rows from .csv file

Discussion started by: MrTuxor

6. Shell Programming and Scripting

Removing rows and chars from text file

Discussion started by: Lord Spectre

7. Shell Programming and Scripting

removing rows from text file older than certain date

Discussion started by: firefox2k2

8. Shell Programming and Scripting

Removing rows based on a different file (ignore my earlier post - there was a mistake).

Discussion started by: kylle345

9. Shell Programming and Scripting

Removing rows based on a another file

Discussion started by: kylle345

10. Shell Programming and Scripting

Removing rows from a file based on date comparison

Discussion started by: Max_2503