Removing rows from a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removing rows from a file
# 1  
Old 12-04-2009
Removing rows from a file

I have a file like below and want to use awk to solve this problem. The record separator is ">". I want to look at each record section enclosed within ">". Find the row with the 2nd and 3rd columns being 0, such as

Code:
10 0  0

I need to take the first number which in this case is 10. Then take the first number in each row in the section between ">" and check if the difference from 10 is greater than 40. If it is greater the row is removed.

For example we do something like this

10-10
13-10
16-10
19-10
22-10
25-10
28-10
31-10
34-10
37-10

If value greater than 40, we remove the row.

Code:
>
10 0 0
13 5.92346 5.92346
16 10.3106 10.3106
19 13.9672 13.9672
22 16.9838 16.9838
25 19.4407 19.4407
28 21.4705 21.4705
31 23.1547 23.1547
34 24.6813 24.6813
37 26.0695 26.0695
>
40 27.3611 27.3611
43 28.631 28.631
46 29.8366 29.8366
49 30.9858 30.9858
52 32.0934 32.0934
55 33.1458 33.1458
58 34.1637 34.1637
61 35.1297 35.1297
64 36.0253 36.0253
67 36.9248 36.9248
70 37.8001 37.8001
>


Last edited by kristinu; 12-04-2009 at 11:22 AM..
# 2  
Old 12-04-2009
some better input test data would be nice.

personally, i see no relationship between your initial paragraph ( 10 0 0 ), your test data input and then your output.

Otherwise, this sounds like a fairly straightforward awk script.
# 3  
Old 12-04-2009
I didn't get your question. You said that your field seperator is ">" but in your code its something as your row. Can you explain in better way?
# 4  
Old 12-04-2009
Better explanation

I have this file:

Code:
>
10 0 0
13 5.92346 5.92346
16 10.3106 10.3106
19 13.9672 13.9672
22 16.9838 16.9838
25 19.4407 19.4407
28 21.4705 21.4705
31 23.1547 23.1547
34 24.6813 24.6813
37 26.0695 26.0695
>
40 27.3611 27.3611
43 28.631 28.631
46 29.8366 29.8366
49 0 0
52 32.0934 32.0934
55 33.1458 33.1458
58 34.1637 34.1637
61 35.1297 35.1297
64 36.0253 36.0253
67 36.9248 36.9248
70 37.8001 37.8001
100 37.8001 37.8001
>

From each section withing the ">" signs, I take each row and find the one having 0 as the second and third number. I take the first number. For example in the first section, it's a 10, because we find 10 0 0.

Then we take each row and subtract the first number from 10. Then check whether the result is greater than 40. If it is greater than 40, we remove the row.

Hope this described things better

Output would be

Code:
>
10 0 0
13 5.92346 5.92346
16 10.3106 10.3106
19 13.9672 13.9672
22 16.9838 16.9838
25 19.4407 19.4407
28 21.4705 21.4705
31 23.1547 23.1547
34 24.6813 24.6813
37 26.0695 26.0695
>
40 27.3611 27.3611
43 28.631 28.631
46 29.8366 29.8366
49 0 0
52 32.0934 32.0934
55 33.1458 33.1458
58 34.1637 34.1637
61 35.1297 35.1297
64 36.0253 36.0253
67 36.9248 36.9248
70 37.8001 37.8001
>

The row

Code:
100 37.8001 37.8001

has been removed because in the second section 100 - 49 > 40.

Last edited by kristinu; 12-04-2009 at 11:36 AM..
# 5  
Old 12-04-2009
Code:
awk 'BEGIN { ARGV[ARGC++] = ARGV[ARGC-1] }
FNR == NR { 
  />/ && fnr[FNR] = ++sec 
  $2 + $3 || idx[sec] = $1
  next
  }
FNR in fnr { v = idx[fnr[FNR]] }
$1 - v < max' max=40 infile



---------- Post updated at 05:10 PM ---------- Previous update was at 04:53 PM ----------

I corrected some really stupid typos from the code, sorry for the previous one Smilie

Last edited by radoulov; 12-04-2009 at 12:10 PM.. Reason: Sorry for the stupid typos :)
# 6  
Old 12-04-2009
Code:
>
10 0 0
13 5.92346 5.92346
16 10.3106 10.3106
19 13.9672 13.9672
22 16.9838 16.9838
25 19.4407 19.4407
28 21.4705 21.4705
31 23.1547 23.1547
34 24.6813 24.6813
37 26.0695 26.0695
40 27.3611 27.3611
43 28.631 28.631
46 29.8366 29.8366
49 30.9858 30.9858
52 32.0934 32.0934
55 33.1458 33.1458
58 34.1637 34.1637
61 35.1297 35.1297
64 36.0253 36.0253
67 36.9248 36.9248
70 37.8001 37.8001
73 38.6296 38.6296
76 39.4503 39.4503
79 40.2424 40.2424
82 40.997 40.997
85 41.7681 41.7681
88 42.5001 42.5001
91 43.2316 43.2316
94 43.9289 43.9289
97 44.6221 44.6221
100 45.3015 45.3015
103 45.9617 45.9617
106 46.6138 46.6138
109 47.2457 47.2457
112 47.8904 47.8904
115 48.5016 48.5016
118 49.1305 49.1305
121 49.7498 49.7498
124 50.3272 50.3272
127 50.8841 50.8841
130 51.472 51.472
133 52.0619 52.0619
136 52.6079 52.6079
139 53.1586 53.1586
142 53.7149 53.7149
145 54.2602 54.2602
148 54.7771 54.7771
151 55.3154 55.3154
154 55.8316 55.8316
157 56.366 56.366
160 56.8704 56.8704
163 57.358 57.358
166 57.8577 57.8577
169 58.338 58.338
172 58.8308 58.8308
175 59.308 59.308
178 59.7918 59.7918
181 60.2547 60.2547
184 60.7199 60.7199
187 61.1781 61.1781
190 61.643 61.643
193 62.1091 62.1091
196 62.5579 62.5579
199 62.9957 62.9957
>


Tried it on the file above, however it has not solved the problems. Rows with 196-10, 199-10 etc are all greater than 40, but they still show in the output file
# 7  
Old 12-04-2009
Quote:
Originally Posted by kristinu
[...]
Tried it on the file above, however it has not solved the problems. Rows with 196-10, 199-10 etc are all greater than 40, but they still show in the output file
First of all, a few minutes ago I fixed some errors and updated the post.
Second: am I missing something or you should remove all the records after this one:

Code:
49 30.9858 30.9858

52 - 10 > 40 ...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing top few and bottom few rows in a file

Hi, I have a requirement where I need to delete given number of top and bottom rows in a flat file which has new line as its delimiter. For ex: if top_rows=2 & bottom_rows=1 Then in a given file which looks like: New York DC LA London Tokyo Prague Paris Bombay Sydney... (7 Replies)
Discussion started by: calredd
7 Replies

2. UNIX for Dummies Questions & Answers

Removing rows that contain non-unique column entry

Background: I have a file of thousands of potential SSR primers from Batch Primer 3. I can't use primers that will contain the same sequence ID or sequence as another primer. I have some basic shell scripting skills, but not enough to handle this. What you need to know: I need to remove the... (1 Reply)
Discussion started by: msatseqs
1 Replies

3. Shell Programming and Scripting

Removing Duplicate Rows in a file

Hello I have a file with contents like this... Part1 Field2 Field3 Field4 (line1) Part2 Field2 Field3 Field4 (line2) Part3 Field2 Field3 Field4 (line3) Part1 Field2 Field3 Field4 (line4) Part4 Field2 Field3 Field4 (line5) Part5 Field2 Field3 Field4 (line6) Part2 Field2 Field3 Field4... (7 Replies)
Discussion started by: ekbaazigar
7 Replies

4. Shell Programming and Scripting

Removing duplicated first field rows

Hello, I am trying to eliminate rows where the first field is duplicated, leaving the row where the last field is "NET". Data file: 345234|22.34|LST 546543|55.33|LST 793929|98.23|LST 793929|64.69|NET 149593|49.22|LST Desired output: 345234|22.34|LST 546543|55.33|LST... (2 Replies)
Discussion started by: palex
2 Replies

5. Shell Programming and Scripting

Bash script help - removing certain rows from .csv file

Hello Everyone, I am trying to find a way to take a .csv file with 7 columns and a ton of rows (over 600,000) and remove the entire row if the cell in forth column is blank. Just to give you a little background on why I am doing this (just in case there is an easier way), I am pulling... (3 Replies)
Discussion started by: MrTuxor
3 Replies

6. Shell Programming and Scripting

Removing rows and chars from text file

Dear community, maybe I'm asking the moon :rolleyes:, but I'm scratching my head to find a solution for it. :wall: I have a file called query.out (coming from Oracle query), the file is like this: ADDR TOTAL -------------------- ---------- TGROUPAGGR... (16 Replies)
Discussion started by: Lord Spectre
16 Replies

7. Shell Programming and Scripting

removing rows from text file older than certain date

Hi I need a way of removing rows from a txt file that are older than 30 days from today, going by the date in column 2, below is an example from my file. I have tried awk but don't have enough knowledge. I would really appreciate some help. 41982,15/07/2010,H833AB/0,JZ,288... (6 Replies)
Discussion started by: firefox2k2
6 Replies

8. Shell Programming and Scripting

Removing rows based on a different file (ignore my earlier post - there was a mistake).

Sorry I made a mistake in my last post (output is suppose to be the opposite). Here is a revised post. Hi, I am not sure if this has already been asked (I tried the search but the search was too broad). Basically I want to remove rows based on another file. So file1 looks like this (tab... (3 Replies)
Discussion started by: kylle345
3 Replies

9. Shell Programming and Scripting

Removing rows based on a another file

Hi, I am not sure if this has already been asked (I tried the search but the search was too broad). Basically I want to remove rows based on another file. So file1 looks like this (tab seperated): HHN 3 5 5 HUJ 2 2 1 JJJ 3 1 1 JUN 2 1 3 I have another file (file2)... (2 Replies)
Discussion started by: kylle345
2 Replies

10. Shell Programming and Scripting

Removing rows from a file based on date comparison

I have a '|' delimited file and want to remove all the records from the file if the date is greater than a year from sysdate. The layout of the file is as below - xxxxxxxxxxxxxx|yyyyyy|zzzzzz|2009-12-27-00:00| 000000000|N xxxxxxxxxxxxxx|yyyyyy|zzzzzz|2010-01-03-00:00| 000000000|N... (4 Replies)
Discussion started by: Max_2503
4 Replies
Login or Register to Ask a Question