ITERATION: remove row based on string value

09-02-2010

Registered User

12, 0

Join Date: Sep 2010

Last Activity: 23 September 2010, 8:30 AM EDT

Posts: 12

Thanks Given: 5

Thanked 0 Times in 0 Posts

ITERATION: remove row based on string value

It is my first post, hoping to get help from the forum.

In a directory, I have 5000 multiple files that contains around 4000 rows with 10 columns in each file containing a unique string 'AT' located at 4th column.

Code:

OM   3328   O     BT   268       5.800      7.500      4.700      0.000     1.400
OM   3329   O     BT   723       8.500      8.900      3.600      8.500     1.400
OM   3330   O     AT   231       6.700      5.500      7.600      0.000     1.400
OM   3331   O     AT   234       1.200      7.700      5.500      8.500     1.400
OM   3332   O     AT   256       3.800      5.800      5.200      0.000     1.400

(Step-1)The bottom of the file needs entire few rows (only with string AT) to be removed ONLY if the 9th column is greater than a value of 0.10 . Then the kept rows in file shall be saved into a new file. An iteration command is required to do it on series of 5000 multiple files.

(Step-2)Next, a program 'calc' will be executed into this multiple new named files one by one. Again, if the 9th column is greater than value 0.10 (only for rows with string AT), then the corresponding row shall be removed from the file. Kept rows shall be renamed into new file.

I have written a short bash code below to execute the program 'calc' to series of multiple files in directory, and so far this small code for linux took me entire day to figure out because I dont have skill in writing any codes.

-------

Code:

#!/bin/sh
for d in $(\ls -d *.txt)
do
     ./calc  $d 
done

-------

(Step-3) Finally, every files that contains the same number of lines (ie, 3098, 3095, 3097 etc) shall be saved in single file, accordingly. In this case, from the original 5000 multiple files, the output file expected can be divided for example into:

Code:

3098 filename = containing all  files with 3098 lines
3095  filename = containing all files with 3095 lines
3097 filename =  containing all files with 3097 lines

Thank you so much for your time and attention.

-A

---------- Post updated at 02:08 PM ---------- Previous update was at 11:50 AM ----------

To tackle the problem in each step, first I need to remove matching lines by string and value.

In GNU/Linux x86_64:

Code:

awk  '($4 ~ /^AT$/){print}' newfile

The code above says that 4th column with matching string AT, will print into newfile. BUT, I need to tell the script that ONLY if the 9th column has value in between 0.00-0.10 ? How to do that in bash shell ?

Code:

> cat file
OM   3328   O     BT   268       5.800      7.500      4.700      0.000      1.400
OM   3329   O     BT   723       8.500      8.900      3.600      8.500      1.400
OM   3330   O     AT   231       6.700      5.500      7.600      0.000      1.400
OM   3331   O     AT   234       1.200      7.700      5.500      8.500      1.400
OM   3332   O     AT   256       3.800      5.800      5.200      0.100      1.400

Code:

> cat newfile
 OM   3330   O     AT   231       6.700      5.500      7.600      0.000      1.400
 OM   3332   O     AT   256       3.800      5.800      5.200      0.100      1.400

Please help.

-A

Last edited by Franklin52; 09-02-2010 at 03:17 AM.. Reason: Please use code tags, thank you!

asanjuan

View Public Profile for asanjuan

Find all posts by asanjuan

09-02-2010

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

Quote:

Originally Posted by asanjuan

To tackle the problem in each step, first I need to remove matching lines by string and value.

In GNU/Linux x86_64:

Code:

awk  '($4 ~ /^AT$/){print}' newfile

Code:

> cat file
OM   3328   O     BT   268       5.800      7.500      4.700      0.000      1.400
OM   3329   O     BT   723       8.500      8.900      3.600      8.500      1.400
OM   3330   O     AT   231       6.700      5.500      7.600      0.000      1.400
OM   3331   O     AT   234       1.200      7.700      5.500      8.500      1.400
OM   3332   O     AT   256       3.800      5.800      5.200      0.100      1.400

Code:

> cat newfile
 OM   3330   O     AT   231       6.700      5.500      7.600      0.000      1.400
 OM   3332   O     AT   256       3.800      5.800      5.200      0.100      1.400

Please help.

-A

Try this:

Code:

awk '$4=="AT" && $9 <= 0.1 && $9 >= 0.0' file

This User Gave Thanks to Franklin52 For This Post:

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

09-02-2010

Registered User

12, 0

Join Date: Sep 2010

Last Activity: 23 September 2010, 8:30 AM EDT

Posts: 12

Thanks Given: 5

Thanked 0 Times in 0 Posts

ITERATION: remove row based on string value

Make myself clear:

Code:

> cat file
OM   3328   O     BT   268       5.800      7.500      4.700      0.000      1.400
OM   3329   O     BT   723       8.500      8.900      3.600      8.500      1.400
OM   3330   O     AT   231       6.700      5.500      7.600      0.000      1.400
OM   3331   O     AT   234       1.200      7.700      5.500      8.500      1.400
OM   3332   O     AT   256       3.800      5.800      5.200      0.100      1.400

Code:

> cat newfile
OM   3328   O     BT   268       5.800      7.500      4.700      0.000      1.400
OM   3329   O     BT   723       8.500      8.900      3.600      8.500      1.400
OM   3330   O     AT   231       6.700      5.500      7.600      0.000      1.400
OM   3332   O     AT   256       3.800      5.800      5.200      0.100      1.400

Using awk the output newfile should ONLY delete lines with AT string (4th column) with value greater than 0.10 at (9th column).

Moderator's Comments:

Please use code tags, thank you

---------- Post updated at 03:09 PM ---------- Previous update was at 02:48 PM ----------

Thank you moderator for the kind reply.

awk '$4=="AT" && $9 <= 0.1 && $9 >= 0.0' file

Code above you gave prints line with string AT (4th column) and with value 0.0-0.10 (9th column).

Since one file contains 4000 lines with about 30 different strings at 4th column, it would be better that instead of above task, I would create a code (with help from forum) to just delete rows with AT string and >0.10 value (9th column). This way, rest of the lines with non "AT" string will kept inside the file.

asanjuan

View Public Profile for asanjuan

Find all posts by asanjuan

09-02-2010

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

Quote:

Originally Posted by asanjuan

Make myself clear:

Code:

> cat file
OM   3328   O     BT   268       5.800      7.500      4.700      0.000      1.400
OM   3329   O     BT   723       8.500      8.900      3.600      8.500      1.400
OM   3330   O     AT   231       6.700      5.500      7.600      0.000      1.400
OM   3331   O     AT   234       1.200      7.700      5.500      8.500      1.400
OM   3332   O     AT   256       3.800      5.800      5.200      0.100      1.400

Code:

> cat newfile
OM   3328   O     BT   268       5.800      7.500      4.700      0.000      1.400
OM   3329   O     BT   723       8.500      8.900      3.600      8.500      1.400
OM   3330   O     AT   231       6.700      5.500      7.600      0.000      1.400
OM   3332   O     AT   256       3.800      5.800      5.200      0.100      1.400

Using awk the output newfile should ONLY delete lines with AT string (4th column) with value greater than 0.10 at (9th column).

Moderator's Comments:

Please use code tags, thank you

Something like this?

Code:

awk '!($4=="AT" && $9 > 0.10)' file

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

09-02-2010

Registered User

12, 0

Join Date: Sep 2010

Last Activity: 23 September 2010, 8:30 AM EDT

Posts: 12

Thanks Given: 5

Thanked 0 Times in 0 Posts

ITERATION: remove row based on string value

It work !!!! Step-1 is almost done;

I tried to create iteration loop:
---------------

Code:

#!/bin/bash
for d in $(\ls -d *.asa)
do
  awk '!($4=="AT" && $9 > 0.10)' > file
done

-----------------

Question is how to printout an output for each files in d ? The 'file' written above needs to be changed such that each output is written into separate files accordingly.

Thank you for your kind help.

Last edited by Franklin52; 09-02-2010 at 05:48 AM.. Reason: Please use code tags and indent your code

asanjuan

View Public Profile for asanjuan

Find all posts by asanjuan

09-02-2010

Registered User

73, 6

Join Date: Aug 2010

Last Activity: 21 July 2011, 4:22 AM EDT

Posts: 73

Thanks Given: 16

Thanked 6 Times in 6 Posts

just execute the same code on your files.so 1st and last step will be performed.
1.

Code:

 
#!/bin/ksh
for i in `ls *.txt`
do
nawk '!($4=="AT" && $9>0.10 )' $i > newfile_$i.txt
done

Code:

 
#!/bin/ksh
wc -l newfile*|grep -v total >records.txt
nawk '{print $1}' records.txt|sort -u >uniq.txt
for i in `cat uniq.txt`
do 
 for j in `cat records.txt|grep "$i "| nawk '{print$2}'`
 do
 cat $j >> $i_output
 done
done

Try the above one

Last edited by malikshahid85; 09-02-2010 at 06:18 AM..

This User Gave Thanks to malikshahid85 For This Post:

malikshahid85

View Public Profile for malikshahid85

Find all posts by malikshahid85

09-02-2010

Registered User

12, 0

Join Date: Sep 2010

Last Activity: 23 September 2010, 8:30 AM EDT

Posts: 12

Thanks Given: 5

Thanked 0 Times in 0 Posts

ITERATION: remove row based on string value

malikshahid85, Thanks so much for great help. Code-1 works well but Code-3 has shell issue or error:

line 9: $i_output: ambiguous redirect

I have tried searching in forum but not successful. Would you please further help me.

Thanks again. There are lot of wonderful people helping in this forum.

asanjuan

View Public Profile for asanjuan

Find all posts by asanjuan

Shell Programming and Scripting

ITERATION: remove row based on string value

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting single row into multiple rows based on for every 10 digits of last field of the row

Discussion started by: kotra

2. Shell Programming and Scripting

Trying to remove duplicates based on field and row

Discussion started by: newbie2010

3. Programming

String array iteration causing segfault

Discussion started by: rupeshkp728

4. UNIX for Dummies Questions & Answers

Remove lines in a positional file based on string value

Discussion started by: gsam

5. Shell Programming and Scripting

Remove line based on string and put new line with parameter

Discussion started by: victor369

6. Shell Programming and Scripting

String manipulation row by row

Discussion started by: patric2326

7. Shell Programming and Scripting

remove characters from string based on occurrence of a string

Discussion started by: victor369

8. UNIX for Dummies Questions & Answers

How to remove duplicated based on longest row & largest value in a column

Discussion started by: reva

9. Shell Programming and Scripting

remove row if string is same as previous row

Discussion started by: dcfargo

10. Shell Programming and Scripting

Remove duplicate files based on text string?

Discussion started by: spangberg