awk to ignore multiple rows based on a condition


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to ignore multiple rows based on a condition
# 1  
Old 02-24-2016
awk to ignore multiple rows based on a condition

All,
I have a text file(Inputfile.csv) with millions of rows and 100 columns. Check the sample for 2 columns below.
Code:
Key,Check
A,1
A,2
A,
A,4
B,0
B,1
B,2
B,3
B,4
....
million rows.

My requirement is to delete all the rows corresponding to all the keys which ever has at least one blank cell in Check column.

Outputfile.csv
Code:
Key,Check
B,0
B,1
B,2
B,3
B,4

Currently I am using the following code
Code:
awk -F, '$1==""' inputfile.csv |awk -F, '{print $1}' |uniq >list_of_keys_to_ignore.txt
for each in `cat list_of_keys_to_ignore.txt`; do grep -v $each inputfile.csv; done >Outputfile.csv

But this script is taking lot of time(especially grep -v) as I have millions of rows and 100's of columns.

Please suggest a faster alternative to my above code.

Thanks and Regards
Sidda
# 2  
Old 02-24-2016
How about
Code:
sort -t, -k2 file | awk -F, '$2 == "" {T[$1]} !($1 in T)'

# 3  
Old 02-24-2016
Hi ks_reddy,
Assuming that the Check values in your input are all numeric values, I note that RudiC's code will sort the header from your input file to the end of your output file. And by using the 2nd field as the primary sort key, the output will be grouped by (alphanumeric; not numeric) Check values while your input seems to be grouped by Key values.

Does your real input have all lines for each distinct Key value grouped together?

Do you want the header line in the output file? If so, does the header need to be kept as the first line in the output?

Does the order of other lines in the output matter? If so, does the input order need to be maintained in the output? Or is a different sort order required (and, if so, what order)?

Approximately how many distinct Key values are there in your real input? Approximately how many of those Key values will need to be removed?
# 4  
Old 02-24-2016
try also:
Code:
awk -F, 'NR==FNR {if ($2 !~ /./) a[$1]=1; next;} ! a[$1] ' inputfile.csv inputfile.csv > output.csv

This User Gave Thanks to rdrtx1 For This Post:
# 5  
Old 02-24-2016
Hi rdrtx1,
Your code works very well. Thank you so much.

---------- Post updated at 05:27 AM ---------- Previous update was at 05:26 AM ----------

Hello Don,
As mentioned already the code suggested by rdrtx1 works well as my output require header to be kept as it is and also the original order to be kept.

---------- Post updated at 05:31 AM ---------- Previous update was at 05:27 AM ----------

Quote:
Originally Posted by RudiC
How about
Code:
sort -t, -k2 file | awk -F, '$2 == "" {T[$1]} !($1 in T)'

Hi Rudi,
I should not sort my whole input data. So I followed the code suggested by the user rdrtx1. It works perfectly.
The speed comparison.
Your code took 90 secs on my sample data and rdrtx1's code took 21 secs.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Combine multiple rows based on selected column keys

Hello I want to collapse a file with multiple rows into consolidated lines of entries based on selected columns as the 'key'. Example: 1 2 3 Abc def ghi 1 2 3 jkl mno p qrts 6 9 0 mno def Abc 7 8 4 Abc mno mno abc 7 8 9 mno mno abc 7 8 9 mno j k So if columns 1, 2 and 3 are... (6 Replies)
Discussion started by: linuxlearner123
6 Replies

2. Shell Programming and Scripting

Convert rows to columns based on condition

I have a file some thing like this: GN Name=YWHAB; RC TISSUE=Keratinocyte; RC TISSUE=Thymus; CC -!- FUNCTION: Adapter protein implicated in the regulation of a large CC spectrum of both general and specialized signaling pathways GN Name=YWHAE; RC TISSUE=Liver; RC ... (13 Replies)
Discussion started by: raj_k
13 Replies

3. Shell Programming and Scripting

Extracting rows and columns in a matrix based on condition

Hi I have a matrix with n rows and m columns like below example. i want to extract all the pairs with values <200. Input A B C D A 100 206 51 300 B 206 100 72 48 C 351 22 100 198 D 13 989 150 100 Output format A,A:200 A,C:51 B,B:100... (2 Replies)
Discussion started by: anurupa777
2 Replies

4. Shell Programming and Scripting

awk code to ignore the first occurence unknown number of rows in a data column

Hello experts, Shown below is the 2 column sample data(there are many data columns in actual input file), Key, Data A, 1 A, 2 A, 2 A, 3 A, 1 A, 1 A, 1 I need the below output. Key, Data A, 2 A, 2 A, 3 A, 1 A, 1 A, 1 (2 Replies)
Discussion started by: ks_reddy
2 Replies

5. Shell Programming and Scripting

awk based script to ignore all columns from a file which contains character strings

Hello All, I have a .CSV file where I expect all numeric data in all the columns other than column headers. But sometimes I get the files (result of statistics computation by other persons) like below( sample data) SNO,Data1,Data2,Data3 1,2,3,4 2,3,4,SOME STRING 3,4,Inf,5 4,5,4,4 I... (9 Replies)
Discussion started by: ks_reddy
9 Replies

6. Shell Programming and Scripting

Selecting rows from a pipe delimited file based on condition

HI all, I have a simple challenge for you.. I have the following pipe delimited file 2345|98|1809||x|969|0 2345|98|0809||y|0|537 2345|97|9809||x|544|0 2345|97|0909||y|0|651 9685|98|7809||x|321|0 9685|98|7909||y|0|357 9685|98|7809||x|687|0 9685|98|0809||y|0|234 2315|98|0809||x|564|0 ... (2 Replies)
Discussion started by: nithins007
2 Replies

7. Shell Programming and Scripting

Combining multiple rows in single row based on certain condition using awk or sed

Hi, I'm using AIX(ksh shell). > cat temp.txt "a","b",0 "c",bc",0 "a1","b1",0 "cc","cb",1 "cc","b2",1 "bb","bc",2 I want the output as: "a","b","c","bc","a1","b1" "cc","cb","cc","b2" "bb","bc" I want to combine multiple lines into single line where third column is same. Is... (1 Reply)
Discussion started by: samuelray
1 Replies

8. Shell Programming and Scripting

Removing rows based on a different file (ignore my earlier post - there was a mistake).

Sorry I made a mistake in my last post (output is suppose to be the opposite). Here is a revised post. Hi, I am not sure if this has already been asked (I tried the search but the search was too broad). Basically I want to remove rows based on another file. So file1 looks like this (tab... (3 Replies)
Discussion started by: kylle345
3 Replies

9. Shell Programming and Scripting

Getting rip of multiple rows based on column1

Hi, I want to get rid of multiple rows (duplicate, triplicate etc..) for only column 1. e.g. iu 2 iu 1 iu 3 k 4 jk 3 nm 4 nm 2 output k 4 jk 3 thanks (7 Replies)
Discussion started by: phil_heath
7 Replies

10. Shell Programming and Scripting

awk to select rows based on condition on column

I have got a file like this 003ABC00281020091005000100042.810001 ... (8 Replies)
Discussion started by: Maruti
8 Replies
Login or Register to Ask a Question