Nawk script to compare records of a file based on a particular column.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Nawk script to compare records of a file based on a particular column.
# 1  
Old 08-25-2011
Power Nawk script to compare records of a file based on a particular column.

Hi Gurus,

I am struggling with nawk command where i am processing a file based on columns.

Here is the sample data file.

Code:
 
UM113570248|24-AUG-11|4|man1|RR211 Alert: Master Process failure |24-AUG-11
UM113570624|24-AUG-11|4|man1| Alert: Pattern 'E_DCLeDAOException' found |24-AUG-11
UM113570248|24-AUG-11|4|man1|RR231:  Alert: Master Process failure |
UM113569599|24-AUG-11|4|man1|RRWEB413 ChartsDCC RoadRunner App|OLTX_Iote6|24-AUG-11


In this file, the first column is the primary key.But there are some duplicate entries in first column.

in this case :

Code:
 
UM113570248|24-AUG-11|4|man1|RR211 Alert: Master Process failure |24-AUG-11
UM113570248|24-AUG-11|4|man1|RR231:  Alert: Master Process failure |

Now my requirement is..
I want to search the file according to the first column and if there are any duplicate entries in first column then i will search the 6th column and print that row in which the 6th column is not empty.

My desired output is as below:

Code:
 
UM113570248|24-AUG-11|4|man1|RR211 Alert: Master Process failure |24-AUG-11
UM113570624|24-AUG-11|4|man1| Alert: Pattern 'E_DCLeDAOException' found |24-AUG-11
UM113569599|24-AUG-11|4|man1|RRWEB413 ChartsDCC RoadRunner App|OLTX_Iote6|24-AUG-11

I have till now tried this script but somehow its not giving me the desired output.

Code:
 
cat testingdata.txt | nawk -F"|" '{ if (!a[$1]) {a[$1]; if ( $6!= "" ) print $0 > "out1" ; else exit }}'

Can anyone help me understand what i am doing wrong in my awk script.

Thanks in Advance
Usha
# 2  
Old 08-25-2011
you have missed the flower brackets.. corrected one..
Code:
$ nawk -F"|" '{ if (!a[$1]) {a[$1]; if ( $6!= "" ) { print $0 } } } ' inputfile
UM113570248|24-AUG-11|4|man1|RR211 Alert: Master Process failure |24-AUG-11
UM113570624|24-AUG-11|4|man1| Alert: Pattern 'E_DCLeDAOException' found |24-AUG-11
UM113569599|24-AUG-11|4|man1|RRWEB413 ChartsDCC RoadRunner App|OLTX_Iote6|24-AUG-11

# 3  
Old 08-25-2011
Hi jayan,

This works only if i have one repeating entry in column 1.
Although it prints the rows where first column is repeating which is having sixth column as NON empty.
But its not printing the rows where the first column is not repeating and sixth column is empty.


Suppose my input file is like this :

Code:
 
UM113568840|24-AUG-11|4|man1|Homepage and News: Urge|
UM113570248|24-AUG-11|4|man1|s detected. UMmediate recycle and validation require|
UM113569599|24-AUG-11|4|man1|Rractivequote6|
UM113570248|24-AUG-11|4|man1|s detected. UMmediate recycle and validation require|24-AUG-11
UM113570624|24-AUG-11|4|man1|Alert: Pattern 'E_DCLeDAOException' |24-AUG-11
UM113568737|24-AUG-11|4|man1| Pattern 'E_DCLeDAOException' |
UM113569231|24-AUG-11|4|man1|is detected. UMmediate recycle and validation require|
UM113569231|24-AUG-11|4|man1|is detected. UMmediate recycle and validation require|24-AUG-11
UM113569346|24-AUG-11|4|man1|FEBMARKETAPP321: Receving multiple alerts for e.log|
UM113571032|24-AUG-11|4|man1|FEBMARKETAPP321: LogFile Alert: Pattern '|
UM113568626|24-AUG-11|3|man1-test|Brokerage, Annuity and Mutual fund Account Details are not been displayed on QA4|
UM113570137|24-AUG-11|4|man1|C - PROD - Equity - Pricing - window does not display a lin|
UM113570659|24-AUG-11|4|man1|-LOGIN-CORADIANT: 4 On 2011-0|
UM113569599|24-AUG-11|4|man1|Rractivequote6|24-AUG-11
UM113570624|24-AUG-11|4|man1|Alert: Pattern 'E_DCLeDAOException'|
UM113570841|24-AUG-11|4|man1|t: SWAP Utilization (91%) i|
UM113568909|24-AUG-11|4|man1|OMEPAGE-AND-NEWS-CORADIANT: Coradiant Alert:  Urge|
UM113571191|25-AUG-11|4|man1|ert: Pattern 'E_DCLeDAOException' found in |


This particular statement --->

Code:
 
nawk -F"|" '{ if (!a[$1]) {a[$1]; if ( $6!= "" ) { print $0 } } } ' infile

is giving me the output like this --->

Code:
 
UM113570248|24-AUG-11|4|man1|s detected. UMmediate recycle and validation require|24-AUG-11
UM113570624|24-AUG-11|4|man1|Alert: Pattern 'E_DCLeDAOException' |24-AUG-11
UM113569231|24-AUG-11|4|man1|is detected. UMmediate recycle and validation require|24-AUG-11
UM113569599|24-AUG-11|4|man1|Rractivequote6|24-AUG-11


whereas the desired output is this ---->



Code:
 
UM113568840|24-AUG-11|4|man1|Homepage and News: Urge|
UM113570248|24-AUG-11|4|man1|s detected. UMmediate recycle and validation require|24-AUG-11
UM113570624|24-AUG-11|4|man1|Alert: Pattern 'E_DCLeDAOException' |24-AUG-11
UM113568737|24-AUG-11|4|man1| Pattern 'E_DCLeDAOException' |
UM113569231|24-AUG-11|4|man1|is detected. UMmediate recycle and validation require|24-AUG-11
UM113569346|24-AUG-11|4|man1|FEBMARKETAPP321: Receving multiple alerts for e.log|
UM113571032|24-AUG-11|4|man1|FEBMARKETAPP321: LogFile Alert: Pattern '|
UM113568626|24-AUG-11|3|man1-test|Brokerage, Annuity and Mutual fund Account Details are not been displayed on QA4|
UM113570137|24-AUG-11|4|man1|C - PROD - Equity - Pricing - window does not display a lin|
UM113570659|24-AUG-11|4|man1|-LOGIN-CORADIANT: 4 On 2011-0|
UM113569599|24-AUG-11|4|man1|Rractivequote6|24-AUG-11
UM113570841|24-AUG-11|4|man1|t: SWAP Utilization (91%) i|
UM113568909|24-AUG-11|4|man1|OMEPAGE-AND-NEWS-CORADIANT: Coradiant Alert:  Urge|
UM113571191|25-AUG-11|4|man1|ert: Pattern 'E_DCLeDAOException' found in |


for a duplicate entry in column 1 there are two rows ---> one having entry in column 6 and other empty column 6.
I want to print only the row having NON empty column 6 and the the having duplicate entry in column 1 and empty column 6 should be discarded.

all other rows having distinct entries in column 1 and empty entries in column 6 should be retained.


Kindly let me know if i am unable to describe the problem.

Thanks
Usha
# 4  
Old 08-26-2011
alternate one ..
Code:
#!/bin/bash
nawk -F\| '{if(a[$1]==a[$1]);if($6!= "") {print $0 >"with_val"} else {if(a[$1]==a[$1]);if($6== "" ) {print $0 >"without_val"}}}' inputfile
nawk -F"|" '{ if ( $6!= "" ) { print $1 | "xargs" } } ' inputfile | sed 's, ,|,g;s,^,egrep -v \",g;s,$,\" without_val,g' | sh > file1
cat with_val file1

This User Gave Thanks to jayan_jay For This Post:
# 5  
Old 08-26-2011
Thanks Jayan for your reply.

I am still a learner in nawk. Could you please explain these two nawk commands.

Thanks again,
Usha
# 6  
Old 08-26-2011
Hope this clears Smilie
Quote:
first line will produce two files
"with_val" --> contains lines with non-empty 6th column
"without_val" --> contains lines with empty 6th column
second line will applies "egrep -v" pattern (using sed) over the without_val file in order to remove the duplicate entries having 6th column as empty.
This User Gave Thanks to jayan_jay For This Post:
# 7  
Old 08-26-2011
MySQL

Thanks a lot Jayan... Smilie
I tried with different input files and this is perfect!!
I have a long way to match the expertise.. Smilie

Cheers,
Usha
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Filtering records of a csv file based on a value of a column

Hi, I tried filtering the records in a csv file using "awk" command listed below. awk -F"~" '$4 ~ /Active/{print }' inputfile > outputfile The output always has all the entries. The same command worked for different users from one of the forum links. content of file I was... (3 Replies)
Discussion started by: sunilmudikonda
3 Replies

2. Shell Programming and Scripting

Compare two csv's with column based

Hi, I am having below two CSV's col_1,col_2,col_3 1,2,4 1,3,6 col_1,col_3,col2,col_5,col_6 1,2,3,4,5 1,6,3,,, I need to compare based on the columns where the mismatch is expected output col_1,col_2,col_3 1,2,4 (3 Replies)
Discussion started by: rohit_shinez
3 Replies

3. Shell Programming and Scripting

Compare two files based on column

Hi, I have two files roughly 1200 fields in length for each row, sorted on the 2nd field. I need to compare based on that 2nd column between file1 and file2 and print lines that exist in both files into separate files (I can't guarantee that every line in file1 is in file2). Example: File1: ... (1 Reply)
Discussion started by: origon
1 Replies

4. Shell Programming and Scripting

Compare based on column value

Hi Experts, I want to compare 2 text files based on their column values text1 is like prd-1234 yes no yes yes prd-2345 no no no yes prd-6475 yes yes yes no and test 2 is prd-1234 no no no yes prd-2345 yes no no no desired out put as follows prd-1234 1 3 prd-235 1 4 basically it shows... (5 Replies)
Discussion started by: tijomonmathew
5 Replies

5. Shell Programming and Scripting

Removing duplicate records in a file based on single column explanation

I was reading this thread. It looks like a simpler way to say this is to only keep uniq lines based on field or column 1. https://www.unix.com/shell-programming-scripting/165717-removing-duplicate-records-file-based-single-column.html Can someone explain this command please? How are there no... (5 Replies)
Discussion started by: cokedude
5 Replies

6. Emergency UNIX and Linux Support

[Solved] Extract records based on a repeated column value

Hi guys, I need help in making a command to find some data. I have multiple files in which multiple records are present.. Each record is separated with a carriage return and in each record there are multiple fields with each field separated by "|" what i want is that I want to extract... (1 Reply)
Discussion started by: m_usmanayub
1 Replies

7. Shell Programming and Scripting

Removing duplicate records in a file based on single column

Hi, I want to remove duplicate records including the first line based on column1. For example inputfile(filer.txt): ------------- 1,3000,5000 1,4000,6000 2,4000,600 2,5000,700 3,60000,4000 4,7000,7777 5,999,8888 expected output: ---------------- 3,60000,4000 4,7000,7777... (5 Replies)
Discussion started by: G.K.K
5 Replies

8. Shell Programming and Scripting

Compare files column to column based on keys

Here is my situation. I need to compare two tab separated files (diff is not useful since there could be known difference between files). I have found similar posts , but not fully matching.I was thinking of writing a shell script using cut and grep and while loop but after going thru posts it... (2 Replies)
Discussion started by: blackjack101
2 Replies

9. UNIX for Dummies Questions & Answers

Filtering records of a file based on a value of a column

Hi all, I would like to extract records of a file based on a condition. The file contains 47 fields, and I would like to extract only those records that match a certain value in one of the columns, e.g. COL1 COL2 COL3 ............... COL47 1 XX 45 ... (4 Replies)
Discussion started by: risk_sly
4 Replies

10. UNIX for Dummies Questions & Answers

Select records based on search criteria on first column

Hi All, I need to select only those records having a non zero record in the first column of a comma delimited file. Suppose my input file is having data like: "0","01/08/2005 07:11:15",1,1,"Created",,"01/08/2005" "0","01/08/2005 07:12:40",1,1,"Created",,"01/08/2005"... (2 Replies)
Discussion started by: shashi_kiran_v
2 Replies
Login or Register to Ask a Question