awk/sed to get unique row


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk/sed to get unique row
# 1  
Old 06-30-2011
awk/sed to get unique row

Hello ALL,

I have very huge file almost 25G size
contents of the file are "|" delimited columns on each row

Code:
eg:
1396745|1078529|KDS|2011-04-21 00:00:00.0|1100|30|2|2011-04-20 22:35:24.0|2011-04-20 22:35:24.0|0|2011-04-21 00:00:00.0|1100|2222434|2011-04-21 11:00:00.0|0|0|2011-06-29 00:05:10
1396745|1078529|KDS|2011-04-21 00:00:00.0|1100|30|2|2011-04-20  22:35:24.0|2011-04-20 22:35:24.0|0|2011-04-21 00:00:00.0|1100|2222434|2011-04-21 11:00:00.0|0|0|2011-06-29 00:20:10

col1, col2 combination is the key

i need unique row based on these two columns

Code:
eg:

1396745|1078529|KDS|2011-04-21 00:00:00.0|1100|30|2|2011-04-20 22:35:24.0|2011-04-20 22:35:24.0|0|2011-04-21 00:00:00.0|1100|2222434|2011-04-21 11:00:00.0|0|0|2011-06-29 00:20:10

i need the one with higher timestamp too

i dont want to load 25 gig file with duplicates in to DB
So please suggest a awk/sed to remove the duplicates

Thanks
# 2  
Old 06-30-2011
Is the data ordered by the key values (the first and the second field) and the timestamp in the last column?

In this case something like this should work:

Code:
awk -F\| 'END {
  if (prev)
    print prev
  } 
!key[$1, $2]++ && NR > 1 {
  print prev
  prev = x
  }
{ prev = $0 }' infile


Last edited by radoulov; 06-30-2011 at 11:13 AM..
# 3  
Old 06-30-2011
Code:
awk -F'|' '!(key[$1$2]){print;key[$1$2]=1}' yourfile


Last edited by pludi; 06-30-2011 at 11:10 AM.. Reason: correction
This User Gave Thanks to pludi For This Post:
# 4  
Old 06-30-2011
Quote:
Originally Posted by radoulov
Is the data ordered (by the key values (the first and the second field) and the timestamp in the last column?
no the first two columns are not sorted ,
most chances are the timestamp(last) column will be sorted but even if the getting higher timestamp causes long script then we can eliminate it
# 5  
Old 06-30-2011
Well,
ignoring the timestamp in the last column:

Code:
awk -F\| '!key[$1, $2]++' infile

This User Gave Thanks to radoulov For This Post:
# 6  
Old 06-30-2011
And this should handle the timestamp too:

Code:
awk -F\| 'END {
  for (R in rec)
    print rec[R]
  }
$NF > max[$1, $2] { 
    max[$1, $2] = $NF
    rec[$1, $2] = $0
    }' infile


Last edited by radoulov; 06-30-2011 at 11:23 AM.. Reason: Refactoring.
# 7  
Old 06-30-2011
Quote:
Originally Posted by pludi
Code:
awk -F'|' '!(key[$1$2]){print;key[$1$2]=1}' yourfile

thanks Pludi
One question does this give higher timestamp
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Unique values in a row sum the next column in UNIX

Hi would like to ask you guys any advise regarding my problem I have this kind of data file.txt 111111111,20 111111111,50 222222222,70 333333333,40 444444444,10 444444444,20 I need to get this file1.txt 111111111,70 222222222,70 333333333,40 444444444,30 using this code I can... (6 Replies)
Discussion started by: reks
6 Replies

2. Shell Programming and Scripting

Reading and appending a row from file1 to file2 using awk or sed

Hi, I wanted to add each row of file2.txt to entire length of file1.txt given the sample data below and save it as new file. Any idea how to efficiently do it. Thank you for any help. input file file1.txt file2.txt 140 30 200006 141 32 140 32 200006 142 33 140 35 200006 142... (5 Replies)
Discussion started by: ida1215
5 Replies

3. Shell Programming and Scripting

Awk/sed script for transposing any number of rows with header row

Greetings! I have been trying to find out a way to take a CSV file with a large number of rows, and a very large number of columns (in the thousands) and convert the rows to a single column of data, where the first row is a header representing the attribute name and the subsequent series of... (3 Replies)
Discussion started by: tntelle
3 Replies

4. Shell Programming and Scripting

Print unique names in each row of a specific column using awk

Is it possible to remove redundant names in the 4th column? input cqWE 100 200 singapore;singapore AZO 300 400 brazil;america;germany;ireland;germany .... .... output cqWE 100 200 singapore AZO 300 400 brazil;america;germany;ireland (4 Replies)
Discussion started by: quincyjones
4 Replies

5. Shell Programming and Scripting

Need an awk / sed / or perl one-liner to remove last 4 characters with non-unique pattern.

Hi, I'm writing a ksh script and trying to use an awk / sed / or perl one-liner to remove the last 4 characters of a line in a file if it begins with a period. Here is the contents of the file... the column in which I want to remove the last 4 characters is the last column. ($6 in awk). I've... (10 Replies)
Discussion started by: right_coaster
10 Replies

6. Shell Programming and Scripting

Combining multiple rows in single row based on certain condition using awk or sed

Hi, I'm using AIX(ksh shell). > cat temp.txt "a","b",0 "c",bc",0 "a1","b1",0 "cc","cb",1 "cc","b2",1 "bb","bc",2 I want the output as: "a","b","c","bc","a1","b1" "cc","cb","cc","b2" "bb","bc" I want to combine multiple lines into single line where third column is same. Is... (1 Reply)
Discussion started by: samuelray
1 Replies

7. Shell Programming and Scripting

Replace last row of a column in bash/awk/sed

Hi, I've got a file with 3 columns which ends like this: ... 1234 345 1400 5287 733 1400 8472 874 1400 9317 726 1400 I want to replace the last row of the last column with the value 0. So my new file will end: ... 1234 345 1400 5287 733 1400 8472 874 1400 9317 726 ... (5 Replies)
Discussion started by: jhunter87
5 Replies

8. Shell Programming and Scripting

Concatenating column values with unique id into single row

Hi, I have a table in Db2 with data say id_1 phase1 id_1 phase2 id_1 phase3 id_2 phase1 id_2 phase2 I need to concatenate the values like id_1 phase1,phase2,phase3 id_2 phase1,phase2 I tried recursive query but in vain as the length of string to be concatenated in quite long. ... (17 Replies)
Discussion started by: jsaravana
17 Replies

9. Shell Programming and Scripting

shell script(Preferably awk or sed) to print selected number of columns from each row

Hi Experts, The question may look very silly by seeing the title, but please have a look at it clearly. I have a text file where the first 5 columns in each row were supposed to be attributes of a sample(like sample name, number, status etc) and the next 25 columns are parameters on which... (3 Replies)
Discussion started by: ks_reddy
3 Replies

10. Shell Programming and Scripting

Add row, awk, sed ?

I wrote script in bash which generates this report: phrase1;phrase2;phrase3;phrase4;phrase5;phrase6;phrase7;phrase8 phrase9;phrase2;phrase10;phrase4;phrase11;phrase12;phrase13;phrase14 phrase15;phrase16;phrase17;phrase18;phrase19;phrase20;phrase21;phrase22 ... I would like add name only... (3 Replies)
Discussion started by: patrykxes
3 Replies
Login or Register to Ask a Question