Deduping file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Deduping file
# 1  
Old 04-13-2008
Deduping file

I could really use some help!

I need to dedup a file (let's call it - Test_File.dat) by the first field and then concatenate one value within it....I guess you could say I need to normalize the file.

Test_File.day looks something like:
aaa|bbb|ccc|123|ddd
abc|def|ghi|123|jkm
aaa|bbb|ccc|456|ddd
abc|def|ghi|456|jkm
aaa|bbb|ccc|789|ddd
abc|def|ghi|789|jkm
I need it to be:
aaa|bbb|ccc|123;456;789|ddd
abc|def|ghi|123;456;789|jkm
I've been looking into commands like "awk" and "sort", but I'm above my pay grade.

Thanks,
-Clueless
# 2  
Old 04-13-2008
Do you need to preserve sort order, or could you start by sorting the file? That would make it a little bit less challenging, and reduce memory needs. The other way is also doable, but not quite as elegant.

Code:
sort file |
awk 'BEGIN { FS = OFS = "|" }
{ key=$1 OFS $2 OFS $3 OFS $5;
  if (key == prev) { four=four (four ? ";" : "") $4 }
  else { if (four) print prev, four; four = $4 }
  prev = key }
END { if (four) print prev, four }'

I cheated and rearranged the field order for simplicity. Maybe you can figure out how to move the fourth field back to the fourth position.

Tested on Ubuntu mawk; if you have another awk, minor changes may be required.

Last edited by era; 04-13-2008 at 03:16 AM.. Reason: Oops, error in else clause, would drop first $4
# 3  
Old 04-13-2008
Quote:
but I'm above my pay grade.
I don't quite understand this ...

here is something in perl
Code:
#! /opt/third-party/bin/perl

open(FILE, "<", "sample.txt") or die "Unable to open file <$!>\n";

while(<FILE>) {
  chomp;
  my @arr = split(/\|/);
  my $tmp = "$arr[0]#$arr[1]#$arr[2]#$arr[4]";
  if( defined $fileHash{$tmp} ) {
    $fileHash{$tmp} .= ( $arr[3] . "#" );
  }
  else {
    $fileHash{$tmp} = ($arr[3] . "#");
  }
}

close(FILE);

foreach my $k (keys %fileHash) {
  my @arr = split(/#/, $k);
  print "$arr[0]|$arr[1]|$arr[2]|";
  my $val = $fileHash{$k};
  $val =~ s/#/\;/g;
  $val =~ s/\;$//;
  print "$val|$arr[3]\n";
}

exit(0);

# 4  
Old 04-13-2008
If the order doesn't matter:

Code:
awk 'END { for (k in x) 
print k, x[k], y[k] }
{ x[$1FS$2FS$3] = x[$1FS$2FS$3] ? x[$1FS$2FS$3] ";" $4 : $4
y[$1FS$2FS$3] = $5 } ' FS=\| OFS=\| Test_File.dat

Otherwise:

Code:
awk 'END { for (i=1; i<=NR; i++)
if (w[i]) 
  print  w[i], x[w[i]], y[w[i]] } 
!z[$1FS$2FS$3]++ { w[NR] = $1FS$2FS$3 }
{ x[$1FS$2FS$3] = x[$1FS$2FS$3] ? x[$1FS$2FS$3] ";" $4 : $4
y[$1FS$2FS$3] = $5 }' FS=\| OFS=\| Test_File.dat

Use nawk or /usr/xpg4/bin/awk on Solaris.
# 5  
Old 04-13-2008
Thanks guys, I'm going to try some of these solutions right now!

The file layout order matters, but not the record order; meaning the end product could be:

aaa|bbb|ccc|123;456;789|ddd
abc|def|ghi|123;456;789|jkm

or

abc|def|ghi|123;456;789|jkm
aaa|bbb|ccc|123;456;789|ddd

...and the "pay grade" things is just a joke. It means that this type of coding is out of my league (i.e. I'm just a low level coder).
# 6  
Old 04-13-2008
Quote:
Originally Posted by clueless181
...and the "pay grade" things is just a joke. It means that this type of coding is out of my league (i.e. I'm just a low level coder).
... or a student with homework Smilie
# 7  
Old 04-13-2008
era/radoulov

Thanks! They both work great.

Would it be too much trouble to get a layman's description of what you did?

Last edited by clueless181; 04-13-2008 at 12:09 PM.. Reason: "nawk" worked when "awk" didn't
 
Login or Register to Ask a Question

Previous Thread | Next Thread

3 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Shell script (sh file) logic to compare contents of one file with another file and output to file

Shell script logic Hi I have 2 input files like with file 1 content as (file1) "BRGTEST-242" a.txt "BRGTEST-240" a.txt "BRGTEST-219" e.txt File 2 contents as fle(2) "BRGTEST-244" a.txt "BRGTEST-244" b.txt "BRGTEST-231" c.txt "BRGTEST-231" d.txt "BRGTEST-221" e.txt I want to get... (22 Replies)
Discussion started by: pottic
22 Replies

2. Shell Programming and Scripting

Compare 2 text file with 1 column in each file and write mismatch data to 3rd file

Hi, I need to compare 2 text files with around 60000 rows and 1 column. I need to compare these and write the mismatch data to 3rd file. File1 - file2 = file3 wc -l file1.txt 58112 wc -l file2.txt 55260 head -5 file1.txt 101214200123 101214700300 101250030067 101214100500... (10 Replies)
Discussion started by: Divya Nochiyil
10 Replies

3. Shell Programming and Scripting

Match list of strings in File A and compare with File B, C and write to a output file in CSV format

Hi Friends, I'm a great fan of this forum... it has helped me tone my skills in shell scripting. I have a challenge here, which I'm sure you guys would help me in achieving... File A has a list of job ids and I need to compare this with the File B (*.log) and File C (extend *.log) and copy... (6 Replies)
Discussion started by: asnandhakumar
6 Replies
Login or Register to Ask a Question