List Duplicate


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting List Duplicate
# 15  
Old 06-29-2007
Aigles,
As I said:
Quote:
...arrays should be used with caution, especially when several
thousands of occurrences are involved.
# 16  
Old 06-29-2007
Bug Dupllicate

Dear Guru
Thanks lot for your efforts and love to see the result and surely have devoted good time for my question.Hats of to you for your in deapth
reply which will not only increase my knowledge but my respect for
you and this Forum .
# 17  
Old 09-20-2007
Question for the experts (array and without array)

inputfile:

1 aaa 1/01/1975 delhi
2 bbb 2/03/1977 mumbai
3 ccc 2/03/1977 mumbai
4 ddd 1/01/1975 chennai
5 aaa 1/01/1975 kolkatta
6 bbb 2/03/1977 bangalore


program1:

sort -k2,3 inputfile | \
awk '
BEGIN { first_duplicate = 1 }
{
name = $2;
dob = $3;
if (name == prv_name && dob == prv_dob) {
if (first_duplicate)
print "\n" prv_rec;
print $0;
first_duplicate = 0;
} else {
prv_name = name;
prv_dob = dob;
prv_rec = $0;
first_duplicate = 1;
}
}
'

Result:
>

1 aaa 1/01/1975 delhi
5 aaa 1/01/1975 kolkatta

2 bbb 2/03/1977 mumbai
6 bbb 2/03/1977 bangalore
>

Questions:

How do I direct the result to a output.file given the code in program1?

------------------------------------------------------------------------

Program 2:

#Sort is now ( sort -k3,3 inputfile)

sort -k2,3 inputfile | \
awk '
BEGIN { first_duplicate = 1 }
{
name = $2;
dob = $3;
# And if (name == prv_name && dob == prv_dob) become if (name != prv_name && dob == prv_dob)#
if (name != prv_name && dob == prv_dob) {
if (first_duplicate)
print "\n" prv_rec;
print $0;
first_duplicate = 0;
} else {
prv_name = name;
prv_dob = dob;
prv_rec = $0;
first_duplicate = 1;
}
}
'

Result:

>

1 aaa 1/01/1975 delhi
4 ddd 1/01/1975 chennai

2 bbb 2/03/1977 mumbai
3 ccc 2/03/1977 mumbai
>

Questions:

What codes change are needed to have program 3 to give similar results of that program 1 and program 2?

Expert please help!



Program 3 codes:


nawk '{
idx= $2 SUBSEP $3
arr[idx] = (idx in arr) ? arr[idx] ORS $0 : $0
arrCnt[idx]++
}
END {
for (i in arr)
if (arrCnt[i] > 1) print arr[i]
}' myInputfile
# 18  
Old 09-20-2007
Program 1: after the last ' add:
Code:
 > newfile

# 19  
Old 09-24-2007
Hi

I suppose your requirements is this:
input(a):
Code:
1 aaa 1/01/1975 delhi
2 bbb 2/03/1977 mumbai
3 aaa 1/01/1976 mumbai
4 bbb 2/03/1975 chennai
5 aaa 1/01/1975 kolkatta
6 bbb 2/03/1977 bangalore

output:

Code:
2 bbb 2/03/1977 mumbai
6 bbb 2/03/1977 bangalore

1 aaa 1/01/1975 delhi
5 aaa 1/01/1975 kolkatta

code:
Code:
awk '{
num[$2$3]=num[$2$3]+1
str[$2$3]=str[$2$3]"\n"$0
}
tr[$2$3]=str[$2$3]"\n"$0
}
END{
for (i in num)
if (num[i]>=2)
print str[i]
}' a

# 20  
Old 09-24-2007
Code:
#! /opt/third-party/bin/perl

open(FILE, "<", "input");

while(<FILE>) {
  chomp;
  my @arr = split(/ /);
  if( defined($fileHash{$arr[1].$arr[2]}) ) {
    $fileHash{$arr[1].$arr[2]} .= ("##" . $_);
  }
  else {
    $fileHash{$arr[1].$arr[2]} = $_;
  }
}

close(FILE);

foreach my $k ( keys %fileHash ) {
  $v = $fileHash{$k};
  if( $v =~ /##/ ) {
    my @arr = split(/##/, $v);
    foreach my $a ( @arr ) {
      print "$a\n";
    }
  }
}

exit 0

This should be even faster Smilie
# 21  
Old 09-24-2007
Please help!

Thank you all for your reply!!!

I am interested in how to make this works. Please read carefully!

inputfile:

1 aaa 1/01/1975 delhi
2 bbb 2/03/1977 mumbai
3 ccc 2/03/1977 mumbai
4 ddd 1/01/1975 chennai
5 aaa 1/01/1975 kolkatta
6 bbb 2/03/1977 bangalore

program2 code:

sort -k2,3 inputfile | \
awk '
BEGIN { first_duplicate = 1 }
{
name = $2;
dob = $3;
if (name == prv_name && dob == prv_dob) {
if (first_duplicate)
print "\n" prv_rec;
print $0;
first_duplicate = 0;
} else {
prv_name = name;
prv_dob = dob;
prv_rec = $0;
first_duplicate = 1;
}
}
'

What changes are needed to to program 2 to give me the following result:

>

1 aaa 1/01/1975 delhi
4 ddd 1/01/1975 chennai

2 bbb 2/03/1977 mumbai
3 ccc 2/03/1977 mumbai
>

Many thanks!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Iterate through a list - checking for a duplicate then report it ot

I have a job that produces a file of barcodes that gets added to every time the job runs I want to check the list to see if the barcode is already in the list and report it out if it is. (3 Replies)
Discussion started by: worky
3 Replies

2. Shell Programming and Scripting

Find duplicate values in specific column and delete all the duplicate values

Dear folks I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values. I want to remove all of the same... (4 Replies)
Discussion started by: sajmar
4 Replies

3. Shell Programming and Scripting

List duplicate files based on Name and size

Hello, I have a huge directory (with millions of files) and need to find out duplicates based on BOTH file name and File size. I know fdupes but it calculates MD5 which is very time-consuming and especially it takes forever as I have millions of files. Can anyone please suggest a script or... (7 Replies)
Discussion started by: prvnrk
7 Replies

4. Shell Programming and Scripting

Find and remove duplicate record and print list

Gents, I needs to delete duplicate values and only get uniq values based in columns 2-27 Always we should keep the last record found... I need to store one clean file and other with the duplicate values removed. Input : S3033.0 7305.01 0 420123.8... (18 Replies)
Discussion started by: jiam912
18 Replies

5. Shell Programming and Scripting

Duplicate files and output list

Gents, I have a file like this. 1 1 1 2 2 3 2 4 2 5 3 6 3 7 4 8 5 9 I would like to get something like it 1 1 2 2 3 4 5 3 6 7 Thanks in advance for your support :b: (8 Replies)
Discussion started by: jiam912
8 Replies

6. Shell Programming and Scripting

Duplicate value

Hi All, i have file like ID|Indiv_ID 12345|10001 |10001 |10001 23456|10002 |10002 |10002 |10002 |10003 |10004 if indiv_id having duplicate values and corresponding ID column is null then copy the id. I need output like: ID|Indiv_ID 12345|10001... (11 Replies)
Discussion started by: bmk
11 Replies

7. Shell Programming and Scripting

Find duplicate based on 'n' fields and mark the duplicate as 'D'

Hi, In a file, I have to mark duplicate records as 'D' and the latest record alone as 'C'. In the below file, I have to identify if duplicate records are there or not based on Man_ID, Man_DT, Ship_ID and I have to mark the record with latest Ship_DT as "C" and other as "D" (I have to create... (7 Replies)
Discussion started by: machomaddy
7 Replies

8. Shell Programming and Scripting

Splitting a list @list by space delimiter so i can access it by using $list[0 ..1..2]

EDIT : This is for perl @data2 = grep(/$data/, @list_now); This gives me @data2 as Printing data2 11 testzone1 running /zones/testzone1 ***-*****-****-*****-***** native shared But I really cant access data2 by its individual elements. $data2 is the entire list, while $data,2,3...... (1 Reply)
Discussion started by: shriyer
1 Replies

9. Shell Programming and Scripting

Removing duplicate files from list with different path

I have a list which contains all the jar files shipped with the product I am involved with. Now, in this list I have some jar files which appear again and again. But these jar files are present in different folders. My input file looks like this /path/1/to a.jar /path/2/to a.jar /path/1/to... (10 Replies)
Discussion started by: vino
10 Replies

10. Shell Programming and Scripting

Get a none duplicate list file

Dear sir i got a file like following format with the duplicate line: AAA AAA AAA AAA AAA BBB BBB BBB BBB CCC CCC CCC CCC ... (5 Replies)
Discussion started by: trynew
5 Replies
Login or Register to Ask a Question