Duplicate files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Duplicate files
# 1  
Old 10-23-2012
Duplicate files

Hi Gents,

I have 1 files as seen below.

Code:
44571009 100
42381900 101
23482389 102
44571009 103
28849007 104
28765648 105
25689908 106
28765648 107
42381900 108
44571009 109
17298799 110
44571009 111

I would like to get something like it

Code:
44571009 100 103 109 111
42381900 101 108
28765648 105 107

Many thanks in advance for your support. Smilie

Last edited by Scott; 10-23-2012 at 04:45 PM.. Reason: Code tags
# 2  
Old 10-23-2012
Code:
unset k v
( sort in_file
  echo EOF # force last set out of while
 ) | while read k v
do
 if [ "$lk" = "" ]
 then
  lk="$k" lv="$v"
  continue
 fi
 if [ "$k" != "$lk" ]
 then
  echo "$lk $lv"
  lk="$k" lv="$v"
 fi
 lv="$lv $v"
done >out_file

Alternatively, without sorting you could put the values into a ksh93/bash associative array by key concatenating the values. Might not scale as well as the sort. Output is in hash order.
Code:
typeset -A ht
while read k v
do
 ht[$k]="${ht[$k]} $v"
done <in_file
for k in "${!ht[@]}"
do
 echo "$k${ht[$k]}"
done >out_file

Arrays (Learning the Korn Shell, 2nd Edition)

Last edited by DGPickett; 10-23-2012 at 06:00 PM..
This User Gave Thanks to DGPickett For This Post:
# 3  
Old 10-23-2012
also try:
Code:
awk '{a[$1] = a[$1] " " $2;}END{for(i in a)if(a[i] ~ / .* /)print i a[i];}' infile | sort -nr

This User Gave Thanks to rdrtx1 For This Post:
# 4  
Old 10-23-2012
Code:
perl -alne  '{$h{$F[0]}.=" ".$F[1];}END{foreach $i (sort keys %h){print $i." ".$h{$i};}}' input_file

This User Gave Thanks to msabhi For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Finds all duplicate files

Hi, How would you write bash script that given a directory as an argument and finds all duplicate files (with same contents - by using bytewise comparison) there and prints their names? (6 Replies)
Discussion started by: elior
6 Replies

2. Shell Programming and Scripting

Find duplicate files but with different extensions

Hi ! I wonder if anyone can help on this : I have a directory: /xyz that has the following files: chsLog.107.20130603.gz chsLog.115.20130603 chsLog.111.20130603.gz chsLog.107.20130603 chsLog.115.20130603.gz As you ca see there are two files that are the same but only with a minor... (10 Replies)
Discussion started by: fretagi
10 Replies

3. Shell Programming and Scripting

Remove duplicate files

Hi, In a directory, e.g. ~/corpus is a lot of files and subdirectories. Some of the files are named: 12345___PP___0902___AA.txt 12346___PP___0902___AA. txt 12347___PP___0902___AA. txt The amount of files varies. I need to keep the highest (12347___PP___0902___AA. txt) and remove... (5 Replies)
Discussion started by: corfuitl
5 Replies

4. Shell Programming and Scripting

Find duplicate files

What utility do you recommend for simply finding all duplicate files among all files? (4 Replies)
Discussion started by: kiasas
4 Replies

5. UNIX for Dummies Questions & Answers

Renaming duplicate files in a loop

Hello, I have a bunch of files whose names start with 'xx' The first line of each file looks something like: a|...|...|...|... , ... In order to rename all of these files to whatever's between the 4th | and the comma (in the first line of that particular file) , I have been using: for... (2 Replies)
Discussion started by: juliette salexa
2 Replies

6. Shell Programming and Scripting

Find Duplicate files, not by name

I have a directory with images: -rw-r--r-- 1 root root 26216 Mar 19 21:00 020109.210001.jpg -rw-r--r-- 1 root root 21760 Mar 19 21:15 020109.211502.jpg -rw-r--r-- 1 root root 23144 Mar 19 21:30 020109.213002.jpg -rw-r--r-- 1 root root 31350 Mar 20 00:45 020109.004501.jpg -rw-r--r-- 1 root... (2 Replies)
Discussion started by: Ikon
2 Replies

7. Shell Programming and Scripting

Finding Duplicate files

How do you delete and and find duplicate files? (1 Reply)
Discussion started by: Jicom4
1 Replies

8. AIX

removinf files containing duplicate data

Hi ppl. I have to check for duplicate files in a directory . the directory has following files /the/folder /containing/the/file a1.yyyymmddhhmmss a1.yyyyMMddhhmmss b1.yyyymmddhhmmss b2.yyyymmddhhmmss c.yyyymmddhhmmss d.yyyymmddhhmmss d.yyyymmddhhmmss where the date time stamp can be... (1 Reply)
Discussion started by: asinha63
1 Replies

9. Shell Programming and Scripting

remove duplicate files in a directory

Hi ppl. I have to check for duplicate files in a directory . the directory has following files /the/folder /containing/the/file a1.yyyymmddhhmmss a1.yyyyMMddhhmmss b1.yyyymmddhhmmss b2.yyyymmddhhmmss c.yyyymmddhhmmss d.yyyymmddhhmmss d.yyyymmddhhmmss where the date time stamp can be... (1 Reply)
Discussion started by: asinha63
1 Replies

10. Shell Programming and Scripting

getting rid of duplicate files

i have a bad problem with multiple occurances of the same file in different directories.. how this happened i am not sure! but I know that i can use awk to scan multiple directory trees to find an occurance of the same file... some of these files differ somwhat but that does not matter! the... (4 Replies)
Discussion started by: moxxx68
4 Replies
Login or Register to Ask a Question