Visit Our UNIX and Linux User Community


Shellscript to sort duplicate files listed in a text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Shellscript to sort duplicate files listed in a text file
# 1  
Old 10-28-2009
Shellscript to sort duplicate files listed in a text file

I have many pdf's scattered across 4 machines. There is 1 location where I have other Pdf's maintained. But the issues it the 4 machines may have duplicate pdf's among themselves, but I want just 1 copy of each so that they can be transfered to that 1 location.

What I have thought is:
1) I have designed a script that will scan each of the 4 machines, and print the list of pdf files in a text file named list.txt.
2)So now I have all the pdf's listed in the list.txt file.
3) I need a shellscript that will now check this list and sort duplicate files. So that I know where are they located and even have them grouped together.
The list.txt contains the path along with the file name. so I guess we have to check just the ending file name part before ".pdf".
Please help me do this.


The list.txt looks like below, which is already generated.

Code:
/home/santosh/z_literature/MIF_Oxime_ph4_JBC_May2007.pdf
/home/santosh/z_literature/J_immun_biochemOFmif.pdf
/home/santosh/z_literature/sak/san/06_JCTC_06_bome.pdf
/home/santosh/z_literature/sak/san/03_IEJMD_05_nkr1.pdf
/home/santosh/z_literature/sak/san/07_JCAMD_06_CoRIA.pdf
/home/santosh/z_literature/sak/san/DDP-IV-JMM2007.pdf

# 2  
Old 10-28-2009

Copy them all to the single location; duplicates will be overwritten.

Then rm all the other PDFs.
# 3  
Old 10-28-2009
If you want to do it using a script
Code:
cat abc.txt
/home/santosh/z_literature/MIF_Oxime_ph4_JBC_May2007.pdf
/home/santosh/z_literature/J_immun_biochemOFmif.pdf
/home/santosh/z_literature/sak/san/06_JCTC_06_bome.pdf
/home/santosh/z_literature/sak/san/03_IEJMD_05_nkr1.pdf
/home/santosh/z_literature/sak/san/07_JCAMD_06_CoRIA.pdf
/home/santosh/z_literature/sak/san/DDP-IV-JMM2007.pdf
/home/santosh/z_literature/sak/san/06_JCTC_06_bome.pdf
/home/santosh/y_literature/sak/san/06_JCTC_06_bome.pdf

use inline perl
Code:
cat abc.txt |perl -e 'my %hash;while($full_filename = <>){ chomp ($full_filename);my @cols = split("/",$full_filename);push @{$hash{$cols[-1]}}, $full_filename;}print "-"x80,"\n";foreach my $fn (keys %hash){print "$fn\n";map {print "$_\n";} @{$hash{$fn}};print "-"x80,"\n";}'

Added output formatting for readability.

Replace abc.txt with what ever file you have.

HTH,
PL
# 4  
Old 10-28-2009
Another approach:

Code:
awk -F"/" 'a[$NF]{print a[$NF];print $0;next}{a[$NF]=$0}' file

# 5  
Old 10-28-2009
thanks Franklin52 your script did the trick!!
Also the others who helped thanks a lot, really appreciated your time you spared for the script!!
# 6  
Old 10-28-2009
Franklin52....Could you explain your command line ???

I don't understand what is the value of the array a

Thanx
# 7  
Old 10-28-2009
Quote:
Originally Posted by protocomm
Franklin52....Could you explain your command line ???

I don't understand what is the value of the array a

Thanx
Code:
awk -F"/" 'a[$NF]{print a[$NF];print $0;next}{a[$NF]=$0}' file

Explanation:

Code:
{a[$NF]=$0}

The value of array a is the current line, the index is the filename (last field : $NF).


Code:
a[$NF]{print a[$NF];print $0;next}

If a line has a file ($NF) defined in array a, print the the saved line of the element a[$NF] and the current line.

I hope this helps.

Previous Thread | Next Thread
Test Your Knowledge in Computers #193
Difficulty: Easy
Python was selected the Programming Language of the Year in 2007, 2010, and 2018.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How do I custom sort the files in a directory using the filenames in a text file.?

Hi all, (5 Replies)
Discussion started by: KMusunuru
5 Replies

2. Shell Programming and Scripting

Want to delete the junk files from a directory which are not listed in a TEXT file

Hello Everyone, I want to delete the image files from a directory, which are not listed in a TEXT file. The directory contains large number of image files (in millions) required / not required. I want to delete the image files which are "not required". I have generated a Text file having... (3 Replies)
Discussion started by: Praveen Pandit
3 Replies

3. Shell Programming and Scripting

Cat files listed in text file and redirect to new directory with same filename

I have a directory that is restricted and I cannot just copy the files need, but I can cat them and redirect them to a new directory. The files all have the date listed in them. If I perform a long listing and grep for the date (150620) I can redirect that output to a text file. Now I need to... (5 Replies)
Discussion started by: trigger467
5 Replies

4. Shell Programming and Scripting

Delete files listed in text file

Hi Team, Here's the scenario, I have a text file called "file_list.txt". Its content is as follows. 111.tmp 112.tmp 113.tmp 114.tmp These files will present in "workdir" directory. It has many files. But only the files present in file_list.txt has to be deleted from the workdir... (7 Replies)
Discussion started by: kmanivan82
7 Replies

5. Shell Programming and Scripting

How to grep a log file for words listed in separate text file?

Hello, I want to grep a log ("server.log") for words in a separate file ("white-list.txt") and generate a separate log file containing each line that uses a word from the "white-list.txt" file. Putting that in bullet points: Search through "server.log" for lines that contain any word... (15 Replies)
Discussion started by: nbsparks
15 Replies

6. Shell Programming and Scripting

Copy files listed in text file to new directory

I am trying to write a script that will copy all file listed in a text file (100s of file names) to a new directory Assume script will run with main as current working directory and I know how many files/lines will be in List.txt Im trying to work up a test script using this model Contents of... (2 Replies)
Discussion started by: IAmTheGrass
2 Replies

7. Shell Programming and Scripting

Sort and Remove Duplicate on file

How do we sort and remove duplicate on column 1,2 retaining the record with maximum date (in feild 3) for the file with following format. aaa|1234|2010-12-31 aaa|1234|2010-11-10 bbb|345|2011-01-01 ccc|346|2011-02-01 bbb|345|2011-03-10 aaa|1234|2010-01-01 Required Output ... (5 Replies)
Discussion started by: mabarif16
5 Replies

8. Shell Programming and Scripting

Send a mail to IDs listed in a text file

I have a list of mail ids in text file and want a ksh script that reads this text file and sends a mail to all mail ids with same subject line and content. I am using UX-HP machine and KSH. Thanks for help in advance! (5 Replies)
Discussion started by: Sriranga
5 Replies

9. Shell Programming and Scripting

Copy files listed in a text file - whitespace problem.

Hi, Say I have this text file <copy.out> that contains a list of files/directories to be copied out to a different location. $ more copy.out dir1/file1 dir1/file2 dir1/file3 "dir1/white space" dir1/file4 If I do the following: $copy=`more copy.out` $echo $copy dir1/file1... (4 Replies)
Discussion started by: 60doses
4 Replies

10. HP-UX

CVSWeb - Directories listed but files not listed

I am using CVSWeb on HPUnix. When i access it, all directories are listed but files are not listed. I am getting the error "NOTE: There are 51 files, but none matches the current tag. " in tomcat sevrer log i am getting the message "rlog warning: Missing revision or branch number after -r"... (0 Replies)
Discussion started by: ganesh
0 Replies

Featured Tech Videos