getting rid of duplicate files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting getting rid of duplicate files
# 1  
Old 12-09-2005
getting rid of duplicate files

i have a bad problem with multiple occurances of the same file in
different directories.. how this happened i am not sure! but I know
that i can use awk to scan multiple directory trees to find an
occurance of the same file... some of these files differ somwhat
but that does not matter! the name of the files are the same and
the context is basically the same....
i have seen an awk script that can be run on the command line using
a syntax where var=file:r and dup=var++ and var < 1 or to the
extent of this but can not remember exactly how this works.......
using the C shell;
i need to find occurances of var and if they are greater than one
and remove them leaving one occurance .. .
any examples or clues as to how to piece this together would be
appreciated since i don't use awk that often.
moxxx68
# 2  
Old 12-09-2005
Start with something like this to find actual duplicated names.
Then use the file to find the paths to get full file names.
Code:
find /path -print -exec basename {} \; | awk 'arr[$0]++' > file

# 3  
Old 12-09-2005
thanx,
will try!
moxxx68.......
# 4  
Old 12-09-2005
too many occurances to rm manually!
will this work...

find ./path -iname "basename" | awk 'BEGIN{arr[$0]++}{var=[$0];var > 0; var++}END{i=[var++]}' | xargs mv --target-directory=dup-dir

%rm dup-dir

looks like it would work if i could use an array to parse the
occurances starting with occurance 1 instead of 0... obviously
this could be in any directory but the way the tree is configured
it doesn't really matter as long as i have one occurance left!

could please use some help...
moxxx68
# 5  
Old 12-09-2005
find ./path -print -exec basename {} \; | awk -v var=arr 'arr[$0]++;i >= 2; i$var' | xargs mv --target-directory=test..


this worked to a certain extent although I am getting some error
messages with the cp command and the basename is contingious to
all names of the same type (ex, file.txt-{1,2,3,4}) and I am not
sure if this gives the exact result as far as leaving one file
although i tried diff two test directories against each other it
seemed real close.. please leave affirmation of any correct syntax
used (if any?)..
moxxx68
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Finds all duplicate files

Hi, How would you write bash script that given a directory as an argument and finds all duplicate files (with same contents - by using bytewise comparison) there and prints their names? (6 Replies)
Discussion started by: elior
6 Replies

2. Shell Programming and Scripting

Trying to get rid of a duplicate output line...

Hi folks, I'm trying to work on a script that will grab a router interface report and generate the numbers of "in use" and "un-used" ports per device. Right now, I've got a cut down of the report as follows: sing /usr/apps/siteName/etc/DCAFT-9K.cmds for send text Connecting using... (11 Replies)
Discussion started by: Marc G
11 Replies

3. Shell Programming and Scripting

Duplicate files

Hi Gents, I have 1 files as seen below. 44571009 100 42381900 101 23482389 102 44571009 103 28849007 104 28765648 105 25689908 106 28765648 107 42381900 108 44571009 109 17298799 110 44571009 111 I would like to get something like it 44571009 100 103 109 111 (3 Replies)
Discussion started by: jiam912
3 Replies

4. Shell Programming and Scripting

Remove duplicate files

Hi, In a directory, e.g. ~/corpus is a lot of files and subdirectories. Some of the files are named: 12345___PP___0902___AA.txt 12346___PP___0902___AA. txt 12347___PP___0902___AA. txt The amount of files varies. I need to keep the highest (12347___PP___0902___AA. txt) and remove... (5 Replies)
Discussion started by: corfuitl
5 Replies

5. UNIX for Dummies Questions & Answers

how to get rid of last _ in the files name?

ex: I have list of files in a folder. abc_def_geh_.txt abc_.txt abc_def_geh_12345_.txt ab134c_d345345ef_444geh_12345_.txt i need to rename all files to get rid of the _ before .txt result should look like this: abc_def_geh.txt abc.txt abc_def_geh_12345.txt... (2 Replies)
Discussion started by: lv99
2 Replies

6. Shell Programming and Scripting

Find duplicate files

What utility do you recommend for simply finding all duplicate files among all files? (4 Replies)
Discussion started by: kiasas
4 Replies

7. Shell Programming and Scripting

Getting Rid of Having to Write to Flat Files

Ok, so i've been having to write to flat files lately and then making my script read information from the flat file and then work off of that. i dont want to keep doing that because i believe it creates a mess. i like to keep my work all to one script instead of having that one script... (7 Replies)
Discussion started by: SkySmart
7 Replies

8. Shell Programming and Scripting

Finding Duplicate files

How do you delete and and find duplicate files? (1 Reply)
Discussion started by: Jicom4
1 Replies

9. UNIX for Dummies Questions & Answers

Getting rid of files with no ownership

I am in the process of learning how to do system administration (just on my own Linux machine) and have been working with the find command. One of the things I tried was find / -nouser -o -nogroup I redirected the output of my find query into a text file, and when I did a wc -l on it, it... (1 Reply)
Discussion started by: kermit
1 Replies

10. UNIX for Dummies Questions & Answers

Renaming multiple files, to get rid of extension

I have a good script to rename multiple files, but what's the best way I can remove some text from multiple filenames? Say I have a directory with 35 files with a .XLS at the end, how can I rename them to remove the .XLS but keep everything the same, without having to mv manually. Thanks. (6 Replies)
Discussion started by: nj78
6 Replies
Login or Register to Ask a Question