Removing duplicate files from list with different path


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removing duplicate files from list with different path
# 1  
Old 05-12-2005
Removing duplicate files from list with different path

I have a list which contains all the jar files shipped with the product I am involved with. Now, in this list I have some jar files which appear again and again. But these jar files are present in different folders.
My input file looks like this

Code:
/path/1/to a.jar
/path/2/to a.jar
/path/1/to b.jar
/path/1/to c.jar
/path/1/to d.jar
/path/2/to c.jar

Now I need to remove the duplicate entries i.e. remove the extra a.jar, c.jar and others like wise.

Final list would be like this:

Code:
/path/1/to a.jar
/path/1/to b.jar
/path/2/to c.jar
/path/1/to d.jar

This is the script I have so far..
Code:
#! /bin/sh
cp jar.txt jarnew.txt
for file in $(cat jar.txt)
do
        FILE1=`basename $file`
        for dup in $(cat jar.txt)
        do
        FILE2=`basename $dup`
        if [ "$file" != "$dup" -a "$FILE1" == "$FILE2" ] ; then
        sed -e '/($dup)/ d' <jarnew.txt >jarnew.txt.tmp
        echo "$FILE1 $FILE2"
        mv jarnew.txt.tmp jarnew.txt
        fi;
        done
done

But with this, the main functionality of the script is not working. i.e. the if condition is not working as required. Could be the sed problem or the logic.

I am getting the jarnew.txt as good as the jar.txt

Any pointers on how to proceed ?

Vino
# 2  
Old 05-12-2005
Try with this script as,

#!/bin/sh
> final.jar
while read line; do

FILE=`basename $line`;
DIR=`echo $line | awk '{ print $(NF-1) }'`;
if [[ $FILE == "c.jar" && $DIR == "2" ]]
then
echo $line >> final.jar
elif [[ $DIR == "1" && $FILE != "c.jar" ]]
echo $line >> final.jar
fi

done < jar.txt

You will get result. Check it.
# 3  
Old 05-12-2005
on the assumption that you only have filenames in the list ...

Code:
sort -t"/" -u +3 jar.txt > jarnew.txt

# 4  
Old 05-12-2005
Muthu,

I dont intend to specify any particular jar file like in the way you have mentioned in
f [[ $FILE == "c.jar" && $DIR == "2" ]]

Rather, I would have it generalized.

How do you go about that ?

Vino
# 5  
Old 05-12-2005
Generally scripts are written based on pattern change. In your requirement on input, only c.jar is taken from path/2/. I have simulated your input to required output.

And, your input and output is not being generallized so that script is given with using speicific filenames.
# 6  
Old 05-12-2005
There are so many jar files. If I can collect these jar files manually, then I might as well do away with the script.

I need to get the jar files from the list, dynamically.

Vino
# 7  
Old 05-12-2005
JustIce,

I dont think sort is a possible soution. The path length varies i.e. the directory structure is different for files. Some of them have a depth of 3.. others a depth of more than 3.


Vino
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing specific records from files when duplicate key

Hello I have been trying to remove a row from a file which has the same first three columns as another row - I have tried lots of different combinations of suggestion on this forum but can't get it exactly right. what I have is 900 - 1000 = 0 900 - 1000 = 2562 1000 - 1100 = 0 1000 - 1100... (7 Replies)
Discussion started by: tinytimmay
7 Replies

2. Shell Programming and Scripting

List duplicate files based on Name and size

Hello, I have a huge directory (with millions of files) and need to find out duplicates based on BOTH file name and File size. I know fdupes but it calculates MD5 which is very time-consuming and especially it takes forever as I have millions of files. Can anyone please suggest a script or... (7 Replies)
Discussion started by: prvnrk
7 Replies

3. Shell Programming and Scripting

Duplicate files and output list

Gents, I have a file like this. 1 1 1 2 2 3 2 4 2 5 3 6 3 7 4 8 5 9 I would like to get something like it 1 1 2 2 3 4 5 3 6 7 Thanks in advance for your support :b: (8 Replies)
Discussion started by: jiam912
8 Replies

4. Shell Programming and Scripting

List all the files in the present path and Folders and subfolders files also

Hi, I need a script/command to list out all the files in current path and also the files in folder and subfolders. Ex: My files are like below $ ls -lrt total 8 -rw-r--r-- 1 abc users 419 May 25 10:27 abcd.xml drwxr-xr-x 3 abc users 4096 May 25 10:28 TEST $ Under TEST, there are... (2 Replies)
Discussion started by: divya bandipotu
2 Replies

5. Shell Programming and Scripting

removing duplicate records comparing 2 csv files

Hi All, I want to remove the rows from File1.csv by comparing a column/field in the File2.csv. If both columns matches then I want that row to be deleted from File1 using shell script(awk). Here is an example on what I need. File1.csv: RAJAK,ACTIVE,1 VIJAY,ACTIVE,2 TAHA,ACTIVE,3... (6 Replies)
Discussion started by: rajak.net
6 Replies

6. UNIX for Dummies Questions & Answers

Removing path name from list of file names

I have this piece of code printf '%s\n' $pth*.msf | tr ' ' '\n' | sort -t '-' -k7 -k6r \ | awk -F- '{c=($6$7!=p&&FNR!=1)?ORS:"";p=$6$7}{printf("%c%s\n",c,$0)}' When I run it I get /home/chrisd/tatsh/branches/terr0.50/darwin/n02-z30-dsr65-terr0.50-dc0.002-8x6drw-csq.msf... (8 Replies)
Discussion started by: kristinu
8 Replies

7. Shell Programming and Scripting

Removing duplicate records from 2 files

Can anyone help me to removing duplicate records from 2 separate files in UNIX? Please find the sample records for both the files cat Monday.dat 3FAHP0JA1AR319226MOHMED ATEK 966504453742 SAU2010DE 3LNHL2GC6AR636361HEA DEUK CHOI 821057314531 KOR2010LE 3MEHM0JG7AR652083MUTLAB NAL-NAFISAH... (4 Replies)
Discussion started by: zooby
4 Replies

8. Shell Programming and Scripting

List files with full path

Hi all, How to save file full name to a file. I tried the following but don't know to include path name. $ ls -l | awk '{print $9}' > outputfile.dat $ cat outputfile.dat fifth.txt first.txt fourth.txt second.txt third.txt My wanted result is ie: ... (3 Replies)
Discussion started by: mr_bold
3 Replies

9. UNIX for Dummies Questions & Answers

List the files without directory path

Hi I am writing a script to find the list of files in dir1 and my script is place in dir2 while doing ls of files dir1 it is displaying with path. I would like to omit the path and display the only file name so that I can pass it to my script as arguments. for filename in ... (2 Replies)
Discussion started by: madankumar
2 Replies

10. UNIX for Advanced & Expert Users

list all files with full path of the file

How can i list every single file on a sun solaris server running 2.8 starting from '/' with the full path included in it? example. / ... ... ... /etc/inetd.conf /etc/passwd /etc/shadow ... ... ... /var/adm/messages /var/adm/messages.0 /var/adm/messages.1 ... ... ...... (4 Replies)
Discussion started by: Sowser
4 Replies
Login or Register to Ask a Question