Help finding non duplicates


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help finding non duplicates
# 1  
Old 06-01-2011
Help finding non duplicates

I am currently creating a script to find filenames that are listed once in an input file (find non duplicates). I then want to report those single files in another file. Here is the function that I have so far:

Code:
function dups_filenames
{
file2=""
file1=""
file=""
dn=""
ch=""
pn=""
 
while read file dn ch pn
do
if [[ $file != $file1 && $file1 != $file2 ]]; then
echo "FILE \t\t\t\t\t CHECKSUM" >> "$dirs"_"$host"_singlefilelog
echo "---- \t\t\t\t\t --------" >> "$dirs"_"$host"_singlefilelog
printf "%-40s%-50s\n" $file1  $ch1 >> "$dirs"_"$host"_singlefilelog
printf "%-20s%-20s\n" "PATH-> "$dn1 >> "$dirs"_"$host"_singlefilelog
echo  >> "$dirs"_"$host"_singlefilelog
fi
 
file2=$file1
file1=$file
dn1=$dn
ch1=$ch
pn1=$pn
done < "$dirs"_"$host"_filelists
}

"$dirs"_"$host"_singlefilelog = the output file

The above code does find single occurrances, but it does not report single files at the bottom of the text file ("$dirs"_"$host"_filelists). If I printf "$file" instead of "$file1," the output does not report the top file in the text file.

Any suggestions?
Thank you
# 2  
Old 06-02-2011
Quote:
Originally Posted by chipblah84
The above code does find single occurrances, but it does not report single files at the bottom of the text file ("$dirs"_"$host"_filelists).
Not sure what your conveying here. What does but it does not report single files... mean..?
Quote:
Originally Posted by chipblah84
If I printf "$file" instead of "$file1," the output does not report the top file in the text file.
Yes. Its because you have assigned null value to file1 as file1="" hence it prints nothing to the output file when the first line of "$dirs"_"$host"_filelists is read. But when 2nd and rest of the lines are read you have assigned the values as below in script so you get the subsequent filenames reported/printed.
Code:
...
file2=$file1 
file1=$file 
dn1=$dn 
ch1=$ch 
pn1=$pn 
done < "$dirs"_"$host"_filelists

# 3  
Old 06-02-2011
Please provide and example of your initial input file.

... depending on how it looks like, maybe you can just use the command :
Code:
uniq -u

# 4  
Old 06-02-2011
"What does but it does not report single files... mean..?" If the last file in the input file is a single file, the script will not report it.


Here is an example of the input file:

filename size checksum path

file1 23 72625276372 /dir/dir1/dir2
file2 38 93939209302 /dir/dir1/dir2/dir3
file2 38 93939209302 /dir/dir1
file3 10 82828282282 /dir/dir1/dir5

The code I pasted earlier will report file1 but not file3 (since it is the last entry in the input file).
# 5  
Old 06-02-2011
Code:
awk '{print$1}' infile | uniq -u | sed 's/^/^/' | egrep -f - infile

Code:
awk '{a[$1]++;b[$1]=$0}END{for(i in a) if(a[i]<2) print b[i]}' infile

---------- Post updated at 12:34 PM ---------- Previous update was at 12:32 PM ----------

For a better pattern matching :

Code:
awk '{print$1}' infile | uniq -u | sed 's/.*/^& /' | egrep -f - infile

This User Gave Thanks to ctsgnb For This Post:
# 6  
Old 06-02-2011
Thank you ctsgnb.

The second code

Code:
awk '{a[$1]++;b[$1]=$0}END{for(i in a) if(a[i]<2) print b[i]}' infile

works. When I run the first and third codes I receive "egrep: can't open - " errors.

I have another input file that is sorted by checksums instead of filenames but the above awk code still finds the non duplicates according to filename, not checksum. I am just starting to dive into awk so I have no clue what to change in the code to make it look at the checksums instead of filenames. Any suggestions?

Thank you

---------- Post updated at 10:23 PM ---------- Previous update was at 09:51 PM ----------

Nevermind...my input file was not properly named. Just had to change $2 to $3 in my below awk print.

Code:
awk '{print $1,"\n""PATH-> "$3}'

Thanks again!
# 7  
Old 06-03-2011
I tested it on a FreeBSD machine, maybe you egrep implementation has a different behaviour and cannot deal with the hyphen
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

UNIX scripting for finding duplicates and null records in pk columns

Hi, I have a requirement.for eg: i have a text file with pipe symbol as delimiter(|) with 4 columns a,b,c,d. Here a and b are primary key columns.. i want to process that file to find the duplicates and null values are in primary key columns(a,b) . I want to write the unique records in which... (5 Replies)
Discussion started by: praveenraj.1991
5 Replies

2. Shell Programming and Scripting

Finding duplicates in a file excluding specific pattern

I have unix file like below >newuser newuser <hello hello newone I want to find the unique values in the file(excluding <,>),so that the out put should be >newuser <hello newone can any body tell me what is command to get this new file. (7 Replies)
Discussion started by: shiva2985
7 Replies

3. UNIX for Dummies Questions & Answers

Finding duplicates then copying, almost there, maybe?

Hi everyone. I'm trying to help my wife with a project, she has exported 200 images from many different folders, unfortunately there was a problem with the export and I need to find the master versions so that she doesn't have to go through and select them again. I need to: For each image in... (2 Replies)
Discussion started by: Rhinoskin
2 Replies

4. Shell Programming and Scripting

finding duplicates in csv based on key columns

Hi team, I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record. can one help me on finding the duplicates, Thanks in advance. ... (2 Replies)
Discussion started by: baskivs
2 Replies

5. Shell Programming and Scripting

Non Duplicates

I have input file like below. I00789524 0213 5212 D00789524 0213 5212 I00778787 2154 5412 The first two records are same(Duplicates) except I & D in the first character. I want non duplicates(ie. 3rd line) to be output. How can we get this . Can you help. Is there any single AWK or SED... (3 Replies)
Discussion started by: awk_beginner
3 Replies

6. Shell Programming and Scripting

Finding duplicates from positioned substring across lines

I have million's of records each containing exactly 50 characters and have to check the uniqueness of 4 character substring of 50 character (postion known prior) and report if any duplicates are found. Eg. data... AAAA00000000000000XXXX0000 0000000000... upto50 chars... (2 Replies)
Discussion started by: gapprasath
2 Replies

7. Shell Programming and Scripting

finding duplicates in columns and removing lines

I am trying to figure out how to scan a file like so: 1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com 2 margies office","555-555-5555","ralph@mail.com","www.ralph.com 3 kims office","555-555-5555","kims@mail.com","www.ralph.com 4 tims... (17 Replies)
Discussion started by: totus
17 Replies

8. HP-UX

getting duplicates

how to get duplicates in a file containing data in columns using command or scripting? (4 Replies)
Discussion started by: megh
4 Replies

9. Shell Programming and Scripting

finding duplicate files by size and finding pattern matching and its count

Hi, I have a challenging task,in which i have to find the duplicate files by its name and size,then i need to take anyone of the file.Then i need to open the file and find for more than one pattern and count of that pattern. Note:These are the samples of two files,but i can have more... (2 Replies)
Discussion started by: jerome Sukumar
2 Replies

10. Shell Programming and Scripting

finding duplicates with perl

I have a huge file (over 30mb) that I am processing through with perl. I am pulling out a list of filenames and placing it in an array called @reports. I am fine up till here. What I then want to do is go through the array and find any duplicates. If there is a duplicate, output it to the screen.... (3 Replies)
Discussion started by: dangral
3 Replies
Login or Register to Ask a Question