Sponsored Content
Top Forums Shell Programming and Scripting Delete duplicates via script? Post 302463517 by Y-T on Sunday 17th of October 2010 10:29:48 PM
Old 10-17-2010
ok, thank you, i will try the Diff.. the array option would mean still i have to compare manually if i read it right, which is futile because it is >10K files
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

An interactive way to delete duplicates

1)I am trying to write a script that works interactively lists duplicated records on certain field/column and asks user to delete one or more. And finally it deletes all the records the used has asked for. I have an idea to store those line numbers in an array, not sure how to do this in... (3 Replies)
Discussion started by: chvs2000
3 Replies

2. Shell Programming and Scripting

How can i delete the duplicates based on one column of a line

I have my data something like this (08/03/2009 22:57:42.414)(:) king aaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbb (08/03/2009 22:57:42.416)(:) John cccccccccccc cccccvssssssssss baaaaa (08/03/2009 22:57:42.417)(:) Michael ddddddd tststststtststts (08/03/2009 22:57:42.425)(:) Ravi... (11 Replies)
Discussion started by: rdhanek
11 Replies

3. Shell Programming and Scripting

how can I delete duplicates in the log?

I have a log file and I am trying to run a script against it to search for key issues such as invalid users, errors etc. In one part, I grep for session closed and get a lot of the same thing,, ie. root username etc. I want to remove the multiple root and just have it do a count, like wc -l ... (5 Replies)
Discussion started by: taekwondo
5 Replies

4. Shell Programming and Scripting

Delete Duplicates on the basis of two column values.

Hi All, i need ti delete two duplicate processss which are running on the same device type (column 1) and port ID (column 2). here is the sample data p1sc1m1 15517 11325 0 01:00:24 ? 0:00 scagntclsx25octtcp 2967 in3v mvmp01 0 8000 N S 969 750@751@752@ p1sc1m1 15519 11325 0 01:00:24 ? ... (5 Replies)
Discussion started by: neeraj617
5 Replies

5. Shell Programming and Scripting

Fastest way to delete duplicates from a large filelist.....

OK I have two filelists...... The first is formatted like this.... /path/to/the/actual/file/location/filename.jpg and has up to a million records The second list shows filename.jpg where there is more then on instance. and has maybe up to 65,000 records I want to copy files... (4 Replies)
Discussion started by: Bashingaway
4 Replies

6. Shell Programming and Scripting

delete from line and remove duplicates

My Input.....file1 ABCDE4435 Connected to 107.71.136.122 (SubNetwork=ONRM_RootMo_R SubNetwork=XYVLTN29CRBR99 MeContext=ABCDE4435 ManagedElement=1) ABCDE4478 Connected to 166.208.30.57 (SubNetwork=ONRM_RootMo_R SubNetwork=KLFMTN29CR0R04 MeContext=ABCDE4478 ManagedElement=1) ABCDE4478... (5 Replies)
Discussion started by: pareshkp
5 Replies

7. Shell Programming and Scripting

Delete duplicates in CA bundle

I do have a big CA bundle certificate file and each time if i get request to add new certificate to the existing bundle i need to make sure it is not present already. How i can validate the duplicates. The alignment of the certificate within the bundle seems to be different. Example: Cert 1... (7 Replies)
Discussion started by: diva_thilak
7 Replies

8. Shell Programming and Scripting

Delete only if duplicates found in each record

Hi, i have another problem. I have been trying to solve it by myself but failed. inputfile ;; ID T08578 NAME T08578 SBASE 30696 EBASE 32083 TYPE P func just test func chronology func cholesterol func null INT 30765-37333 INT 37154-37318 Link 5546 Link 8142 (4 Replies)
Discussion started by: redse171
4 Replies

9. Shell Programming and Scripting

Script to compare partial filenames in two folders and delete duplicates

Background: I use a TV tuner card to capture OTA video files (.mpeg) and then my Plex Media Server automatically optimizes the files (transcodes for better playback) and places them in a new directory. I have another Plex Library pointing to the new location for the optimized .mp4 files. This... (2 Replies)
Discussion started by: shaky
2 Replies

10. Shell Programming and Scripting

To Delete the duplicates using Part of File Name

I am using the below script to delete duplicate files but it is not working for directories with more than 10k files "Argument is too long" is getting for ls -t. Tried to replace ls -t with find . -type f \( -iname "*.xml" \) -printf '%T@ %p\n' | sort -rg | sed -r 's/* //' | awk... (8 Replies)
Discussion started by: gold2k8
8 Replies
Algorithm::DiffOld(3pm) 				User Contributed Perl Documentation				   Algorithm::DiffOld(3pm)

NAME
Algorithm::DiffOld - Compute `intelligent' differences between two files / lists but use the old (<=0.59) interface. NOTE
This has been provided as part of the Algorithm::Diff package by Ned Konz. This particular module is ONLY for people who HAVE to have the old interface, which uses a comparison function rather than a key generating function. Because each of the lines in one array have to be compared with each of the lines in the other array, this does M*N comparisions. This can be very slow. I clocked it at taking 18 times as long as the stock version of Algorithm::Diff for a 4000-line file. It will get worse quadratically as array sizes increase. SYNOPSIS
use Algorithm::DiffOld qw(diff LCS traverse_sequences); @lcs = LCS( @seq1, @seq2, $comparison_function ); $lcsref = LCS( @seq1, @seq2, $comparison_function ); @diffs = diff( @seq1, @seq2, $comparison_function ); traverse_sequences( @seq1, @seq2, { MATCH => $callback, DISCARD_A => $callback, DISCARD_B => $callback, }, $comparison_function ); COMPARISON FUNCTIONS
Each of the main routines should be passed a comparison function. If you aren't passing one in, use Algorithm::Diff instead. These functions should return a true value when two items should compare as equal. For instance, @lcs = LCS( @seq1, @seq2, sub { my ($a, $b) = @_; $a eq $b } ); but if that is all you're doing with your comparison function, just use Algorithm::Diff and let it do this (this is its default). Or: sub someFunkyComparisonFunction { my ($a, $b) = @_; $a =~ m{$b}; } @diffs = diff( @lines, @patterns, &someFunkyComparisonFunction ); which would allow you to diff an array @lines which consists of text lines with an array @patterns which consists of regular expressions. This is actually the reason I wrote this version -- there is no way to do this with a key generation function as in the stock Algorithm::Diff. perl v5.10.1 2006-07-31 Algorithm::DiffOld(3pm)
All times are GMT -4. The time now is 03:37 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy