Sponsored Content
Top Forums Shell Programming and Scripting Search Duplicates, Print Line # Post 302407169 by genehunter on Wednesday 24th of March 2010 05:59:38 PM
Old 03-24-2010
Java Search Duplicates, Print Line #

Masters,

I have a text file in the following format.
Code:
  vrsonlviee	RVEBAALSKE
lyolzteglx	UUOSIWMDLR
pcybtapfee	DKGFJBHBJO
ozhrucfeau	YQXATYMGJD
cjwvjolrcv	YDHALRYQTG
mdukphspbc	CQZRIOWEUB
nbiqomzsgw	DYSUBQSSPZ
xovgvkneav	HJFQQYBLAF
boyyzdmzka	BVTVUDHSCR
vrsonlviee	TGTKUCUYMA
pcybtapfee	CQZRIOWEUB

I want to find duplicates in Col 2 and the get their line number.
I also want a solution to remove them using those line numbers.
The reason for choosing the line number is to make sure that I want to remove the line I chose from the duplicates, taking account of the variable in Col1.
Awk or sed egrep preferred.

Thanks
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search for text and print the next line

Hi, I want to write a small script to search for a text in file and when its found I want to print the next line. I try to write that script but I could not manage it, I just write the following script the find the exact line but I want the next line. $ sed -n -e '/Form not/p' test.txt... (2 Replies)
Discussion started by: alijassim
2 Replies

2. Shell Programming and Scripting

Print the line within the search pattern

Hi Guys, I had file as typedef struct { char TrailerType1; char TrailerTxt1; }Trailer; typedef struct { char PfigMoneyType; char PfigMoneyvalue; }PfigMoney; i need to print the lines within the search pattern. if i give the search pattern as... (3 Replies)
Discussion started by: manosubsulo
3 Replies

3. Shell Programming and Scripting

awk search column, print line

Hello. I've been banging my head against walls trying to search a comma delimited file, using awk. I'm trying to search a "column" for a specific parameter, if it matches, then I'd like to print the whole line. I've read in multiple texts: awk -F, '{ if ($4 == "string") print $0 }'... (2 Replies)
Discussion started by: Matthias03
2 Replies

4. Shell Programming and Scripting

Search a string and print the rest of line

Hi Guys, I need to search a string and print the rest of the lines... input: 8 0 90 1 0 59 20 2488 96 30006dde372 S ? 0:00 /etc/opt/SUNWconn/atm/bin/atmsnmpd -n output: 00 /etc/opt/SUNWconn/atm/bin/atmsnmpd -n Actually i don even need the first "00".. any suggestions is appreciated..... (13 Replies)
Discussion started by: mac4rfree
13 Replies

5. Shell Programming and Scripting

Search in specific position and print the whole line

I have two files abc.dat and sant.dat (Big file 60k rows) for every line's 1,4 of abc.dat need to seach if this is present on 28,4 of sant.dat every line. if its present the output needs to go to bde.dat Example: contents abc.dat aaaa bbbb cccc dddd contents sant.dat this is... (4 Replies)
Discussion started by: ssantoshss
4 Replies

6. Shell Programming and Scripting

Search from one file and print the next line

Hi, I have a file that contain more than a 1000 entries like this in one file P400000278 P400000446 P400000659 P400000789 I want to search in file that looks like this >P400000278 Adenosine 3'-phospho 5'-phosphosulfate transporter MVNPWKDYVKLSTVLMGSHGLTKGSLAFLNYPAQIMFKSAKVLPVMVMGAFVPGL... (5 Replies)
Discussion started by: Feeqa
5 Replies

7. Shell Programming and Scripting

Search words in a line and print next 15 lines.

I have a text file ( basically a log file) and i have 2 words (alpha, beta), Now i want to search these two words in one line and then print next 15 lines in a temp file. there would be many lines with alpha and beta But I need only last occurrence with "alpha" and "beta" and next 15 lines. ... (4 Replies)
Discussion started by: kashif.live
4 Replies

8. Shell Programming and Scripting

search for a date and print the contents below the line

Hi, We have a script which takes the backup of some files and writes the output into a log file for each run on a daily basis. Following is the extract from the log file. Date:20120917 ********************************************************** * BACKUP ACTIVITY STARTED ... (5 Replies)
Discussion started by: svajhala
5 Replies

9. Shell Programming and Scripting

Search string and print the above line and below lines?.

if the first string matches then print the previous line and current line and also print the following lines if the other string search matches. Input ------ TranTime 2012 10 12 The Record starts here Accountnumber: 4632473431274 TxnCode 323 TranID 329473242834 ccsdkcnsdncskd... (7 Replies)
Discussion started by: laknar
7 Replies

10. Shell Programming and Scripting

String search and print next all lines in one line until blank line

Dear all I want to search special string in file and then print next all line in one line until blank lines come. Help me plz for same. My input file and desire op file is as under. i/p file: A1/EXT "BSCABD1_21233G1" 757 130823 1157 RADIO X-CEIVER ADMINISTRATION BTS EXTERNAL FAULT ... (7 Replies)
Discussion started by: jaydeep_sadaria
7 Replies
DUFF(1) 						    BSD General Commands Manual 						   DUFF(1)

NAME
duff -- duplicate file finder SYNOPSIS
duff [-0HLPaeqprtz] [-d function] [-f format] [-l limit] [file ...] duff [-h] duff [-v] DESCRIPTION
The duff utility reports clusters of duplicates in the specified files and/or directories. In the default mode, duff prints a customizable header, followed by the names of all the files in the cluster. In excess mode, duff does not print a header, but instead for each cluster prints the names of all but the first of the files it includes. If no files are specified as arguments, duff reads file names from stdin. Note that as of version 0.4, duff ignores symbolic links to files, as that behavior was conceptually broken. Therefore, the -H, -L and -P options now apply only to directories. The following options are available: -0 If reading file names from stdin, assume they are null-terminated, instead of separated by newlines. Also, when printing file names and cluster headers, terminate them with null characters instead of newlines. This is useful for file names containing whitespace or other non-standard characters. -H Follow symbolic links listed on the command line. This overrides any previous -L or -P option. Note that this only applies to directories, as symbolic links to files are never followed. -L Follow all symbolic links. This overrides any previous -H or -P option. Note that this only applies to directories, as symbolic links to files are never followed. -P Don't follow any symbolic links. This overrides any previous -H or -L option. This is the default. Note that this only applies to directories, as symbolic links to files are never followed. -a Include hidden files and directories when searching recursively. -d function The message digest function to use. The supported functions are sha1, sha256, sha384 and sha512. The default is sha1. -e Excess mode. List all but one file from each cluster of duplicates. Also suppresses output of the cluster header. This is useful when you want to automate removal of duplicate files and don't care which duplicates are removed. -f format Set the format of the cluster header. If the header is set to the empty string, no header line is printed. The following escape sequences are available: %n The number of files in the cluster. %c A legacy synonym for %d, for compatibility reasons. %d The message digest of files in the cluster. This may not be combined with -t as no digest is calculated. %i The one-based index of the file cluster. %s The size, in bytes, of a file in the cluster. %% A '%' character. The default format string when using -t is: %n files in cluster %i (%s bytes) The default format string for other modes is: %n files in cluster %i (%s bytes, digest %d) -h Display help information and exit. -l limit The minimum size of files to be sampled. If the size of files in a cluster is equal or greater than the specified limit, duff will sample and compare a few bytes from the start of each file before calculating a full digest. This is stricly an optimization and does not affect which files are considered by duff. The default limit is zero bytes, i.e. to use sampling on all files. -q Quiet mode. Suppress warnings and error messages. -p Physical mode. Make duff consider physical files instead of hard links. If specified, multiple hard links to the same physical file will not be reported as duplicates. -r Recursively search into all specified directories. -t Thorough mode. Distrust digests as a guarantee for equality. In thorough mode, duff compares files byte by byte when their sizes match. -v Display version information and exit. -z Do not consider empty files to be equal. This option prevents empty files from being reported as duplicates. EXAMPLES
The command: duff -r foo/ lists all duplicate files in the directory foo and its subdirectories. The command: duff -e0 * | xargs -0 rm removes all duplicate files in the current directory. Note that you have no control over which files in each cluster that are selected by -e (excess mode). Use with care. The command: find . -name '*.h' -type f | duff lists all duplicate header files in the current directory and its subdirectories. The command: find . -name '*.h' -type f -print0 | duff -0 | xargs -0 -n1 echo lists all duplicate header files in the current directory and its subdirectories, correctly handling file names containing whitespace. Note the use of xargs and echo to remove the null separators again before listing. DIAGNOSTICS
The duff utility exits 0 on success, and >0 if an error occurs. SEE ALSO
find(1), xargs(1) AUTHORS
Camilla Berglund <elmindreda@elmindreda.org> BUGS
duff doesn't check whether the same file has been specified twice on the command line. This will lead it to report files listed multiple times as duplicates when not using -p (physical mode). Note that this problem only affects files, not directories. duff no longer (as of version 0.4) reports symbolic links to files as duplicates, as they're by definition always duplicates. This may break scripts relying on the previous behavior. If the underlying files are modified while duff is running, all bets are off. This is not really a bug, but it can still bite you. BSD
January 18, 2012 BSD
All times are GMT -4. The time now is 07:12 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy