Sponsored Content
Top Forums Shell Programming and Scripting Need to print duplicate row along with highest version of original Post 302829671 by vijay_rajni on Friday 5th of July 2013 01:34:22 PM
Old 07-05-2013
Need to print duplicate row along with highest version of original

There are some duplicate field on description column .I want to print duplicate row along with highest version of number and corresponding description column.

Code:
file1.txt
number   Description
===     ============
34567  nl21a00is-centerdb001:ncdbareq:Error in loading init
34577  nl21a00is-centerdb001:ncdbareq:Error in loading init
45678  nl21a00is-centerdb001:ncdbareq:Error in loading Sizing
43567  nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info
24578  nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn
45890  nl21a00is-centerdb001:testingQA:FSFO has configuration errors
45698  nl21a00is-centerdb001:ncdbareq:Error in loading Sizing
43599  nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info
25578  nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn
51890  nl21a00is-centerdb001:ncdbareq:Error in loading init

Code:
out.txt 
34567  nl21a00is-centerdb001:ncdbareq:Error in loading init  IS DUPLICATE OF "51890  nl21a00is-centerdb001:ncdbareq:Error in loading init"
34577  nl21a00is-centerdb001:ncdbareq:Error in loading init  IS DUPLICATE OF "51890  nl21a00is-centerdb001:ncdbareq:Error in loading init"
45678  nl21a00is-centerdb001:ncdbareq:Error in loading Sizing IS DUPLICATE OF "45698  nl21a00is-centerdb001:ncdbareq:Error in loading Sizing"
43567  nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info IS DUPLIATE OF "43599  nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info"
24578  nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn IS DUPLICATE OF "25578  nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn"

 

10 More Discussions You Might Find Interesting

1. Linux Benchmarks

Original BYTE UNIX Benchmarks (Version 3.11)

Just dusted off an old version of the Byte UNIX Benchmarks from our old benchmark days at http://linux.silkroad.com/ and ran them against www.unix.com: ============================================================== BYTE UNIX Benchmarks (Version 3.11) System -- Linux www 2.4.20 #2 Mon... (0 Replies)
Discussion started by: Neo
0 Replies

2. UNIX for Dummies Questions & Answers

about UNIX? original version?

sorry for my English We'll report about Unix in my school, for Operating Systems subject... with Installation demo.... I'm wondering if System V, which is from original developers AT&T still exist and downloadable? because I cant find it anywhere... then i found out that Solaris, MacOS... (4 Replies)
Discussion started by: slowchem
4 Replies

3. UNIX for Advanced & Expert Users

tar: how to preserve atime? (also on extracted version, not just original)

How do I make tar set the correct atime on the extracted version? The option --atime-preserve works just on the original, not on the extracted file. The extracted files always have current time as atime, which is bad. (10 Replies)
Discussion started by: frankie06
10 Replies

4. Shell Programming and Scripting

How to delete a duplicate line and original with sed.

I am completely new to shell scripting but have been assigned the task of creating several batch files to manipulate data. My final task requires me to find lines that have duplicates present then delete not only the duplicate but the original as well. The script will be used in a windows... (9 Replies)
Discussion started by: chino_1
9 Replies

5. UNIX for Dummies Questions & Answers

Print line with highest value from one column

Hi everyone, This is my first post, but I have already received a lot of help from the forums in the past. Thanks! I've searched the forums and my question is very similar to an earlier post entitled "Printing highest value from one column", which I am apparently not yet allowed to post a... (1 Reply)
Discussion started by: dliving3
1 Replies

6. Shell Programming and Scripting

Print the key with highest value

print the key with highest value input a 10 a 20 a 30 b 2 b 3 b 1 output a 30 b 3 (9 Replies)
Discussion started by: quincyjones
9 Replies

7. Shell Programming and Scripting

Remove duplicates and update last 2 digits of the original row with 0's

Hi, I have a requirement where I have to remove duplicates from a file based on the first 8 chars (It is fixed width file of 10 chars length) and whenever a duplicate row is found, its original row's last 2 chars should be updated to all 0's. I thought of using sort -u -k 1.1,1.8... (4 Replies)
Discussion started by: farawaydsky
4 Replies

8. Shell Programming and Scripting

Filtering out duplicates with the highest version number

Hi, I have a huge text file with filenames which which looks like the following ie uniquenumber_version_filename: e.g. 1234_1_xxxx 1234_2_vfvfdbb 343333_1_vfvfdvd 2222222_1_ggggg 55555_1_xxxxxx 55555_2_vrbgbgg 55555_3_grgrbr What I need to do is examine the file, look for... (4 Replies)
Discussion started by: mantis
4 Replies

9. Shell Programming and Scripting

Need to show highest version line from the list

Hi All, Need help here, can you tell me the syntax to line grep the highest file version? 0 04-05-2016 08:00 lib/SBSSchemaProject.jar/schemas/ 0 04-05-2016 08:00 lib/SBSSchemaProject.jar/schemas/airprice/ 0 04-05-2016 08:00 ... (2 Replies)
Discussion started by: 100rin
2 Replies

10. Shell Programming and Scripting

Print whole line with highest value from one column

Hi, I have a little issue right now. I have a file with 4 columns test0000002,10030010330,c_,218 test0000002,10030010330,d_,202 test0000002,10030010330,b_,193 test0000002,10030010020,c_,178 test0000002,10030010020,b_,170 test0000002,10030010330,a_,166 test0000002,10030010020,a_,151... (3 Replies)
Discussion started by: Ebk
3 Replies
rdfind(1)							      rdfind								 rdfind(1)

NAME
rdfind - finds duplicate files SYNOPSIS
rdfind [ options ] directory1 | file1 [ directory2 | file2 ] ... DESCRIPTION
rdfind finds duplicate files across and/or within several directories. It calculates checksum only if necessary. rdfind runs in O(Nlog(N)) time with N being the number of files. If two (or more) equal files are found, the program decides which of them is the original and the rest are considered duplicates. This is done by ranking the files to each other and deciding which has the highest rank. See section RANKING for details. If you need better control over the ranking than given, you can use some preprocessor which sorts the file names in desired order and then run the program using xargs. See examples below for how to use find and xargs in conjunction with rdfind. To include files or directories that have names starting with -, use rdfind ./- to not confuse them with options. RANKING
Given two or more equal files, the one with the highest rank is selected to be the original and the rest are duplicates. The rules of rank- ing are given below, where the rules are executed from start until an original has been found. Given two files A and B which have equal content, the ranking is as follows: If A was found while scanning an input argument earlier than than B, A is higher ranked. If A was found at a depth lower than B, A is higher ranked (A closer to the root) If A was found earlier than B, A is higher ranked. The last rule is needed when two files are found in the same directory (obviously not given in separate arguments, otherwise the first rule applies) and gives the same order between the files as the operating system delivers the files while listing the directory. This is operat- ing system specific behaviour. OPTIONS
Searching options etc: -ignoreempty true|false Ignore empty files. (default) -followsymlinks true|false Follow symlinks. Default is false. -removeidentinode true|false removes items found which have identical inode and device ID. Default is true. -checksum md5|sha1 what type of checksum to be used: md5 or sha1. Default is md5. Action options: -makesymlinks true|false Replace duplicate files with symbolic links -makehardlinks true|false Replace duplicate files with hard links -makeresultsfile true|false Make a results file results.txt (default) in the current directory. -outputname name Make the results file name to be "name" instead of the default results.txt. -deleteduplicates true|false Delete (unlink) files. General options: -sleep Xms sleeps X milliseconds between reading each file, to reduce load. Default is 0 (no sleep). Note that only a few values are supported at present: 0,1-5,10,25,50,100 milliseconds. -n -dryrun displays what should have been done, dont actually delete or link anything. -h, -help, --help displays brief help message. -v, -version, --version displays version number. EXAMPLES
Search for duplicate files in home directory and a backup directory: rdfind ~ /mnt/backup Delete duplicate in a backup directory: rdfind -deletefiles true /mnt/backup Search for duplicate files in directories called foo: find . -type d -name foo -print0 |xargs -0 rdfind FILES
results.txt (the default name is results.txt and can be changed with option outputname, see above) The results file results.txt will con- tain one row per duplicate file found, along with a header row explaining the columns. A text describes why the file is considered a duplicate: DUPTYPE_UNKNOWN some internal error DUPTYPE_FIRST_OCCURRENCE the file that is considered to be the original. DUPTYPE_WITHIN_SAME_TREE files in the same tree (found when processing the directory in the same input argument as the original) DUPTYPE_OUTSIDE_TREE the file is found during processing another input argument than the original. ENVIRONMENT
DIAGNOSTICS
EXIT VALUES
0 on success, nonzero otherwise. BUGS
/FEATURES When specifying the same directory twice, it keeps the first encountered as the most important (original), and the rest as duplicates. This might not be what you want. The symlink creates absolute links. This might not be what you want. To create relative links instead, you may use the symlinks (2) com- mand, which is able to convert absolute links to relative links. Older versions unfortunately contained a misspelling on the word occurrence. This is now corrected (since 1.3), which might affect user scripts parsing the output file written by rdfind. There are lots of enhancements left to do. Please contribute! SECURITY CONSIDERATIONS
Avoid manipulating the directories while rdfind is reading. rdfind is quite brittle in that case. Especially, when deleting or making links, rdfind can be subject to a symlink attack. Use with care! AUTHOR
Paul Dreik 2006, reachable at rdfind@pauldreik.se Rdfind can be found at http://rdfind.pauldreik.se/ Do you find rdfind useful? Drop me a line! It is always fun to hear from people who actually use it and what data collections they run it on. THANKS
Several persons have helped with suggestions and improvements: Niels Moller, Carl Payne and Salvatore Ansani. Thanks also to you who tested the program and sent me feedback. VERSION
1.3.1 (release date 2012-05-07) svn id: $Id: rdfind.1 766 2012-05-07 17:26:17Z pauls $ COPYRIGHT
This program is distributed under GPLv2 or later, at your option. SEE ALSO
md5sum(1), find(1), symlinks(2) May 2012 1.3.1 rdfind(1)
All times are GMT -4. The time now is 06:49 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy