Need to print duplicate row along with highest version of original


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Need to print duplicate row along with highest version of original
# 1  
Old 07-05-2013
Need to print duplicate row along with highest version of original

There are some duplicate field on description column .I want to print duplicate row along with highest version of number and corresponding description column.

Code:
file1.txt
number   Description
===     ============
34567  nl21a00is-centerdb001:ncdbareq:Error in loading init
34577  nl21a00is-centerdb001:ncdbareq:Error in loading init
45678  nl21a00is-centerdb001:ncdbareq:Error in loading Sizing
43567  nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info
24578  nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn
45890  nl21a00is-centerdb001:testingQA:FSFO has configuration errors
45698  nl21a00is-centerdb001:ncdbareq:Error in loading Sizing
43599  nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info
25578  nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn
51890  nl21a00is-centerdb001:ncdbareq:Error in loading init

Code:
out.txt 
34567  nl21a00is-centerdb001:ncdbareq:Error in loading init  IS DUPLICATE OF "51890  nl21a00is-centerdb001:ncdbareq:Error in loading init"
34577  nl21a00is-centerdb001:ncdbareq:Error in loading init  IS DUPLICATE OF "51890  nl21a00is-centerdb001:ncdbareq:Error in loading init"
45678  nl21a00is-centerdb001:ncdbareq:Error in loading Sizing IS DUPLICATE OF "45698  nl21a00is-centerdb001:ncdbareq:Error in loading Sizing"
43567  nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info IS DUPLIATE OF "43599  nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info"
24578  nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn IS DUPLICATE OF "25578  nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn"

# 2  
Old 07-05-2013
An awk approach:
Code:
awk '
        NR > 2 {
                V = $0
                sub ( $1, X, V )
                gsub ( /^[ ]*|[ ]*$/, X, V )
                R[++c] = $1 "," V

                if ( V in A )
                {
                        if ( A[V] < $1 )
                        {
                                M[V] = $1
                        }
                }
                else
                {
                        A[V] = $1
                }
        }
        END {
                for ( i = 1; i <= c; i++ )
                {
                        n = split ( R[i], T, "," )
                        if ( M[T[n]] != T[1] && M[T[n]] )
                                print A[T[n]], T[n], "IS DUPLICATE OF \"" M[T[n]], T[n] "\""
                }
        }
' file

# 3  
Old 07-05-2013
The second duplicate entry 34577 is not reported ...the output.txt as per script given..
Code:
34567 nl21a00is-centerdb001:ncdbareq:Error in loading init IS DUPLICATE OF "51890 nl21a00is-centerdb001:ncdbareq:Error in loading init"
34567 nl21a00is-centerdb001:ncdbareq:Error in loading init IS DUPLICATE OF "51890 nl21a00is-centerdb001:ncdbareq:Error in loading init"
45678 nl21a00is-centerdb001:ncdbareq:Error in loading Sizing IS DUPLICATE OF "45698 nl21a00is-centerdb001:ncdbareq:Error in loading Sizing"
43567 nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info IS DUPLICATE OF "43599 nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info"
24578 nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn IS DUPLICATE OF "25578 nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn"

and in second line its showing 34567 nl21a00is-centerdb001:ncdbareq:Error in loading init IS DUPLICATE OF "51890 nl21a00is-centerdb001:ncdbareq:Error in loading init"

instead of

34577 nl21a00is-centerdb001:ncdbareq:Error in loading init IS DUPLICATE OF "51890 nl21a00is-centerdb001:ncdbareq:Error in loading init"
# 4  
Old 07-05-2013
Change
Code:
print A[T[n]], T[n], "IS DUPLICATE OF \"" M[T[n]], T[n] "\""

To
Code:
print T[1], T[n], "IS DUPLICATE OF \"" M[T[n]], T[n] "\""

This User Gave Thanks to Yoda For This Post:
# 5  
Old 07-05-2013
This is perfect....can you please explain this code..thanks
# 6  
Old 07-05-2013
Here is a brief explanation:
Code:
awk '
        # Skip first two records of input file
        NR > 2 {
                # Set variable V = $0 (current record)
                V = $0
                # Remove first field to get the description in variable: V value
                sub ( $1, X, V )
                # Remove leading and trailing space from description in variable: V value
                gsub ( /^[ ]*|[ ]*$/, X, V )
                # Create indexed array: R with 1st and 2nd field separated by comma
                R[++c] = $1 "," V
                Check if associative array: A contain record indexed by variable: V value
                if ( V in A )
                {
                        If yes, compare if existing vale is less that 1st field value
                        if ( A[V] < $1 )
                        {
                                Set associate array: M = $1 (maximum value)
                                M[V] = $1
                        }
                }
                # If associative array: A does not contain record indexed by V value
                else
                {
                        Set associative array: A indexed by V = $1
                        A[V] = $1
                        Set associative array: M indexed by V = $1
                        M[V] = $1
                }
        }
        # END Block
        END {
                # For each element in indexed array: R
                for ( i = 1; i <= c; i++ )
                {
                        # Split record separated by comma into array: T
                        n = split ( R[i], T, "," )
                        # Print records that are having duplicates and not having maximum value
                        if ( M[T[n]] != T[1] && M[T[n]] )
                                print T[1], T[n], "IS DUPLICATE OF \"" M[T[n]], T[n] "\""
                }
        }
' file

This User Gave Thanks to Yoda For This Post:
# 7  
Old 07-16-2013
Hi

I want to print the below output file in tabular format.

output

Code:
34567 nl21a00is-centerdb001:ncdbareq:Error in loading init IS DUPLICATE OF "51890 nl21a00is-centerdb001:ncdbareq:Error in loading init"
34567 nl21a00is-centerdb001:ncdbareq:Error in loading init IS DUPLICATE OF "51890 nl21a00is-centerdb001:ncdbareq:Error in loading init"
45678 nl21a00is-centerdb001:ncdbareq:Error in loading Sizing IS DUPLICATE OF "45698 nl21a00is-centerdb001:ncdbareq:Error in loading Sizing"
43567 nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info IS DUPLICATE OF "43599 nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info"
24578 nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn IS DUPLICATE OF "25578 nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn"

means i need to put all output file in tabular format using html.
can you please help me

DESIRE outfile

DUPLICATE ENTRY NEWLY GENERATED TICKET
Code:
34567  nl21a00is-centerdb001:ncdbareq:Error in loading init  	51890  nl21a00is-centerdb001:ncdbareq:Error in loading init
34577  nl21a00is-centerdb001:ncdbareq:Error in loading init 	51890  nl21a00is-centerdb001:ncdbareq:Error in loading init
45678  nl21a00is-centerdb001:ncdbareq:Error in loading Sizing	45698  nl21a00is-centerdb001:ncdbareq:Error in loading Sizing
43567  nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info	43599  nl21a00is-centerdb001:ncdbareq:Error in loading DBMS info
24578  nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn	25578  nl21a00is-centerdb001:ncdbareq:Error in loading Trig/Proc/Syn


Last edited by Scrutinizer; 07-16-2013 at 03:14 AM.. Reason: need to correct outfile; please continue to use code tags
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Print whole line with highest value from one column

Hi, I have a little issue right now. I have a file with 4 columns test0000002,10030010330,c_,218 test0000002,10030010330,d_,202 test0000002,10030010330,b_,193 test0000002,10030010020,c_,178 test0000002,10030010020,b_,170 test0000002,10030010330,a_,166 test0000002,10030010020,a_,151... (3 Replies)
Discussion started by: Ebk
3 Replies

2. Shell Programming and Scripting

Need to show highest version line from the list

Hi All, Need help here, can you tell me the syntax to line grep the highest file version? 0 04-05-2016 08:00 lib/SBSSchemaProject.jar/schemas/ 0 04-05-2016 08:00 lib/SBSSchemaProject.jar/schemas/airprice/ 0 04-05-2016 08:00 ... (2 Replies)
Discussion started by: 100rin
2 Replies

3. Shell Programming and Scripting

Filtering out duplicates with the highest version number

Hi, I have a huge text file with filenames which which looks like the following ie uniquenumber_version_filename: e.g. 1234_1_xxxx 1234_2_vfvfdbb 343333_1_vfvfdvd 2222222_1_ggggg 55555_1_xxxxxx 55555_2_vrbgbgg 55555_3_grgrbr What I need to do is examine the file, look for... (4 Replies)
Discussion started by: mantis
4 Replies

4. Shell Programming and Scripting

Remove duplicates and update last 2 digits of the original row with 0's

Hi, I have a requirement where I have to remove duplicates from a file based on the first 8 chars (It is fixed width file of 10 chars length) and whenever a duplicate row is found, its original row's last 2 chars should be updated to all 0's. I thought of using sort -u -k 1.1,1.8... (4 Replies)
Discussion started by: farawaydsky
4 Replies

5. Shell Programming and Scripting

Print the key with highest value

print the key with highest value input a 10 a 20 a 30 b 2 b 3 b 1 output a 30 b 3 (9 Replies)
Discussion started by: quincyjones
9 Replies

6. UNIX for Dummies Questions & Answers

Print line with highest value from one column

Hi everyone, This is my first post, but I have already received a lot of help from the forums in the past. Thanks! I've searched the forums and my question is very similar to an earlier post entitled "Printing highest value from one column", which I am apparently not yet allowed to post a... (1 Reply)
Discussion started by: dliving3
1 Replies

7. Shell Programming and Scripting

How to delete a duplicate line and original with sed.

I am completely new to shell scripting but have been assigned the task of creating several batch files to manipulate data. My final task requires me to find lines that have duplicates present then delete not only the duplicate but the original as well. The script will be used in a windows... (9 Replies)
Discussion started by: chino_1
9 Replies

8. UNIX for Advanced & Expert Users

tar: how to preserve atime? (also on extracted version, not just original)

How do I make tar set the correct atime on the extracted version? The option --atime-preserve works just on the original, not on the extracted file. The extracted files always have current time as atime, which is bad. (10 Replies)
Discussion started by: frankie06
10 Replies

9. UNIX for Dummies Questions & Answers

about UNIX? original version?

sorry for my English We'll report about Unix in my school, for Operating Systems subject... with Installation demo.... I'm wondering if System V, which is from original developers AT&T still exist and downloadable? because I cant find it anywhere... then i found out that Solaris, MacOS... (4 Replies)
Discussion started by: slowchem
4 Replies

10. Linux Benchmarks

Original BYTE UNIX Benchmarks (Version 3.11)

Just dusted off an old version of the Byte UNIX Benchmarks from our old benchmark days at http://linux.silkroad.com/ and ran them against www.unix.com: ============================================================== BYTE UNIX Benchmarks (Version 3.11) System -- Linux www 2.4.20 #2 Mon... (0 Replies)
Discussion started by: Neo
0 Replies
Login or Register to Ask a Question