How to get Duplicate rows in a file

04-02-2009

Registered User

74, 0

Join Date: Mar 2009

Last Activity: 9 April 2014, 11:00 PM EDT

Posts: 74

Thanks Given: 2

Thanked 0 Times in 0 Posts

How to get Duplicate rows in a file

Hi all,

I have written one shell script. The output file of this script is having sql output.

In that file, I want to extract the rows which are having multiple entries(duplicate rows).
For example, the output file will be like the following way.

===============================================================
<SH12_MC30_CE_VS_NY_HIST_T>
===============================================================
397 44847
400 33653
401 46455
===============================================================
<SH12_MC30_CE_VS_NY_HIST_T_BKP>
===============================================================
397 44847
398 40107
399 39338
400 33653

In this output, I want numeric duplicate rows only. Suppose this file is having lines to separate the values, those lines also considered as duplicate rows. So I want only the out put from this file which is having more than one entry and which is related to numbers.

Can anyone please tell me the command?
Thanks in advance.

Regards,
Raghu.

raghu.iv85

View Public Profile for raghu.iv85

Find all posts by raghu.iv85

04-02-2009

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

Code:

cat file1 file2 | \
   grep -v -e '^='  -e '^<' | \
   awk '{ arr[$0]++} END{ for (i in arr) { if(arr[i]>1) { print i}  }}' > newfile

cat the files into grep to remove filenames in grep output, grep removes the header lines

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

04-02-2009

Registered User

44, 0

Join Date: Apr 2009

Last Activity: 8 March 2020, 4:43 PM EDT

Posts: 44

Thanks Given: 0

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by raghu.iv85

Try this

Code:

#!/bin/ksh
sort $1 > sortedfile
nawk '{ while (getline < sortedfile >0); array[n++]=$0; compare and remove non dup record here}'

siquadri

View Public Profile for siquadri

Find all posts by siquadri

04-02-2009

Registered User

74, 0

Join Date: Mar 2009

Last Activity: 9 April 2014, 11:00 PM EDT

Posts: 74

Thanks Given: 2

Thanked 0 Times in 0 Posts

Hi Jim,

I could understand till second line of ur command.
I couldn't understand the awk part. Becoz i dont know the awk features.
But it is working. Thank you very much for that. 'awk' is so nice.
Can you give any aother way to get it instead of awk.

Thanks & Regards,
Raghunadh.

raghu.iv85

View Public Profile for raghu.iv85

Find all posts by raghu.iv85

04-02-2009

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

Code:

nawk '/^[0-9]/ {a[$0]++} END {for (i in a) if (a[i]>1) print i}' myOutputFile

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

04-02-2009

Registered User

74, 0

Join Date: Mar 2009

Last Activity: 9 April 2014, 11:00 PM EDT

Posts: 74

Thanks Given: 2

Thanked 0 Times in 0 Posts

Hi vgersh99,

Thank you very much for ur reply.
'nawk' command id nice. But I dont know the 'awk' functionalities. So if I put this command in my script then I cant explain this command to anyone. So can you please provide me the command instead of 'awk' and 'nawk'.

Thanks in advance,

Regards,
Raghu.

raghu.iv85

View Public Profile for raghu.iv85

Find all posts by raghu.iv85

04-02-2009

Registered User

2,524, 241

Join Date: Dec 2007

Last Activity: 17 March 2020, 2:04 PM EDT

Posts: 2,524

Thanks Given: 173

Thanked 241 Times in 206 Posts

another way to approach

I used awk at end only to handle output format. This could be done with a cut command also, although extra care is necessary for positioning.

Code:

> cat file9
===============================================================
<SH12_MC30_CE_VS_NY_HIST_T>
===============================================================
397 44847
400 33653
401 46455
===============================================================
<SH12_MC30_CE_VS_NY_HIST_T_BKP>
===============================================================
397 44847
398 40107
399 39338
400 33653

> grep "^[0-9]" file9 | sort | uniq -cd
      2 397 44847
      2 400 33653

> grep "^[0-9]" file9 | sort | uniq -cd | awk '{print $2" "$3}'
397 44847
400 33653

and, if your really don't want awk

Code:

> grep "^[0-9]" file9 | sort | uniq -cd | tr -s " " | cut -d" " -f3-4
397 44847
400 33653

Added quicker way -->

Code:

> grep "^[0-9]" file9 | sort | uniq -d 
397 44847
400 33653

Last edited by joeyg; 04-02-2009 at 01:38 PM.. Reason: added quicker way

joeyg

View Public Profile for joeyg

Find all posts by joeyg

HP-UX

How to get Duplicate rows in a file

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Get duplicate rows from a csv file

Discussion started by: ggupta

2. Shell Programming and Scripting

Removing Duplicate Rows in a file

Discussion started by: ekbaazigar

3. Shell Programming and Scripting

Delete duplicate rows

Discussion started by: jacobs.smith

4. Shell Programming and Scripting

Duplicate rows in a text file

Discussion started by: whitecross

5. Shell Programming and Scripting

How to extract duplicate rows

Discussion started by: chromatin

6. Shell Programming and Scripting

To remove date and duplicate rows from a log file using unix commands

Discussion started by: Pank10

7. Shell Programming and Scripting

How to extract duplicate rows

Discussion started by: bobbygsk

8. UNIX for Dummies Questions & Answers

Remove duplicate rows of a file based on a value of a column

Discussion started by: risk_sly

9. Shell Programming and Scripting

how to delete duplicate rows in a file

Discussion started by: vamshikrishnab

10. Shell Programming and Scripting

duplicate rows in a file

Discussion started by: infyanurag