How to get Duplicate rows in a file


 
Thread Tools Search this Thread
Operating Systems HP-UX How to get Duplicate rows in a file
# 1  
Old 04-02-2009
How to get Duplicate rows in a file

Hi all,

I have written one shell script. The output file of this script is having sql output.

In that file, I want to extract the rows which are having multiple entries(duplicate rows).
For example, the output file will be like the following way.

===============================================================
<SH12_MC30_CE_VS_NY_HIST_T>
===============================================================
397 44847
400 33653
401 46455
===============================================================
<SH12_MC30_CE_VS_NY_HIST_T_BKP>
===============================================================
397 44847
398 40107
399 39338
400 33653


In this output, I want numeric duplicate rows only. Suppose this file is having lines to separate the values, those lines also considered as duplicate rows. So I want only the out put from this file which is having more than one entry and which is related to numbers.

Can anyone please tell me the command?
Thanks in advance.

Regards,
Raghu.Smilie
# 2  
Old 04-02-2009
Code:
cat file1 file2 | \
   grep -v -e '^='  -e '^<' | \
   awk '{ arr[$0]++} END{ for (i in arr) { if(arr[i]>1) { print i}  }}' > newfile

cat the files into grep to remove filenames in grep output, grep removes the header lines
# 3  
Old 04-02-2009
Quote:
Originally Posted by raghu.iv85
Hi all,

I have written one shell script. The output file of this script is having sql output.

In that file, I want to extract the rows which are having multiple entries(duplicate rows).
For example, the output file will be like the following way.

===============================================================
<SH12_MC30_CE_VS_NY_HIST_T>
===============================================================
397 44847
400 33653
401 46455
===============================================================
<SH12_MC30_CE_VS_NY_HIST_T_BKP>
===============================================================
397 44847
398 40107
399 39338
400 33653


In this output, I want numeric duplicate rows only. Suppose this file is having lines to separate the values, those lines also considered as duplicate rows. So I want only the out put from this file which is having more than one entry and which is related to numbers.

Can anyone please tell me the command?
Thanks in advance.

Regards,
Raghu.Smilie
Try this

Code:
#!/bin/ksh
sort $1 > sortedfile
nawk '{ while (getline < sortedfile >0); array[n++]=$0; compare and remove non dup record here}'

# 4  
Old 04-02-2009
Hi Jim,


I could understand till second line of ur command.
I couldn't understand the awk part. Becoz i dont know the awk features.
But it is working. Thank you very much for that. 'awk' is so nice.
Can you give any aother way to get it instead of awk.

Thanks & Regards,
Raghunadh.
# 5  
Old 04-02-2009
Code:
nawk '/^[0-9]/ {a[$0]++} END {for (i in a) if (a[i]>1) print i}' myOutputFile

# 6  
Old 04-02-2009
Hi vgersh99,

Thank you very much for ur reply.
'nawk' command id nice. But I dont know the 'awk' functionalities. So if I put this command in my script then I cant explain this command to anyone. So can you please provide me the command instead of 'awk' and 'nawk'.


Thanks in advance,

Regards,
Raghu.
# 7  
Old 04-02-2009
Hammer & Screwdriver another way to approach

I used awk at end only to handle output format. This could be done with a cut command also, although extra care is necessary for positioning.


Code:
> cat file9
===============================================================
<SH12_MC30_CE_VS_NY_HIST_T>
===============================================================
397 44847
400 33653
401 46455
===============================================================
<SH12_MC30_CE_VS_NY_HIST_T_BKP>
===============================================================
397 44847
398 40107
399 39338
400 33653

> grep "^[0-9]" file9 | sort | uniq -cd
      2 397 44847
      2 400 33653

> grep "^[0-9]" file9 | sort | uniq -cd | awk '{print $2" "$3}'
397 44847
400 33653

and, if your really don't want awk
Code:
> grep "^[0-9]" file9 | sort | uniq -cd | tr -s " " | cut -d" " -f3-4
397 44847
400 33653

Added quicker way -->
Code:
> grep "^[0-9]" file9 | sort | uniq -d 
397 44847
400 33653


Last edited by joeyg; 04-02-2009 at 01:38 PM.. Reason: added quicker way
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Get duplicate rows from a csv file

How can i get the duplicates rows from a file using unix, for example i have data like a,1 b,2 c,3 d,4 a,1 c,3 e,5 i want output to be like a,1 c,3 (4 Replies)
Discussion started by: ggupta
4 Replies

2. Shell Programming and Scripting

Removing Duplicate Rows in a file

Hello I have a file with contents like this... Part1 Field2 Field3 Field4 (line1) Part2 Field2 Field3 Field4 (line2) Part3 Field2 Field3 Field4 (line3) Part1 Field2 Field3 Field4 (line4) Part4 Field2 Field3 Field4 (line5) Part5 Field2 Field3 Field4 (line6) Part2 Field2 Field3 Field4... (7 Replies)
Discussion started by: ekbaazigar
7 Replies

3. Shell Programming and Scripting

Delete duplicate rows

Hi, This is a followup to my earlier post him mno klm 20 76 . + . klm_mango unix_00000001; alp fdc klm 123 456 . + . klm_mango unix_0000103; her tkr klm 415 439 . + . klm_mango unix_00001043; abc tvr klm 20 76 . + . klm_mango unix_00000001; abc def klm 83 84 . + . klm_mango... (5 Replies)
Discussion started by: jacobs.smith
5 Replies

4. Shell Programming and Scripting

Duplicate rows in a text file

notes: i am using cygwin and notepad++ only for checking this and my OS is XP. #!/bin/bash typeset -i totalvalue=(wc -w /cygdrive/c/cygwinfiles/database.txt) typeset -i totallines=(wc -l /cygdrive/c/cygwinfiles/database.txt) typeset -i columnlines=`expr $totalvalue / $totallines` awk -F' ' -v... (5 Replies)
Discussion started by: whitecross
5 Replies

5. Shell Programming and Scripting

How to extract duplicate rows

Hi! I have a file as below: line1 line2 line2 line3 line3 line3 line4 line4 line4 line4 I would like to extract duplicate lines (not unique, triplicate or quadruplicate lines). Output will be as below: line2 line2 I would appreciate if anyone can help. Thanks. (4 Replies)
Discussion started by: chromatin
4 Replies

6. Shell Programming and Scripting

To remove date and duplicate rows from a log file using unix commands

Hi, I have a log file having size of 48mb. For such a large log file. I want to get the message in a particular format which includes only unique error and exception messages. The following things to be done : 1) To remove all the date and time from the log file 2) To remove all the... (1 Reply)
Discussion started by: Pank10
1 Replies

7. Shell Programming and Scripting

How to extract duplicate rows

I have searched the internet for duplicate row extracting. All I have seen is extracting good rows or eliminating duplicate rows. How do I extract duplicate rows from a flat file in unix. I'm using Korn shell on HP Unix. For.eg. FlatFile.txt ======== 123:456:678 123:456:678 123:456:876... (5 Replies)
Discussion started by: bobbygsk
5 Replies

8. UNIX for Dummies Questions & Answers

Remove duplicate rows of a file based on a value of a column

Hi, I am processing a file and would like to delete duplicate records as indicated by one of its column. e.g. COL1 COL2 COL3 A 1234 1234 B 3k32 2322 C Xk32 TTT A NEW XX22 B 3k32 ... (7 Replies)
Discussion started by: risk_sly
7 Replies

9. Shell Programming and Scripting

how to delete duplicate rows in a file

I have a file content like below. "0000000","ABLNCYI","BOTH",1049,2058,"XYZ","5711002","","Y","","","","","","","","" "0000000","ABLNCYI","BOTH",1049,2058,"XYZ","5711002","","Y","","","","","","","","" "0000000","ABLNCYI","BOTH",1049,2058,"XYZ","5711002","","Y","","","","","","","",""... (5 Replies)
Discussion started by: vamshikrishnab
5 Replies

10. Shell Programming and Scripting

duplicate rows in a file

hi all can anyone please let me know if there is a way to find out duplicate rows in a file. i have a file that has hundreds of numbers(all in next row). i want to find out the numbers that are repeted in the file. eg. 123434 534 5575 4746767 347624 5575 i want 5575 please help (3 Replies)
Discussion started by: infyanurag
3 Replies
Login or Register to Ask a Question