Extract duplicate fields in rows

11-28-2007

Registered User

24, 0

Join Date: Nov 2007

Last Activity: 23 April 2013, 11:00 AM EDT

Posts: 24

Thanks Given: 0

Thanked 0 Times in 0 Posts

Extract duplicate fields in rows

I have a input file with formating:

6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;

Each fields is seperated by semi-comma. Sometime, the second files is duplicated. So I'd like to extract all the lines which have duplicated second field to a new file. Example for output file:

2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;

How to do this with awk or shell script ?
Other programming, I don't know.

anhtt

View Public Profile for anhtt

Find all posts by anhtt

11-28-2007

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

Try:

Code:

sort -t ';' -k 2,2 | awk 'dat==$2{print $0}{dat=$2}' file

Regards

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

11-28-2007

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

Cannot find a better solution right now

Code:

awk 'NR==FNR&&x[$2]++{y[$2];next}$2 in y' FS=";" file file

Use nawk or /usr/xpg4/bin/awk on Solaris.

radoulov

View Public Profile for radoulov

Find all posts by radoulov

11-29-2007

Registered User

1,305, 26

Join Date: Jun 2007

Last Activity: 11 November 2016, 3:44 AM EST

Location: Beijing China

Posts: 1,305

Thanks Given: 0

Thanked 26 Times in 26 Posts

awk

hi

code:

Code:

awk 'BEGIN{FS=" ;"}
{
if (temp=="")
{
	temp=$2
	t_line=$0
}
else if (temp==$2)
{
	print t_line
	print $0
	temp=""
	t_line=""
}
else
{
	temp=$2
	t_line=$0
}
}' filename

summer_cherry

View Public Profile for summer_cherry

Find all posts by summer_cherry

11-29-2007

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

summer_cherry,
I meant this (nonconsecutive duplicates):

Code:

$ cat file
6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
6000000901 ;36200103 ;h3a01f496 ;
$ awk 'BEGIN{FS=" ;"}                                       
{
if (temp=="")
{
temp=$2
t_line=$0
}
else if (temp==$2)
{
print t_line
print $0
temp=""
t_line=""
}
else
{
temp=$2
t_line=$0
}
}' file
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
$ awk 'NR==FNR&&x[$2]++{y[$2];next}$2 in y' FS=";" file file
6000000901 ;36200103 ;h3a01f496 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
6000000901 ;36200103 ;h3a01f496 ;

radoulov

View Public Profile for radoulov

Find all posts by radoulov

12-01-2007

Registered User

24, 0

Join Date: Nov 2007

Last Activity: 23 April 2013, 11:00 AM EDT

Posts: 24

Thanks Given: 0

Thanked 0 Times in 0 Posts

This is my script code
It's very simple shell script

cut -f2 -d";" $1 > /tmp/mdn1
sort /tmp/mdn1 | uniq -d > /tmp/mdn2
cat /tmp/mdn2 | while read line;
do
echo $line > /tmp/mdn3
x=`cut -f1 -d" " /tmp/mdn3`
echo $x
y=`grep "$x" "$1"`
echo $y >> duplicate
done
rm -f /tmp/mdn*

$1 is the input file and duplicate is the output file.

anhtt

View Public Profile for anhtt

Find all posts by anhtt

12-02-2007

Registered User

1,305, 26

Join Date: Jun 2007

Last Activity: 11 November 2016, 3:44 AM EST

Location: Beijing China

Posts: 1,305

Thanks Given: 0

Thanked 26 Times in 26 Posts

sort

Hi,

I am not quiet sure whether your output should be in the same sequence as original file.

If not, why not sorting it first and then use my awk code.

6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
6000000901 ;36200103 ;h3a01f496 ;

sort +1 file:

6000000901 ;36200103 ;h3a01f496 ;
6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;

Then it is ok to use my awk code.

summer_cherry

View Public Profile for summer_cherry

Find all posts by summer_cherry

Shell Programming and Scripting

Extract duplicate fields in rows

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract and exclude rows based on duplicate values

Discussion started by: CHoggarth

2. Shell Programming and Scripting

Extract duplicate rows with conditions

Discussion started by: jiam912

3. Shell Programming and Scripting

Extract and count number of Duplicate rows

Discussion started by: Arun Mishra

4. Shell Programming and Scripting

Delete duplicate rows

Discussion started by: jacobs.smith

5. Shell Programming and Scripting

Extract fields from different rows.

Discussion started by: chetan.c

6. Shell Programming and Scripting

Find duplicate based on 'n' fields and mark the duplicate as 'D'

Discussion started by: machomaddy

7. Shell Programming and Scripting

How to extract duplicate rows

Discussion started by: chromatin

8. HP-UX

How to get Duplicate rows in a file

Discussion started by: raghu.iv85

9. Shell Programming and Scripting

How to extract duplicate rows

Discussion started by: bobbygsk

10. Shell Programming and Scripting

duplicate rows in a file

Discussion started by: infyanurag