The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Google UNIX.COM


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
How to extract duplicate rows bobbygsk Shell Programming and Scripting 5 2 Weeks Ago 08:31 AM
duplicate rows in a file infyanurag Shell Programming and Scripting 3 05-21-2008 09:39 PM
Awk Command to extract Fields usshell Shell Programming and Scripting 1 04-30-2008 07:35 AM
Extract fields from from this rnallamothu Shell Programming and Scripting 0 06-05-2007 04:05 PM
how to extract fields for /etc/passwd mopimp Shell Programming and Scripting 6 03-30-2006 09:35 PM

Reply
 
Submit Tools LinkBack Thread Tools Search this Thread Display Modes
  #1  
Old 11-28-2007
Registered User
 

Join Date: Nov 2007
Posts: 22
Extract duplicate fields in rows

I have a input file with formating:

6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;

Each fields is seperated by semi-comma. Sometime, the second files is duplicated. So I'd like to extract all the lines which have duplicated second field to a new file. Example for output file:

2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;

How to do this with awk or shell script ?
Other programming, I don't know.
Reply With Quote
Forum Sponsor
  #2  
Old 11-28-2007
Moderator
 

Join Date: Feb 2007
Posts: 2,329
Try:

Code:
sort -t ';' -k 2,2 | awk 'dat==$2{print $0}{dat=$2}' file
Regards
Reply With Quote
  #3  
Old 11-28-2007
radoulov's Avatar
addict
 

Join Date: Jan 2007
Location: Milano, Italia/Варна, България
Posts: 1,933
Cannot find a better solution right now

Code:
awk 'NR==FNR&&x[$2]++{y[$2];next}$2 in y' FS=";" file file
Use nawk or /usr/xpg4/bin/awk on Solaris.
Reply With Quote
  #4  
Old 11-28-2007
Registered User
 

Join Date: Jun 2007
Location: Beijing China
Posts: 495
awk

hi

code:
Code:
awk 'BEGIN{FS=" ;"}
{
if (temp=="")
{
	temp=$2
	t_line=$0
}
else if (temp==$2)
{
	print t_line
	print $0
	temp=""
	t_line=""
}
else
{
	temp=$2
	t_line=$0
}
}' filename
Reply With Quote
  #5  
Old 11-29-2007
radoulov's Avatar
addict
 

Join Date: Jan 2007
Location: Milano, Italia/Варна, България
Posts: 1,933
summer_cherry,
I meant this (nonconsecutive duplicates):

Code:
$ cat file
6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
6000000901 ;36200103 ;h3a01f496 ;
$ awk 'BEGIN{FS=" ;"}                                       
{
if (temp=="")
{
temp=$2
t_line=$0
}
else if (temp==$2)
{
print t_line
print $0
temp=""
t_line=""
}
else
{
temp=$2
t_line=$0
}
}' file
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
$ awk 'NR==FNR&&x[$2]++{y[$2];next}$2 in y' FS=";" file file
6000000901 ;36200103 ;h3a01f496 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
6000000901 ;36200103 ;h3a01f496 ;
Reply With Quote
  #6  
Old 12-01-2007
Registered User
 

Join Date: Nov 2007
Posts: 22
This is my script code
It's very simple shell script

cut -f2 -d";" $1 > /tmp/mdn1
sort /tmp/mdn1 | uniq -d > /tmp/mdn2
cat /tmp/mdn2 | while read line;
do
echo $line > /tmp/mdn3
x=`cut -f1 -d" " /tmp/mdn3`
echo $x
y=`grep "$x" "$1"`
echo $y >> duplicate
done
rm -f /tmp/mdn*

$1 is the input file and duplicate is the output file.
Reply With Quote
  #7  
Old 12-02-2007
Registered User
 

Join Date: Jun 2007
Location: Beijing China
Posts: 495
sort

Hi,

I am not quiet sure whether your output should be in the same sequence as original file.

If not, why not sorting it first and then use my awk code.

6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
6000000901 ;36200103 ;h3a01f496 ;

sort +1 file:

6000000901 ;36200103 ;h3a01f496 ;
6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;

Then it is ok to use my awk code.
Reply With Quote
Google The UNIX and Linux Forums
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes




All times are GMT -7. The time now is 07:18 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008. All Rights Reserved.Ad Management by RedTyger Visit The Complex Event Processing Blog

Content Relevant URLs by vBSEO 3.2.0