![]() |
|
|
|
|
|||||||
| Forums | Portal | Register | Forum Rules | FAQ | Contribute | Members List | Arcade | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here. |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| How to extract duplicate rows | bobbygsk | Shell Programming and Scripting | 5 | 2 Weeks Ago 08:31 AM |
| duplicate rows in a file | infyanurag | Shell Programming and Scripting | 3 | 05-21-2008 09:39 PM |
| Awk Command to extract Fields | usshell | Shell Programming and Scripting | 1 | 04-30-2008 07:35 AM |
| Extract fields from from this | rnallamothu | Shell Programming and Scripting | 0 | 06-05-2007 04:05 PM |
| how to extract fields for /etc/passwd | mopimp | Shell Programming and Scripting | 6 | 03-30-2006 09:35 PM |
|
|
Submit Tools | LinkBack | Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
Extract duplicate fields in rows
I have a input file with formating:
6000000901 ;36200103 ;h3a01f496 ; 2000123605 ;36218982 ;heefa1328 ; 2000273132 ;36246985 ;h08c5cb71 ; 2000041207 ;36246985 ;heef75497 ; Each fields is seperated by semi-comma. Sometime, the second files is duplicated. So I'd like to extract all the lines which have duplicated second field to a new file. Example for output file: 2000273132 ;36246985 ;h08c5cb71 ; 2000041207 ;36246985 ;heef75497 ; How to do this with awk or shell script ? Other programming, I don't know. |
| Forum Sponsor | ||
|
|
|
#2
|
|||
|
|||
|
Try:
Code:
sort -t ';' -k 2,2 | awk 'dat==$2{print $0}{dat=$2}' file
|
|
#3
|
||||
|
||||
|
Cannot find a better solution right now
Code:
awk 'NR==FNR&&x[$2]++{y[$2];next}$2 in y' FS=";" file file
|
|
#4
|
|||
|
|||
|
awk
hi
code: Code:
awk 'BEGIN{FS=" ;"}
{
if (temp=="")
{
temp=$2
t_line=$0
}
else if (temp==$2)
{
print t_line
print $0
temp=""
t_line=""
}
else
{
temp=$2
t_line=$0
}
}' filename
|
|
#5
|
||||
|
||||
|
summer_cherry,
I meant this (nonconsecutive duplicates): Code:
$ cat file
6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
6000000901 ;36200103 ;h3a01f496 ;
$ awk 'BEGIN{FS=" ;"}
{
if (temp=="")
{
temp=$2
t_line=$0
}
else if (temp==$2)
{
print t_line
print $0
temp=""
t_line=""
}
else
{
temp=$2
t_line=$0
}
}' file
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
$ awk 'NR==FNR&&x[$2]++{y[$2];next}$2 in y' FS=";" file file
6000000901 ;36200103 ;h3a01f496 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
6000000901 ;36200103 ;h3a01f496 ;
|
|
#6
|
|||
|
|||
|
This is my script code
It's very simple shell script cut -f2 -d";" $1 > /tmp/mdn1 sort /tmp/mdn1 | uniq -d > /tmp/mdn2 cat /tmp/mdn2 | while read line; do echo $line > /tmp/mdn3 x=`cut -f1 -d" " /tmp/mdn3` echo $x y=`grep "$x" "$1"` echo $y >> duplicate done rm -f /tmp/mdn* $1 is the input file and duplicate is the output file. |
|
#7
|
|||
|
|||
|
sort
Hi,
I am not quiet sure whether your output should be in the same sequence as original file. If not, why not sorting it first and then use my awk code. 6000000901 ;36200103 ;h3a01f496 ; 2000123605 ;36218982 ;heefa1328 ; 2000273132 ;36246985 ;h08c5cb71 ; 2000041207 ;36246985 ;heef75497 ; 6000000901 ;36200103 ;h3a01f496 ; sort +1 file: 6000000901 ;36200103 ;h3a01f496 ; 6000000901 ;36200103 ;h3a01f496 ; 2000123605 ;36218982 ;heefa1328 ; 2000273132 ;36246985 ;h08c5cb71 ; 2000041207 ;36246985 ;heef75497 ; Then it is ok to use my awk code. |
|||
| Google The UNIX and Linux Forums |