Extract duplicate fields in rows


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract duplicate fields in rows
# 1  
Old 11-28-2007
Extract duplicate fields in rows

I have a input file with formating:

6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;

Each fields is seperated by semi-comma. Sometime, the second files is duplicated. So I'd like to extract all the lines which have duplicated second field to a new file. Example for output file:

2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;

How to do this with awk or shell script ?
Other programming, I don't know.
# 2  
Old 11-28-2007
Try:

Code:
sort -t ';' -k 2,2 | awk 'dat==$2{print $0}{dat=$2}' file

Regards
# 3  
Old 11-28-2007
Cannot find a better solution right now Smilie

Code:
awk 'NR==FNR&&x[$2]++{y[$2];next}$2 in y' FS=";" file file

Use nawk or /usr/xpg4/bin/awk on Solaris.
# 4  
Old 11-29-2007
awk

hi

code:
Code:
awk 'BEGIN{FS=" ;"}
{
if (temp=="")
{
	temp=$2
	t_line=$0
}
else if (temp==$2)
{
	print t_line
	print $0
	temp=""
	t_line=""
}
else
{
	temp=$2
	t_line=$0
}
}' filename

# 5  
Old 11-29-2007
summer_cherry,
I meant this (nonconsecutive duplicates):

Code:
$ cat file
6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
6000000901 ;36200103 ;h3a01f496 ;
$ awk 'BEGIN{FS=" ;"}                                       
{
if (temp=="")
{
temp=$2
t_line=$0
}
else if (temp==$2)
{
print t_line
print $0
temp=""
t_line=""
}
else
{
temp=$2
t_line=$0
}
}' file
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
$ awk 'NR==FNR&&x[$2]++{y[$2];next}$2 in y' FS=";" file file
6000000901 ;36200103 ;h3a01f496 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
6000000901 ;36200103 ;h3a01f496 ;

# 6  
Old 12-01-2007
This is my script code
It's very simple shell script

cut -f2 -d";" $1 > /tmp/mdn1
sort /tmp/mdn1 | uniq -d > /tmp/mdn2
cat /tmp/mdn2 | while read line;
do
echo $line > /tmp/mdn3
x=`cut -f1 -d" " /tmp/mdn3`
echo $x
y=`grep "$x" "$1"`
echo $y >> duplicate
done
rm -f /tmp/mdn*

$1 is the input file and duplicate is the output file.
# 7  
Old 12-02-2007
sort

Hi,

I am not quiet sure whether your output should be in the same sequence as original file.

If not, why not sorting it first and then use my awk code.

6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;
6000000901 ;36200103 ;h3a01f496 ;

sort +1 file:

6000000901 ;36200103 ;h3a01f496 ;
6000000901 ;36200103 ;h3a01f496 ;
2000123605 ;36218982 ;heefa1328 ;
2000273132 ;36246985 ;h08c5cb71 ;
2000041207 ;36246985 ;heef75497 ;

Then it is ok to use my awk code.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract and exclude rows based on duplicate values

Hello I have a file like this: > cat examplefile ghi|NN603762|eee mno|NN607265|ttt pqr|NN613879|yyy stu|NN615002|uuu jkl|NN607265|rrr vwx|NN615002|iii yzA|NN618555|ooo def|NN190486|www BCD|NN628717|ppp abc|NN190486|qqq EFG|NN628717|aaa HIJ|NN628717|sss > I can sort the file by... (5 Replies)
Discussion started by: CHoggarth
5 Replies

2. Shell Programming and Scripting

Extract duplicate rows with conditions

Gents Can you help please. Input file 5490921425 1 7 1310342 54909214251 5490921425 2 1 1 54909214252 5491120937 1 1 3 54911209371 5491120937 3 1 1 54911209373 5491320785 1 ... (4 Replies)
Discussion started by: jiam912
4 Replies

3. Shell Programming and Scripting

Extract and count number of Duplicate rows

Hi All, I need to extract duplicate rows from a file and write these bad records into another file. And need to have a count of these bad records. i have a command awk ' {s++} END { for(i in s) { if(s>1) { print i } } }' ${TMP_DUPE_RECS}>>${TMP_BAD_DATA_DUPE_RECS}... (5 Replies)
Discussion started by: Arun Mishra
5 Replies

4. Shell Programming and Scripting

Delete duplicate rows

Hi, This is a followup to my earlier post him mno klm 20 76 . + . klm_mango unix_00000001; alp fdc klm 123 456 . + . klm_mango unix_0000103; her tkr klm 415 439 . + . klm_mango unix_00001043; abc tvr klm 20 76 . + . klm_mango unix_00000001; abc def klm 83 84 . + . klm_mango... (5 Replies)
Discussion started by: jacobs.smith
5 Replies

5. Shell Programming and Scripting

Extract fields from different rows.

Hi, I have data like below. SID=D6EB96CC0 HID=9C246D6 CSource=xya Cappe=1 Versionc=3670 MAR1=STL MARS2=STL REQ_BUFFER_ENCODING=UTF-8 REQ_BUFFER_ORIG_ENCODING=UTF-8 RESP_BODY_ENCODING=UTF-8 CON_ID=2713 I want to select CSource=xya (18 Replies)
Discussion started by: chetan.c
18 Replies

6. Shell Programming and Scripting

Find duplicate based on 'n' fields and mark the duplicate as 'D'

Hi, In a file, I have to mark duplicate records as 'D' and the latest record alone as 'C'. In the below file, I have to identify if duplicate records are there or not based on Man_ID, Man_DT, Ship_ID and I have to mark the record with latest Ship_DT as "C" and other as "D" (I have to create... (7 Replies)
Discussion started by: machomaddy
7 Replies

7. Shell Programming and Scripting

How to extract duplicate rows

Hi! I have a file as below: line1 line2 line2 line3 line3 line3 line4 line4 line4 line4 I would like to extract duplicate lines (not unique, triplicate or quadruplicate lines). Output will be as below: line2 line2 I would appreciate if anyone can help. Thanks. (4 Replies)
Discussion started by: chromatin
4 Replies

8. HP-UX

How to get Duplicate rows in a file

Hi all, I have written one shell script. The output file of this script is having sql output. In that file, I want to extract the rows which are having multiple entries(duplicate rows). For example, the output file will be like the following way. ... (7 Replies)
Discussion started by: raghu.iv85
7 Replies

9. Shell Programming and Scripting

How to extract duplicate rows

I have searched the internet for duplicate row extracting. All I have seen is extracting good rows or eliminating duplicate rows. How do I extract duplicate rows from a flat file in unix. I'm using Korn shell on HP Unix. For.eg. FlatFile.txt ======== 123:456:678 123:456:678 123:456:876... (5 Replies)
Discussion started by: bobbygsk
5 Replies

10. Shell Programming and Scripting

duplicate rows in a file

hi all can anyone please let me know if there is a way to find out duplicate rows in a file. i have a file that has hundreds of numbers(all in next row). i want to find out the numbers that are repeted in the file. eg. 123434 534 5575 4746767 347624 5575 i want 5575 please help (3 Replies)
Discussion started by: infyanurag
3 Replies
Login or Register to Ask a Question