Duplicate records


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Duplicate records
# 8  
Old 08-19-2016
Quote:
Originally Posted by jiam912
Dear R. Singh
Thanks a lot
And for this case?.
input
Code:
3099753489 3
3099753489 5
3099753489 7
3101954341 12
3101954341 14
3102153285 3
3102153285 5
3102153297 3
3102153297 5
3102153297 8

output
Code:
3099753489 3 5 7
3101954341 12 14
3102153285 3 5
3102153297 3 5 8

Hello jiam912,

Yes, it works for 2 fields too as follows too, I just added 2nd solution in case you have more than 2 fields into your Input_file.
Code:
awk 'FNR==NR{if(NF>2){for(i=3;i<=NF;i++){Q=Q?Q OFS $i:$i}} else {Q=$2};A[$1]=A[$1]?A[$1] OFS Q:Q;next} ($1 in A){print $1 OFS A[$1];delete A[$1]}  Input_file  Input_file

Output will be as follows.
Code:
3099753489 3 5 7
3101954341 12 14
3102153285 3 5
3102153297 3 5 8

If you have any queries please do let us know in details.

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 9  
Old 08-19-2016
Dear R. Singh

Appreciate your help.

It works perfectly.
# 10  
Old 08-19-2016
Quote:
Originally Posted by jiam912
Dear R. Singh

Thanks a lot

And for this case?.

input

Code:
3099753489 3
3099753489 5
3099753489 7
3101954341 12
3101954341 14
3102153285 3
3102153285 5
3102153297 3
3102153297 5
3102153297 8

output

Code:
3099753489 3 5 7
3101954341 12 14
3102153285 3 5
3102153297 3 5 8

Hi jiam912,
Did you try the code RudiC suggested in post #6 in this thread? Or, if his suggestion gave you a syntax error:
Code:
awk '
NR == 1 || $1 != LAST {
	printf "%s%s", (NR==1?"":RS), LAST = $1
}
{	printf " %s", $2
}
END {	print _
}' file

which should work as long as there are only two fields per input line and all lines with the same first field value are adjacent in your input file. If your real input files (like your samples) meet the above requirements, this should be faster than RavinderSingh13's suggestion because it only needs to read your input file once.
This User Gave Thanks to Don Cragun For This Post:
# 11  
Old 08-19-2016
Quote:
Originally Posted by Don Cragun
Hi jiam912,
Did you try the code RudiC suggested in post #6 in this thread? Or, if his suggestion gave you a syntax error:
Code:
awk '
NR == 1 || $1 != LAST {
    printf "%s%s", (NR==1?"":RS), LAST = $1
}
{    printf " %s", $2
}
END {    print _
}' file

which should work as long as there are only two fields per input line and all lines with the same first field value are adjacent in your input file. If your real input files (like your samples) meet the above requirements, this should be faster than RavinderSingh13's suggestion because it only needs to read your input file once.
Hello Don/jiam912,

Not much sure about how much fast this following solution may be, following solution will do:
I- Will read the Input_file once.
II- Will take care of sequence of output as per Input_file only.
III- Will take care of requirement in case more than 2 fields are there for a value of 1st field too.
Code:
awk 'FNR==NR{if(NF>2){for(i=3;i<=NF;i++){Q=Q?Q OFS $i:$i}} else {Q=$2};A[$1]=A[$1]?A[$1] OFS Q:Q;if(!C[$1]){D[++j]=$1}} END{for(k=1;k<=j;k++){if(A[D[k]]){print D[k] OFS A[D[k]]};delete A[D[k]]}}'  Input_file

Output will be as follows.
Code:
3099753489 3 5
3101954341 5 21 31 34 56 78 14
3102153285 3 5
3102153297 3 5

Where Input_file is as follows.
Code:
cat Input_file
3099753489 3
3099753489 5
3101954341 12 21 31  34 56 78
3101954341 14
3102153285 3
3102153285 5
3102153297 3
3102153297 5

EDIT: Adding a non-one liner form of solution too now.
Code:
awk 'FNR==NR{
                if(NF>2){
                                for(i=3;i<=NF;i++){
                                                        Q=Q?Q OFS $i:$i
                                                  }
                        }
                else    {
                                Q=$2
                        };
                A[$1]=A[$1]?A[$1] OFS Q:Q;
                if(!C[$1])                        {
                                                        D[++j]=$1
                                                  }
                        }
     END    {
                for(k=1;k<=j;k++)                 {
                                                        if(A[D[k]]){
                                                                        print D[k] OFS A[D[k]]
                                                                   };
                                                        delete A[D[k]]
                                                  }
            }
     '   Input_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 08-19-2016 at 09:03 AM.. Reason: Adding a non-one liner form of solution successfully too now.
This User Gave Thanks to RavinderSingh13 For This Post:
# 12  
Old 08-19-2016
In case you have more than 2 fields, try

Code:
awk 'NR == 1 || $1 != LAST {printf "%s%s", (NR==1?"":RS), LAST = $1} {sub ("^" LAST, _); printf "%s", $0} END {print _} ' file

This User Gave Thanks to RudiC For This Post:
# 13  
Old 08-19-2016
Try

Quote:
Originally Posted by jiam912
Dear R. Singh

Thanks a lot

And for this case?.

input

Code:
3099753489 3
3099753489 5
3099753489 7
3101954341 12
3101954341 14
3102153285 3
3102153285 5
3102153297 3
3102153297 5
3102153297 8

output

Code:
3099753489 3 5 7
3101954341 12 14
3102153285 3 5
3102153297 3 5 8


Input

Code:
[akshay@localhost tmp]$ cat f
3099753489 3
3099753489 5
3099753489 7
3101954341 12
3101954341 14
3102153285 3
3102153285 5
3102153297 3
3102153297 5
3102153297 8


Output

Code:
[akshay@localhost tmp]$ awk '$1 in A{A[$1]=A[$1] OFS $2; next}{ O[++o]=$1; A[$1]=$2}END{for(i=1; i in O; i++)print O[i],A[O[i]]}' f
3099753489 3 5 7
3101954341 12 14
3102153285 3 5
3102153297 3 5 8


Readable version

Code:
awk '
$1 in A{
       A[$1]=A[$1] OFS $2 
       next
}
{ 
       O[++o]=$1; 
       A[$1]=$2
}
END{
      for(i=1; i in O; i++)
            print O[i],A[O[i]]
}' f


Last edited by Akshay Hegde; 08-19-2016 at 09:29 AM.. Reason: to simplify code
This User Gave Thanks to Akshay Hegde For This Post:
# 14  
Old 08-19-2016
With any shell using POSIX shell syntax, you could also do it without awk:
Code:
#!/bin/ksh
{	read -r last rest
	printf '%s %s' "$last" "$rest"
	while read -r key rest
	do	[ "$key" = "$last" ] && printf ' %s' "$rest" ||
		    printf '\n%s %s' "$key" "$rest"
		last="$key"
	done
	echo
} < input

which also works with two or more fields/line as long as all lines in the input file with a given key are adjacent.
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Duplicate records

Gents, Please give a help file --BAD STATUS NOT RESHOOTED-- *** VP 41255/51341 in sw 2973 *** VP 41679/51521 in sw 2973 *** VP 41687/51653 in sw 2973 *** VP 41719/51629 in sw 2976 --BAD COG NOT RESHOOTED-- *** VP 41689/51497 in sw 2974 *** VP 41699/51677 in sw 2974 *** VP... (18 Replies)
Discussion started by: jiam912
18 Replies

2. Shell Programming and Scripting

How to keep the last 2 records from duplicate entries?

Gents, Please how I can get only the last 2 records from repetead values, from column 2 input 1 1011 1 1011 1 1012 1 1012 1 5001 1 5001 1 5002 1 5002 1 5003 1 5003 1 7001 1 7001 1 7002 1 7002 (2 Replies)
Discussion started by: jiam912
2 Replies

3. Shell Programming and Scripting

Remove duplicate records

Hi, i am working on a script that would remove records or lines in a flat file. The only difference in the file is the "NOT NULL" word. Please see below example of the input file. INPUT FILE:> CREATE a ( TRIAL_CLIENT NOT NULL VARCHAR2(60), TRIAL_FUND NOT NULL... (3 Replies)
Discussion started by: reignangel2003
3 Replies

4. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

5. UNIX for Dummies Questions & Answers

Need to keep duplicate records

Consider my input is 10 10 20 then, uniq -u will give 20 and uniq -dwill return 10. But i need the output as , 10 10 How we can achieve this? Thanks (4 Replies)
Discussion started by: pandeesh
4 Replies

6. UNIX for Dummies Questions & Answers

Getting non-duplicate records

Hi, I have a file with these records abc xyz xyz pqr uvw cde cde In my o/p file , I want all the non duplicate rows to be shown. o/p abc pqr uvw Any suggestions how to do this? Thanks for the help. rs (2 Replies)
Discussion started by: rs123
2 Replies

7. Shell Programming and Scripting

Remove Duplicate Records

Hi frinds, Need your help. item , color ,desc ==== ======= ==== 1,red ,abc 1,red , a b c 2,blue,x 3,black,y 4,brown,xv 4,brown,x v 4,brown, x v I have to elemnet the duplicate rows on the basis of item. the final out put will be 1,red ,abc (6 Replies)
Discussion started by: imipsita.rath
6 Replies

8. Shell Programming and Scripting

combine duplicate records

I have a .DAT file like below 23666483030000653-B94030001OLFXXX000000120081227 23797049900000654-E71060001OLFXXX000000220081227 23699281320000655 E71060002OLFXXX000000320081227 22885068900000652 B86860003OLFXXX592123320081227 22885068900000652 B86860003ODL-SP592123420081227... (8 Replies)
Discussion started by: kshuser
8 Replies

9. Shell Programming and Scripting

find duplicate records... again

Hi all: Let's suppose I have a file like this (but with many more records). XX ME 342 8688 2006 7 6 3c 60.029 -38.568 2901 0001 74 4 7603 8 969.8 958.4 3.6320 34.8630 985.5 973.9 3.6130 34.8600 998.7 986.9 3.6070 34.8610 1003.6 991.7 ... (4 Replies)
Discussion started by: rleal
4 Replies

10. Shell Programming and Scripting

Records Duplicate

Hi Everyone, I have a flat file of 1000 unique records like following : For eg Andy,Flower,201-987-0000,12/23/01 Andrew,Smith,101-387-3400,11/12/01 Ani,Ross,401-757-8640,10/4/01 Rich,Finny,245-308-0000,2/27/06 Craig,Ford,842-094-8740,1/3/04 . . . . . . Now I want to duplicate... (9 Replies)
Discussion started by: ganesh123
9 Replies
Login or Register to Ask a Question