Visit Our UNIX and Linux User Community


combine duplicate records


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting combine duplicate records
# 1  
Old 10-26-2009
combine duplicate records

I have a .DAT file like below

Code:
23666483030000653-B94030001OLFXXX000000120081227
23797049900000654-E71060001OLFXXX000000220081227
23699281320000655 E71060002OLFXXX000000320081227
22885068900000652 B86860003OLFXXX592123320081227
22885068900000652 B86860003ODL-SP592123420081227
22885068900000652-B94030001ODL-CH592123520081227
 


I would like to combine duplicate records into a single record with the new single record containing additional fields appending at the end of line record (for example see below ) . In the example file above, the first field is the unique field. So I would like my output to be like below:

If any duplicate record exists in this case 288506890 has 3 records check only for the position ODL-SP & ODL-CH
if ODL-SP exists then get the amount position 34:40
if ODL-CH exists then get the amount position 34:40

then get/append the final record (288506890) for this no is like below, if no duplcate record exists just create the line as is
Code:
2885068900000652 B86860003OLFXXX592123320081227  5921234 5921235

Code:
ben_type=`echo $line|cut -c28-33`
(you get ODL-SP spouse, ODL-CH child)
amount=`echo $line|cut -c34-40`
(you get spouse=5921234, child=5921235)

Code:
23666483030000653-B94030001OLFXXX000000120081227
23797049900000654-E71060001OLFXXX000000220081227
23699281320000655 E71060002OLFXXX000000320081227
22885068900000652 B86860003OLFXXX592123320081227 5921234 5921235




Can someone please please help me with a solution using Unix ksh scripting Thank you.


Last edited by vgersh99; 10-26-2009 at 04:26 PM.. Reason: code tags, please!
# 2  
Old 10-26-2009
With a name like kshuser and asked for ksh only solution, I assume you use ksh93.
Code:
while read x; do
  k=${x:0:17}
  if [ "$k" = "$ok" ]; then
    p="$p ${x:33:7}"
  else
    [ -n "$p" ] && echo "$p"
    ok=$k
    p=$x
  fi
done
echo "$p"

If you can add a blank line at the end of file (e.g.
Code:
(cat file;echo)

), you can omit the last echo outside the loop.
# 3  
Old 10-26-2009
I am kind of new to KSH scripting.

FILE1.DAT has the following records.

Code:
23666483030000653-B94030001ODL-Ch000000120081227
23797049900000654-E71060001OLFXXX000000220081227
23699281320000655 E71060002OLFXXX000000320081227
22885068900000652 B86860003OLFXXX592123320081227
22885068900000652 B86860003ODL-Sp592123420081227
22885068900000652-B94030001ODL-Ch592123520081227

i am writing the script like below and getting the outfile mondaytest.txt

Code:
rec_cnt=1
while read line
do
no=`echo $line|cut -c2-10`
ben_type=`echo $line|cut -c28-33`
amount=`echo $line|cut -c34-40`
if [[ $rec_cnt -eq 1 ]]
then
echo $line >> mondaytest.txt
prior_no=$no
prev_line=$line
else
if [[ $no -eq $prior_no ]]
then
if [[ $ben_type = "ODL-SP" ]]
then
spouse_amt=$amount
prev_line="$prev_line $spouse_amt"
elif [[ $ben_type = "ODL-CH" ]]
then
child_amt=$amount
#prev_line="$prev_line $spouse_amt"
else 
echo 'invalid ben_type'
fi
#echo $prev_line $spouse_amt $child_amt>> mondaytest.txt
echo 'Insert_1' $prev_line $child_amt >> mondaytest.txt
else
echo 'Insert_2' $line >> mondaytest.txt
prev_line=$line
fi
spouse_amt=""
child_amt=""
fi 
(( rec_cnt=rec_cnt + 1 )) 
prior_no=$no
done <FILE.DAT

OUT FILE mondaytest.txt
Code:
23666483030000653-B94030001ODL-Ch000000120081227
23797049900000654-E71060001OLFXXX000000220081227
23699281320000655 E71060002OLFXXX000000320081227
22885068900000652 B86860003OLFXXX592123320081227
22885068900000652 B86860003OLFXXX592123320081227 5921234
22885068900000652 B86860003OLFXXX592123320081227 5921234 5921235

I want the outfile should have only 4 records like this.
Code:
23666483030000653-B94030001ODL-Ch000000120081227
23797049900000654-E71060001OLFXXX000000220081227
23699281320000655 E71060002OLFXXX000000320081227
22885068900000652 B86860003OLFXXX592123320081227 5921234 5921235

Can you please correct me in my code to get the above expected result.

Quote:
Originally Posted by binlib
With a name like kshuser and asked for ksh only solution, I assume you use ksh93.
Code:
while read x; do
  k=${x:0:17}
  if [ "$k" = "$ok" ]; then
    p="$p ${x:33:7}"
  else
    [ -n "$p" ] && echo "$p"
    ok=$k
    p=$x
  fi
done
echo "$p"

If you can add a blank line at the end of file (e.g.
Code:
(cat file;echo)

), you can omit the last echo outside the loop.

Last edited by vgersh99; 10-26-2009 at 04:23 PM.. Reason: code tags, please!
# 4  
Old 10-26-2009
Quote:
Originally Posted by binlib
If you can add a blank line at the end of file (e.g.
Code:
(cat file;echo)

), you can omit the last echo outside the loop.
E.g. like so?

Code:
#!/bin/ksh
 echo|cat infile -|while read line; do
  case ${line:27:6} in
    ODL-SP|ODL-CH)
        prev+=" ${line:33:7}" ;;
    *)  [[ -n $prev ]] && print $prev
        prev=$line ;;
  esac
done > outfile

# 5  
Old 10-26-2009
But when i ran your code it is generating the outfile file but no changes compared to INPUT file.

Code:
>echo|cat FILE2.DAT -|while read line
> do
> case {$line:27:6} in
> ODL-SP|ODL-CH)
> prev+=" ${line:33:7}" ;;
> *) [[ -n $prev ]] && print $prev
> prev=$line ;;
> esac
> done > OUT.txt

OUT.txt ...is the same as input file FILE2.DAT

Code:
23666483030000653-B94030001OLFXXX000000120081227
23797049900000654-E71060001OLFXXX000000220081227
23699281320000655 E71060002OLFXXX000000320081227
22885068900000652 B86860003OLFXXX592123320081227
22885068900000652 B86860003ODL-SP592123420081227
22885068900000652-B94030001ODL-CH592123520081227


Last edited by vgersh99; 10-26-2009 at 08:20 PM.. Reason: added code tags - charged 5K bits
# 6  
Old 10-26-2009
Code:
${line:27:6}

# 7  
Old 10-26-2009
What about:
Code:
# awk 'NF{a[substr($0,0,9)]=(a[substr($0,0,9)])?a[substr($0,0,9)] FS substr($0,34,7):$0}END{for(i in a)print a[i]}' file
22885068900000652 B86860003OLFXXX592123320081227 5921234 5921235
23797049900000654-E71060001OLFXXX000000220081227
23666483030000653-B94030001OLFXXX000000120081227
23699281320000655 E71060002OLFXXX000000320081227


Previous Thread | Next Thread
Test Your Knowledge in Computers #986
Difficulty: Medium
The X Window System is installed by default in FreeBSD.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Duplicate records

Gents, Please give a help file --BAD STATUS NOT RESHOOTED-- *** VP 41255/51341 in sw 2973 *** VP 41679/51521 in sw 2973 *** VP 41687/51653 in sw 2973 *** VP 41719/51629 in sw 2976 --BAD COG NOT RESHOOTED-- *** VP 41689/51497 in sw 2974 *** VP 41699/51677 in sw 2974 *** VP... (18 Replies)
Discussion started by: jiam912
18 Replies

2. Shell Programming and Scripting

Duplicate records

Gents, I have a file which contends duplicate records in column 1, but the values in column 2 are different. 3099753489 3 3099753489 5 3101954341 12 3101954341 14 3102153285 3 3102153285 5 3102153297 3 3102153297 5 I will like to get something like this: output desired... (16 Replies)
Discussion started by: jiam912
16 Replies

3. Shell Programming and Scripting

How to keep the last 2 records from duplicate entries?

Gents, Please how I can get only the last 2 records from repetead values, from column 2 input 1 1011 1 1011 1 1012 1 1012 1 5001 1 5001 1 5002 1 5002 1 5003 1 5003 1 7001 1 7001 1 7002 1 7002 (2 Replies)
Discussion started by: jiam912
2 Replies

4. Shell Programming and Scripting

Remove duplicate records

Hi, i am working on a script that would remove records or lines in a flat file. The only difference in the file is the "NOT NULL" word. Please see below example of the input file. INPUT FILE:> CREATE a ( TRIAL_CLIENT NOT NULL VARCHAR2(60), TRIAL_FUND NOT NULL... (3 Replies)
Discussion started by: reignangel2003
3 Replies

5. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

6. UNIX for Dummies Questions & Answers

Need to keep duplicate records

Consider my input is 10 10 20 then, uniq -u will give 20 and uniq -dwill return 10. But i need the output as , 10 10 How we can achieve this? Thanks (4 Replies)
Discussion started by: pandeesh
4 Replies

7. Programming

[Solved] duplicate records

Hi I have a table which has 2 columns - id and amount. If there duplicate rows , as in id and amount are same , then i have to update the table in such away that only one row should contain amount and all rows should become zero for that id. eg id amount 1 100 1 100 2 200 1... (5 Replies)
Discussion started by: megha2525
5 Replies

8. UNIX for Dummies Questions & Answers

Getting non-duplicate records

Hi, I have a file with these records abc xyz xyz pqr uvw cde cde In my o/p file , I want all the non duplicate rows to be shown. o/p abc pqr uvw Any suggestions how to do this? Thanks for the help. rs (2 Replies)
Discussion started by: rs123
2 Replies

9. Shell Programming and Scripting

Remove duplicate records

I want to remove the records based on duplicate. I want to remove if two or more records exists with combination fields. Those records should not come once also file abc.txt ABC;123;XYB;HELLO; ABC;123;HKL;HELLO; CDE;123;LLKJ;HELLO; ABC;123;LSDK;HELLO; CDF;344;SLK;TEST key fields are... (7 Replies)
Discussion started by: svenkatareddy
7 Replies

10. Shell Programming and Scripting

Records Duplicate

Hi Everyone, I have a flat file of 1000 unique records like following : For eg Andy,Flower,201-987-0000,12/23/01 Andrew,Smith,101-387-3400,11/12/01 Ani,Ross,401-757-8640,10/4/01 Rich,Finny,245-308-0000,2/27/06 Craig,Ford,842-094-8740,1/3/04 . . . . . . Now I want to duplicate... (9 Replies)
Discussion started by: ganesh123
9 Replies

Featured Tech Videos