forming duplicate rows based on value of a key


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers forming duplicate rows based on value of a key
# 1  
Old 03-07-2010
forming duplicate rows based on value of a key

if the key (A or B or ...others) has 4 in its 3rd column the 1st A row has to form 4 dupicates along with the all the values of A in 4th column (2.9, 3.8, 4.2) .
Hope I explain the question clearly.

Cheers
Ruby

input
Code:
"A"        1           4           2.9
"A"        2           5           3.8
"A"        3           3           4.2
"B"        1           3           3.6

output
Code:
"A"        1           2.9
"A"        1           3.8
"A"        1           4.2
"A"        1           -
"A"        2           2.9
"A"        2           3.8
"A"        2           4.2
"A"        2           -
"A"        2           -
"A"        3           2.9
"A"        3           3.8
"A"        3           4.2
"B"        1           3.6
"B"        1           -
"B"        1           -


Last edited by ruby_sgp; 03-07-2010 at 09:22 AM..
# 2  
Old 03-07-2010
Could you please explain why:

Code:
"B"        1           3.6
"B"        1           3.6
"B"        1           3.6

And not:

Code:
"B"        1           3.6
"B"        1           -
"B"        1           -

# 3  
Old 03-07-2010
mistake

sorry my bad. you are right. it is

Code:
"B"        1           3.6
"B"        1           -
"B"        1           -

# 4  
Old 03-07-2010
Use gawk, nawk or sawk on Solaris:

Code:
awk 'END {
  for (i = 0; ++i <= NR;) {
    split(n[i], t, SUBSEP); K = t[1]
    K == pk || N = split(v[K], tt)
    for (j = 0; ++j <= k[n[i]];)
      print t[1], t[2], j <= N ? tt[j] : "-" 
    }	 
    pk = K
  }
{ 
  v[$1] = $1 in v ? v[$1] FS $NF : $NF
  k[$1, $2] = $3; n[NR] = $1 SUBSEP $2
  }' OFS='\t' infile


Last edited by radoulov; 03-07-2010 at 09:49 AM.. Reason: code formatting
# 5  
Old 03-07-2010
thnx

thank you so much. working great.

---------- Post updated at 08:54 AM ---------- Previous update was at 08:51 AM ----------
# 6  
Old 03-24-2010
hi small change in input1. Could you please modify the code based on the following input. Thanx in advance
Ruby

Code:
A	1	4	2.9	X/X	ggfgg
A	2	5	3.8	Y/Y	ghfghf
A	3	3	4.2	Z/Z	gg667
A	NULL	null	2.9	null	null
A	null	null	10.4	null	null
B	1	3	3.6	N/N	hjjghjg

output
Code:
A	1	2.9	X/X	ggfgg
A	1	3.8	X/X	ggfgg
A	1	4.2	X/X	ggfgg
A	1	2.9	X/X	ggfgg
A	-	10.4	-	ggfgg
A	2	2.9	Y/Y	ghfghf
A	2	3.8	Y/Y	ghfghf
A	2	4.2	Y/Y	ghfghf
A	2	2.9	Y/Y	ghfghf
A	2	10.4	Y/Y	ghfghf
A	3	2.9	Z/Z	gg667
A	3	3.8	Z/Z	gg667
A	3	4.2	Z/Z	gg667
A	-	2.9	-	gg667
A	-	10.4	-	gg667
B	1	3.6	N/N	hjjghjg
B	1	-	-	hjjghjg
B	1	-	-	hjjghjg


Last edited by ruby_sgp; 03-24-2010 at 08:48 PM..
# 7  
Old 03-25-2010
Quote:
Originally Posted by ruby_sgp
hi small change in input1. Could you please modify the code based on the following input.
[...]
Try this:
Code:
awk -F'\t' 'END {
  for (i = 0; ++i <= c;) {
    rnf = split(r[i], t)
    pt1 == t[1] || vnf = split(v[t[1]], tt)
    max = vnf > t[3] ? vnf : t[3]
    for (j = 0; ++j <= max;) 
      print t[1], (t[3] >= j ? t[2] : "-"), \
      tt[j] ? tt[j] : "-", (t[3] >= j ? t[5] : "-"), t[6]
    pt1 = t[1]
    }
  }
{
  v[$1] = v[$1] ? v[$1] FS $4 : $4
  $3 + 0 > 0 && r[++c] = $0
  }' OFS='\t' infile

However, the fourth field in the last three lines is a bit different,
because I don't understand the logic in that point Smilie
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract and exclude rows based on duplicate values

Hello I have a file like this: > cat examplefile ghi|NN603762|eee mno|NN607265|ttt pqr|NN613879|yyy stu|NN615002|uuu jkl|NN607265|rrr vwx|NN615002|iii yzA|NN618555|ooo def|NN190486|www BCD|NN628717|ppp abc|NN190486|qqq EFG|NN628717|aaa HIJ|NN628717|sss > I can sort the file by... (5 Replies)
Discussion started by: CHoggarth
5 Replies

2. Shell Programming and Scripting

Convert rows to columns based on key and count

Team, I am having requirement to convert rows to columns Input is: key ,count, id1, pulse1, id2, pulse2 ,id3, pulse3 12, 2 , 14 , 56 , 15, 65 13, 3, 12, 32, 14, 23, 18, 54 22, 1 , 32, 42 Expected Out put: key, id,pulse 12, 14, 56 12, 15, 65 13 ,12, 32 13, 14 ,23 13, 18 ,54 22 ,32,... (3 Replies)
Discussion started by: syam1406
3 Replies

3. Shell Programming and Scripting

Remove duplicate rows based on one column

Dear members, I need to filter a file based on the 8th column (that is id), and does not mather the other columns, because I want just one id (1 line of each id) and remove the duplicates lines based on this id (8th column), and does not matter wich duplicate will be removed. example of my file... (3 Replies)
Discussion started by: clarissab
3 Replies

4. Shell Programming and Scripting

Find duplicate based on 'n' fields and mark the duplicate as 'D'

Hi, In a file, I have to mark duplicate records as 'D' and the latest record alone as 'C'. In the below file, I have to identify if duplicate records are there or not based on Man_ID, Man_DT, Ship_ID and I have to mark the record with latest Ship_DT as "C" and other as "D" (I have to create... (7 Replies)
Discussion started by: machomaddy
7 Replies

5. UNIX for Dummies Questions & Answers

Remove duplicate rows when >10 based on single column value

Hello, I'm trying to delete duplicates when there are more than 10 duplicates, based on the value of the first column. e.g. a 1 a 2 a 3 b 1 c 1 gives b 1 c 1 but requires 11 duplicates before it deletes. Thanks for the help Video tutorial on how to use code tags in The UNIX... (11 Replies)
Discussion started by: informaticist
11 Replies

6. Shell Programming and Scripting

Duplicate rows in CSV files based on values

I am new to this forum and this is my first post. I am looking at an old post with exactly the same name. Can not paste URL because I do not have 5 posts My requirement is exactly opposite. I want to get rid of duplicate rows and try to append the values of columns in those rows ... (10 Replies)
Discussion started by: vbhonde11
10 Replies

7. Shell Programming and Scripting

how to delete duplicate rows based on last column

hii i have a huge amt of data stored in a file.Here in this file i need to remove duplicates rows in such a way that the last column has different data & i must check for greatest among last colmn data & print the largest data along with other entries but just one of other duplicate entries is... (16 Replies)
Discussion started by: reva
16 Replies

8. Shell Programming and Scripting

Duplicate rows in CSV files based on values

I want to duplicate a row if found two or more values in a particular column for corresponding row which is delimitted by comma. Input abc,line one,value1 abc,line two, value1, value2 abc,line three,value1 needs to converted to abc,line one,value1 abc,line two, value1 abc,line... (8 Replies)
Discussion started by: Incrediblian
8 Replies

9. Shell Programming and Scripting

How to delete duplicate records based on key

For example suppose I have a file which contains data as: $cat data 800,2 100,9 700,3 100,9 200,8 100,3 Now I want the output as 200,8 700,3 800,2 Key is first three characters, I don't want any reords which are having duplicate keys. Like sort +0.0 -0.3 data can we use... (9 Replies)
Discussion started by: sumitc
9 Replies

10. UNIX for Dummies Questions & Answers

Remove duplicate rows of a file based on a value of a column

Hi, I am processing a file and would like to delete duplicate records as indicated by one of its column. e.g. COL1 COL2 COL3 A 1234 1234 B 3k32 2322 C Xk32 TTT A NEW XX22 B 3k32 ... (7 Replies)
Discussion started by: risk_sly
7 Replies
Login or Register to Ask a Question