duplicates lines with one column different


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting duplicates lines with one column different
# 1  
Old 05-05-2008
duplicates lines with one column different

Hi
I have the following lines in a file

SANDI108085FRANKLIN WRAP 7285
SANDI109514ZIPLOC STRETCH N SEAL 7285
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704
SANDI111144RANDOM WEIGHT BRAND 0738
SANDI111144RANDOM WEIGHT BRAND 0739
SANDI113951NBL-NO COMPANY LISTED 7285
SANDI115203HOME BASICS 7285


I need the output like

SANDI108085FRANKLIN WRAP 7285
SANDI109514ZIPLOC STRETCH N SEAL 7285
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704 0738 0739
SANDI113951NBL-NO COMPANY LISTED 7285
SANDI115203HOME BASICS 7285

Note:- SANDI111144RANDOMWEIGHT BRAND has same lines repeated but the last column is different, i am grouping those columns

Is there any way in sed or awk which can fit the logic very easily.

Regards
Dhana
# 2  
Old 05-05-2008
Code:
awk code --
awk '{
       key=substr($0,1,11)
       if(arr[key])
           {
                 arr[key]=sprintf("%s %s", arr[key], $NF)

           }
       else
            {
                arr[key]=$0
            }
    }
    END {for (i in arr) {print arr[i]} } ' filenamecsadev:/home/jmcnama>
# output
csadev:/home/jmcnama> t.awk
SANDI110198CHOICE DM 0911 0911
SANDI108085FRANKLIN WRAP 7285 7285a
SANDI113951NBL-NO COMPANY LISTED 7285 7285b
SANDI115203HOME BASICS 7285 7285b
SANDI111144RANDOM WEIGHT BRAND 0704 0738 0739 0704a 0738b 0739b
SANDI109514ZIPLOC STRETCH N SEAL 7285 7285a


#input file
csadev:/home/jmcnama> cat filename
SANDI108085FRANKLIN WRAP 7285
SANDI109514ZIPLOC STRETCH N SEAL 7285
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704
SANDI111144RANDOM WEIGHT BRAND 0738
SANDI111144RANDOM WEIGHT BRAND 0739
SANDI113951NBL-NO COMPANY LISTED 7285
SANDI115203HOME BASICS 7285
SANDI108085FRANKLIN WRAP 7285a
SANDI109514ZIPLOC STRETCH N SEAL 7285a
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704a
SANDI111144RANDOM WEIGHT BRAND 0738b
SANDI111144RANDOM WEIGHT BRAND 0739b
SANDI113951NBL-NO COMPANY LISTED 7285b
SANDI115203HOME BASICS 7285b

# 3  
Old 05-05-2008
duplicates lines with one column different

Hi
Your logic works but i have a small correction in my requirement


the input file as i said will look like this
SANDI108085FRANKLIN WRAP 7285
SANDI109514ZIPLOC STRETCH N SEAL 7285
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704
SANDI111144RANDOM WEIGHT BRAND 0738

The output should be in the format.
I need to know whether we can use printf '%s %-51s' in formatting in awk


SANDI FRANKLIN WRAP 108085 7285
SANDI ZIPLOC STRETHC N SEAL 109514 7285
SANDI CHOICE DM 110198 0911
SANDI RANDOM WEIGHT BRAND 111144 0704 0738


Regards
# 4  
Old 05-05-2008
duplicates lines with one column different

Hi
Also i have one more question
if we are using like this

arr[key]=sprintf("%s %s", arr[key], $NF)
We are creating a map or relationship between the key and the elements.
I would like to do a file processing of nearly 3GB size file.
If thisis the case will there be any memory issues coming out.

Regards
Dhana
# 5  
Old 05-05-2008
Quote:
Originally Posted by dhanamurthy
Hi
Also i have one more question
if we are using like this

arr[key]=sprintf("%s %s", arr[key], $NF)
We are creating a map or relationship between the key and the elements.
I would like to do a file processing of nearly 3GB size file.
If thisis the case will there be any memory issues coming out.

Regards
Dhana

dhanumurthy Smilie

actually in awk - associative arrays; you are creating the associativity between key and value
# 6  
Old 05-05-2008
Duplicate lines with last column column different

Hi
Your logic works but i have a small correction in my requirement


the input file as i said will look like this
SANDI108085FRANKLIN WRAP 7285
SANDI109514ZIPLOC STRETCH N SEAL 7285
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704
SANDI111144RANDOM WEIGHT BRAND 0738

The output should be in the format.

SANDI FRANKLIN WRAP 108085 7285
SANDI ZIPLOC STRETHC N SEAL 109514 7285
SANDI CHOICE DM 110198 0911
SANDI RANDOM WEIGHT BRAND 111144 0704 0738

I need to know whether we can use printf '%s %-51s' in formatting in awk


Regards
# 7  
Old 05-05-2008
you don't have to post the requirement again Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For... (1 Reply)
Discussion started by: as7951
1 Replies

2. Shell Programming and Scripting

Count and keep duplicates in Column

Hi folks, I've got a csv file called test.csv Column A Column B Apples 1900 Apples 1901 Pears 1902 Pears 1903I want to count and keep duplicates in the first column. Desired output Column A Column B Column C Apples 2 1900 Apples ... (5 Replies)
Discussion started by: pshields1984
5 Replies

3. Shell Programming and Scripting

Filter first column duplicates

Dear All, I really enjoy your help or suggestion for resolving an issue. Briefly, I have a file like this: a b c a d e f g h k g h x y z If the first column has the same ID, for example a, just remove it. The output should be this: f g h k g h x y z I was thinking to do it... (11 Replies)
Discussion started by: giuliangiuseppe
11 Replies

4. Shell Programming and Scripting

Remove duplicates according to their frequency in column

Hi all, I have huge a tab-delimited file with the following format and I want to remove the duplicates according to their frequency based on Column2 and Column3. Column1 Column2 Column3 Column4 Column5 Column6 Column7 1 user1 access1 word word 3 2 2 user2 access2 ... (10 Replies)
Discussion started by: corfuitl
10 Replies

5. Shell Programming and Scripting

Find duplicates in column 1 and merge their lines (awk?)

Hi, I have a file (sorted by sort) with 8 tab delimited columns. The first column contains duplicated fields and I need to merge all these identical lines. My input file: comp100002 aaa bbb ccc ddd eee fff ggg comp100003 aba aba aba aba aba aba aba comp100003 fff fff fff fff fff fff fff... (5 Replies)
Discussion started by: falcox
5 Replies

6. Shell Programming and Scripting

Request to check:remove duplicates only in first column

Hi all, I have an input file like this Now I have to remove duplicates only in first column and nothing has to be changed in second and third column. so that output would be Please let me know scripting regarding this (20 Replies)
Discussion started by: manigrover
20 Replies

7. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

8. Shell Programming and Scripting

Delete Duplicates on the basis of two column values.

Hi All, i need ti delete two duplicate processss which are running on the same device type (column 1) and port ID (column 2). here is the sample data p1sc1m1 15517 11325 0 01:00:24 ? 0:00 scagntclsx25octtcp 2967 in3v mvmp01 0 8000 N S 969 750@751@752@ p1sc1m1 15519 11325 0 01:00:24 ? ... (5 Replies)
Discussion started by: neeraj617
5 Replies

9. Shell Programming and Scripting

How can i delete the duplicates based on one column of a line

I have my data something like this (08/03/2009 22:57:42.414)(:) king aaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbb (08/03/2009 22:57:42.416)(:) John cccccccccccc cccccvssssssssss baaaaa (08/03/2009 22:57:42.417)(:) Michael ddddddd tststststtststts (08/03/2009 22:57:42.425)(:) Ravi... (11 Replies)
Discussion started by: rdhanek
11 Replies

10. Shell Programming and Scripting

How to delete lines in a file that have duplicates or derive the lines that aper once

Input: a b b c d d I need: a c I know how to get this (the lines that have duplicates) : b d sort file | uniq -d But i need opossite of this. I have searched the forum and other places as well, but have found solution for everything except this variant of the problem. (3 Replies)
Discussion started by: necroman08
3 Replies
Login or Register to Ask a Question