duplicates lines with one column different

05-05-2008

Registered User

105, 0

Join Date: Jun 2006

Last Activity: 1 October 2008, 10:29 AM EDT

Posts: 105

Thanks Given: 0

Thanked 0 Times in 0 Posts

duplicates lines with one column different

Hi
I have the following lines in a file

SANDI108085FRANKLIN WRAP 7285
SANDI109514ZIPLOC STRETCH N SEAL 7285
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704
SANDI111144RANDOM WEIGHT BRAND 0738
SANDI111144RANDOM WEIGHT BRAND 0739
SANDI113951NBL-NO COMPANY LISTED 7285
SANDI115203HOME BASICS 7285

I need the output like

SANDI108085FRANKLIN WRAP 7285
SANDI109514ZIPLOC STRETCH N SEAL 7285
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704 0738 0739
SANDI113951NBL-NO COMPANY LISTED 7285
SANDI115203HOME BASICS 7285

Note:- SANDI111144RANDOMWEIGHT BRAND has same lines repeated but the last column is different, i am grouping those columns

Is there any way in sed or awk which can fit the logic very easily.

Regards
Dhana

dhanamurthy

View Public Profile for dhanamurthy

Find all posts by dhanamurthy

05-05-2008

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

Code:

awk code --
awk '{
       key=substr($0,1,11)
       if(arr[key])
           {
                 arr[key]=sprintf("%s %s", arr[key], $NF)

           }
       else
            {
                arr[key]=$0
            }
    }
    END {for (i in arr) {print arr[i]} } ' filenamecsadev:/home/jmcnama>
# output
csadev:/home/jmcnama> t.awk
SANDI110198CHOICE DM 0911 0911
SANDI108085FRANKLIN WRAP 7285 7285a
SANDI113951NBL-NO COMPANY LISTED 7285 7285b
SANDI115203HOME BASICS 7285 7285b
SANDI111144RANDOM WEIGHT BRAND 0704 0738 0739 0704a 0738b 0739b
SANDI109514ZIPLOC STRETCH N SEAL 7285 7285a


#input file
csadev:/home/jmcnama> cat filename
SANDI108085FRANKLIN WRAP 7285
SANDI109514ZIPLOC STRETCH N SEAL 7285
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704
SANDI111144RANDOM WEIGHT BRAND 0738
SANDI111144RANDOM WEIGHT BRAND 0739
SANDI113951NBL-NO COMPANY LISTED 7285
SANDI115203HOME BASICS 7285
SANDI108085FRANKLIN WRAP 7285a
SANDI109514ZIPLOC STRETCH N SEAL 7285a
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704a
SANDI111144RANDOM WEIGHT BRAND 0738b
SANDI111144RANDOM WEIGHT BRAND 0739b
SANDI113951NBL-NO COMPANY LISTED 7285b
SANDI115203HOME BASICS 7285b

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

05-05-2008

Registered User

105, 0

Join Date: Jun 2006

Last Activity: 1 October 2008, 10:29 AM EDT

Posts: 105

Thanks Given: 0

Thanked 0 Times in 0 Posts

duplicates lines with one column different

Hi
Your logic works but i have a small correction in my requirement

the input file as i said will look like this
SANDI108085FRANKLIN WRAP 7285
SANDI109514ZIPLOC STRETCH N SEAL 7285
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704
SANDI111144RANDOM WEIGHT BRAND 0738

The output should be in the format.
I need to know whether we can use printf '%s %-51s' in formatting in awk

SANDI FRANKLIN WRAP 108085 7285
SANDI ZIPLOC STRETHC N SEAL 109514 7285
SANDI CHOICE DM 110198 0911
SANDI RANDOM WEIGHT BRAND 111144 0704 0738

Regards

dhanamurthy

View Public Profile for dhanamurthy

Find all posts by dhanamurthy

05-05-2008

Registered User

105, 0

Join Date: Jun 2006

Last Activity: 1 October 2008, 10:29 AM EDT

Posts: 105

Thanks Given: 0

Thanked 0 Times in 0 Posts

duplicates lines with one column different

Hi
Also i have one more question
if we are using like this

arr[key]=sprintf("%s %s", arr[key], $NF)
We are creating a map or relationship between the key and the elements.
I would like to do a file processing of nearly 3GB size file.
If thisis the case will there be any memory issues coming out.

Regards
Dhana

dhanamurthy

View Public Profile for dhanamurthy

Find all posts by dhanamurthy

05-05-2008

Registered User

3,216, 33

Join Date: Mar 2005

Last Activity: 4 September 2020, 7:11 AM EDT

Location: classification algos

Posts: 3,216

Thanks Given: 19

Thanked 33 Times in 30 Posts

Quote:

Originally Posted by dhanamurthy

dhanumurthy

actually in awk - associative arrays; you are creating the associativity between key and value

matrixmadhan

View Public Profile for matrixmadhan

Find all posts by matrixmadhan

05-05-2008

Registered User

105, 0

Join Date: Jun 2006

Last Activity: 1 October 2008, 10:29 AM EDT

Posts: 105

Thanks Given: 0

Thanked 0 Times in 0 Posts

Duplicate lines with last column column different

Hi
Your logic works but i have a small correction in my requirement

the input file as i said will look like this
SANDI108085FRANKLIN WRAP 7285
SANDI109514ZIPLOC STRETCH N SEAL 7285
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704
SANDI111144RANDOM WEIGHT BRAND 0738

The output should be in the format.

SANDI FRANKLIN WRAP 108085 7285
SANDI ZIPLOC STRETHC N SEAL 109514 7285
SANDI CHOICE DM 110198 0911
SANDI RANDOM WEIGHT BRAND 111144 0704 0738

I need to know whether we can use printf '%s %-51s' in formatting in awk

Regards

dhanamurthy

View Public Profile for dhanamurthy

Find all posts by dhanamurthy

05-05-2008

Registered User

3,216, 33

Join Date: Mar 2005

Last Activity: 4 September 2020, 7:11 AM EDT

Location: classification algos

Posts: 3,216

Thanks Given: 19

Thanked 33 Times in 30 Posts

you don't have to post the requirement again

matrixmadhan

View Public Profile for matrixmadhan

Find all posts by matrixmadhan

Shell Programming and Scripting

duplicates lines with one column different

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Discussion started by: as7951

2. Shell Programming and Scripting

Count and keep duplicates in Column

Discussion started by: pshields1984

3. Shell Programming and Scripting

Filter first column duplicates

Discussion started by: giuliangiuseppe

4. Shell Programming and Scripting

Remove duplicates according to their frequency in column

Discussion started by: corfuitl

5. Shell Programming and Scripting

Find duplicates in column 1 and merge their lines (awk?)

Discussion started by: falcox

6. Shell Programming and Scripting

Request to check:remove duplicates only in first column

Discussion started by: manigrover

7. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Discussion started by: script_op2a

8. Shell Programming and Scripting

Delete Duplicates on the basis of two column values.

Discussion started by: neeraj617

9. Shell Programming and Scripting

How can i delete the duplicates based on one column of a line

Discussion started by: rdhanek

10. Shell Programming and Scripting

How to delete lines in a file that have duplicates or derive the lines that aper once

Discussion started by: necroman08