The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Google UNIX.COM


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
finding duplicates in columns and removing lines totus Shell Programming and Scripting 17 4 Days Ago 08:27 AM
How to check Null values in a file column by column if columns are Not NULLs Mandab Shell Programming and Scripting 7 03-15-2008 06:57 AM
(sed) parsing insert statement column that crosses multiple lines jjordan Shell Programming and Scripting 3 10-08-2007 09:23 PM
Add a column at the end of all the lines in a file ruthless HP-UX 6 01-20-2006 11:48 AM

Reply
 
Submit Tools LinkBack Thread Tools Search this Thread Display Modes
  #1  
Old 05-05-2008
Read Only
 

Join Date: Jun 2006
Posts: 105
duplicates lines with one column different

Hi
I have the following lines in a file

SANDI108085FRANKLIN WRAP 7285
SANDI109514ZIPLOC STRETCH N SEAL 7285
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704
SANDI111144RANDOM WEIGHT BRAND 0738
SANDI111144RANDOM WEIGHT BRAND 0739
SANDI113951NBL-NO COMPANY LISTED 7285
SANDI115203HOME BASICS 7285


I need the output like

SANDI108085FRANKLIN WRAP 7285
SANDI109514ZIPLOC STRETCH N SEAL 7285
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704 0738 0739
SANDI113951NBL-NO COMPANY LISTED 7285
SANDI115203HOME BASICS 7285

Note:- SANDI111144RANDOMWEIGHT BRAND has same lines repeated but the last column is different, i am grouping those columns

Is there any way in sed or awk which can fit the logic very easily.

Regards
Dhana
Reply With Quote
Forum Sponsor
  #2  
Old 05-05-2008
...@...
 

Join Date: Feb 2004
Location: NM
Posts: 4,298
Code:
awk code --
awk '{
       key=substr($0,1,11)
       if(arr[key])
           {
                 arr[key]=sprintf("%s %s", arr[key], $NF)

           }
       else
            {
                arr[key]=$0
            }
    }
    END {for (i in arr) {print arr[i]} } ' filenamecsadev:/home/jmcnama>
# output
csadev:/home/jmcnama> t.awk
SANDI110198CHOICE DM 0911 0911
SANDI108085FRANKLIN WRAP 7285 7285a
SANDI113951NBL-NO COMPANY LISTED 7285 7285b
SANDI115203HOME BASICS 7285 7285b
SANDI111144RANDOM WEIGHT BRAND 0704 0738 0739 0704a 0738b 0739b
SANDI109514ZIPLOC STRETCH N SEAL 7285 7285a


#input file
csadev:/home/jmcnama> cat filename
SANDI108085FRANKLIN WRAP 7285
SANDI109514ZIPLOC STRETCH N SEAL 7285
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704
SANDI111144RANDOM WEIGHT BRAND 0738
SANDI111144RANDOM WEIGHT BRAND 0739
SANDI113951NBL-NO COMPANY LISTED 7285
SANDI115203HOME BASICS 7285
SANDI108085FRANKLIN WRAP 7285a
SANDI109514ZIPLOC STRETCH N SEAL 7285a
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704a
SANDI111144RANDOM WEIGHT BRAND 0738b
SANDI111144RANDOM WEIGHT BRAND 0739b
SANDI113951NBL-NO COMPANY LISTED 7285b
SANDI115203HOME BASICS 7285b
Reply With Quote
  #3  
Old 05-05-2008
Read Only
 

Join Date: Jun 2006
Posts: 105
duplicates lines with one column different

Hi
Your logic works but i have a small correction in my requirement


the input file as i said will look like this
SANDI108085FRANKLIN WRAP 7285
SANDI109514ZIPLOC STRETCH N SEAL 7285
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704
SANDI111144RANDOM WEIGHT BRAND 0738

The output should be in the format.
I need to know whether we can use printf '%s %-51s' in formatting in awk


SANDI FRANKLIN WRAP 108085 7285
SANDI ZIPLOC STRETHC N SEAL 109514 7285
SANDI CHOICE DM 110198 0911
SANDI RANDOM WEIGHT BRAND 111144 0704 0738


Regards
Reply With Quote
  #4  
Old 05-05-2008
Read Only
 

Join Date: Jun 2006
Posts: 105
duplicates lines with one column different

Hi
Also i have one more question
if we are using like this

arr[key]=sprintf("%s %s", arr[key], $NF)
We are creating a map or relationship between the key and the elements.
I would like to do a file processing of nearly 3GB size file.
If thisis the case will there be any memory issues coming out.

Regards
Dhana
Reply With Quote
  #5  
Old 05-05-2008
Technorati Master
 

Join Date: Mar 2005
Location: Large scale systems...
Posts: 2,610
Quote:
Originally Posted by dhanamurthy View Post
Hi
Also i have one more question
if we are using like this

arr[key]=sprintf("%s %s", arr[key], $NF)
We are creating a map or relationship between the key and the elements.
I would like to do a file processing of nearly 3GB size file.
If thisis the case will there be any memory issues coming out.

Regards
Dhana

dhanumurthy

actually in awk - associative arrays; you are creating the associativity between key and value
Reply With Quote
  #6  
Old 05-05-2008
Read Only
 

Join Date: Jun 2006
Posts: 105
Duplicate lines with last column column different

Hi
Your logic works but i have a small correction in my requirement


the input file as i said will look like this
SANDI108085FRANKLIN WRAP 7285
SANDI109514ZIPLOC STRETCH N SEAL 7285
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704
SANDI111144RANDOM WEIGHT BRAND 0738

The output should be in the format.

SANDI FRANKLIN WRAP 108085 7285
SANDI ZIPLOC STRETHC N SEAL 109514 7285
SANDI CHOICE DM 110198 0911
SANDI RANDOM WEIGHT BRAND 111144 0704 0738

I need to know whether we can use printf '%s %-51s' in formatting in awk


Regards
Reply With Quote
  #7  
Old 05-05-2008
Technorati Master
 

Join Date: Mar 2005
Location: Large scale systems...
Posts: 2,610
you don't have to post the requirement again
Reply With Quote
Google The UNIX and Linux Forums
Reply

Tags
solaris

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes




All times are GMT -7. The time now is 12:08 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008. All Rights Reserved.Ad Management by RedTyger Visit The Complex Event Processing Blog

Content Relevant URLs by vBSEO 3.2.0