![]() |
|
|
|
|
|||||||
| Forums | Portal | Register | Forum Rules | FAQ | Contribute | Members List | Arcade | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here. |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| finding duplicates in columns and removing lines | totus | Shell Programming and Scripting | 17 | 4 Days Ago 08:27 AM |
| How to check Null values in a file column by column if columns are Not NULLs | Mandab | Shell Programming and Scripting | 7 | 03-15-2008 06:57 AM |
| (sed) parsing insert statement column that crosses multiple lines | jjordan | Shell Programming and Scripting | 3 | 10-08-2007 09:23 PM |
| Add a column at the end of all the lines in a file | ruthless | HP-UX | 6 | 01-20-2006 11:48 AM |
|
|
Submit Tools | LinkBack | Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
duplicates lines with one column different
Hi
I have the following lines in a file SANDI108085FRANKLIN WRAP 7285 SANDI109514ZIPLOC STRETCH N SEAL 7285 SANDI110198CHOICE DM 0911 SANDI111144RANDOM WEIGHT BRAND 0704 SANDI111144RANDOM WEIGHT BRAND 0738 SANDI111144RANDOM WEIGHT BRAND 0739 SANDI113951NBL-NO COMPANY LISTED 7285 SANDI115203HOME BASICS 7285 I need the output like SANDI108085FRANKLIN WRAP 7285 SANDI109514ZIPLOC STRETCH N SEAL 7285 SANDI110198CHOICE DM 0911 SANDI111144RANDOM WEIGHT BRAND 0704 0738 0739 SANDI113951NBL-NO COMPANY LISTED 7285 SANDI115203HOME BASICS 7285 Note:- SANDI111144RANDOMWEIGHT BRAND has same lines repeated but the last column is different, i am grouping those columns Is there any way in sed or awk which can fit the logic very easily. Regards Dhana |
| Forum Sponsor | ||
|
|
|
#2
|
|||
|
|||
|
Code:
awk code --
awk '{
key=substr($0,1,11)
if(arr[key])
{
arr[key]=sprintf("%s %s", arr[key], $NF)
}
else
{
arr[key]=$0
}
}
END {for (i in arr) {print arr[i]} } ' filenamecsadev:/home/jmcnama>
# output
csadev:/home/jmcnama> t.awk
SANDI110198CHOICE DM 0911 0911
SANDI108085FRANKLIN WRAP 7285 7285a
SANDI113951NBL-NO COMPANY LISTED 7285 7285b
SANDI115203HOME BASICS 7285 7285b
SANDI111144RANDOM WEIGHT BRAND 0704 0738 0739 0704a 0738b 0739b
SANDI109514ZIPLOC STRETCH N SEAL 7285 7285a
#input file
csadev:/home/jmcnama> cat filename
SANDI108085FRANKLIN WRAP 7285
SANDI109514ZIPLOC STRETCH N SEAL 7285
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704
SANDI111144RANDOM WEIGHT BRAND 0738
SANDI111144RANDOM WEIGHT BRAND 0739
SANDI113951NBL-NO COMPANY LISTED 7285
SANDI115203HOME BASICS 7285
SANDI108085FRANKLIN WRAP 7285a
SANDI109514ZIPLOC STRETCH N SEAL 7285a
SANDI110198CHOICE DM 0911
SANDI111144RANDOM WEIGHT BRAND 0704a
SANDI111144RANDOM WEIGHT BRAND 0738b
SANDI111144RANDOM WEIGHT BRAND 0739b
SANDI113951NBL-NO COMPANY LISTED 7285b
SANDI115203HOME BASICS 7285b
|
|
#3
|
|||
|
|||
|
duplicates lines with one column different
Hi
Your logic works but i have a small correction in my requirement the input file as i said will look like this SANDI108085FRANKLIN WRAP 7285 SANDI109514ZIPLOC STRETCH N SEAL 7285 SANDI110198CHOICE DM 0911 SANDI111144RANDOM WEIGHT BRAND 0704 SANDI111144RANDOM WEIGHT BRAND 0738 The output should be in the format. I need to know whether we can use printf '%s %-51s' in formatting in awk SANDI FRANKLIN WRAP 108085 7285 SANDI ZIPLOC STRETHC N SEAL 109514 7285 SANDI CHOICE DM 110198 0911 SANDI RANDOM WEIGHT BRAND 111144 0704 0738 Regards |
|
#4
|
|||
|
|||
|
duplicates lines with one column different
Hi
Also i have one more question if we are using like this arr[key]=sprintf("%s %s", arr[key], $NF) We are creating a map or relationship between the key and the elements. I would like to do a file processing of nearly 3GB size file. If thisis the case will there be any memory issues coming out. Regards Dhana |
|
#5
|
|||
|
|||
|
Quote:
dhanumurthy actually in awk - associative arrays; you are creating the associativity between key and value |
|
#6
|
|||
|
|||
|
Duplicate lines with last column column different
Hi
Your logic works but i have a small correction in my requirement the input file as i said will look like this SANDI108085FRANKLIN WRAP 7285 SANDI109514ZIPLOC STRETCH N SEAL 7285 SANDI110198CHOICE DM 0911 SANDI111144RANDOM WEIGHT BRAND 0704 SANDI111144RANDOM WEIGHT BRAND 0738 The output should be in the format. SANDI FRANKLIN WRAP 108085 7285 SANDI ZIPLOC STRETHC N SEAL 109514 7285 SANDI CHOICE DM 110198 0911 SANDI RANDOM WEIGHT BRAND 111144 0704 0738 I need to know whether we can use printf '%s %-51s' in formatting in awk Regards |
|
#7
|
|||
|
|||
|
you don't have to post the requirement again
|
|||
| Google The UNIX and Linux Forums |