Concatenating more than two lines into one based on some columns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Concatenating more than two lines into one based on some columns
# 1  
Old 09-27-2011
Concatenating more than two lines into one based on some columns

Hi,

I need to concatenate some lines in a file based on the First 4 coloumns of a file .. (For Eg.)
Consider a file ...
Code:
I,01,000002,0666,00000.00,000,00,000,000, ,0
I,01,000002,0667,00000.00,000,00,000,000, ,0
I,01,000002,0666,00056.10
I,01,000002,0667,00056.10
I,01,000002,0666,00001
I,01,000002,0667,00001

I want the output as
Code:
I,01,000002,0666,00000.00,000,00,000,000, ,0,00056.10,00001
I,01,000002,0667,00000.00,000,00,000,000, ,0,00056.10,00001

Can you please give me the command for this , i would prefer an AWK or a Sed or any unix command for this.

Thanks in Advance
Sri

Last edited by Franklin52; 09-27-2011 at 03:17 AM.. Reason: Please use code tags, thank you
# 2  
Old 09-27-2011
Code:
awk -F , 'NR<=2{a[$1 FS $2 FS $3 FS $4]=$0;next}
    {t=$1 FS $2 FS $3 FS $4;sub(t,"");a[t]=a[t] $0}
    END{for (i in a) print a[i]}' infile

# 3  
Old 09-27-2011
Thanks a ton rdcwayx

But I have a problem here , the awk command that you have given here is working fine only for the sample that i have shown here..
If you take a bigger file something like the one i have shown below
Code:
I,01,000002,0666,00000.00,000,00,000,000, ,0
I,01,000002,0667,00000.00,000,00,000,000, ,0
I,01,000002,0668,00000.00,000,00,000,000, ,0
I,01,000002,0669,00000.00,000,00,000,000, ,0
I,01,000002,0670,,,,,,,
I,01,000002,0671,,,,,,,
I,01,000002,0672,,,,,,,
I,01,000002,0673,,,,,,,
I,01,000007,0666,,,,,,,
I,01,000007,0667,,,,,,,
I,01,000002,0666,00056.10
I,01,000002,0667,00056.10
I,01,000002,0668,00056.10
I,01,000002,0669,00056.10
I,01,000002,0670,00056.10
I,01,000002,0671,00056.10
I,01,000002,0672,00056.10
I,01,000002,0673,00056.10
I,01,000007,0666,00010.02
I,01,000007,0667,00010.02
I,01,000002,0666,00001
I,01,000002,0667,00001
I,01,000002,0668,00001
I,01,000002,0669,00001
I,01,000002,0670,00001
I,01,000002,0671,00001
I,01,000002,0672,00001
I,01,000002,0673,00001
I,01,000007,0666,00001
I,01,000007,0667,00001

It is not giving the desired output , the output comes something like this
Code:
,,,,,,,,00056.10,00001
,,,,,,,,00056.10,00001
,,,,,,,,00010.02,00001
,,,,,,,,00056.10,00001
,,,,,,,,00010.02,00001
,,,,,,,,00056.10,00001
I,01,000002,0666,00000.00,000,00,000,000, ,0,00056.10,00001
I,01,000002,0667,00000.00,000,00,000,000, ,0,00056.10,00001
,00000.00,000,00,000,000, ,0,00056.10,00001
,00000.00,000,00,000,000, ,0,00056.10,00001


But I want it as something shown below
Code:
I,01,000002,0666,00000.00,000,00,000,000, ,0,00056.10,00001
I,01,000002,0667,00000.00,000,00,000,000, ,0,00056.10,00001
I,01,000002,0668,00000.00,000,00,000,000, ,0,00056.10,00001
I,01,000002,0669,00000.00,000,00,000,000, ,0,00056.10,00001
I,01,000002,0670,,,,,,,,00056.10,00001
I,01,000002,0671,,,,,,,,00056.10,00001
I,01,000002,0672,,,,,,,,00056.10,00001
I,01,000002,0673,,,,,,,,00056.10,00001
I,01,000007,0666,,,,,,,,00056.10,00001
I,01,000007,0667,,,,,,,,00056.10,00001


Hope you can help me in this as well ,

Thanks in advance
Sri

Moderator's Comments:
Mod Comment Video tutorial on how to use code tags in The UNIX and Linux Forums.

Last edited by Franklin52; 09-27-2011 at 03:17 AM..
# 4  
Old 09-27-2011
Code:
for i in `awk -F, '{print $3|"sort"}' infile|uniq`
do
        for j in `awk -F, '{print $4|"sort"}' infile|uniq`
        do
                grep "I,01,$i,$j," infile > temp
                lne=`echo "I,01,$i,$j,"`
                sed "s|$lne||g" temp | xargs | sed "s|^|$lne|g;s, ,\,,g"
        done
done

# 5  
Old 09-27-2011
Code:
perl -lne '
/(([^,]+,){4})(.*)/ and $v{$1} .= $3;
END {
  print $_, $v{$_} for sort keys %v
}' INPUTFILE

# 6  
Old 09-27-2011
Try this...
Code:
awk -F, ' { if(!a[$3,$4]){ a[$3,$4]=$0 }
  else { f=match($0,$5); str=substr($0,f-1); a[$3,$4]=a[$3,$4] str; } }
END{ for(i in a){print a[i]} }' input_file

You can pipe the output to sort if required. awk {...} | sort

--ahamed

Last edited by ahamed101; 09-27-2011 at 04:12 AM..
# 7  
Old 09-27-2011
Code:
awk -F , '!a[$1 FS $2 FS $3 FS $4] {a[$1 FS $2 FS $3 FS $4]=$0;next} 
    {t=$1 FS $2 FS $3 FS $4;sub(t,"");a[t]=a[t] $0}
    END{for (i in a) print a[i]}' infile

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join columns across multiple lines in a Text based on common column using BASH

Hello, I have a file with 2 columns ( tableName , ColumnName) delimited by a Pipe like below . File is sorted by ColumnName. Table1|Column1 Table2|Column1 Table5|Column1 Table3|Column2 Table2|Column2 Table4|Column3 Table2|Column3 Table2|Column4 Table5|Column4 Table2|Column5 From... (6 Replies)
Discussion started by: nv186000
6 Replies

2. Shell Programming and Scripting

Concatenating many files based on a specific column contents

Dear all, I have many files(.csv) in a directory. I want to concatenate the files which have similar entries in a particular column and save into a new file like result_datetime.csv etc. One example file is like below. Sno,Step,Data1,Data2,Data3 etc. 1,0,2,3,4 2,1,3,4,5 3,2,0,1,1 ... (4 Replies)
Discussion started by: ks_reddy
4 Replies

3. UNIX for Dummies Questions & Answers

Concatenating columns

Hi I have the following input file, It is a tab delimited file ISOCOUNTRYCODE POSTALCODE CITY HNO STREETBASENAME STREETTYPE FIN 40950 Muurame Teollisuus tie FIN 02160 Westendintie FIN 33210 Tampere Päämäärän kuja... (2 Replies)
Discussion started by: ramky79
2 Replies

4. UNIX for Dummies Questions & Answers

append a column by concatenating other columns

Hi In a tab delimited file how can I add a column that have values concatenated from all columns. For example input.txt test1 test2 test3 zz2 mm uu pp3 yy kk ss2 tt ll zz3 mm uu pp23 yy kk ss3 tt ll 11e 22 44 33c 22 99 output.txt test1 test2 test3 reslt (6 Replies)
Discussion started by: mary271
6 Replies

5. UNIX for Dummies Questions & Answers

remove duplicate lines based on two columns and judging from a third one

hello all, I have an input file with four columns like this with a lot of lines and for example, line 1 and line 5 match because the first 4 characters match and the fourth column matches too. I want to keep the line that has the lowest number in the third column. So I discard line 5.... (5 Replies)
Discussion started by: TheTransporter
5 Replies

6. Shell Programming and Scripting

awk : extracting unique lines based on columns

Hi, snp.txt CHR_A SNP_A BP_A_st BP_A_End CHR_B BP_B SNP_B R2 p-SNP_A p-SNP_B 5 rs1988728 74904317 74904318 5 74960646 rs1427924 0.377333 0.000740085 0.013930081 5 ... (12 Replies)
Discussion started by: genehunter
12 Replies

7. Shell Programming and Scripting

awk: switching lines and concatenating lines?

Hello, I have only recently begun with awk and need to write this: I have an input consisting of a couple of letters, a space and a number followed by various other characters: fiRcQ 9( ) klsRo 9( ) pause fiRcQ 9( ) pause klsRo continue 1 aPLnJ 62( ) fiRcQ continue 5 ... and so on I... (7 Replies)
Discussion started by: Borghal
7 Replies

8. Shell Programming and Scripting

Concatenating and appending string based on specific pattern match

Input #GEO-1-type-1-fwd-Initial 890 1519 OPKHIJEFVTEFVHIJEFVOPKHIJTOPKEFVHIJTEFVOPKOPKHIJHIJHIJTTOPKHIJHIJEFVEFVOPKHIJOPKHIJOPKEFVEFVOPKHIJHIJEFVHIJHIJEFVTHIJOPKOPKTEFVEFVEFVOPKHIJOPKOPKHIJTTEFVEFVTEFV #GEO-1-type-2-fwd-Terminal 1572 2030... (7 Replies)
Discussion started by: patrick87
7 Replies

9. Shell Programming and Scripting

Remove lines, Sorted with Time based columns using AWK & SORT

Hi having a file as follows MediaErr.log 84 Server1 Policy1 Schedule1 master1 05/08/2008 02:12:16 84 Server1 Policy1 Schedule1 master1 05/08/2008 02:22:47 84 Server1 Policy1 Schedule1 master1 05/08/2008 03:41:26 84 Server1 Policy1 ... (1 Reply)
Discussion started by: karthikn7974
1 Replies

10. UNIX for Dummies Questions & Answers

Removing lines that are (same in content) based on columns

I have a file which looks like AA BB CC DD EE FF GG HH KK AA BB GG HH KK FF CC DD EE AA BB CC DD EE UU VV XX ZZ AA BB VV XX ZZ UU CC DD EE .... I want the script to give me only one line based on duplicate contents: AA BB CC DD EE FF GG HH KK AA BB CC DD EE UU VV XX ZZ (7 Replies)
Discussion started by: adsforall
7 Replies
Login or Register to Ask a Question