Concatenating and appending string based on specific pattern match


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Concatenating and appending string based on specific pattern match
# 1  
Old 12-15-2009
Concatenating and appending string based on specific pattern match

Input
Code:
#GEO-1-type-1-fwd-Initial  890 1519
OPKHIJEFVTEFVHIJEFVOPKHIJTOPKEFVHIJTEFVOPKOPKHIJHIJHIJTTOPKHIJHIJEFVEFVOPKHIJOPKHIJOPKEFVEFVOPKHIJHIJEFVHIJHIJEFVTHIJOPKOPKTEFVEFVEFVOPKHIJOPKOPKHIJTTEFVEFVTEFV

#GEO-1-type-2-fwd-Terminal  1572 2030
HIJOPKHIJEFVTOPKOPKTTOPKHIJOPKHIJEFVOPKTOPKTOPKHIJHIJTEFVOPKTOPKTOPKEFVOPKOPKEFVEFVTEFVOPKHIJEFVEFVOPKHIJOPKOPKHIJHIJEFVEFVHIJEFVEFVTOPKEFVOPKTHIJTTHIJOPK

#GEO-2-type-1-rev-Terminal  2734 2475
EFVTEFVTTOPKTOPKTEFVOPKHIJTEFVTTTOPKEFVTEFVOPKTTOPKTHIJTTTOPKEFVTOPKTEFVEFVEFVTHIJEFVHIJOPKEFVHIJOPKHIJEFVEFVHIJEFVEFVEFVTHIJEFVHIJOPKTHIJ

#GEO-2-type-2-rev-Internal  3041 2804
TEFVEFVOPKHIJTEFVHIJHIJHIJOPKOPKTTOPKHIJTOPKTOPKEFVEFVEFVEFVOPKHIJEFVTEFVTHIJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPK

#GEO-2-type-3-rev-Terminal  4050 3990
IJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPK

Output
Code:
#GEO-1-fwd 890 1519 1572 2030 
OPKHIJEFVTEFVHIJEFVOPKHIJTOPKEFVHIJTEFVOPKOPKHIJHIJHIJTTOPKHIJHIJEFVEFVOPKHIJOPKHIJOPKEFVEFVOPKHIJHIJEFVHIJHIJEFVTHIJOPKOPKTEFVEFVEFVOPKHIJOPKOPKHIJTTEFVEFVTEFVHIJOPKHIJEFVTOPKOPKTTOPKHIJOPKHIJEFVOPKTOPKTOPKHIJHIJTEFVOPKTOPKTOPKEFVOPKOPKEFVEFVTEFVOPKHIJEFVEFVOPKHIJOPKOPKHIJHIJEFVEFVHIJEFVEFVTOPKEFVOPKTHIJTTHIJOPK

#GEO-2-rev 4050 3990 3041 2804 2734 2475
IJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPKTEFVEFVOPKHIJTEFVHIJHIJHIJOPKOPKTTOPKHIJTOPKTOPKEFVEFVEFVEFVOPKHIJEFVTEFVTHIJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPKEFVTEFVTTOPKTOPKTEFVOPKHIJTEFVTTTOPKEFVTEFVOPKTTOPKTHIJTTTOPKEFVTOPKTEFVEFVEFVTHIJEFVHIJOPKEFVHIJOPKHIJEFVEFVHIJEFVEFVEFVTHIJEFVHIJOPKTHIJ

I would like to concatenating and appending the string content based on its header description. For those header description got "fwd",it append its content ascending. For those header description got "rev",it append its content descending. I trying the awk and perl do archive my desired goal now. Thanks a lot for any advice and suggestion.
# 2  
Old 12-15-2009
Straight forward approach:

Code:
awk -F '[ -]' '{if (NF>1){r=$1"-"$2"-"$5; m=$5;
                   if (m=="fwd"){A[r]=A[r]" "$8" "$9}
                   else if (m=="rev"){A[r]=$8" "$9" "A[r]} }
                else if (!/^$/){
                  if (m=="fwd") {B[r]=B[r]$1}
                  else {if (m=="rev") B[r]=$1B[r]} } }
                END{for (i in A) {print i, A[i]; print B[i] }}' infile

# 3  
Old 12-15-2009
Thanks a lot, Scrutinizer.
Your code works perfectly.
But it will give the output result like this:
Code:
#GEO-2-rev 4050 3990 3041 2804 2734 2475
IJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPKTEFVEFVOPKHIJTEFVHIJHIJHIJOPKOPKTTOPKHIJTOPKTOPKEFVEFVEFVEFVOPKHIJEFVTEFVTHIJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPKEFVTEFVTTOPKTOPKTEFVOPKHIJTEFVTTTOPKEFVTEFVOPKTTOPKTHIJTTTOPKEFVTOPKTEFVEFVEFVTHIJEFVHIJOPKEFVHIJOPKHIJEFVEFVHIJEFVEFVEFVTHIJEFVHIJOPKTHIJ

#GEO-1-fwd 890 1519 1572 2030 
OPKHIJEFVTEFVHIJEFVOPKHIJTOPKEFVHIJTEFVOPKOPKHIJHIJHIJTTOPKHIJHIJEFVEFVOPKHIJOPKHIJOPKEFVEFVOPKHIJHIJEFVHIJHIJEFVTHIJOPKOPKTEFVEFVEFVOPKHIJOPKOPKHIJTTEFVEFVTEFVHIJOPKHIJEFVTOPKOPKTTOPKHIJOPKHIJEFVOPKTOPKTOPKHIJHIJTEFVOPKTOPKTOPKEFVOPKOPKEFVEFVTEFVOPKHIJEFVEFVOPKHIJOPKOPKHIJHIJEFVEFVHIJEFVEFVTOPKEFVOPKTHIJTTHIJOPK

In between, can I ask you about the meaning of A/B[r] and what is the $9 represent in your awk code?
What I understand is the header only from $1-$8,right?
Thanks again first, Scrutinizer.
# 4  
Old 12-15-2009
Code:
my $key;
while(<DATA>){
	chomp;
	if(/-/){
		my @tmp = split(/[- ]/,$_,6);
		$key=$tmp[4];
		if($hash{$tmp[4]}->{TITLE} == ""){
			$hash{$key}->{TITLE}=$tmp[0]."-".$tmp[0]. "-".$tmp[4];
		}
		else{
			$hash{$key}->{TITLE}=$hash{$key}->{TITLE}. " ".$tmp[6];
		}
	}
	else{
		$hash{$key}->{DATA}=$hash{$key}->{DATA}.$_;
	}
}
foreach my $key( keys %hash){
	print $hash{$key}->{TITLE},"\n";
	print $hash{$key}->{DATA},"\n";
}
__DATA__
#GEO-1-type-1-fwd-Initial  890 1519
OPKHIJEFVTEFVHIJEFVOPKHIJTOPKEFVHIJTEFVOPKOPKHIJHIJHIJTTOPKHIJHIJEFVEFVOPKHIJOPKHIJOPKEFVEFVOPKHIJHIJEFVHIJHIJEFVTHIJOPKOPKTEFVEFVEFVOPKHIJOPKOPKHIJTTEFVEFVTEFV

#GEO-1-type-2-fwd-Terminal  1572 2030
HIJOPKHIJEFVTOPKOPKTTOPKHIJOPKHIJEFVOPKTOPKTOPKHIJHIJTEFVOPKTOPKTOPKEFVOPKOPKEFVEFVTEFVOPKHIJEFVEFVOPKHIJOPKOPKHIJHIJEFVEFVHIJEFVEFVTOPKEFVOPKTHIJTTHIJOPK

#GEO-2-type-1-rev-Terminal  2734 2475
EFVTEFVTTOPKTOPKTEFVOPKHIJTEFVTTTOPKEFVTEFVOPKTTOPKTHIJTTTOPKEFVTOPKTEFVEFVEFVTHIJEFVHIJOPKEFVHIJOPKHIJEFVEFVHIJEFVEFVEFVTHIJEFVHIJOPKTHIJ

#GEO-2-type-2-rev-Internal  3041 2804
TEFVEFVOPKHIJTEFVHIJHIJHIJOPKOPKTTOPKHIJTOPKTOPKEFVEFVEFVEFVOPKHIJEFVTEFVTHIJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPK

#GEO-2-type-3-rev-Terminal  4050 3990
IJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPK

# 5  
Old 12-15-2009
Quote:
Originally Posted by patrick87
Thanks a lot, Scrutinizer.
Your code works perfectly.
But it will give the output result like this:
Code:
#GEO-2-rev 4050 3990 3041 2804 2734 2475
IJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPKTEFVEFVOPKHIJTEFVHIJHIJHIJOPKOPKTTOPKHIJTOPKTOPKEFVEFVEFVEFVOPKHIJEFVTEFVTHIJTOPKHIJEFVOPKOPKTHIJEFVHIJHIJOPKOPKHIJHIJTTEFVEFVOPKTTEFVEFVOPKHIJOPKOPKOPKEFVTEFVTTOPKTOPKTEFVOPKHIJTEFVTTTOPKEFVTEFVOPKTTOPKTHIJTTTOPKEFVTOPKTEFVEFVEFVTHIJEFVHIJOPKEFVHIJOPKHIJEFVEFVHIJEFVEFVEFVTHIJEFVHIJOPKTHIJ

#GEO-1-fwd 890 1519 1572 2030 
OPKHIJEFVTEFVHIJEFVOPKHIJTOPKEFVHIJTEFVOPKOPKHIJHIJHIJTTOPKHIJHIJEFVEFVOPKHIJOPKHIJOPKEFVEFVOPKHIJHIJEFVHIJHIJEFVTHIJOPKOPKTEFVEFVEFVOPKHIJOPKOPKHIJTTEFVEFVTEFVHIJOPKHIJEFVTOPKOPKTTOPKHIJOPKHIJEFVOPKTOPKTOPKHIJHIJTEFVOPKTOPKTOPKEFVOPKOPKEFVEFVTEFVOPKHIJEFVEFVOPKHIJOPKOPKHIJHIJEFVEFVHIJEFVEFVTOPKEFVOPKTHIJTTHIJOPK

In between, can I ask you about the meaning of A/B[r] and what is the $9 represent in your awk code?
What I understand is the header only from $1-$8,right?
Thanks again first, Scrutinizer.
Hi Patrick,

That is because in awk the order of associative array elements is undetermined. On my computer it gets printed in the right order but that is by chance. If that is important we'd have to something to ensure the right order. Single spaces and - are used as separation characters so there are more fields, hence the $9. We could improve the robustness by using * in the -F specification and then using the proper field number.
# 6  
Old 12-15-2009
Thanks for your explanation, Scrutinizer.
I get what you mean now Smilie
I will try to fix the problem by make sure they are in the right order.
I got try your script few times just now.
All give the "rev" result first then only "fwd" Smilie
Thanks again, Scrutinizer.
# 7  
Old 12-15-2009
Perhaps you could give this a try then:
Code:
awk -F '[ -]*' '{ if (NF>1){
                    r=$1"-"$2"-"$5; m=$5;
                    if (!A[r]) O[i++]=r
                    if (m=="fwd") A[r]=A[r]" "$7" "$8
                    else if (m=="rev") A[r]=$7" "$8" "A[r]
                  }
                  else if (NF>0)
                    if (m=="fwd") B[r]=B[r]$1
                    else if (m=="rev") B[r]=$1B[r]
                 }
                 END{for (j=0;j<i;j++) {k=O[j];print k, A[k]; print B[k] }}' infile



---------- Post updated 16-12-09 at 00:24 ---------- Previous update was 15-12-09 at 11:47 ----------

Slightly simplified
Code:
awk -F '[ -]*' 'NF>1  { r=$1"-"$2"-"$5; m=$5; if (!A[r]) O[i++]=r
                        if (m=="fwd") A[r]=A[r]" "$7" "$8
                        else if (m=="rev") A[r]=$7" "$8" "A[r] }
                NF==1 { if (m=="fwd") B[r]=B[r]$1
                        else if (m=="rev") B[r]=$1B[r] }
                END   { for (j=0;j<i;j++) {k=O[j];print k, A[k]; print B[k]} }' infile


Last edited by Scrutinizer; 12-15-2009 at 07:13 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed or awk command to replace a string pattern with another string based on position of this string

here is what i want to achieve... consider a file contains below contents. the file size is large about 60mb cat dump.sql INSERT INTO `table1` (`id`, `action`, `date`, `descrip`, `lastModified`) VALUES (1,'Change','2011-05-05 00:00:00','Account Updated','2012-02-10... (10 Replies)
Discussion started by: vivek d r
10 Replies

2. Shell Programming and Scripting

Concatenating many files based on a specific column contents

Dear all, I have many files(.csv) in a directory. I want to concatenate the files which have similar entries in a particular column and save into a new file like result_datetime.csv etc. One example file is like below. Sno,Step,Data1,Data2,Data3 etc. 1,0,2,3,4 2,1,3,4,5 3,2,0,1,1 ... (4 Replies)
Discussion started by: ks_reddy
4 Replies

3. Shell Programming and Scripting

Help with replace line based on specific pattern match

Input file data20714 7327 7366 detail data20714 7327 7366 main data250821 56532 57634 detail data250821 57527 57634 main data250821 57359 57474 main data250821 57212 57301 main data250821 57140 57159 detail data250821 56834 57082 main data250821 56708 56779 main ... (3 Replies)
Discussion started by: perl_beginner
3 Replies

4. Shell Programming and Scripting

Filename pattern match and appending pipe

Hi, I have a directory with around 100k files and files with varying sizes(10GB files to as low as 5KB). All the files are having pipe dilimited records. I need to append 7 pipes to the end of each record, in each file whose name contains _X3_ and need to append 10 pipes to the end of each... (3 Replies)
Discussion started by: nss280
3 Replies

5. Shell Programming and Scripting

Appending string to match pattern (data processing)

Hello i have go the following result from performing 2 testing using the same file. I have used unix script to extract the result because the files are many as shown below. 01_gravity.f.tcov 7 3 42.86 02_gravity.f.tcov 9 4 80.86... (4 Replies)
Discussion started by: ganiel24
4 Replies

6. Shell Programming and Scripting

Paste two file side by side together based on specific pattern match problem

Input file_1: P78811 P40108 O17861 Q6NTW1 P40986 Q6PBK1 P38264 Q6PBK1 Q9CZ49 Q1GZI0 Input file_2: (6 Replies)
Discussion started by: patrick87
6 Replies

7. Shell Programming and Scripting

Merge two file data together based on specific pattern match

My input: File_1: 2000_t g1110.b1 abb.1 2001_t g1111.b1 abb.2 abb.2 g1112.b1 abb.3 2002_t . . File_2: 2000_t Ali england 135 abb.1 Zoe british 150 2001_t Ali england 305 g1111.b1 Lucy russia 126 (6 Replies)
Discussion started by: patrick87
6 Replies

8. Shell Programming and Scripting

Concatenating multiple lines to one line if match pattern

Hi all, I've been working on a script which I have hit a road block now. I have written a script using sed to extract the below data and pumped into another file: Severity............: MAJORWARNING Summary: System temperature is out of normal range. Severity............: MAJORWARNING... (13 Replies)
Discussion started by: phixsius
13 Replies

9. Shell Programming and Scripting

appending with sed based on matched pattern

Hi, I want to know if you can input with sed but instead of specifing a line number like below I wan't to be able to insert based on a specific word or patttern. 10i\ Insert me after line 10 is this possible with sed or should I use AWK? Thanks Jack (2 Replies)
Discussion started by: jack1981
2 Replies

10. Shell Programming and Scripting

appending string to text file based on search string

Hi, I need to append string "Hi" to the beginning of the lines containing some specific string. How can I achieve that? Please help. Malay (1 Reply)
Discussion started by: malaymaru
1 Replies
Login or Register to Ask a Question