Split 1 column into numerous columns based on patterns


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Split 1 column into numerous columns based on patterns
# 8  
Old 10-13-2015
Telling us you use awk version 3.1.5 without telling us what operating system you're using doesn't help much...

If the data in your fields isn't always less than eight characters and you want the output columns to one up (instead of just being <tab> separated), you could try something like:
Code:
awk '
function dump(sep,	c, r) {
# printf("dump:sep=%d,colc=%d,rowc=%d\n", sep, colc, rowc)
	for(r = 1; r <= rowc; r++) {
		for(c = 1; c <= colc; c++) {
			printf("%-*s%s", w[c], d[c, r], c == colc ? "\n" : "  ")
			delete d[c, r]
		}
	}
	if(rowc) {
		for(c in w)
			delete w[c]
		if(sep)	print ""
		colc = rowc = 0
	}
}
NF == 0 {
	# Skip blank lines.
	next
}
/^"NEW (PRODUCT|ITEM), VERSION 1[.]1"$/ {
# print "NP:" $0
	if(r > rowc)
		rowc = r
	dump(1)
	r = 0
	print
	next
}
/^"PRODUCT FIELD"$/ {
# print "PF:" $0
	colc++
	if(r > rowc)
		rowc = r
	r = 0
	next
}
{	if(colc) {
		if(length($0) > w[colc])
			w[colc] = length($0)
		d[colc, ++r] = $0
# printf("d[%d,%d]=%s, w[%d]=%d, rowc=%d\n", colc, r, d[colc,r], colc, w[colc], rowc)
	} else	print
}
END {	if(r > rowc)
		rowc = r
	dump(0)
}' "Item_List.txt"

which, with your latest sample input, produces the output:
Code:
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
FIELD_X  FIELD_Y
11.11    22.22  
11.22    22.33  
11.33           
11.44           

"NEW PRODUCT, VERSION 1.1"
PRODUCT_02
FIELD_X  FIELD_Y  FIELD_Z
33.33    44.44    55.55

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

To see that it still lines up column columns in the output when data has varying widths, with the input:
Code:
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
"PRODUCT FIELD"
FIELD_X
11.11
"PRODUCT FIELD"
FIELD_Y
22.22
"NEW ITEM, VERSION 1.1"
PRODUCT_02
"PRODUCT FIELD"
FIELD_X
33.33
"PRODUCT FIELD"
FIELD_Y
44.44
"PRODUCT FIELD"
FIELD_Z
55.55
"NEW PRODUCT, VERSION 1.1"
Product_Item_#3
"PRODUCT FIELD"
Heading #1
f1 l#1
"PRODUCT FIELD"
H2
field 2 1st line
field 2 2nd line
field 2 3rd line
"PRODUCT FIELD"
Heading #3
f 3 line 1
field 3 line 2 
field 3 3rd line
f 3 4th line
f 3 5th line
"PRODUCT FIELD"
FIELD_X
33.33
"PRODUCT FIELD"
FIELD_Y
44.44
"PRODUCT FIELD"
FIELD_Z
55.55

it produces the output:
Code:
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
FIELD_X  FIELD_Y
11.11    22.22  

"NEW ITEM, VERSION 1.1"
PRODUCT_02
FIELD_X  FIELD_Y  FIELD_Z
33.33    44.44    55.55  

"NEW PRODUCT, VERSION 1.1"
Product_Item_#3
Heading #1  H2                Heading #3        FIELD_X  FIELD_Y  FIELD_Z
f1 l#1      field 2 1st line  f 3 line 1        33.33    44.44    55.55  
            field 2 2nd line  field 3 line 2                             
            field 2 3rd line  field 3 3rd line                           
                              f 3 4th line                               
                              f 3 5th line

This User Gave Thanks to Don Cragun For This Post:
# 9  
Old 10-13-2015
That works! thanks to those who responded.

---------- Post updated at 11:46 AM ---------- Previous update was at 11:25 AM ----------

One last thing. In the input file there is the possibility that there will be blank spaces for some of the data values. For instance;

Code:
 
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
"PRODUCT FIELD"
FIELD_X
3333333.0000000
  
4444444.0000000

The output looks like;

Code:
 
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
"PRODUCT FIELD"
FIELD_X
3333333.0000000
4444444.0000000

Is there a way to keep the blank spaces?
# 10  
Old 10-13-2015
I know you have got your answer...but since I got he code, I am posting here
Code:
awk '/\"NEW PRODUCT, VERSION 1[.]1\"/ {if(n) {l = l < n ? n : l; print "\"NEW PRODUCT, VERSION 1.1\""; print prod; print head; for(i = 1; i < l-1; i++){print A[i]}; l = 0; n = 0; print ""}; head = ""; val = ""; getline; prod = $0; split(x, A); next}
/\"PRODUCT FIELD\"/ {l = l < n ? n : l; n = 1; next}
n == 1 {head = head == "" ? $0 : (head "\t" $0); n++; next}
n >= 2 {A[(n-1)] = (A[(n-1)] == "") ? $0 : (A[(n-1)] "\t" $0); n++}
END {l = l < n ? n : l; print "\"NEW PRODUCT, VERSION 1.1\""; print prod; print head; for(i = 1; i < l-1; i++){print A[i]}}' Item_List.txt

---------- Post updated at 06:51 AM ---------- Previous update was at 06:47 AM ----------

Code:
awk '/\"NEW PRODUCT, VERSION 1[.]1\"/ {if(n) {l = l < n ? n : l; print "\"NEW PRODUCT, VERSION 1.1\""; print prod; print head; for(i = 1; i < l-1; i++){print A[i]}; l = 0; n = 0; print ""}; head = ""; val = ""; getline; prod = $0; split(x, A); next}
/\"PRODUCT FIELD\"/ {l = l < n ? n : l; n = 1; next}
n == 1 {head = head == "" ? $0 : (head "\t" $0); n++; next}
n >= 2 && NF {A[(n-1)] = (A[(n-1)] == "") ? $0 : (A[(n-1)] "\t" $0); n++}
END {l = l < n ? n : l; print "\"NEW PRODUCT, VERSION 1.1\""; print prod; print head; for(i = 1; i < l-1; i++){print A[i]}}' Item_List.txt

Quote:
Originally Posted by mmab
That works! thanks to those who responded.

---------- Post updated at 11:46 AM ---------- Previous update was at 11:25 AM ----------

One last thing. In the input file there is the possibility that there will be blank spaces for some of the data values. For instance;

Code:
 
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
"PRODUCT FIELD"
FIELD_X
3333333.0000000
  
4444444.0000000

The output looks like;

Code:
 
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
"PRODUCT FIELD"
FIELD_X
3333333.0000000
4444444.0000000

Is there a way to keep the blank spaces?
# 11  
Old 10-13-2015
Quote:
Originally Posted by mmab
That works! thanks to those who responded.

---------- Post updated at 11:46 AM ---------- Previous update was at 11:25 AM ----------

One last thing. In the input file there is the possibility that there will be blank spaces for some of the data values. For instance;

Code:
 
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
"PRODUCT FIELD"
FIELD_X
3333333.0000000
  
4444444.0000000

The output looks like;

Code:
 
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
"PRODUCT FIELD"
FIELD_X
3333333.0000000
4444444.0000000

Is there a way to keep the blank spaces?
Did you run post#7's code on that data?
# 12  
Old 10-13-2015
If you get rid of blank lines at the start of your file (before the first "NEW PRODUCT, VERSION 1.1" line) and remove the following lines in the code I provided in post #8:
Code:
NF == 0 {
	# Skip blank lines.
	next
}

it should also work. (But if you keep blank lines at the start of your input files, I'll leave it up to you to modify the code to adjust for that.)
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Using awk to split a column into two columns

Hi, I am trying to split the following output into two columns, where each column has Source: Destination: OUTPUT TO FILTER $ tshark -r Capture_without_mtr.pcap -V | awk '/ (Source|Destination): /' | more Source: x.x.x.x Destination: x.x.x.x Source:... (2 Replies)
Discussion started by: sand1234
2 Replies

2. UNIX for Beginners Questions & Answers

How to split a column based on |?

Hi all, Newbie here, so please bear over with my stupid question :) I have used far too long time today on figuring this out, so I hope that someone here can help me move on. I have some annotation data for a transcriptome where I want to split a column containing NCBI accession IDs into a... (7 Replies)
Discussion started by: BioBing
7 Replies

3. UNIX for Dummies Questions & Answers

File merging based on column patterns

Hello :) I am in this situation: Input: two tab-delimited files, `File1` and `File2`. `File2` (`$2`) has to be parsed by patterns found in `File1` (`$1`). Expected output: tab-delimited file, `File3`. `File3` has to contain the same rows as `File2`, plus the corresponding value in... (5 Replies)
Discussion started by: dovah
5 Replies

4. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies

5. UNIX for Dummies Questions & Answers

Split file based on column

i have file1.txt asdas|csada|130310|0423|A1|canberra sdasd|sfdsf|130426|2328|A1|sydney Expected output : on eaceh third and fourth colum, split into each two characters asdas|csada|13|03|10|04|23|A1|canberra sdasd|sfdsf|13|04|26|23|28|A1|sydney (10 Replies)
Discussion started by: radius
10 Replies

6. Shell Programming and Scripting

Split the file based on column

Hi, I have a file sample_1.txt (300k rows) which has data like below: * Also each record is around 64k bytes 11|1|abc|102553|125589|64k bytes of data 10|2|def|123452|123356|...... 13|2|geh|144351|121123|... 25|4|fgh|165250|118890|.. 14|1|abc|186149|116657|......... (6 Replies)
Discussion started by: sol_nov
6 Replies

7. Shell Programming and Scripting

Split into columns based on the parameter and use & as delimiter

Here is my source, i have million lines like this on a file. disp0201.php?poc=4060&roc=1&ps=R&ooc=13&mjv=6&mov=5&rel=5&bod=155&oxi=2&omj=5&ozn=1&dav=20&cd=&daz=& drc=&mo=&sid=&lang=EN&loc=JPN I want to split this into columns in order to load in database, anything starts with"&mjv=6" as first... (13 Replies)
Discussion started by: elamurugu
13 Replies

8. Shell Programming and Scripting

split one column into multiple columns

hey, i have the following data: 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 (7 Replies)
Discussion started by: zaneded
7 Replies

9. UNIX for Dummies Questions & Answers

split one column into multiple columns

hey guys... Im looking to do the following: 1 2 3 4 5 6 7 8 9 Change to: 1 4 7 2 5 8 3 6 9 Did use | perl -lpe'$\=$.%3?$":"\n"' , but it doesnt give me the matrix i want. (3 Replies)
Discussion started by: zaneded
3 Replies

10. Web Development

split the fields in a column into 3 columns

Have a column "address" which is combination of city, region and postal code like. Format is : city<comma><space>region<space>postal code abc, xyz 123456 All these three city, region and postal code are not mandatory. There can be any one of the above. In that case a nell... (2 Replies)
Discussion started by: rakshit
2 Replies
Login or Register to Ask a Question