Split 1 column into numerous columns based on patterns


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Split 1 column into numerous columns based on patterns
# 1  
Old 10-12-2015
Code Split 1 column into numerous columns based on patterns

Hi,

I have a text file 'Item_List.txt' containing only 1 column. This column lists different products, each separated by the same generic string header "NEW PRODUCT, VERSION 1.1". After this the name of the product is given, then a delimiter string "PRODUCT FIELD", and then the name of the field itself. After the field name comes the data which in this example only has 1 entry per field but will have more.

Code:
'Item_List.txt'

"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
"PRODUCT FIELD"
FIELD_X
11.11
"PRODUCT FIELD"
FIELD_Y
22.22
"NEW ITEM, VERSION 1.1"
PRODUCT_02
"PRODUCT FIELD"
FIELD_X
33.33
"PRODUCT FIELD"
FIELD_Y
44.44
"PRODUCT FIELD"
FIELD_Z
55.55

etc

My desired output file is as follows;

Code:
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
FIELD_X		FIELD_Y
11.11 		22.22

"NEW PRODUCT, VERSION 1.1"
PRODUCT_02		
FIELD_X		FIELD_Y		FIELD_Z	
33.33    		44.44      		55.55

Basically I require to list each product with its relevant field columns ordered from left to right. I should note there is no limit on the number of fields each product has.

I have tried using csplit, awk, sed, cut & paste but with no luck. I apologize for not inserting my code as what i have does not work and i didn't want to confuse things.

I'm not expecting the answer, but even just a pointer on to how best go about this task. Even a rough structure would be good and i could add my own code to that.

Thanks
# 2  
Old 10-12-2015
your item_list has "NEW ITEM, VERSION 1.1". If that is "NEW PRODUCT, VERSION 1.1", below code will give you the desired output
Code:
awk '/\"NEW PRODUCT, VERSION 1.1\"/ {if(n) {print "\"NEW PRODUCT, VERSION 1.1\""; print prod; print head; print val "\n"}; head = ""; val = ""; getline; prod = $0; next}
/\"PRODUCT FIELD\"/ {n = 1; next}
n == 1 {head = head == "" ? $0 : (head "\t" $0); n++; next}
n == 2 {val = val == "" ? $0 : (val "\t" $0)}
END {print "\"NEW PRODUCT, VERSION 1.1\""; print prod; print head; print val}' Item_List.txt

# 3  
Old 10-12-2015
Thanks for your response! I ran your code on the file and got the following output;
Code:
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
FIELD_X FIELD_Y
11.11     22.22

"NEW PRODUCT, VERSION 1.1"
PRODUCT_02
FIELD_X FIELD_Z
33.33     55.55

This is very close but for the 2nd product only 2 out of its 3 fields are printed. (the 1st product only had 2 fields). Each product may have numerous fields, sorry if i was vague explaining this. I've had a good look through the code and i'm not sure why only 2 fields are printed.

Thanks

Last edited by Franklin52; 10-12-2015 at 10:32 AM.. Reason: Please use code tags
# 4  
Old 10-12-2015
Try also
Code:
awk '
BEGIN           {SRCH="NEW (PRODUCT|ITEM), VERSION 1.1"
                }
$0 ~ SRCH       {if (RES) print ORS RES
                 RES = ""
                 print
                 getline
                 print
                }
/^FIELD/        {printf "%s\t", $1
                 getline
                 RES = RES $0 "\t"
                }
END             {print ORS RES
                }
' file
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
FIELD_X    FIELD_Y    
11.11      22.22    
"NEW ITEM, VERSION 1.1"
PRODUCT_02
FIELD_X    FIELD_Y    FIELD_Z    
33.33      44.44      55.55

# 5  
Old 10-12-2015
Quote:
Originally Posted by mmab
Thanks for your response! I ran your code on the file and got the following output;
Code:
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
FIELD_X FIELD_Y
11.11     22.22

"NEW PRODUCT, VERSION 1.1"
PRODUCT_02
FIELD_X FIELD_Z
33.33     55.55

This is very close but for the 2nd product only 2 out of its 3 fields are printed. (the 1st product only had 2 fields). Each product may have numerous fields, sorry if i was vague explaining this. I've had a good look through the code and i'm not sure why only 2 fields are printed.

Thanks
The only way that I can see that you would get that output from SriniShoo's script would be if the 2nd line in your input file that you showed us containing the text:
Code:
FIELD_Y

was misspelled or did not appear at start of a line.

What operating system and version of awk are you using?

Also, in your 1st post in this thread you said:
Quote:
After the field name comes the data which in this example only has 1 entry per field but will have more.
None of the suggestions presented so far have addressed this, and I'm not sure what you mean by it. Please show us sample input (in CODE tags) and corresponding desired sample output (in CODE tags) so we can see what you are trying to do.

In your 1st post, you also said:
Quote:
I apologize for not inserting my code as what i have does not work and i didn't want to confuse things.
Please don't feel that way. Showing us what you tried (and the output you got from what you tried) (both in CODE tags) helps us understand what you're thinking and gives us a better chance of understanding some minor point in shell scripting that is causing your scripts to fail.
# 6  
Old 10-13-2015
Here is a cleaned up version of the input file showing how each field can have an undefined amount of data;

Code:
 
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
"PRODUCT FIELD"
FIELD_X
11.11
11.22
11.33
11.44
"PRODUCT FIELD"
FIELD_Y
22.22
22.33
"NEW PRODUCT, VERSION 1.1"
PRODUCT_02
"PRODUCT FIELD"
FIELD_X
33.33
"PRODUCT FIELD"
FIELD_Y
44.44
"PRODUCT FIELD"
FIELD_Z
55.55

Desired output like so;

Code:
 
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
FIELD_X  FIELD_Y
11.11      22.22
11.22      22.33
11.33      
11.44
       
"NEW PRODUCT, VERSION 1.1"
PRODUCT_02
FIELD_X  FIELD_Y  FIELD_Z
33.33      44.44     55.55

The version of awk I'm using is 3.1.5.

Thanks

Last edited by mmab; 10-13-2015 at 06:05 AM..
# 7  
Old 10-13-2015
Try
Code:
awk '
BEGIN           {SRCH="NEW (PRODUCT|ITEM), VERSION 1.1"
                }

function        PRT()   {printf "\n"
                         for (i=1; i<=VMAX; i++)
                                {for (j=1; j<=FCNT; j++)
                                        printf "%s\t", PRTARR[FSQ[j],i]
                                 printf "\n"
                                }
                         delete PRTARR
                        }

$0 ~ SRCH       {if (NR > 1) PRT()
                 print
                 getline
                 print
                 VMAX = 0
                 FCNT = 0
                 next
                }

/^"PROD/        {next
                }

/^FIELD/        {IDX = $1
                 VCNT = 0
                 printf "%s\t", IDX
                 FSQ[++FCNT] = IDX
                 next
                }

IDX             {PRTARR[IDX,++VCNT] = $1
                 if (VCNT > VMAX) VMAX = VCNT
                }

END             {PRT()
                }
' file
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
FIELD_X FIELD_Y
11.11   22.22
11.22   22.33
11.33
11.44
"NEW PRODUCT, VERSION 1.1"
PRODUCT_02
FIELD_X FIELD_Y FIELD_Z
33.33   44.44   55.55

This User Gave Thanks to RudiC For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Using awk to split a column into two columns

Hi, I am trying to split the following output into two columns, where each column has Source: Destination: OUTPUT TO FILTER $ tshark -r Capture_without_mtr.pcap -V | awk '/ (Source|Destination): /' | more Source: x.x.x.x Destination: x.x.x.x Source:... (2 Replies)
Discussion started by: sand1234
2 Replies

2. UNIX for Beginners Questions & Answers

How to split a column based on |?

Hi all, Newbie here, so please bear over with my stupid question :) I have used far too long time today on figuring this out, so I hope that someone here can help me move on. I have some annotation data for a transcriptome where I want to split a column containing NCBI accession IDs into a... (7 Replies)
Discussion started by: BioBing
7 Replies

3. UNIX for Dummies Questions & Answers

File merging based on column patterns

Hello :) I am in this situation: Input: two tab-delimited files, `File1` and `File2`. `File2` (`$2`) has to be parsed by patterns found in `File1` (`$1`). Expected output: tab-delimited file, `File3`. `File3` has to contain the same rows as `File2`, plus the corresponding value in... (5 Replies)
Discussion started by: dovah
5 Replies

4. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies

5. UNIX for Dummies Questions & Answers

Split file based on column

i have file1.txt asdas|csada|130310|0423|A1|canberra sdasd|sfdsf|130426|2328|A1|sydney Expected output : on eaceh third and fourth colum, split into each two characters asdas|csada|13|03|10|04|23|A1|canberra sdasd|sfdsf|13|04|26|23|28|A1|sydney (10 Replies)
Discussion started by: radius
10 Replies

6. Shell Programming and Scripting

Split the file based on column

Hi, I have a file sample_1.txt (300k rows) which has data like below: * Also each record is around 64k bytes 11|1|abc|102553|125589|64k bytes of data 10|2|def|123452|123356|...... 13|2|geh|144351|121123|... 25|4|fgh|165250|118890|.. 14|1|abc|186149|116657|......... (6 Replies)
Discussion started by: sol_nov
6 Replies

7. Shell Programming and Scripting

Split into columns based on the parameter and use & as delimiter

Here is my source, i have million lines like this on a file. disp0201.php?poc=4060&roc=1&ps=R&ooc=13&mjv=6&mov=5&rel=5&bod=155&oxi=2&omj=5&ozn=1&dav=20&cd=&daz=& drc=&mo=&sid=&lang=EN&loc=JPN I want to split this into columns in order to load in database, anything starts with"&mjv=6" as first... (13 Replies)
Discussion started by: elamurugu
13 Replies

8. Shell Programming and Scripting

split one column into multiple columns

hey, i have the following data: 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 (7 Replies)
Discussion started by: zaneded
7 Replies

9. UNIX for Dummies Questions & Answers

split one column into multiple columns

hey guys... Im looking to do the following: 1 2 3 4 5 6 7 8 9 Change to: 1 4 7 2 5 8 3 6 9 Did use | perl -lpe'$\=$.%3?$":"\n"' , but it doesnt give me the matrix i want. (3 Replies)
Discussion started by: zaneded
3 Replies

10. Web Development

split the fields in a column into 3 columns

Have a column "address" which is combination of city, region and postal code like. Format is : city<comma><space>region<space>postal code abc, xyz 123456 All these three city, region and postal code are not mandatory. There can be any one of the above. In that case a nell... (2 Replies)
Discussion started by: rakshit
2 Replies
Login or Register to Ask a Question