Split 1 column into numerous columns based on patterns

10-12-2015

Registered User

22, 0

Join Date: Nov 2011

Last Activity: 21 October 2015, 8:56 AM EDT

Posts: 22

Thanks Given: 2

Thanked 0 Times in 0 Posts

Split 1 column into numerous columns based on patterns

Hi,

I have a text file 'Item_List.txt' containing only 1 column. This column lists different products, each separated by the same generic string header "NEW PRODUCT, VERSION 1.1". After this the name of the product is given, then a delimiter string "PRODUCT FIELD", and then the name of the field itself. After the field name comes the data which in this example only has 1 entry per field but will have more.

Code:

'Item_List.txt'

"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
"PRODUCT FIELD"
FIELD_X
11.11
"PRODUCT FIELD"
FIELD_Y
22.22
"NEW ITEM, VERSION 1.1"
PRODUCT_02
"PRODUCT FIELD"
FIELD_X
33.33
"PRODUCT FIELD"
FIELD_Y
44.44
"PRODUCT FIELD"
FIELD_Z
55.55

etc

My desired output file is as follows;

Code:

"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
FIELD_X		FIELD_Y
11.11 		22.22

"NEW PRODUCT, VERSION 1.1"
PRODUCT_02		
FIELD_X		FIELD_Y		FIELD_Z	
33.33    		44.44      		55.55

Basically I require to list each product with its relevant field columns ordered from left to right. I should note there is no limit on the number of fields each product has.

I have tried using csplit, awk, sed, cut & paste but with no luck. I apologize for not inserting my code as what i have does not work and i didn't want to confuse things.

I'm not expecting the answer, but even just a pointer on to how best go about this task. Even a rough structure would be good and i could add my own code to that.

Thanks

mmab

View Public Profile for mmab

Find all posts by mmab

10-12-2015

Registered User

559, 160

Join Date: Jul 2012

Last Activity: 20 September 2019, 7:24 AM EDT

Location: India, Hyderabad

Posts: 559

Thanks Given: 11

Thanked 160 Times in 148 Posts

your item_list has "NEW ITEM, VERSION 1.1". If that is "NEW PRODUCT, VERSION 1.1", below code will give you the desired output

Code:

awk '/\"NEW PRODUCT, VERSION 1.1\"/ {if(n) {print "\"NEW PRODUCT, VERSION 1.1\""; print prod; print head; print val "\n"}; head = ""; val = ""; getline; prod = $0; next}
/\"PRODUCT FIELD\"/ {n = 1; next}
n == 1 {head = head == "" ? $0 : (head "\t" $0); n++; next}
n == 2 {val = val == "" ? $0 : (val "\t" $0)}
END {print "\"NEW PRODUCT, VERSION 1.1\""; print prod; print head; print val}' Item_List.txt

SriniShoo

View Public Profile for SriniShoo

Find all posts by SriniShoo

10-12-2015

Registered User

22, 0

Join Date: Nov 2011

Last Activity: 21 October 2015, 8:56 AM EDT

Posts: 22

Thanks Given: 2

Thanked 0 Times in 0 Posts

Thanks for your response! I ran your code on the file and got the following output;

Code:

"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
FIELD_X FIELD_Y
11.11     22.22

"NEW PRODUCT, VERSION 1.1"
PRODUCT_02
FIELD_X FIELD_Z
33.33     55.55

This is very close but for the 2nd product only 2 out of its 3 fields are printed. (the 1st product only had 2 fields). Each product may have numerous fields, sorry if i was vague explaining this. I've had a good look through the code and i'm not sure why only 2 fields are printed.

Thanks

Last edited by Franklin52; 10-12-2015 at 10:32 AM.. Reason: Please use code tags

mmab

View Public Profile for mmab

Find all posts by mmab

10-12-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Try also

Code:

awk '
BEGIN           {SRCH="NEW (PRODUCT|ITEM), VERSION 1.1"
                }
$0 ~ SRCH       {if (RES) print ORS RES
                 RES = ""
                 print
                 getline
                 print
                }
/^FIELD/        {printf "%s\t", $1
                 getline
                 RES = RES $0 "\t"
                }
END             {print ORS RES
                }
' file
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
FIELD_X    FIELD_Y    
11.11      22.22    
"NEW ITEM, VERSION 1.1"
PRODUCT_02
FIELD_X    FIELD_Y    FIELD_Z    
33.33      44.44      55.55

RudiC

View Public Profile for RudiC

Find all posts by RudiC

10-12-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by mmab

Thanks for your response! I ran your code on the file and got the following output;

Code:

"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
FIELD_X FIELD_Y
11.11     22.22

"NEW PRODUCT, VERSION 1.1"
PRODUCT_02
FIELD_X FIELD_Z
33.33     55.55

The only way that I can see that you would get that output from SriniShoo's script would be if the 2nd line in your input file that you showed us containing the text:

Code:

FIELD_Y

was misspelled or did not appear at start of a line.

What operating system and version of awk are you using?

Also, in your 1st post in this thread you said:

Quote:

After the field name comes the data which in this example only has 1 entry per field but will have more.

None of the suggestions presented so far have addressed this, and I'm not sure what you mean by it. Please show us sample input (in CODE tags) and corresponding desired sample output (in CODE tags) so we can see what you are trying to do.

In your 1st post, you also said:

Quote:

I apologize for not inserting my code as what i have does not work and i didn't want to confuse things.

Please don't feel that way. Showing us what you tried (and the output you got from what you tried) (both in CODE tags) helps us understand what you're thinking and gives us a better chance of understanding some minor point in shell scripting that is causing your scripts to fail.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10-13-2015

Registered User

22, 0

Join Date: Nov 2011

Last Activity: 21 October 2015, 8:56 AM EDT

Posts: 22

Thanks Given: 2

Thanked 0 Times in 0 Posts

Here is a cleaned up version of the input file showing how each field can have an undefined amount of data;

Code:

 
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
"PRODUCT FIELD"
FIELD_X
11.11
11.22
11.33
11.44
"PRODUCT FIELD"
FIELD_Y
22.22
22.33
"NEW PRODUCT, VERSION 1.1"
PRODUCT_02
"PRODUCT FIELD"
FIELD_X
33.33
"PRODUCT FIELD"
FIELD_Y
44.44
"PRODUCT FIELD"
FIELD_Z
55.55

Desired output like so;

Code:

 
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
FIELD_X  FIELD_Y
11.11      22.22
11.22      22.33
11.33      
11.44
       
"NEW PRODUCT, VERSION 1.1"
PRODUCT_02
FIELD_X  FIELD_Y  FIELD_Z
33.33      44.44     55.55

The version of awk I'm using is 3.1.5.

Thanks

Last edited by mmab; 10-13-2015 at 06:05 AM..

mmab

View Public Profile for mmab

Find all posts by mmab

10-13-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Try

Code:

awk '
BEGIN           {SRCH="NEW (PRODUCT|ITEM), VERSION 1.1"
                }

function        PRT()   {printf "\n"
                         for (i=1; i<=VMAX; i++)
                                {for (j=1; j<=FCNT; j++)
                                        printf "%s\t", PRTARR[FSQ[j],i]
                                 printf "\n"
                                }
                         delete PRTARR
                        }

$0 ~ SRCH       {if (NR > 1) PRT()
                 print
                 getline
                 print
                 VMAX = 0
                 FCNT = 0
                 next
                }

/^"PROD/        {next
                }

/^FIELD/        {IDX = $1
                 VCNT = 0
                 printf "%s\t", IDX
                 FSQ[++FCNT] = IDX
                 next
                }

IDX             {PRTARR[IDX,++VCNT] = $1
                 if (VCNT > VMAX) VMAX = VCNT
                }

END             {PRT()
                }
' file
"NEW PRODUCT, VERSION 1.1"
PRODUCT_01
FIELD_X FIELD_Y
11.11   22.22
11.22   22.33
11.33
11.44
"NEW PRODUCT, VERSION 1.1"
PRODUCT_02
FIELD_X FIELD_Y FIELD_Z
33.33   44.44   55.55

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

UNIX for Dummies Questions & Answers

Split 1 column into numerous columns based on patterns

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Using awk to split a column into two columns

Discussion started by: sand1234

2. UNIX for Beginners Questions & Answers

How to split a column based on |?

Discussion started by: BioBing

3. UNIX for Dummies Questions & Answers

File merging based on column patterns

Discussion started by: dovah

4. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Discussion started by: prashob123

5. UNIX for Dummies Questions & Answers

Split file based on column

Discussion started by: radius

6. Shell Programming and Scripting

Split the file based on column

Discussion started by: sol_nov

7. Shell Programming and Scripting

Split into columns based on the parameter and use & as delimiter

Discussion started by: elamurugu

8. Shell Programming and Scripting

split one column into multiple columns

Discussion started by: zaneded

9. UNIX for Dummies Questions & Answers

split one column into multiple columns

Discussion started by: zaneded

10. Web Development

split the fields in a column into 3 columns

Discussion started by: rakshit