Convert rows to column and add header


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Convert rows to column and add header
# 1  
Old 10-10-2014
Convert rows to column and add header

Hi,

I need help to convert rows in input file into a table.

inputfile

Code:
192.98.1   192.98.192.98.17    VVC family                            Zorro    10
192.98.1   192.98.192.98.17    VVC family                            Ace      1
192.98.1   192.98.192.98.17    VVC family                            Bora     1
192.98.1   192.98.192.98.17    VVC family                            Sakura   5
12.A.4     12.A.4.10.30        channel2 family                       Usopun   1
7.A3.14    7.A3.14.3.1         DuanXon channel family                T-Law    1
7.A3.14    7.A3.14.3.1         DuanXon channel family                Robyn    1
7.A3.14    7.A3.14.3.1         DuanXon channel family                Zorro    1
7.A3.14    7.A3.14.3.1         DuanXon channel family                Ace      1
7.A3.14    7.A3.14.3.1         DuanXon channel family                Bora     1
7.A3.14    7.A3.14.3.1         DuanXon channel family                Sakura   1
7.A3.14    7.A3.14.3.1         DuanXon channel family                Hashir   1
8.M.14     8.M.14.1            potential receptor, channel family    Robyn    1
8.M.14     8.M.14.1.2          potential receptor, channel family    Usopun   1
8.M.14     8.M.14.1.3          potential receptor, channel family    T-Law    1
8.M.14     8.M.14.1.3          potential receptor, channel family    Zorro    2
8.M.14     8.M.14.1.3          potential receptor, channel family    Ace      4
8.M.14     8.M.14.1.3          potential receptor, channel family    Bora     1
8.M.14     8.M.14.1.3          potential receptor, channel family    Sakura   2
1.P.5      1.P.5.18.1          major intrinsic family                Ace      8
1.P.5      1.P.5.18.3          major intrinsic family                Sakura   1
1.P.5      1.P.5.6.4           major intrinsic family                T-Law    1
1.P.5      1.P.5.6.4           major intrinsic family                Robyn    6
1.P.5      1.P.5.6.4           major intrinsic family                Sakura   1

The expected output file (sample) as below:-

Code:
Fam        No                  Name                               Ace     Bora    Hashir    Robyn    Sakura    T-Law    Usopun    Zorro
192.98.1   192.98.192.98.17   VVC family                          1       1       null      null       5       null      null       10
12.A.4     12.A.4.10.30       channel2 family                     null    null    null      null      null     null      1        null
7.A3.14    7.A3.14.3.1        DuanXon channel family              1       1         1       1         1         1        null        1
8.M.14     8.M.14.1           potential receptor, channel family  null    null    null      1         null     null      null     null
8.M.14     8.M.14.1.2         potential receptor, channel family  null    null    null      null      null     null      1        null
8.M.14     8.M.14.1.3         potential receptor, channel family  4       1       null      null      2          1       null       2
1.P.5      1.P.5.18.1         major intrinsic family              8       null    null      null      null     null      null     null
1.P.5      1.P.5.18.3         major intrinsic family              null    null    null      null       1       null      null     null
1.P.5      1.P.5.6.4          major intrinsic family              null    null    null      6          1         1       null     null

I need to add header and arrange the info in inputfile by distributing the value in $5 (inputfile) accordingly. The distribution is based on $2. If there is no value in any $4 to $11 (from $5 in inputfile), those columns should be printed with "null" as above.
I tried some codes that i got from this forum plus awk online documentation but failed to get what i wanted. one of the script that i tried as follows:-

Code:
awk -F"\t" '{a[$2]=a[$2]?a[$2]FS$4$5:$2FS$4$5} END{for(i in a) print a[i]}' inputfile

This problem is very complicated to me. Hope my description is clear and really appreciate if somebody can help me to solve this using awk. Thanks.
# 2  
Old 10-10-2014
Would an adaption of this help?
# 3  
Old 10-10-2014
How about this ?

Code:
 awk 'NR==FNR{a[$(NF-1)]=$(NF-1);next}
{if (FNR==1) { asort(a) ; printf "Fam\t\tNo\t\tName\t" ; for ( j in a ) { printf a[j] FS }} if ( !b[$1,$2] ) { if ( FNR>1) { for(j in a) {if ( p[a[j]] ) { printf OFS p[a[j]] } else {printf OFS "NULL"} } delete p;} printf "\n"; b[$1,$2]++; printf $1 OFS $2 OFS ;for (i=3;i<=NF-2;i++) { printf $i FS  } ; } if (FNR==1) { asort(a) } ; for ( j in a ) { if ( a[j] == $(NF-1) ) { p[a[j]]=$NF;} }} END {  for(j in a) {if ( p[a[j]] ) { printf OFS p[a[j]] } else {printf OFS "NULL" } } printf "\n";}' OFS="|" testFile testFile

This User Gave Thanks to pravin27 For This Post:
# 4  
Old 10-10-2014
Try

Input
Code:
[akshay@nio tmp]$ cat file
192.98.1   192.98.192.98.17    VVC family                            Zorro    10
192.98.1   192.98.192.98.17    VVC family                            Ace      1
192.98.1   192.98.192.98.17    VVC family                            Bora     1
192.98.1   192.98.192.98.17    VVC family                            Sakura   5
12.A.4     12.A.4.10.30        channel2 family                       Usopun   1
7.A3.14    7.A3.14.3.1         DuanXon channel family                T-Law    1
7.A3.14    7.A3.14.3.1         DuanXon channel family                Robyn    1
7.A3.14    7.A3.14.3.1         DuanXon channel family                Zorro    1
7.A3.14    7.A3.14.3.1         DuanXon channel family                Ace      1
7.A3.14    7.A3.14.3.1         DuanXon channel family                Bora     1
7.A3.14    7.A3.14.3.1         DuanXon channel family                Sakura   1
7.A3.14    7.A3.14.3.1         DuanXon channel family                Hashir   1
8.M.14     8.M.14.1            potential receptor, channel family    Robyn    1
8.M.14     8.M.14.1.2          potential receptor, channel family    Usopun   1
8.M.14     8.M.14.1.3          potential receptor, channel family    T-Law    1
8.M.14     8.M.14.1.3          potential receptor, channel family    Zorro    2
8.M.14     8.M.14.1.3          potential receptor, channel family    Ace      4
8.M.14     8.M.14.1.3          potential receptor, channel family    Bora     1
8.M.14     8.M.14.1.3          potential receptor, channel family    Sakura   2
1.P.5      1.P.5.18.1          major intrinsic family                Ace      8
1.P.5      1.P.5.18.3          major intrinsic family                Sakura   1
1.P.5      1.P.5.6.4           major intrinsic family                T-Law    1
1.P.5      1.P.5.6.4           major intrinsic family                Robyn    6
1.P.5      1.P.5.6.4           major intrinsic family                Sakura   1

Excel version

Code:
awk '
	{
		# Field 1,2,4 and 5
		f1 = $1  
		f2 = $2 
		f4 = $(NF -1)
		f5 = $NF
				
		# This is for f3
		# We can also do this like $1 = $2 = $(NF -1) = $NF ="" ; f3 = $0
		# But there will be OFS issue, you have set OFS in END block I guess for proper formating

		split($0,d,/[^[:space:]]*/)
		for(i=3; i<=NF-2; i++)
		{
			f3 = sprintf("%s%s%s",f3,d[i],$i)
		}
		
		# Remove Leading and trailing space in f3
		gsub(/^[ \t]+|[ \t]+$/,"",f3)	

		# Index
		key = f1 SUBSEP f2 SUBSEP f3		

		# Uniq index
		D[key]
		
		# Hash with value
		N[key,f4] = f5

		# Uniq User
		U[f4] 
		
		# reset variable
		key = f3 = "" 
	}
	END{
		# Sort User array
		n = asorti(U,copy)		
	
		# Loop through uniq index
		for(key in D)
		{
			# Split key
			split(key,S,SUBSEP)

			# Loop through uniq user
			for(i=1;i<=n;i++)
			{
			
				# User
				u = copy[i]

				# If header not printed then
				if(!header)
				{
					# Create header
					hdr = sprintf("%s%s%s",hdr,OFS,u)
				}			
				
				# index we are looking for
				ind = S[1] SUBSEP S[2] SUBSEP S[3] SUBSEP u

				# if key exists and use that value else string null
				val = (ind in N)? N[ind] : "null"
				
				# Save value in variable
				str = sprintf("%s%s%s",str,OFS,val)
			}
			
			# write header if header not set
			if(!header)
			{
				print "Fam","No","Name" hdr
				
				# set flag header
				header = 1
			}
			
			# Print values
			# Quote is added incase if you want to open them in excel as 
			# there are comma inside the field
			print S[1],S[2],"\""S[3]"\"" str
			str = ""
		}
	}
    ' 	OFS=","  file


Resulting

Code:
Fam,No,Name,Ace,Bora,Hashir,Robyn,Sakura,T-Law,Usopun,Zorro
12.A.4,12.A.4.10.30,"channel2 family",null,null,null,null,null,null,1,null
1.P.5,1.P.5.6.4,"major intrinsic family",null,null,null,6,1,1,null,null
1.P.5,1.P.5.18.3,"major intrinsic family",null,null,null,null,1,null,null,null
8.M.14,8.M.14.1.3,"potential receptor, channel family",4,1,null,null,2,1,null,2
8.M.14,8.M.14.1,"potential receptor, channel family",null,null,null,1,null,null,null,null
8.M.14,8.M.14.1.2,"potential receptor, channel family",null,null,null,null,null,null,1,null
7.A3.14,7.A3.14.3.1,"DuanXon channel family",1,1,1,1,1,1,null,1
1.P.5,1.P.5.18.1,"major intrinsic family",8,null,null,null,null,null,null,null
192.98.1,192.98.192.98.17,"VVC family",1,1,null,null,5,null,null,10

Fixed Width don't blame me if not opening properly on excel
Code:
awk '
	{
		split($0,d,/[^[:space:]]*/)
		for(i=1; i<=NF-2; i++)
		{
			key = sprintf("%s%s%s",key,d[i],$i)
		}
			D[key]
			N[key,$(NF-1)] = $NF; 
			U[$(NF-1)] 
		key = "" 
	}
	END{
		n = asorti(U,copy)
		for(key in D)
		{
			split(key,S,SUBSEP)
			for(i=1;i<=n;i++)
			{
			
				u = copy[i]

				if(!header)
				{
					hdr = sprintf("%3s%s%3s",hdr,"\t",u)
				}
				val = ((S[1],u) in N)? N[S[1],u] : "null"
				str = sprintf("%3s%s%3s",str,"\t",val)
			}
			if(!header)
			{
				printf("%-100s%s\n",sprintf("%-15s%-25s%s","Fam","No","Name"),hdr);
				header = 1
			}
			printf("%-100s%s\n", S[1],str)
			str = ""
		}
	}
    '  file

Resulting
Code:
Fam            No                       Name                                                           	Ace	Bora	Hashir	Robyn	Sakura	T-Law	Usopun	Zorro
1.P.5      1.P.5.18.3          major intrinsic family                                                  	null	null	null	null	  1	null	null	null
8.M.14     8.M.14.1.3          potential receptor, channel family                                      	  4	  1	null	null	  2	  1	null	  2
8.M.14     8.M.14.1            potential receptor, channel family                                      	null	null	null	  1	null	null	null	null
192.98.1   192.98.192.98.17    VVC family                                                              	  1	  1	null	null	  5	null	null	 10
1.P.5      1.P.5.18.1          major intrinsic family                                                  	  8	null	null	null	null	null	null	null
1.P.5      1.P.5.6.4           major intrinsic family                                                  	null	null	null	  6	  1	  1	null	null
12.A.4     12.A.4.10.30        channel2 family                                                         	null	null	null	null	null	null	  1	null
8.M.14     8.M.14.1.2          potential receptor, channel family                                      	null	null	null	null	null	null	  1	null
7.A3.14    7.A3.14.3.1         DuanXon channel family                                                  	  1	  1	  1	  1	  1	  1	null	  1


Last edited by Akshay Hegde; 10-10-2014 at 11:08 AM.. Reason: comment edit
This User Gave Thanks to Akshay Hegde For This Post:
# 5  
Old 10-10-2014
Quote:
Originally Posted by pravin27
How about this ?

Code:
 awk 'NR==FNR{a[$(NF-1)]=$(NF-1);next}
{if (FNR==1) { asort(a) ; printf "Fam\t\tNo\t\tName\t" ; for ( j in a ) { printf a[j] FS }} if ( !b[$1,$2] ) { if ( FNR>1) { for(j in a) {if ( p[a[j]] ) { printf OFS p[a[j]] } else {printf OFS "NULL"} } delete p;} printf "\n"; b[$1,$2]++; printf $1 OFS $2 OFS ;for (i=3;i<=NF-2;i++) { printf $i FS  } ; } if (FNR==1) { asort(a) } ; for ( j in a ) { if ( a[j] == $(NF-1) ) { p[a[j]]=$NF;} }} END {  for(j in a) {if ( p[a[j]] ) { printf OFS p[a[j]] } else {printf OFS "NULL" } } printf "\n";}' OFS="|" testFile testFile

Hi Pravin27,

Thanks a lot! It really works perfectly on my real data. New things in the codes that interest me like asort function. I am wondering about the input file being written twice there too. Anyways, i will study your code first and come back if i don't understand it. Smilie

---------- Post updated at 11:08 AM ---------- Previous update was at 11:02 AM ----------

Quote:
Originally Posted by Akshay Hegde
Try

Input
Code:
[akshay@nio tmp]$ cat file
192.98.1   192.98.192.98.17    VVC family                            Zorro    10
192.98.1   192.98.192.98.17    VVC family                            Ace      1
192.98.1   192.98.192.98.17    VVC family                            Bora     1
192.98.1   192.98.192.98.17    VVC family                            Sakura   5
12.A.4     12.A.4.10.30        channel2 family                       Usopun   1
7.A3.14    7.A3.14.3.1         DuanXon channel family                T-Law    1
7.A3.14    7.A3.14.3.1         DuanXon channel family                Robyn    1
7.A3.14    7.A3.14.3.1         DuanXon channel family                Zorro    1
7.A3.14    7.A3.14.3.1         DuanXon channel family                Ace      1
7.A3.14    7.A3.14.3.1         DuanXon channel family                Bora     1
7.A3.14    7.A3.14.3.1         DuanXon channel family                Sakura   1
7.A3.14    7.A3.14.3.1         DuanXon channel family                Hashir   1
8.M.14     8.M.14.1            potential receptor, channel family    Robyn    1
8.M.14     8.M.14.1.2          potential receptor, channel family    Usopun   1
8.M.14     8.M.14.1.3          potential receptor, channel family    T-Law    1
8.M.14     8.M.14.1.3          potential receptor, channel family    Zorro    2
8.M.14     8.M.14.1.3          potential receptor, channel family    Ace      4
8.M.14     8.M.14.1.3          potential receptor, channel family    Bora     1
8.M.14     8.M.14.1.3          potential receptor, channel family    Sakura   2
1.P.5      1.P.5.18.1          major intrinsic family                Ace      8
1.P.5      1.P.5.18.3          major intrinsic family                Sakura   1
1.P.5      1.P.5.6.4           major intrinsic family                T-Law    1
1.P.5      1.P.5.6.4           major intrinsic family                Robyn    6
1.P.5      1.P.5.6.4           major intrinsic family                Sakura   1

Excel version

Code:
awk '
    {
        # Field 1,2,4 and 5
        f1 = $1  
        f2 = $2 
        f4 = $(NF -1)
        f5 = $NF
                
        # This is for f3
        # We can also do this like $1 = $2 = $(NF -1) = $NF ="" ; f3 = $0
        # But there will be OFS issue, you have set OFS in END block I guess for proper formating

        split($0,d,/[^[:space:]]*/)
        for(i=3; i<=NF-2; i++)
        {
            f3 = sprintf("%s%s%s",f3,d[i],$i)
        }
        
        # Remove Leading and trailing space in f3
        gsub(/^[ \t]+|[ \t]+$/,"",f3)    

        # Index
        key = f1 SUBSEP f2 SUBSEP f3        

        # Uniq index
        D[key]
        
        # Hash with value
        N[key,f4] = f5

        # Uniq User
        U[f4] 
        
        # reset variable
        key = f3 = "" 
    }
    END{
        # Sort User array
        n = asorti(U,copy)        
    
        # Loop through uniq index
        for(key in D)
        {
            # Split key
            split(key,S,SUBSEP)

            # Loop through uniq user
            for(i=1;i<=n;i++)
            {
            
                # User
                u = copy[i]

                # If header not printed then
                if(!header)
                {
                    # Create header
                    hdr = sprintf("%s%s%s",hdr,OFS,u)
                }            
                
                # index we are looking for
                ind = S[1] SUBSEP S[2] SUBSEP S[3] SUBSEP u

                # if key exists and use that value else string null
                val = (ind in N)? N[ind] : "null"
                
                # Save value in variable
                str = sprintf("%s%s%s",str,OFS,val)
            }
            
            # write header if header not set
            if(!header)
            {
                print "Fam","No","Name" hdr
                
                # set flag header
                header = 1
            }
            
            # Print values
            # Quote is added incase if you want to open them in excel as 
            # there are comma inside the field
            print S[1],S[2],"\""S[3]"\"" str
            str = ""
        }
    }
    '     OFS=","  file


Resulting

Code:
Fam,No,Name,Ace,Bora,Hashir,Robyn,Sakura,T-Law,Usopun,Zorro
12.A.4,12.A.4.10.30,"channel2 family",null,null,null,null,null,null,1,null
1.P.5,1.P.5.6.4,"major intrinsic family",null,null,null,6,1,1,null,null
1.P.5,1.P.5.18.3,"major intrinsic family",null,null,null,null,1,null,null,null
8.M.14,8.M.14.1.3,"potential receptor, channel family",4,1,null,null,2,1,null,2
8.M.14,8.M.14.1,"potential receptor, channel family",null,null,null,1,null,null,null,null
8.M.14,8.M.14.1.2,"potential receptor, channel family",null,null,null,null,null,null,1,null
7.A3.14,7.A3.14.3.1,"DuanXon channel family",1,1,1,1,1,1,null,1
1.P.5,1.P.5.18.1,"major intrinsic family",8,null,null,null,null,null,null,null
192.98.1,192.98.192.98.17,"VVC family",1,1,null,null,5,null,null,10

Fixed Width don't blame me if not opening properly on excel
Code:
awk '
    {
        split($0,d,/[^[:space:]]*/)
        for(i=1; i<=NF-2; i++)
        {
            key = sprintf("%s%s%s",key,d[i],$i)
        }
            D[key]
            N[key,$(NF-1)] = $NF; 
            U[$(NF-1)] 
        key = "" 
    }
    END{
        n = asorti(U,copy)
        for(key in D)
        {
            split(key,S,SUBSEP)
            for(i=1;i<=n;i++)
            {
            
                u = copy[i]

                if(!header)
                {
                    hdr = sprintf("%3s%s%3s",hdr,"\t",u)
                }
                val = ((S[1],u) in N)? N[S[1],u] : "null"
                str = sprintf("%3s%s%3s",str,"\t",val)
            }
            if(!header)
            {
                printf("%-100s%s\n",sprintf("%-15s%-25s%s","Fam","No","Name"),hdr);
                header = 1
            }
            printf("%-100s%s\n", S[1],str)
            str = ""
        }
    }
    '  file

Resulting
Code:
Fam            No                       Name                                                               Ace    Bora    Hashir    Robyn    Sakura    T-Law    Usopun    Zorro
1.P.5      1.P.5.18.3          major intrinsic family                                                      null    null    null    null      1    null    null    null
8.M.14     8.M.14.1.3          potential receptor, channel family                                            4      1    null    null      2      1    null      2
8.M.14     8.M.14.1            potential receptor, channel family                                          null    null    null      1    null    null    null    null
192.98.1   192.98.192.98.17    VVC family                                                                    1      1    null    null      5    null    null     10
1.P.5      1.P.5.18.1          major intrinsic family                                                        8    null    null    null    null    null    null    null
1.P.5      1.P.5.6.4           major intrinsic family                                                      null    null    null      6      1      1    null    null
12.A.4     12.A.4.10.30        channel2 family                                                             null    null    null    null    null    null      1    null
8.M.14     8.M.14.1.2          potential receptor, channel family                                          null    null    null    null    null    null      1    null
7.A3.14    7.A3.14.3.1         DuanXon channel family                                                        1      1      1      1      1      1    null      1

Hi Akshay Hegde,

Your codes worked awesome!! Many new things in the codes and i am really2 appreciate your clear explanation for each step. Thanks a million!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Convert rows into column along with header

Hi, I have a requirement to format the data in a new order. Here is my source format : ppp ***Wed Dec 16 10:32:30 GMT 2015 header1 header2 header3 header4 header5 server1 0.00 0.02 0.07 0.98 server2 0.01 0.00 0.08 0.79 server3 0.05 0.82 0.77 0.86 ... (18 Replies)
Discussion started by: john_prince
18 Replies

2. Shell Programming and Scripting

Convert Column data values to rows

Hi all , I have a file with the below content Header Section employee|employee name||Job description|Job code|Unitcode|Account|geography|C1|C2|C3|C4|C5|C6|C7|C8|C9|Csource|Oct|Nov|Dec|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep Data section ... (1 Reply)
Discussion started by: Hypesslearner
1 Replies

3. Shell Programming and Scripting

Convert header rows into

I want to put the 3 first lines into a single line separated by ; I've tried to use Sed and Awk but without success. I'm new to Shell scripting. Thanks in advance! Input 112 DESAC_201309_OR_DJ10 DJ10 1234567890123;8 1234567890124;20 1234567890125;3 expected Output... (8 Replies)
Discussion started by: MoroccanRoll
8 Replies

4. Shell Programming and Scripting

Convert Rows into Column

Hi Experts, I have a requirement to convert rows into columns. For e.g. Input File: Output File should be like Appreciate if you could suggest code snippet(may be awk) for above requirement... Thanks in Advance for your help... (3 Replies)
Discussion started by: sai_2507
3 Replies

5. Shell Programming and Scripting

Add column header and row header

Hi, I have an input like this 1 2 3 4 2 3 4 5 4 5 6 7 I would like to count the no. of columns and print a header with a prefix "Col". I would also like to count the no. of rows and print as first column with each line number with a prefix "Row" So, my output would be ... (2 Replies)
Discussion started by: jacobs.smith
2 Replies

6. Shell Programming and Scripting

convert columns into rows with respect to first column

Hello All, Please help me with this file. My input file (Tab separated) is like: Abc-01 pc1 -0.69 Abc-01 E2cR 0.459666666666667 Abc-01 5ez.2 1.2265625 Xyz-01 pc1 -0.153 Xyz-01 E2cR 1.7358 Xyz-01 5ez.2 2.0254 Ced-02 pc1 -0.5714 Ced-02 ... (7 Replies)
Discussion started by: mira
7 Replies

7. Shell Programming and Scripting

Convert rows into column groups

Hi I have the text file like this "A" "AA Info" "AA Text" "AAA" "ABC" "ABC Info" "ABC Tech" "AGH" "SYN" "SYMBony" "SYN BEREN" Like about 2000 lines Output would be in Column with groups like following "A" "AA Info", "AA Text" "AAA" "ABC","ABC Info","ABC Tech" (0 Replies)
Discussion started by: selvanraj
0 Replies

8. Shell Programming and Scripting

Convert Column to rows

Hi, I have a file with below contents. Heading1 Heading2 Heading3 Heading4 Value1 Value2 Value3 Value4 The file has only 2 rows and is tab separated The desired output is : Heading1 Value1 Heading2 Value2 Heading3 Value3 Heading4 Value4 CAn you please help? (5 Replies)
Discussion started by: kaponeh
5 Replies

9. Shell Programming and Scripting

convert rows into column

if u have a data 2 4 6 8 5 4 4 5 6 then result shud be like 2 4 6 7 5 4 4 5 6 (3 Replies)
Discussion started by: cdfd123
3 Replies

10. Shell Programming and Scripting

Convert Header into Column in Text file

Hi Gurus, I have a requirement like this and have to create a UX shell scripts. Thanks in advance. File-in: ------ Header2007-12-012007-11-21 100|xyz|was 101|wsa|qws ...... ....... Output should be: ------------------- 2007-12-01|100|xyz|was 2007-12-01|101|wsa|qws ...... .......... (7 Replies)
Discussion started by: vsubbu1000
7 Replies
Login or Register to Ask a Question