Unix/Linux Go Back    


Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

Help tabulating file putting repeated strings as headers

Shell Programming and Scripting


Tags
awk, parse, tabulate

Reply    
 
Thread Tools Search this Thread Display Modes
    #8  
Old Unix and Linux 04-15-2018   -   Original Discussion by Ophiuchus
Chubler_XL's Unix or Linux Image
Chubler_XL Chubler_XL is offline Forum Staff  
Moderator
 
Join Date: Oct 2010
Last Activity: 22 May 2018, 7:27 PM EDT
Posts: 3,527
Thanks: 154
Thanked 1,258 Times in 1,152 Posts
Update to rdrtx1's solution (in red) to cover the duplicate field(s) qualification.



Code:
awk '
BEGIN {lines=0; column_count=0}
$2 !~ /=/ || NF != 3 {next}
! column[$1]++ {columns[column_count++]=$1}
$1 ~ /^STAGE*$/ || $1 SUBSEP lines in column_data{lines++}
{column_data[$1, lines]=$3}
END {
   for (i=0; i<column_count; i++) if (columns[i]) printf columns[i] ((i<column_count-1) ? "|" : "\n")
   for (i=1; i <= lines; i++) {
      for (j=0; j < column_count; j++) {
         if (columns[j]) printf column_data[columns[j], i] ((j<column_count-1) ? "|" : "\n")
      }
   }
}
' infile

The Following User Says Thank You to Chubler_XL For This Useful Post:
Ophiuchus (04-16-2018)
Sponsored Links
    #9  
Old Unix and Linux 04-16-2018   -   Original Discussion by Ophiuchus
Ophiuchus's Unix or Linux Image
Ophiuchus Ophiuchus is offline
Registered User
 
Join Date: Oct 2011
Last Activity: 18 May 2018, 4:38 AM EDT
Posts: 286
Thanks: 50
Thanked 2 Times in 2 Posts
Quote:
Originally Posted by Chubler_XL View Post
Update to rdrtx1's solution (in red) to cover the duplicate field(s) qualification.
Excellent rdrtx1 for yout solution and Chubler_XL for the modification. With both solution it seems to work pretty nice even when in real file the strings have leading spaces.

Chubler_XL,

I see this solution has several arrays. Is there a tool to test and see what has stored each part of an awk program, the content of the arrays, etc?

Something similar to IRB(interactive Ruby) which we can test small parts of the ruby program. This in order to understand better the logic and how it works.

Thanks for help.
Sponsored Links
    #10  
Old Unix and Linux 04-16-2018   -   Original Discussion by Ophiuchus
Chubler_XL's Unix or Linux Image
Chubler_XL Chubler_XL is offline Forum Staff  
Moderator
 
Join Date: Oct 2010
Last Activity: 22 May 2018, 7:27 PM EDT
Posts: 3,527
Thanks: 154
Thanked 1,258 Times in 1,152 Posts
I'm not aware of any tool like that. You could simply print out the contents of the arrays in the END block like this:



Code:
awk '
BEGIN {lines=0; column_count=0}
$2 !~ /=/ || NF != 3 {next}
! column[$1]++ {columns[column_count++]=$1}
$1 ~ /^STAGE*$/ || $1 SUBSEP lines in column_data{lines++}
{column_data[$1, lines]=$3}
END {
   for (i in column) print "column[" i "]="column[i]
   for (i in columns) print "columns[" i "]="columns[i]
   for (i in column_data) {
     k=i
     gsub(SUBSEP,",",k)
     print "column_data[" k "]="column_data[i]
   }
}
' infile


result:

Code:
column[ADDR]=5
column[ISGALW]=5
column[RRUL]=5
column[TYPE]=4
...
columns[0]=STAGE
columns[1]=ID
columns[2]=NAME
columns[3]=TYPE
columns[4]=DFRUL
columns[5]=ADDR
...
column_data[NAME,6]=PPROOA
column_data[SPRR,1]=TRUE
column_data[ID,1]=0
column_data[SPRR,3]=FALSE
column_data[ID,2]=2
column_data[SPRR,4]=FALSE
...

You can see that:
column[] is index by the column name and contains a count of the number of rows the column appears in (less 1)
columns[] is indexed by the column order and contains the column name.
column_data[] is a 2D array with index of <column name>,<row> and contains value of the data for that cell.

column[] is only used to detect the first occurrence of a column and add it to columns[]

Last edited by Chubler_XL; 04-16-2018 at 07:53 PM..
The Following User Says Thank You to Chubler_XL For This Useful Post:
Ophiuchus (04-17-2018)
    #11  
Old Unix and Linux 04-17-2018   -   Original Discussion by Ophiuchus
Ophiuchus's Unix or Linux Image
Ophiuchus Ophiuchus is offline
Registered User
 
Join Date: Oct 2011
Last Activity: 18 May 2018, 4:38 AM EDT
Posts: 286
Thanks: 50
Thanked 2 Times in 2 Posts
Quote:
Originally Posted by Chubler_XL View Post
I'm not aware of any tool like that. You could simply print out the contents of the arrays in the END block like this:

You can see that:
column[] is index by the column name and contains a count of the number of rows the column appears in (less 1)
columns[] is indexed by the column order and contains the column name.
column_data[] is a 2D array with index of <column name>,<row> and contains value of the data for that cell.

column[] is only used to detect the first occurrence of a column and add it to columns[]
Excellent ChublerXL. Thanks for the lesson, specially that comes from someone that knows a lot. I'll play a little bit with this example of arrays.

Thanks to all.
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Joining ends of strings in certain order with repeated ID's verse123 UNIX for Dummies Questions & Answers 8 03-25-2014 12:29 PM
Find repeated word and take sum of the second field to it ,for all the repeated words in awk 100bees Shell Programming and Scripting 11 07-04-2013 01:41 AM
delete repeated strings (tags) in a line and concatenate corresponding words mjomba Shell Programming and Scripting 2 11-08-2010 04:15 AM
Merging of files with different headers to make combined headers file marut_ashu Shell Programming and Scripting 1 08-07-2009 04:10 AM
Deleting repeated strings in column 2 cgkmal Shell Programming and Scripting 5 05-26-2009 03:36 AM



All times are GMT -4. The time now is 03:21 AM.