Transpose data from columns to lines for each event

01-13-2009

Registered User

290, 37

Join Date: Jan 2009

Last Activity: 28 June 2018, 4:18 PM EDT

Location: Tegucigalpa, Honduras

Posts: 290

Thanks Given: 8

Thanked 37 Times in 36 Posts

Transpose data from columns to lines for each event

Hi everyone,

Maybe somebody could help me with this.

I have a text file showing in 2 columns registers of services used by customers in a comercial place.

The register for the use of any particular service begins with "EVENT" in column 1.
I would like to transpose the info for each block in one line. I mean, the different
words in column 1 will appear only once like a header, and the data in column 2
will appear below of its respective column in 1 line only.

Source file. (Not all blocks have the same registers in column 1, some have more than others)

Code:

EVENT                                
INTERNET CONNECTION                  
Date                       11/01/2009
Initial hour               07:30     
Number of users            27        
Average of use             32 min    
Final hour                 19:00     
 
EVENT                                
LOCAL CALL                           
Date                       11/01/2009
Initial hour               07:42     
Number of users            15        
Average of use             7 min     
Final hour                 16:11     
 
EVENT                                
INTERNATIONAL CALL                   
Date                       11/01/2009
Initial hour               09:14     
Number of users            21        
Average of use             5 min     
Final hour                 16:17     
 
EVENT                                
PRINTER USE                          
Date                       12/01/2009
Initial hour               07:30     
Number of users            23        
Average of pages printed   17        
Final hour                 19:00

I would like to tabulate it as follow

Code:

 
EVENT                   Date    Initial hour Number of users Average of use Average of pages printed  Final hour
INTERNET CONNECTION  11/01/2009    07:30             27          32 min                                   19:00   
LOCAL CALL           11/01/2009    07:42             15           7 min                                   16:11   
INTERNATIONAL CALL   11/01/2009    09:14             21           5 min                                   16:17   
PRINTER USE          12/01/2009    07:30             23                                  17               19:00

So far I know that If I use:

Code:

 
awk '/INTERNET CONNECTION/ { getline; print $2}' Input_1.txt
awk '/LOCAL CALL/ { getline; print $2}' Input_1.txt
awk '/INTERNATIONAL CALL/ { getline; print $2}' Input_1.txt
awk '/PRINTER USE/ { getline; print $2}' Input_1.txt

the result is the date (in column 2) for each register
11/01/2009
11/01/2009
11/01/2009
12/01/2009

But how can I follow to get want I want (the other columns below the respective header).

Thanks in advance for any help.

Best regards.

cgkmal

View Public Profile for cgkmal

Find all posts by cgkmal

01-13-2009

Registered User

125, 0

Join Date: Jul 2008

Last Activity: 17 April 2009, 10:44 PM EDT

Location: Philippines

Posts: 125

Thanks Given: 0

Thanked 0 Times in 0 Posts

Hi cgkmal,

Do you expect the file to contain a dynamic list of headers? Or will the headers be fixed? meaning it will always have the same header every time?

If it is fixed then we could just hard code the header part as well as other parts of the table that are fixed (ie. values under the EVENT column)

It would be helpful if you could identify this fixed variables of the table (if there are any).

angheloko

View Public Profile for angheloko

Find all posts by angheloko

01-13-2009

Registered User

290, 37

Join Date: Jan 2009

Last Activity: 28 June 2018, 4:18 PM EDT

Location: Tegucigalpa, Honduras

Posts: 290

Thanks Given: 8

Thanked 37 Times in 36 Posts

angheloko,

Thanks for your answer. Yes, the words in column 1 that will become in headers, are always the same. The only this is that some Event blocks have less than others event blocks. I mean, not all blocks will have value
below some headers.

Many thanks for the help you can give me.

cgkmal

View Public Profile for cgkmal

Find all posts by cgkmal

01-14-2009

Registered User

125, 0

Join Date: Jul 2008

Last Activity: 17 April 2009, 10:44 PM EDT

Location: Philippines

Posts: 125

Thanks Given: 0

Thanked 0 Times in 0 Posts

Hi,

Sorry for the delay. I was pretty busy today...doing documentation (oh the pain! every programmer's nightmare). Anyway, here's a very crude implementation:

Code:

# Extract data only and separate into files
sed 's/^EVENT//g;s/^[A-Z][A-Z][A-Z]*/EVENT  &/g;/^ *$/d;s/ *$//g;s/   */|/g' input.txt > input2.txt
grep -n EVENT input2.txt | cut -d: -f1 | while read X; do
        START=$X
        ((END=X+5))
        sed -n "${START},${END}p" input2.txt > input2.txt.$X
done

# Compose the headers
sed 's/   */|/g;/^ *$/d' input.txt | awk -F"|" ' { print $1 } ' | sort | uniq -ud > headers.txt
grep -v [A-Z][A-Z][A-Z]* headers.txt > colheaders.txt
grep [A-Z][A-Z][A-Z]* headers.txt | sed '/^EVENT/d' > rowheaders.txt

# Create the unformatted output
LINE="EVENT     "`cat colheaders.txt | tr "\n" "\t"`
echo "$LINE" > output.txt
cat rowheaders.txt | while read X; do
        echo "---"
        echo "Row: $X"
        FILE=`grep "$X" input2.txt.* | cut -d: -f1`
        echo "File: $FILE"
        LINE="$X"
        cat colheaders.txt | while read Y; do
                echo "$Y"
                LINE="$LINE     "`awk ' BEGIN { FS="|" } $1==key { print $field } ' key="$Y" field=2 $FILE`
        done
        echo ">> $LINE"
        echo "$LINE" >> output.txt
done

# Make it pretty
awk '
BEGIN { FS="\t" }

{ printf ("%-22s%-11s%-13s%-16s%-15s%-26s%-10s\n", $1, $4, $6, $7, $3, $2, $5) }

' output.txt > output2.txt

Basically,
the input file in input.txt
the output file is output2.txt

and some temporary files - input2.txt*, headers.txt, colheaders.txt, rowheaders.txt (just rm them in the end of the script)

Anyway, had to rush it so I know that there could be other more simpler solutions but here you go... Try it yourself...

My output:

Code:

EVENT                 Date       Initial hour Number of users Average of use Average of pages printed  Final hour
INTERNATIONAL CALL    11/01/2009 09:14        21              5 min                                    16:17
INTERNET CONNECTION   11/01/2009 07:30        27              32 min                                   19:00
LOCAL CALL            11/01/2009 07:42        15              7 min                                    16:11
PRINTER USE           12/01/2009 07:30        23                             17                        19:00

angheloko

View Public Profile for angheloko

Find all posts by angheloko

01-14-2009

Registered User

1,305, 26

Join Date: Jun 2007

Last Activity: 11 November 2016, 3:44 AM EST

Location: Beijing China

Posts: 1,305

Thanks Given: 0

Thanked 26 Times in 26 Posts

hi, a little difficult, hope can helop you.

1> convert your file into strict two column files

Code:

cvt.sh
sed -n '/EVENT/{
h
N
h
x
s/\n//
p
}
/EVENT/ !{
p
}' yourfile

2> use below perl script to process it

Code:

sub _exist{
	my($ref,$value)=(@_);
	my @arr=@{$ref};
	for(my $i=0;$i<=$#arr;$i++){
		return 1 if $arr[$i] eq $value;
	}
	return 0;
}
$/="\n\n";
open FH,"sh cvt.sh|";
my (%res,$n,@seq);
while(<FH>){
	my @arr=split("\n",$_);
	foreach(@arr){
		my @tmp=split(/  +/,$_);
		$res{$.}->{$tmp[0]}=$tmp[1];
		push @seq,$tmp[0] if (_exist(\@seq,$tmp[0])==0);
	}
	$n++;
}
close FH;
print ((join "        ",@seq),"\n");
map { printf("%20s",$_) } @sep;
for($i=1;$i<=$n;$i++){
	my %hash=%{$res{$i}};
	map {printf("%20s",$hash{$_})} @seq;
	print "\n";
}

output:

Code:

EVENT        Date        Initial hour        Number of users        Average of use        Final hour        Average of pages printed
 INTERNET CONNECTION          11/01/2009               07:30                  27              32 min               19:00     
          LOCAL CALL          11/01/2009               07:42                  15               7 min               16:11     
  INTERNATIONAL CALL          11/01/2009               09:14                  21               5 min               16:17     
         PRINTER USE          12/01/2009               07:30                  23                                   19:00                  17

summer_cherry

View Public Profile for summer_cherry

Find all posts by summer_cherry

01-14-2009

Registered User

290, 37

Join Date: Jan 2009

Last Activity: 28 June 2018, 4:18 PM EDT

Location: Tegucigalpa, Honduras

Posts: 290

Thanks Given: 8

Thanked 37 Times in 36 Posts

Hello angheloko and summer_cherry,

Many thanks for take some of your time to help me. I got some errors testing your codes. Explanations below.

angheloko,

May you help saying what I�m doing wrong or how did you send the scripts? because I see it works for you, and for me doesn�t show the complete answer.

If a leave only input.txt and your script within a folder and run it step by step the behaviour is as follow:

(1)
If I run the first script (# Extract data only and separate into files) looks good so far and generates:

input2.txt (adding some pipes)
input2.txt.10
input2.txt.18
input2.txt.2
input2.txt.26

(2)
If I run the 2nd script (# Compose the headers) looks good so far and generates:

colheaders.txt
headers.txt
rowheaders.txt

(3)
If I run the 3rd script (# Create the unformatted output) looks good so far and generates:

output.txt -->(looks like transpose column into line, but appears some squares in the format)

example:

Code:

	input.txt:INTERNET CONNECTION

	input.txt:Date

(4)
If I run the 4th script (# Make it pretty) generates:

output2.txt

and only shows

Code:

EVENT     input.txt:EVENT                                
input.txt:Initial hour               07:30     
input.txt:Average of use             32 min    
input.txt:Final hour                 19:00     
input.txt:Date                       11/01/2009
input.txt:INTERNET CONNECTION                  
input.txt:Number of users            27

(5)
If I run the complete script within a folder with other files in it I get an error.
(It�s not too relevant, I only isolated the files and ran it again, information only)

Code:

sed: "input.txt", line 30: warning: newline appended
sed: input.txt: cannot open [No such file or directory]
---
�►╚��S┌D�hrar:}' input.txt@t ��.n☺D☻☻j�{D─].:↔3       CDRs_1.pl*��↨
          I┤▬↕_Y∟G�W�╔►�►!�@#a-��╗�Ѭ��]�H�W�▲M�~ao▼═▄7▄�Z/▒/�
�z�%���"L�/��           ͰC-Q�~♥ffr?
File:        �/D*�K█�J┌b
CDRs_1.sh:cvt.sh
�►╚��S┌D�hr:}' input.txt@t ��.n☺D☻☻j�{D─].:↔3 CDRs_1.pl*��↨
          I┤▬↕_Y∟G�W�╔►�►!�@#a-��╗�Ѭ��]�H�W�▲M�~ao▼═▄7▄�Z/▒/�
�z�%���"L�/��           ͰC-Q�~♥ffr?
$            �/D*�K█�J┌b

summer_chery,

Thanks for your help, really. But I tryed to test it, the first part looks like work for me withot errors,
but when I try to send the second script I get

Code:

[root@trm72 cc]# ./script.pl
./script.pl: line 2: sub: command not found
./script.pl: line 3: syntax error near unexpected token `$ref,$value'
'/script.pl: line 3: `  my($ref,$value)=(@_);
[root@trm72 cc]#

May you help saying what I�m doing wrong or how did you send the scripts? because I see it works for you.

Thanks for your help again.

cgkmal

View Public Profile for cgkmal

Find all posts by cgkmal

01-14-2009

Registered User

125, 0

Join Date: Jul 2008

Last Activity: 17 April 2009, 10:44 PM EDT

Location: Philippines

Posts: 125

Thanks Given: 0

Thanked 0 Times in 0 Posts

Hi cgkmal,

Could you post the flat file (source file) and the outputs of the script (input2.txt, etc...) so we can isolate what part that caused the error.

In my case, when I run the script:

input.txt (source file):

Code:

EVENT
INTERNET CONNECTION
Date                       11/01/2009
Initial hour               07:30
Number of users            27
Average of use             32 min
Final hour                 19:00

EVENT
LOCAL CALL
Date                       11/01/2009
Initial hour               07:42
Number of users            15
Average of use             7 min
Final hour                 16:11

EVENT
INTERNATIONAL CALL
Date                       11/01/2009
Initial hour               09:14
Number of users            21
Average of use             5 min
Final hour                 16:17

EVENT
PRINTER USE
Date                       12/01/2009
Initial hour               07:30
Number of users            23
Average of pages printed   17
Final hour                 19:00

input2.txt (processed input.txt):

Code:

EVENT|INTERNET CONNECTION
Date|11/01/2009
Initial hour|07:30
Number of users|27
Average of use|32 min
Final hour|19:00
EVENT|LOCAL CALL
Date|11/01/2009
Initial hour|07:42
Number of users|15
Average of use|7 min
Final hour|16:11
EVENT|INTERNATIONAL CALL
Date|11/01/2009
Initial hour|09:14
Number of users|21
Average of use|5 min
Final hour|16:17
EVENT|PRINTER USE
Date|12/01/2009
Initial hour|07:30
Number of users|23
Average of pages printed|17
Final hour|19:00

input2.txt.(n) (separated records):

Code:

EVENT|INTERNET CONNECTION
Date|11/01/2009
Initial hour|07:30
Number of users|27
Average of use|32 min
Final hour|19:00

headers.txt (row and column headers):

Code:

Average of pages printed
Average of use
Date
EVENT
Final hour
INTERNATIONAL CALL
INTERNET CONNECTION
Initial hour
LOCAL CALL
Number of users
PRINTER USE

colheaders.txt (column headers):

Code:

Average of pages printed
Average of use
Date
Final hour
Initial hour
Number of users

rowheaders.txt (row headers):

Code:

INTERNATIONAL CALL
INTERNET CONNECTION
LOCAL CALL
PRINTER USE

output.txt (awk friendly table - tab-delimited):

Code:

EVENT   Average of pages printed        Average of use  Date    Final hour      Initial hour    Number of users
INTERNATIONAL CALL              5 min   11/01/2009      16:17   09:14   21
INTERNET CONNECTION             32 min  11/01/2009      19:00   07:30   27
LOCAL CALL              7 min   11/01/2009      16:11   07:42   15
PRINTER USE     17              12/01/2009      19:00   07:30   23

output2.txt (formatted output):

Code:

EVENT                 Date       Initial hour Number of users Average of use Average of pages printed  Final hour
INTERNATIONAL CALL    11/01/2009 09:14        21              5 min                                    16:17
INTERNET CONNECTION   11/01/2009 07:30        27              32 min                                   19:00
LOCAL CALL            11/01/2009 07:42        15              7 min                                    16:11
PRINTER USE           12/01/2009 07:30        23                             17                        19:00

Screen looks like this while running the script:

Code:

---
Row: INTERNATIONAL CALL
File: input2.txt.13
Average of pages printed
Average of use
Date
Final hour
Initial hour
Number of users
>> INTERNATIONAL CALL           5 min   11/01/2009      16:17   09:14   21
---
Row: INTERNET CONNECTION
File: input2.txt.1
Average of pages printed
Average of use
Date
Final hour
Initial hour
Number of users
>> INTERNET CONNECTION          32 min  11/01/2009      19:00   07:30   27
---
Row: LOCAL CALL
File: input2.txt.7
Average of pages printed
Average of use
Date
Final hour
Initial hour
Number of users
>> LOCAL CALL           7 min   11/01/2009      16:11   07:42   15
---
Row: PRINTER USE
File: input2.txt.19
Average of pages printed
Average of use
Date
Final hour
Initial hour
Number of users
>> PRINTER USE  17              12/01/2009      19:00   07:30   23

angheloko

View Public Profile for angheloko

Find all posts by angheloko

Shell Programming and Scripting

Transpose data from columns to lines for each event

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Transpose rows to certain columns

Discussion started by: rahman.ahmed

2. Shell Programming and Scripting

Transpose columns to row

Discussion started by: jiam912

3. Shell Programming and Scripting

Transpose comma delimited data in rows to columns

Discussion started by: gimley

4. Shell Programming and Scripting

Transpose lines from individual blocks to unique lines

Discussion started by: Ophiuchus

5. Shell Programming and Scripting

transpose selected columns

Discussion started by: quincyjones

6. Shell Programming and Scripting

Transpose Data from Columns to rows

Discussion started by: Mikes88

7. Shell Programming and Scripting

transpose rows to columns

Discussion started by: ux4me

8. Shell Programming and Scripting

Transpose columns to Rows

Discussion started by: aravindj80

9. Shell Programming and Scripting

Transpose Rows Into Columns

Discussion started by: spindoctor

10. Shell Programming and Scripting

Transpose columns to Rows : Big data

Discussion started by: genehunter