Transpose data from columns to lines for each event
Hi everyone,
Maybe somebody could help me with this.
I have a text file showing in 2 columns registers of services used by customers in a comercial place.
The register for the use of any particular service begins with "EVENT" in column 1.
I would like to transpose the info for each block in one line. I mean, the different
words in column 1 will appear only once like a header, and the data in column 2
will appear below of its respective column in 1 line only.
Source file. (Not all blocks have the same registers in column 1, some have more than others)
Code:
EVENT
INTERNET CONNECTION
Date 11/01/2009
Initial hour 07:30
Number of users 27
Average of use 32 min
Final hour 19:00
EVENT
LOCAL CALL
Date 11/01/2009
Initial hour 07:42
Number of users 15
Average of use 7 min
Final hour 16:11
EVENT
INTERNATIONAL CALL
Date 11/01/2009
Initial hour 09:14
Number of users 21
Average of use 5 min
Final hour 16:17
EVENT
PRINTER USE
Date 12/01/2009
Initial hour 07:30
Number of users 23
Average of pages printed 17
Final hour 19:00
I would like to tabulate it as follow
Code:
EVENT Date Initial hour Number of users Average of use Average of pages printed Final hour
INTERNET CONNECTION 11/01/2009 07:30 27 32 min 19:00
LOCAL CALL 11/01/2009 07:42 15 7 min 16:11
INTERNATIONAL CALL 11/01/2009 09:14 21 5 min 16:17
PRINTER USE 12/01/2009 07:30 23 17 19:00
Thanks for your answer. Yes, the words in column 1 that will become in headers, are always the same. The only this is that some Event blocks have less than others event blocks. I mean, not all blocks will have value
below some headers.
Sorry for the delay. I was pretty busy today...doing documentation (oh the pain! every programmer's nightmare). Anyway, here's a very crude implementation:
Code:
# Extract data only and separate into files
sed 's/^EVENT//g;s/^[A-Z][A-Z][A-Z]*/EVENT &/g;/^ *$/d;s/ *$//g;s/ */|/g' input.txt > input2.txt
grep -n EVENT input2.txt | cut -d: -f1 | while read X; do
START=$X
((END=X+5))
sed -n "${START},${END}p" input2.txt > input2.txt.$X
done
# Compose the headers
sed 's/ */|/g;/^ *$/d' input.txt | awk -F"|" ' { print $1 } ' | sort | uniq -ud > headers.txt
grep -v [A-Z][A-Z][A-Z]* headers.txt > colheaders.txt
grep [A-Z][A-Z][A-Z]* headers.txt | sed '/^EVENT/d' > rowheaders.txt
# Create the unformatted output
LINE="EVENT "`cat colheaders.txt | tr "\n" "\t"`
echo "$LINE" > output.txt
cat rowheaders.txt | while read X; do
echo "---"
echo "Row: $X"
FILE=`grep "$X" input2.txt.* | cut -d: -f1`
echo "File: $FILE"
LINE="$X"
cat colheaders.txt | while read Y; do
echo "$Y"
LINE="$LINE "`awk ' BEGIN { FS="|" } $1==key { print $field } ' key="$Y" field=2 $FILE`
done
echo ">> $LINE"
echo "$LINE" >> output.txt
done
# Make it pretty
awk '
BEGIN { FS="\t" }
{ printf ("%-22s%-11s%-13s%-16s%-15s%-26s%-10s\n", $1, $4, $6, $7, $3, $2, $5) }
' output.txt > output2.txt
Basically,
the input file in input.txt
the output file is output2.txt
and some temporary files - input2.txt*, headers.txt, colheaders.txt, rowheaders.txt (just rm them in the end of the script)
Anyway, had to rush it so I know that there could be other more simpler solutions but here you go... Try it yourself...
My output:
Code:
EVENT Date Initial hour Number of users Average of use Average of pages printed Final hour
INTERNATIONAL CALL 11/01/2009 09:14 21 5 min 16:17
INTERNET CONNECTION 11/01/2009 07:30 27 32 min 19:00
LOCAL CALL 11/01/2009 07:42 15 7 min 16:11
PRINTER USE 12/01/2009 07:30 23 17 19:00
cvt.sh
sed -n '/EVENT/{
h
N
h
x
s/\n//
p
}
/EVENT/ !{
p
}' yourfile
2> use below perl script to process it
Code:
sub _exist{
my($ref,$value)=(@_);
my @arr=@{$ref};
for(my $i=0;$i<=$#arr;$i++){
return 1 if $arr[$i] eq $value;
}
return 0;
}
$/="\n\n";
open FH,"sh cvt.sh|";
my (%res,$n,@seq);
while(<FH>){
my @arr=split("\n",$_);
foreach(@arr){
my @tmp=split(/ +/,$_);
$res{$.}->{$tmp[0]}=$tmp[1];
push @seq,$tmp[0] if (_exist(\@seq,$tmp[0])==0);
}
$n++;
}
close FH;
print ((join " ",@seq),"\n");
map { printf("%20s",$_) } @sep;
for($i=1;$i<=$n;$i++){
my %hash=%{$res{$i}};
map {printf("%20s",$hash{$_})} @seq;
print "\n";
}
output:
Code:
EVENT Date Initial hour Number of users Average of use Final hour Average of pages printed
INTERNET CONNECTION 11/01/2009 07:30 27 32 min 19:00
LOCAL CALL 11/01/2009 07:42 15 7 min 16:11
INTERNATIONAL CALL 11/01/2009 09:14 21 5 min 16:17
PRINTER USE 12/01/2009 07:30 23 19:00 17
Many thanks for take some of your time to help me. I got some errors testing your codes. Explanations below.
angheloko,
May you help saying what I´m doing wrong or how did you send the scripts? because I see it works for you, and for me doesn´t show the complete answer.
If a leave only input.txt and your script within a folder and run it step by step the behaviour is as follow:
(1)
If I run the first script (# Extract data only and separate into files) looks good so far and generates:
input2.txt (adding some pipes)
input2.txt.10
input2.txt.18
input2.txt.2
input2.txt.26
(2)
If I run the 2nd script (# Compose the headers) looks good so far and generates:
colheaders.txt
headers.txt
rowheaders.txt
(3)
If I run the 3rd script (# Create the unformatted output) looks good so far and generates:
output.txt -->(looks like transpose column into line, but appears some squares in the format)
example:
Code:
input.txt:INTERNET CONNECTION
input.txt:Date
(4)
If I run the 4th script (# Make it pretty) generates:
output2.txt
and only shows
Code:
EVENT input.txt:EVENT
input.txt:Initial hour 07:30
input.txt:Average of use 32 min
input.txt:Final hour 19:00
input.txt:Date 11/01/2009
input.txt:INTERNET CONNECTION
input.txt:Number of users 27
(5)
If I run the complete script within a folder with other files in it I get an error.
(It´s not too relevant, I only isolated the files and ran it again, information only)
Thanks for your help, really. But I tryed to test it, the first part looks like work for me withot errors,
but when I try to send the second script I get
Code:
[root@trm72 cc]# ./script.pl
./script.pl: line 2: sub: command not found
./script.pl: line 3: syntax error near unexpected token `$ref,$value'
'/script.pl: line 3: ` my($ref,$value)=(@_);
[root@trm72 cc]#
May you help saying what I´m doing wrong or how did you send the scripts? because I see it works for you.
Could you post the flat file (source file) and the outputs of the script (input2.txt, etc...) so we can isolate what part that caused the error.
In my case, when I run the script:
input.txt (source file):
Code:
EVENT
INTERNET CONNECTION
Date 11/01/2009
Initial hour 07:30
Number of users 27
Average of use 32 min
Final hour 19:00
EVENT
LOCAL CALL
Date 11/01/2009
Initial hour 07:42
Number of users 15
Average of use 7 min
Final hour 16:11
EVENT
INTERNATIONAL CALL
Date 11/01/2009
Initial hour 09:14
Number of users 21
Average of use 5 min
Final hour 16:17
EVENT
PRINTER USE
Date 12/01/2009
Initial hour 07:30
Number of users 23
Average of pages printed 17
Final hour 19:00
input2.txt (processed input.txt):
Code:
EVENT|INTERNET CONNECTION
Date|11/01/2009
Initial hour|07:30
Number of users|27
Average of use|32 min
Final hour|19:00
EVENT|LOCAL CALL
Date|11/01/2009
Initial hour|07:42
Number of users|15
Average of use|7 min
Final hour|16:11
EVENT|INTERNATIONAL CALL
Date|11/01/2009
Initial hour|09:14
Number of users|21
Average of use|5 min
Final hour|16:17
EVENT|PRINTER USE
Date|12/01/2009
Initial hour|07:30
Number of users|23
Average of pages printed|17
Final hour|19:00
input2.txt.(n) (separated records):
Code:
EVENT|INTERNET CONNECTION
Date|11/01/2009
Initial hour|07:30
Number of users|27
Average of use|32 min
Final hour|19:00
headers.txt (row and column headers):
Code:
Average of pages printed
Average of use
Date
EVENT
Final hour
INTERNATIONAL CALL
INTERNET CONNECTION
Initial hour
LOCAL CALL
Number of users
PRINTER USE
colheaders.txt (column headers):
Code:
Average of pages printed
Average of use
Date
Final hour
Initial hour
Number of users
rowheaders.txt (row headers):
Code:
INTERNATIONAL CALL
INTERNET CONNECTION
LOCAL CALL
PRINTER USE
output.txt (awk friendly table - tab-delimited):
Code:
EVENT Average of pages printed Average of use Date Final hour Initial hour Number of users
INTERNATIONAL CALL 5 min 11/01/2009 16:17 09:14 21
INTERNET CONNECTION 32 min 11/01/2009 19:00 07:30 27
LOCAL CALL 7 min 11/01/2009 16:11 07:42 15
PRINTER USE 17 12/01/2009 19:00 07:30 23
output2.txt (formatted output):
Code:
EVENT Date Initial hour Number of users Average of use Average of pages printed Final hour
INTERNATIONAL CALL 11/01/2009 09:14 21 5 min 16:17
INTERNET CONNECTION 11/01/2009 07:30 27 32 min 19:00
LOCAL CALL 11/01/2009 07:42 15 7 min 16:11
PRINTER USE 12/01/2009 07:30 23 17 19:00
Screen looks like this while running the script:
Code:
---
Row: INTERNATIONAL CALL
File: input2.txt.13
Average of pages printed
Average of use
Date
Final hour
Initial hour
Number of users
>> INTERNATIONAL CALL 5 min 11/01/2009 16:17 09:14 21
---
Row: INTERNET CONNECTION
File: input2.txt.1
Average of pages printed
Average of use
Date
Final hour
Initial hour
Number of users
>> INTERNET CONNECTION 32 min 11/01/2009 19:00 07:30 27
---
Row: LOCAL CALL
File: input2.txt.7
Average of pages printed
Average of use
Date
Final hour
Initial hour
Number of users
>> LOCAL CALL 7 min 11/01/2009 16:11 07:42 15
---
Row: PRINTER USE
File: input2.txt.19
Average of pages printed
Average of use
Date
Final hour
Initial hour
Number of users
>> PRINTER USE 17 12/01/2009 19:00 07:30 23
Hello,
I have the following data and I want to use awk to transpose each value to a certain column , so in case the value is not available the column should be empty.
Example:
Box Name: BoxA
Weight: 1
Length :2
Depth :3
Color: red
Box Name: BoxB
Weight: 3
Length :4
Color: Yellow... (5 Replies)
Hello,
I have a bilingual database with the following structure
a,b,c=d,e,f
The right half is in a Left to right script and the second is in a Right to left script as the examples below show
What I need is to separate out the database such that the first word on the left hand matches the first... (4 Replies)
Hello to all, happy new year 2013!
May somebody could help me, is about a very similar problem to the problem I've posted here where the member rdrtx1 and bipinajith helped me a lot.
https://www.unix.com/shell-programming-scripting/211147-map-values-blocks-single-line-2.html
It is very... (3 Replies)
Can I transform input like the below ?
Note: Insert zeros if there is no value to transform.
Input
key name score
key1 abc 10
key2 abc 20
key1 xxx 100
key2 xxx 20
key1 zzz 0
key2 zzz 29
key3 zzz 129
key1 yyy 39output
abc ... (1 Reply)
Hello. very new to shell scripting and would like to know if anyone could help me.
I have data thats being pulled into a txt file and currently have to manually transpose the data which is taking a long time to do.
here is what the data looks like.
Server1 -- Date -- Other -- value... (7 Replies)
Any tips on how I can awk the input data to display the desired output per below? Thanking you in advance.
input test data:
2
2010-02-16 10:00:00
111111111111 bytes
99999999999 bytes
90%
4
2010-02-16 12:00:00
333333333333 bytes
77777777777 bytes
88%
5
2010-02-16 11:00:00... (4 Replies)
I have a data
A 1
B 2
C 3
D 4
E 5
i would like to change the data
A B C D E
1 2 3 4 5
Pls suggest how we can do it in UNIX.
Start using code tags, thanks. Also start reading your PM's you get from Mods as well read the Forum Rules. That might not do any harm. (24 Replies)
I'm aware there are a lot of resources dedicated to the question of transposing rows and columns, but I'm a total newbie at this and the task appears to be beyond me.
I have 40 text files with content that looks like this:
Dokument 1 von 146
Orange County Register (California)
June 26, 2010... (2 Replies)
Hi,
I did read a few posts on the subjects, tried out a few solutions, but did not solve my problem.
https://www.unix.com/302121568-post11.html
https://www.unix.com/shell-programming-scripting/137953-large-file-columns-into-rows-etc-4.html
Please help. Problem very similar to the second link... (15 Replies)