Data Normalization


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Data Normalization
# 1  
Old 02-17-2013
Data Normalization

Hi, there
Need help on rearranging the data.
I have data in the following format.

LAC
=
040
DN
=
24001001
EQN
=
920-
2-
0-
1
CAT
=
MS
OPTRCL
=
3
TRARSTR
=
ACTTRACL
TRACLACT
TRACLMOD
LNATT
=
PB
COS
=
CLIP



Immediate row before the symbol "=" is like Column heading. The rows after "=" are like corresponding values.

I need to rearrange the data as follows.

LAC|DN|EQN|CAT|OPTRCL|TRARSTR|LNATT|COS
040|24001001|920- 2- 0- 1|MS|3|ACTTRACL TRACLACT TRACLMOD|PB|CLIP


Please help me out.
Thanks
# 2  
Old 02-17-2013
Since you didn't use code tags, I'm not sure that I have correctly interpreted what your input looks like, but this might work:
Code:
awk '
/=/ {   if(hc) f[hc] = s
        h[++hc] = last
        last = s = ""
        lc = fc = 0
        next
}
{       if(lc) {
                if(fc)  s = s " " last
                else {  fc = 1
                        s = last
                }
        } 
        lc = 1
        last = $0
}
END {   for(i = 1; i <= hc; i++) printf("%s%s", h[i], i == hc ? "\n" : "|")
        if(lc && sc) f[hc] = s " " last
        else    f[hc] = last
        for(i = 1; i <= hc; i++) printf("%s%s", f[i], i == hc ? "\n" : "|")
}' data

As always, if you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk or nawk instead of awk.
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 02-17-2013
PLEASE use code tags for code and data as required by forum rules!

This has become uglier than what I went for, but at least for your special problem, it might work:
Code:
awk     '$1=="=" {header = header M1 "|"        # if "=" encountered, use "dragging" copy, i.e. last line's $1
                  row = row "|"                 # insert "|" separator into row
                  next                          # no further action on this line
                 }
                 {row = row " " $1; M1 = $1}    # collect everything into row; keep a "dragging" copy of $1 in M1
         END     {gsub (/ [^ ]*\| /, "|", row)  # remove header fields from row
                  sub  (/^\|/, "", row)         # remove leading "|" from row
                  sub  (/\|$/,"", header)       # remove trailing "|" from header
                  print header; print row}      # print it
        ' file
LAC|DN|EQN|CAT|OPTRCL|TRARSTR|LNATT|COS
040|24001001|920- 2- 0- 1|MS|3|ACTTRACL TRACLACT TRACLMOD|PB|CLIP

---------- Post updated at 11:21 ---------- Previous update was at 10:57 ----------

Well, this might be a bit more straightforward and easier to understand...

Code:
awk     '       {tot = tot " " $1}                      # collect everything into one long line
         END    {gsub (/ [^ ]* =/, "|&", tot)           # separate entries by "|"
                 n = split (tot, Ar1, "\| ")            # split line by "|", remember field count
                 for (i=2; i<=n; i++)                   # every field but first (artificial, empty field)
                        {split (Ar1[i], Ar2, " = ")     # split into header/row part
                         head = head (i>2?"|":_) Ar2[1] # and create header 
                         row  = row  (i>2?"|":_) Ar2[2] # and row (both avoiding leading "|")
                        }
                 print head; print row                  # print both
                }
        ' file


Last edited by RudiC; 02-17-2013 at 09:12 AM.. Reason: correction of typo in script
This User Gave Thanks to RudiC For This Post:
# 4  
Old 02-17-2013
The second one is not giving proper output, but the first one does. Can you please explain the first code.?
# 5  
Old 02-17-2013
This is the output of the second proposal - what is wrong, what are you missing?
Code:
LAC|DN|EQN|CAT|OPTRCL|TRARSTR|LNATT|COS
040|24001001|920- 2- 0- 1|MS|3|ACTTRACL TRACLACT TRACLMOD|PB|CLIP

Added comments to above post...

Last edited by RudiC; 02-17-2013 at 07:20 AM..
This User Gave Thanks to RudiC For This Post:
# 6  
Old 02-17-2013
For the second one, the output is as follows.

Code:
LAC|=|040||DN|=|24001001||EQN|=|920-|2-|0-|1||CAT|=|MS||OPTRCL|=|3||SUBTRCL|=|3||NUMCAL|=|1||ORIG1|=|1||ORIG2|=|1||TRARSTR|=|ACTTRACL|TRACLACT|TRACLMOD||LNATT|=|PB||COS|=|CLIP||ADDINF|=|KEYWORD||LAC|=|040||DN|=|24001003||EQN|=|1010-|1-|4-13||CAT|=|MS||OPTRCL|=|12||NUMCAL|=|1||ORIG1|=|7||ORIG2|=|7||BLK|=|ACCSPORI||TRARSTR|=|RSTSCI1|RSTSCI4||LNATT|=|PB||COS|=|CLIP||LAC|=|040||DN|=|24001004||EQN|=|920-|2-|5-|6||CAT|=|MS||OPTRCL|=

Anyways, thanks for your concern. I wrote a big script to rearrange this data.

Code:
rm output.txt 2> /dev/null;
heading="";
valu="";
preline="#";
outhead="";
lnum=0;
lines=$(cat har4.txt | wc -l);
#echo "No. of lines: "$lines;
limit=`expr $lines - 2`;
#echo "Limit is: "$limit;
while read line
do
lnum=`expr $lnum + 1`;
if [ $line == "LAC" ]; then
outhead=$outhead"|"$heading;
echo $outhead >> output.txt;
echo $valu >> output.txt;
outhead="";
#echo "In first if";
heading="LAC";
#echo "Heading: "$heading;
valu="";
else
if [ $line == "=" ]; then
#echo "in second if";
if [ $preline != "LAC" ]; then
heading=$heading"|"$preline;
fi
#echo "Heading: "$heading;
a=$(echo $valu | awk -F "|" '{print length($NF)+1;}');
#echo "a value: "$a;
valu=$(echo $valu | awk -v ln=$a '{print substr($0, 1, length($0)-ln)}');
#echo "value: "$valu;
fi
fi
if [ $line != "LAC" ] && [ $line != "=" ]; then
if [ $lnum -le $limit ]; then
#echo "Lnum is: "$lnum;
nexfstlineno=`expr $lnum + 1`;
nexscndlineno=`expr $lnum + 2`;
#echo "Next Fst No: "$nexfstlineno;
#echo "Next Scnd No: "$nexscndlineno;
nexfstline=$(sed -n "${nexfstlineno}{p;q;}" har4.txt);
nexscndline=$(sed -n "${nexscndlineno}{p;q;}" har4.txt);
#echo "Next first line: "$nexfstline;
#echo "Next second line: "$nexscndline;
if [ $nexscndline == "=" ] && [ $nexfstline != "=" ] && [ $preline != "=" ]; then
valu=$valu"#"$line;
else
if [ $nexscndline != "=" ] && [ $nexfstline != "=" ] && [ $preline != "=" ]; then
valu=$valu"#"$line;
#echo "Value is: "$valu;
else
#echo "in third if";
valu=$valu"|"$line;
#echo "in third if, value is: "$valu
fi
fi
else
valu=$valu"|"$line;
#echo "in third if, value is: "$valu;
fi
fi
preline=$line;
done < har4.txt
awk '{gsub("-#","-");gsub("#"," ");print}' output.txt > final_output.txt;

here har4.txt is the input file.

output is:

Code:
|LAC|DN|EQN|CAT|OPTRCL|NUMBCH|NUMCAL|ORIG1|ORIG2|LNATT|COS|ADDINF|SERV|SERV|SERV|SERV|SERV|SERV|SERV|SERV|SERV|SERV
|040|24001279|900-1-1-5|IBA|1|2|2|95|1|MSN|INHTROFF STIPROT FCTPROT|MULTIEQN|CMSPEECH|CMUNRST|CMAUDIO|CMAUDI7K|TEL3K|TEL7K|FAX4|TTX64K|VIDEOTEX|VIDEOTEL
|LAC|DN|EQN|CAT|OPTRCL|NUMBCH|NUMCAL|ORIG1|ORIG2|LNATT|COS|ADDINF|SERV|SERV|SERV|SERV|SERV|SERV|SERV|SERV|SERV|SERV
|040|24001280|900-1-1-5|IBA|1|2|2|95|1|MSN|INHTROFF STIPROT FCTPROT|MULTIEQN|CMSPEECH|CMUNRST|CMAUDIO|CMAUDI7K|TEL3K|TEL7K|FAX4|TTX64K|VIDEOTEX|VIDEOTEL
|LAC|DN|EQN|CAT|OPTRCL|NUMBCH|NUMCAL|ORIG1|ORIG2|LNATT|COS|ADDINF|SERV|SERV|SERV|SERV|SERV|SERV|SERV|SERV|SERV|SERV
|040|24001281|900-1-2-4|IBA|3|2|2|1|1|MSN|INHTROFF MSNDEFDN STIPROT FCTPROT|MULTIEQN|CMSPEECH|CMUNRST|CMAUDIO|CMAUDI7K|TEL3K|TEL7K|FAX4|TTX64K|VIDEOTEX|VIDEOTEL
|LAC|DN|EQN|CAT|OPTRCL|NUMBCH|NUMCAL|ORIG1|ORIG2|LNATT|COS|ADDINF|SERV|SERV|SERV|SERV|SERV|SERV|SERV|SERV|SERV|SERV
|040|24001282|900-1-2-4|IBA|1|2|2|95|1|MSN|INHTROFF STIPROT FCTPROT|MULTIEQN|CMSPEECH|CMUNRST|CMAUDIO|CMAUDI7K|TEL3K|TEL7K|FAX4|TTX64K|VIDEOTEX|VIDEOTEL

# 7  
Old 02-18-2013
Quote:
Originally Posted by RudiC
PLEASE use code tags for code and data as required by forum rules!

This has become uglier than what I went for, but at least for your special problem, it might work:
Code:
awk     '$1=="=" {header = header M1 "|"        # if "=" encountered, use "dragging" copy, i.e. last line's $1
                  row = row "|"                 # insert "|" separator into row
                  next                          # no further action on this line
                 }
                 {row = row " " $1; M1 = $1}    # collect everything into row; keep a "dragging" copy of $1 in M1
         END     {gsub (/ [^ ]*\| /, "|", row)  # remove header fields from row
                  sub  (/^\|/, "", row)         # remove leading "|" from row
                  sub  (/\|$/,"", header)       # remove trailing "|" from header
                  print header; print row}      # print it
        ' file
LAC|DN|EQN|CAT|OPTRCL|TRARSTR|LNATT|COS
040|24001001|920- 2- 0- 1|MS|3|ACTTRACL TRACLACT TRACLMOD|PB|CLIP

---------- Post updated at 11:21 ---------- Previous update was at 10:57 ----------

Well, this might be a bit more straightforward and easier to understand...

Code:
awk     '       {tot = tot " " $1}                      # collect everything into one long line
         END    {gsub (/ [^ ]* =/, "|&", tot)           # separate entries by "|"
                 n = split (tot, Ar1, "\| ")            # split line by "|", remember field count
                 for (i=2; i<=n; i++)                   # every field but first (artificial, empty field)
                        {split (Ar1[i], Ar2, " = ")     # split into header/row part
                         head = head (i>2?"|":_) Ar2[1] # and create header 
                         row  = row  (i>2?"|":_) Ar2[2] # and row (both avoiding leading "|")
                        }
                 print head; print row                  # print both
                }
        ' file



Code:
sub(/^ \|/,"", row)

is needed as well for this code.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk --> math-operation in data-record and joining with second file data

Hi! I have a pretty complex job - at least for me! i have two csv-files with meassurement-data: fileA ...... (2 Replies)
Discussion started by: IMPe
2 Replies

2. Shell Programming and Scripting

Converting variable space width data into CSV data in bash

Hi All, I was wondering how I can convert each line in an input file where fields are separated by variable width spaces into a CSV file. Below is the scenario what I am looking for. My Input data in inputfile.txt 19 15657 15685 Sr2dReader 107.88 105.51... (4 Replies)
Discussion started by: vharsha
4 Replies

3. Shell Programming and Scripting

Normalization using awk

I made my explanation precise in the CODE below. I can do this manually. But is there a way to automate this? If I give 4 or 10 or any number of inputs. It should calculate the CODE and print the different outputs with normalization value ? some thing like script.sh input1 input2 input3 input4... (12 Replies)
Discussion started by: quincyjones
12 Replies

4. Shell Programming and Scripting

Normalization using awk

Hi I have a file with chr22_190_200 XXY 0 0 chr22_201_210 XXY 0 30 chr22_211_220 XXY 3 0 chr22_221_230 XXY 0 0 chr22_231_240 XXY 5 0 chr22_241_250 ABC 0 0 chr22_251_260 ABC 22 11 ... (12 Replies)
Discussion started by: Diya123
12 Replies

5. Shell Programming and Scripting

Normalization Using Shell Scripting.

Hi All, I am having a file having below three lines or maybe more than 3 lines. The first line will be always constant. ### Line 1 #### Transformation||Transformation Mapplet Name||Transformation Group||Partition Index||Transformation Row ID||Error Sequence||Error Timestamp||Error UTC... (4 Replies)
Discussion started by: satyaranjon
4 Replies

6. Shell Programming and Scripting

Extract data based on match against one column data from a long list data

My input file: data_5 Ali 422 2.00E-45 102/253 140/253 24 data_3 Abu 202 60.00E-45 12/23 140/23 28 data_1 Ahmad 256 7.00E-45 120/235 140/235 22 data_4 Aman 365 8.00E-45 15/65 140/65 20 data_10 Jones 869 9.00E-45 65/253 140/253 18... (12 Replies)
Discussion started by: patrick87
12 Replies

7. UNIX for Dummies Questions & Answers

converting a tabular format data to comma seperated data in KSH

Hi, Could anyone help me in changing a tabular format output to comma seperated file pls in K-sh. Its very urgent. E.g : username empid ------------------------ sri 123 to username,empid sri,123 Thanks, Hema:confused: (2 Replies)
Discussion started by: Hemamalini
2 Replies

8. Shell Programming and Scripting

how to verify that copied data to remote system is identical with local data.

I have created simple shell script #!/bin/sh echo `date`; echo "Start .... find . -mtime +95 -print > /tmp/files.txt for file in `cat /tmp/files.txt` do echo "copying file - $file" /usr/local/bin/scp -p -P 2222 $file remote.hostname:/file/path echo "copid file -... (3 Replies)
Discussion started by: ynilesh
3 Replies

9. UNIX for Dummies Questions & Answers

Howto capture data from rs232port andpull data into oracle database-9i automatically

Hi, i willbe very much grateful to u if u help me out.. if i simply connect pbx machine to printer by serial port RS232 then we find this view: But i want to capture this data into database automatically when the pbx is running.The table in database will contain similar to this view inthe... (1 Reply)
Discussion started by: boss
1 Replies
Login or Register to Ask a Question