Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Search Forums:



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 09-10-2010
Registered User
 

Join Date: Sep 2010
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
looking for expert sed/script help for translation/substitution

Hi All,

I'm looking for some expert help on sed/script to work out the best way to transform one xml format into another however there are a few complexities around translation.

The extra complexities are to:
1) take the start and stop time (YYYYMMDDHHMMSS) and convert to start time to unix time plus output the difference in seconds between both times.
2) oid, tsid and sid are found by looking up an external file and finding the value against the channel. For example one of the lines in the file will be 2:806:27e2=channel1

Is there any way to write piped sed commands that can do this? If not, any ideas how the script should look like?

Thanks in advance.

Input File

Code:
<programme start="20100910060000 +0100" stop="20100910061000 +0100" channel="channel1">
<title lang="en">This is the title</title>
<desc>This is the description</desc>
</programme>

Output File

Code:
<service oid="0002" tsid="0806" sid="27e2">
<event id="0">
<name lang="OFF" string="This is the title"/>
<text lang="OFF" string="This is the description"/>
<time start_time="1284098400" duration="600"/>
</event>
</service>

Look up file for oid, tsid and sid

Code:
2:806:27e2=channel1
2:756:37a3=channel2
5:4a06:42e5=channel3


Last edited by hotbaws11; 09-10-2010 at 08:47 AM..
Sponsored Links
    #2  
Old 09-10-2010
Registered User
 

Join Date: Aug 2010
Posts: 60
Thanks: 12
Thanked 0 Times in 0 Posts
how many records are in each file roughly?
Sponsored Links
    #3  
Old 09-10-2010
Registered User
 

Join Date: Sep 2010
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
One of my files has 22k of programmes and I only expect to process 1 file per day.
    #4  
Old 09-10-2010
Registered User
 

Join Date: Aug 2010
Posts: 60
Thanks: 12
Thanked 0 Times in 0 Posts
This should be enough to get you started down the right track.....


Code:
XMLINPUTFILE=`cat /tmp/xmlinputfile.txt`
OIDTSIDSIDREF=`cat /tmp/oidtsidsid.txt`
STARTTIME=`echo $XMLINPUTFILE | sed -e '/programme start/!d' -e 's/<programme start="//' -e 's/ .*//g'`
STOPTIME=`echo $XMLINPUTFILE | sed -e '/programme start/!d' -e 's/.*stop="//' -e 's/ .*//g'`
CHANNELNO=`echo $XMLINPUTFILE | sed -e '/programme start/!d' -e 's/..*channel="//' -e 's/".*//'`
OIDCHANNEL=`echo "$OIDTSIDSIDREF" | grep $CHANNELNO `

OIDVAL=`echo "$OIDCHANNEL" | sed -e 's/:.*//' -e 's/abcdef/ABCDEF/g'`
TSIDVAL=`echo "$OIDCHANNEL" | sed -e 's/^[0-9]*://' -e 's/:.*//' -e 's/abcdef/ABCDEF/g'`
SIDVAL=`echo "$OIDCHANNEL" | sed -e 's/[0-9].*://' -e 's/=chan.*//' -e 'y/abcdef/ABCDEF/'`

OIDVAL=`echo "ibase=16; $OIDVAL" | bc`
TSIDVAL=`echo "ibase=16; $TSIDVAL" | bc`
SIDVAL=`echo "ibase=16; $SIDVAL" | bc`

OIDVAL=`printf "%04X" $(echo "$OIDVAL + 0" | bc -l ) | sed -e 'y/ABCDEF/abcdef/'`
TSIDVAL=`printf "%04X" $(echo "$TSIDVAL + 0" | bc -l ) | sed -e 'y/ABCDEF/abcdef/'`
SIDVAL=`printf "%04X" $(echo "$SIDVAL + 0" | bc -l ) | sed -e 'y/ABCDEF/abcdef/'`

STARTSTRING=`echo $STARTTIME | sed -e 's/...[0-9]/&-/' -e 's/.*-.[0-9]/&-/' -e 's/.*-.[0-9]/& /' -e 's/..* [0-9][0-9]/&:/' -e 's/..*:../&:/'`
STOPSTRING=`echo $STOPTIME | sed -e 's/...[0-9]/&-/' -e 's/.*-.[0-9]/&-/' -e 's/.*-.[0-9]/& /' -e 's/..* [0-9][0-9]/&:/' -e 's/..*:../&:/'`
STARTUNIX=`date -d "$STARTSTRING" +%s`
STOPUNIX=`date -d "$STOPSTRING" +%s`
DIFFERENCE=`expr $STOPUNIX - $STARTUNIX`

echo "<service oid="'"'"$OIDVAL"'"'" tsid="'"'"$TSIDVAL"'"'" sid="'"'"$SIDVAL"'"'">"
echo "<time start_time="'"'"$STARTUNIX"'"'" duration="'"'"$DIFFERENCE"'"'"/>"

Have fun

---------- Post updated at 04:46 AM ---------- Previous update was at 04:41 AM ----------

I just noticed a typo....

On the first instance of OIDVAL and TSIDVAL the end 's/abcdef/ABCDEF/g' should read 'y/abcdef/ABCDEF/' as in the first SIDVAL line.
Sponsored Links
    #5  
Old 09-12-2010
Registered User
 

Join Date: Aug 2010
Posts: 33
Thanks: 0
Thanked 7 Times in 6 Posts
I think it's best to use Perl and XML::Simple for that. Awk has limitations with non-multi-line need-parser-type based data.

---------- Post updated at 04:43 PM ---------- Previous update was at 04:42 PM ----------

I'm not also sure but maybe this can also help: xmlsh
Sponsored Links
    #6  
Old 09-12-2010
Registered User
 

Join Date: Sep 2010
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
thanks for the response guys. Basically I need a shell script (sh) as a) that's all I know and b) I think that is the only thing available on the box I'll be running this on. Again I have no idea how to install any new languages onto the box.

I've managed to get quite good performance (less than an hour) by:
1) extracting all the information out of the source file and into 'raw' format by using a multiple sed commands. Raw format is something like event_num~channel~startdate~enddate~title~desc
2) Then using usual read file method to read each line and output it in xml file format. I need the read file method to a) convert startdate/enddate into unix time. I can't get this to work in sed itself (i.e. taking a sed variable and passing it to date function). b) lookup the external file for sid,onid,tsid.

I don't have the script handy, otherwise I would post it here. Again, if you have any suggestions on how I could improve my sed statements to do a) and b) above then that would make it even faster
Sponsored Links
    #7  
Old 09-12-2010
Registered User
 

Join Date: Aug 2010
Posts: 60
Thanks: 12
Thanked 0 Times in 0 Posts
If you look at the script I posted you'll see it already does the unix time conversion.

Look at the very top and you'll see the two required filenames in the /tmp directory.

copy your xmlfile and the oidtsidsid values file into those names

run the script

you'll get the service oid line and the starttime and duration as outputs.

runtime is about 3 seconds.
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Making script show command (e.g. copy) being executed and variable substitution? gr0124 Shell Programming and Scripting 3 05-24-2010 01:46 PM
Expert cp command lipe.82 Shell Programming and Scripting 12 03-29-2010 09:43 AM
Bad substitution errors in shell script Jackinthemox Shell Programming and Scripting 2 03-19-2010 04:25 AM
Expert Opinion rsheikh UNIX for Advanced & Expert Users 6 03-02-2009 09:13 AM



All times are GMT -4. The time now is 04:17 AM.