Change XML file structure script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Change XML file structure script
# 1  
Old 10-05-2011
Change XML file structure script

Hi to all,

Maybe someone could help me. I want to transform the structure of a xml file.

I have this input.xml:

Code:
<?xml version="1.0" encoding="utf-8"?>
<votings>
  <file name="Reference 19762">
    <case id="No. 3 Div. 870">
      <j v="1">Peter</j>
      <j v="1">Ely</j>
      <j v="9">Mark</j>
    </case>
    <case id="No. 3 Div. 887">
      <j v="1">Mary</j>
      <j v="9">Peter</j>
      <j v="1">Ely</j>
      <j v="1">Perry</j>
      <j v="1">Mark</j>
    </case>
  </file>
</votings>

and the required output should be:
Code:
<Table ss:ExpandedColumnCount="3" ss:ExpandedRowCount="6" x:FullColumns="1"
   x:FullRows="1" ss:DefaultColumnWidth="60" ss:DefaultRowHeight="15">
   <Column ss:Width="30.75"/>
   <Column ss:Width="67.5" ss:Span="1"/>
   <Row ss:AutoFitHeight="0">
    <Cell ss:Index="2"><Data ss:Type="String">No. 3 Div. 870</Data></Cell>
    <Cell><Data ss:Type="String">No. 3 Div. 887</Data></Cell>
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell><Data ss:Type="String">Ely</Data></Cell>
    <Cell><Data ss:Type="Number">1</Data></Cell>
    <Cell><Data ss:Type="Number">1</Data></Cell>
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell><Data ss:Type="String">Mark</Data></Cell>
    <Cell><Data ss:Type="Number">9</Data></Cell>
    <Cell><Data ss:Type="Number">1</Data></Cell>
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell><Data ss:Type="String">Mary</Data></Cell>
    <Cell><Data ss:Type="Number">0</Data></Cell>
    <Cell><Data ss:Type="Number">1</Data></Cell>
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell><Data ss:Type="String">Perry</Data></Cell>
    <Cell><Data ss:Type="Number">0</Data></Cell>
    <Cell><Data ss:Type="Number">1</Data></Cell>
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell><Data ss:Type="String">Peter</Data></Cell>
    <Cell><Data ss:Type="Number">1</Data></Cell>
    <Cell><Data ss:Type="Number">9</Data></Cell>
   </Row>
  </Table>

As you can see, in input.xml there are 2 blocks of "cases" with names (it could be more than 2 "cases" blocks. e.g 5, 7, 8 etc.).
In both "cases" blocks appear some names repeated (Peter, Ely and Mark appear in both blocks)

Then, in the output the "Row" blocks should be obtained as follow:

Block 1:
Code:
<Table ss:ExpandedColumnCount="3" ss:ExpandedRowCount="6" x:FullColumns="1"

Variables here are in red:
ExpandedColumnCount = Number of "Cases" blocks + 1 = 2 +1 = 3
ExpandedRowCount = Number of unique names + 1 = 5 + 1 = 6


Block 2 (the first "Row" block):
Code:
   <Row ss:AutoFitHeight="0">
    <Cell ss:Index="2"><Data ss:Type="String">No. 3 Div. 870</Data></Cell>
    <Cell><Data ss:Type="String">No. 3 Div. 887</Data></Cell>
   </Row>

The values in red should be taken from "case id" in "cases" blocks.

Blocks 3,4,5...N ("Row" blocks for each unique name):

Code:
   <Row ss:AutoFitHeight="0">
    <Cell><Data ss:Type="String">Ely</Data></Cell>
    <Cell><Data ss:Type="Number">1</Data></Cell>
    <Cell><Data ss:Type="Number">1</Data></Cell>
   </Row>

The values in red are taken from "cases" blocks, but needs to look up unique names and show a unique "Row" block for each name and
all related values for each name within the same "Row" block as shown above.

I really hope you could help me with this. I'm a kind of beginner in this type of scripts.

Thanks in advance.

Regards.

Last edited by cgkmal; 10-05-2011 at 05:32 AM..
# 2  
Old 10-05-2011
I read your requirements three times and I still could not figure it out.

You have a large project.

Quote:
I really hope you could help me with this. I'm a kind of beginner in this type of scripts.
Why do you have to write a solution using unix shell script?

Try using a compiled language.
# 3  
Old 10-05-2011
Quote:
Originally Posted by Shell_Life
I read your requirements three times and I still could not figure it out.
Hi, thanks for answer.

Sorry for my explanation. In input file there are repeated names and in the output only appear once with their corresponding values. Maybe you could understand better only seeing how is the input and how is the output.

Quote:
Originally Posted by Shell_Life
You have a large project.
I'm not sure, I've seen several questions of more complex xml transformations here, I hope this is not too complex. Maybe awk, sed or combination of those with bash.



Quote:
Originally Posted by Shell_Life
Why do you have to write a solution using unix shell script?

Try using a compiled language.
This because is what I could understand and what I thought could be the solution. I really dont know a compiled language to try doing something like this.

Many thanks for any help.

Regards.
# 4  
Old 10-05-2011
Like Shell_Life said, it is complex!
And yes it is possible through scripts, not that easy... But to what extend have you implemented this?...

btw, I also read your requirement 2 - 3 times, well couldn't catch all of them!

--ahamed
# 5  
Old 10-06-2011
Hi ahamed,

I intend to parse the input in that way because the output would be open in MS Excel, but showing in different layout, as I said before, without repeat the names, only show them once and locate within the same block every associated value to each name. One block for each name.

Sorry for my explanation, I hope somebody could get my english explanation.

Thanks in advance.

Regards.

---------- Post updated 10-06-11 at 03:08 AM ---------- Previous update was 10-05-11 at 11:51 AM ----------

Hi again,

Answering my own question to show if anybody is interested in the future.

I had to separate step by step the conversion needed, it's not pure bash. I helped me with individual awk commands to get
each section of the script.

Probably the same script could be obtained in a unique awk program, I'll like to see how to join this code in a single awk program.

Well, the code I could work is:
Code:
#########################################################################################################################
### Begin of script of XML conversion ###################################################################################
#########################################################################################################################

Voting_Info="input.xml"

## (1) - Get cases id's between double quotes e.g. "<case id="No. 3 Div. 870">" and store them in varible array ###
oldIFS=$IFS
IFS=$'\n'
Cases=($(awk -F "[\"]" '/id=/{print $2}' "$Voting_Info"))
IFS=$oldIFS

## (2) - Count "cases" blocks
CasesNumber=($(awk -F "[\"]" '/id=/{print $2}' "$Voting_Info" | wc -l))
let "CasesNumber=$CasesNumber+1"  # Add "1" to set the value in "ExpandedColumnCount"

## (3) - Get unique names between ">" and "</j>", e.g. ">Mary</j>" and store them in varible array ###
UniqNames=($(awk -F "[><]" '/v=/{a[$3];} END{for (i in a) print i;}' "$Voting_Info" | sort))

## (4) Print first lines of output
echo "  <Table ss:ExpandedColumnCount=\""$CasesNumber"\" ss:ExpandedRowCount=\"6\" x:FullColumns=\"1\""
echo "   x:FullRows=\"1\" ss:DefaultColumnWidth=\"60\" ss:DefaultRowHeight=\"15\">"
echo "   <Column ss:Width=\"30.75\"/>"
echo "   <Column ss:Width=\"67.5\" ss:Span=\"1\"/>"

## (5) - Print first block, that is the "Cases" names block.
    echo "   <Row ss:AutoFitHeight=\"0\">"
    echo "    <Cell ss:Index=\"2\"><Data ss:Type=\"String\">""${Cases[0]}""</Data></Cell>"
#for i in "${Cases[@]}"
for ((i=1;i<${#Cases[*]};i++))
   do
    echo "    <Cell><Data ss:Type=\"String\">""${Cases[$i]}""</Data></Cell>"
done

## (6) - Loop to get values of each name within all cases blocks
for j in "${UniqNames[@]}"
   do
    echo "   </Row>"
    echo "    <Row ss:AutoFitHeight=\"0\">"
    echo "    <Cell><Data ss:Type=\"String\">"$j"</Data></Cell>"
    awk -v Z=$j -F"[\"><]+" '/case id/{v=0}/\/case/{print "    <Cell><Data ss:Type=\"Number\">" v "</Data></Cell>"}$0 ~ Z{v=$3}' "$Voting_Info"
done
## (7) - Print last lines to complete output
echo "   </Row>"
echo "  </Table>"

Hope this helps.

Thanks as always for your help and time.

Regards
# 6  
Old 10-06-2011
If the rules are always same.. and the criteria for pattern matching are unchanged, which means the actual file is same as sample you have posted except the variables.

This is what I have..
Please note I have just tried to achieve your requirement i.e I hadn't thought about tuning and efficiency.

Code:
datafile=xmlfile
ExpandedColumnCount=$(( $(grep -c '<case id' ${datafile}) + 1 ))
ExpandedRowCount=$(( $(awk -F '[<>]' ' /<j/ {print $3| "sort -u|wc -l"}' ${datafile}) + 1))


cat <<-ENDCAT1
<Table ss:ExpandedColumnCount="${ExpandedColumnCount}" ss:ExpandedRowCount="${ExpandedRowCount}" x:FullColumns="1"
    x:FullRows="1" ss:DefaultColumnWidth="60" ss:DefaultRowHeight="15">
    <Column ss:Width="30.75"/>
    <Column ss:Width="67.5" ss:Span="1"/>
    <Row ss:AutoFitHeight="0">
$(awk -F '"' ' /case id=/ {print $2}' ${datafile} | sed 's|^|      <Cell ss:Index="2"><Data ss:Type="String">|g' | sed 's|$|</Data></Cell>|g')
    </Row>
ENDCAT1

awk -F '[<>]' ' /<j/ {print $3| "sort -u"}' ${datafile} | while read name
do
cat <<-ENDCAT2
    <Row ss:AutoFitHeight="0">
      <Cell><Data ss:Type="String">${name}</Data></Cell>"
$(awk -F '["<>]' -v n=$name '/<j/ && $5 == n {print $3}' ${datafile} | sed 's|^|      <Cell><Data ss:Type="Number">|g' | sed 's|$|</Data></Cell>|g')
    </Row>
ENDCAT2
done

echo '</Table>'


O/P

Code:
<Table ss:ExpandedColumnCount="3" ss:ExpandedRowCount="6" x:FullColumns="1"
    x:FullRows="1" ss:DefaultColumnWidth="60" ss:DefaultRowHeight="15">
    <Column ss:Width="30.75"/>
    <Column ss:Width="67.5" ss:Span="1"/>
    <Row ss:AutoFitHeight="0">
      <Cell ss:Index="2"><Data ss:Type="String">No. 3 Div. 870</Data></Cell>
      <Cell ss:Index="2"><Data ss:Type="String">No. 3 Div. 887</Data></Cell>
    </Row>
    <Row ss:AutoFitHeight="0">
      <Cell><Data ss:Type="String">Ely</Data></Cell>"
      <Cell><Data ss:Type="Number">1</Data></Cell>
      <Cell><Data ss:Type="Number">1</Data></Cell>
    </Row>
    <Row ss:AutoFitHeight="0">
      <Cell><Data ss:Type="String">Mark</Data></Cell>"
      <Cell><Data ss:Type="Number">9</Data></Cell>
      <Cell><Data ss:Type="Number">1</Data></Cell>
    </Row>
    <Row ss:AutoFitHeight="0">
      <Cell><Data ss:Type="String">Mary</Data></Cell>"
      <Cell><Data ss:Type="Number">1</Data></Cell>
    </Row>
    <Row ss:AutoFitHeight="0">
      <Cell><Data ss:Type="String">Perry</Data></Cell>"
      <Cell><Data ss:Type="Number">1</Data></Cell>
    </Row>
    <Row ss:AutoFitHeight="0">
      <Cell><Data ss:Type="String">Peter</Data></Cell>"
      <Cell><Data ss:Type="Number">1</Data></Cell>
      <Cell><Data ss:Type="Number">9</Data></Cell>
    </Row>
</Table>


Please note, this doesn't include one thing..
Putting entry with zero if the name doesn't appear in any row block. but i think you can try that yourself.
# 7  
Old 10-06-2011
Hello anchal_khare,

Many thanks for reply and give some of your time to share your knowledge to help. Your code works beautifully!

With your code I've learned more than one thing new.
1-) I didnt know that "cat" can be used in that way, it has a name use cat in that way?
2-) The use of "while read" in combination of awk commands to avoid several steps, processing, resources and memory using array variables.
3-) I had forgotten the great features of "sed" to replace directly at the begin or at the end of a string, great.

I only modified the following in your code (the added code in red).

I changed this:
Code:
$(awk -F '"' ' /case id=/ {print $2}' ${datafile} | sed 's|^|      <Cell ss:Index="2"><Data ss:Type="String">|g' | sed 's|$|</Data></Cell>|g')

for this:
Code:
$(awk -F '"' ' /case id=/ {print $2}' ${datafile} | sed q | sed 's|^|    <Cell ss:Index="2"><Data ss:Type="String">|g' | sed 's|$|</Data></Cell>|g')
$(awk -F '"' ' /case id=/ {print $2}' ${datafile} | sed 1d | sed 's|^|    <Cell><Data ss:Type="String">|g' | sed 's|$|</Data></Cell>|g')

and this:
Code:
$(awk -F '["<>]' -v n=$name '/<j/ && $5 == n {print $3}' ${datafile} | sed 's|^|      <Cell><Data ss:Type="Number">|g' | sed 's|$|</Data></Cell>|g')

for this (to get zeros when there is not a match)
Code:
$(awk -v Z=$name -F"[\"><]+" '/case id/{v=0}/\/case/{print "    <Cell><Data ss:Type=\"Number\">" v "</Data></Cell>"}$0 ~ Z{v=$3}' ${datafile})

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
I think I'll be able to emulate your code to extend it and process other file that I need to, after this input.xml and input1.xml have
been changed in similar way they will be part of an final output.xml like show below:

Code:
Line1..............
Line2..............
..
.
Line30............

Output of processing input1.xml # Output from other input.xml

Output of processing input2.xml # The output your script already does

Line31............
Line31............
..
.
.
Line50............

In order to get that, may you help me with suggestions regarding this (assume your code is rutine2 and rutine1 is a code I have
to add to process input1.xml and I have both rutines working already):


A suggestion or idea in how to surround your script with a loop to process all somename_ci.xml in folder with Rutine1 and proccess
somename_vi.xml in folder with Rutine2.


The final code should look like this:
Code:
# there are couples of files with the same name, only different at the end with "_ci" and "_vi". 
E.g. December_ci.xml and December_vi.xml, June_ci.xml and June_vi.xml,... etc.

For i=1 to CountOfAllFilesInFolder/2 # divided by 2 since there are many couples of files to process in one output each couple
 
file1=somename_ci.xml # somename is the string that will vary
file2=somename_vi.xml # somename is the string that will vary
  do
      Code to add first 30 fixed lines #I'll think of emulate the way you use "cat" to add this fixed lines
      Code to add lines after processing somename_ci.xml #Rutine1, I'll add it later.
      Code to add lines after processing somename_vi.xml #Rutine2, Your code....
      Code to add last 20 fixed lines  #I'll think of emulate the way you use "cat" to add this fixed lines
done

I hope you can give and idea of how to get this last part of my code.

Thanks in advance.

Regards.

Last edited by cgkmal; 10-06-2011 at 06:25 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to change values in xml file?

I have xml file like below, i want change the values at default-value place of each argument name using shell script. like where argument name= protocol and default-value=tcp, where argument name =port and default-value= 7223, where argument name = username and default-value=test, example ... (12 Replies)
Discussion started by: s1s2s3s4
12 Replies

2. Shell Programming and Scripting

Change attribute value in xml using shell script

hi, i am new to unix and i have a problem. -------------------------------------------------------------- sebben.xml <envelope> <email> sebben@example.com </email> </envelope> script_mail written in the vi editor. #!/bin/sh script to change the value in attribute <email> echo... (3 Replies)
Discussion started by: sebbenw
3 Replies

3. Shell Programming and Scripting

To change Specific Lines in An XML file

hi Guys, this is my requirement, there is a huge xml file of this i have to change 3 lines with out opening the file /users/oracle > cat lnxdb-pts-454.xml|egrep "s_virtual|s_cluster|s_dlsnstatus" <cluster_port oa_var="s_clusterServicePort">9998</cluster_port> <host... (2 Replies)
Discussion started by: smarlaku
2 Replies

4. Shell Programming and Scripting

KSH - help needed for creating a script to generate xml file from text file

Dear Members, I have a table in Oracle DB and one of its column name is INFO which has data in text format which we need to fetch in a script and create an xml file of a new table from the input. The contents of a single cell of INFO column is like: Area:app - aam Clean Up Criteria:... (0 Replies)
Discussion started by: Yoodit
0 Replies

5. Shell Programming and Scripting

Change values in Log4j.xml using ksh script

Hi, I am new to UNIX and shell scripting. I have to create a shell script(ksh) which parses log4j.xml file for a given webservice name and change the corresponding value from INFO to DEBUG or vice-versa. My log4j.xml looks like:- <!-- Appender WEBSERVICENAME--> <appender... (3 Replies)
Discussion started by: sanjeevcseng
3 Replies

6. Shell Programming and Scripting

Shell script for a writing the directory structure to a file

Hi All, I am new user of shell scripting has come up with a problem. that I have a directory structure like : Home | |--------A | |----trunk | |-------A_0_1/ | | | |-------A_0_2/ | |--------B | ... (6 Replies)
Discussion started by: bhaskar_m
6 Replies

7. Shell Programming and Scripting

How to remove xml namespace from xml file using shell script?

I have an xml file: <AutoData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Table1> <Data1 10 </Data1> <Data2 20 </Data2> <Data3 40 </Data3> <Table1> </AutoData> and I have to remove the portion xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" only. I tried using sed... (10 Replies)
Discussion started by: Gary1978
10 Replies

8. Programming

compare XML/flat file with UNIX file system structure

Before i start doing something, I wanted to know whether the approach to compare XML file with UNIX file system structure. I have a pre-configured file(contains a list of paths to executables) and i need to check against the UNIX directory structure. what are the various approches should i use ? I... (6 Replies)
Discussion started by: shafi2all
6 Replies

9. Shell Programming and Scripting

change function structure with perl (regExp)

Hello all i have some function what looks like this class.method("servantName").servantMethod(arg1,arg2,arg3) now i need to convert it to : class.method("servantName","servantMethod",arg1,arg2,arg3); is there any wasy way to do that consider that the arg1+2+3 can be also... (1 Reply)
Discussion started by: umen
1 Replies
Login or Register to Ask a Question