Change XML file structure script

10-05-2011

Registered User

290, 37

Join Date: Jan 2009

Last Activity: 28 June 2018, 4:18 PM EDT

Location: Tegucigalpa, Honduras

Posts: 290

Thanks Given: 8

Thanked 37 Times in 36 Posts

Change XML file structure script

Hi to all,

Maybe someone could help me. I want to transform the structure of a xml file.

I have this input.xml:

Code:

<?xml version="1.0" encoding="utf-8"?>
<votings>
  <file name="Reference 19762">
    <case id="No. 3 Div. 870">
      <j v="1">Peter</j>
      <j v="1">Ely</j>
      <j v="9">Mark</j>
    </case>
    <case id="No. 3 Div. 887">
      <j v="1">Mary</j>
      <j v="9">Peter</j>
      <j v="1">Ely</j>
      <j v="1">Perry</j>
      <j v="1">Mark</j>
    </case>
  </file>
</votings>

and the required output should be:

Code:

<Table ss:ExpandedColumnCount="3" ss:ExpandedRowCount="6" x:FullColumns="1"
   x:FullRows="1" ss:DefaultColumnWidth="60" ss:DefaultRowHeight="15">
   <Column ss:Width="30.75"/>
   <Column ss:Width="67.5" ss:Span="1"/>
   <Row ss:AutoFitHeight="0">
    <Cell ss:Index="2"><Data ss:Type="String">No. 3 Div. 870</Data></Cell>
    <Cell><Data ss:Type="String">No. 3 Div. 887</Data></Cell>
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell><Data ss:Type="String">Ely</Data></Cell>
    <Cell><Data ss:Type="Number">1</Data></Cell>
    <Cell><Data ss:Type="Number">1</Data></Cell>
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell><Data ss:Type="String">Mark</Data></Cell>
    <Cell><Data ss:Type="Number">9</Data></Cell>
    <Cell><Data ss:Type="Number">1</Data></Cell>
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell><Data ss:Type="String">Mary</Data></Cell>
    <Cell><Data ss:Type="Number">0</Data></Cell>
    <Cell><Data ss:Type="Number">1</Data></Cell>
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell><Data ss:Type="String">Perry</Data></Cell>
    <Cell><Data ss:Type="Number">0</Data></Cell>
    <Cell><Data ss:Type="Number">1</Data></Cell>
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell><Data ss:Type="String">Peter</Data></Cell>
    <Cell><Data ss:Type="Number">1</Data></Cell>
    <Cell><Data ss:Type="Number">9</Data></Cell>
   </Row>
  </Table>

As you can see, in input.xml there are 2 blocks of "cases" with names (it could be more than 2 "cases" blocks. e.g 5, 7, 8 etc.).
In both "cases" blocks appear some names repeated (Peter, Ely and Mark appear in both blocks)

Then, in the output the "Row" blocks should be obtained as follow:

Block 1:

Code:

<Table ss:ExpandedColumnCount="3" ss:ExpandedRowCount="6" x:FullColumns="1"

Variables here are in red:
ExpandedColumnCount = Number of "Cases" blocks + 1 = 2 +1 = 3
ExpandedRowCount = Number of unique names + 1 = 5 + 1 = 6

Block 2 (the first "Row" block):

Code:

   <Row ss:AutoFitHeight="0">
    <Cell ss:Index="2"><Data ss:Type="String">No. 3 Div. 870</Data></Cell>
    <Cell><Data ss:Type="String">No. 3 Div. 887</Data></Cell>
   </Row>

The values in red should be taken from "case id" in "cases" blocks.

Blocks 3,4,5...N ("Row" blocks for each unique name):

Code:

   <Row ss:AutoFitHeight="0">
    <Cell><Data ss:Type="String">Ely</Data></Cell>
    <Cell><Data ss:Type="Number">1</Data></Cell>
    <Cell><Data ss:Type="Number">1</Data></Cell>
   </Row>

The values in red are taken from "cases" blocks, but needs to look up unique names and show a unique "Row" block for each name and
all related values for each name within the same "Row" block as shown above.

I really hope you could help me with this. I'm a kind of beginner in this type of scripts.

Thanks in advance.

Regards.

Last edited by cgkmal; 10-05-2011 at 05:32 AM..

cgkmal

View Public Profile for cgkmal

Find all posts by cgkmal

10-05-2011

Registered User

1,203, 103

Join Date: Mar 2007

Last Activity: 28 January 2020, 10:33 PM EST

Location: Orlando, Florida

Posts: 1,203

Thanks Given: 1

Thanked 103 Times in 100 Posts

I read your requirements three times and I still could not figure it out.

You have a large project.

Quote:

I really hope you could help me with this. I'm a kind of beginner in this type of scripts.

Why do you have to write a solution using unix shell script?

Try using a compiled language.

Shell_Life

View Public Profile for Shell_Life

Find all posts by Shell_Life

10-05-2011

Registered User

290, 37

Join Date: Jan 2009

Last Activity: 28 June 2018, 4:18 PM EDT

Location: Tegucigalpa, Honduras

Posts: 290

Thanks Given: 8

Thanked 37 Times in 36 Posts

Quote:

Originally Posted by Shell_Life

I read your requirements three times and I still could not figure it out.

Hi, thanks for answer.

Sorry for my explanation. In input file there are repeated names and in the output only appear once with their corresponding values. Maybe you could understand better only seeing how is the input and how is the output.

Quote:

Originally Posted by Shell_Life

You have a large project.

I'm not sure, I've seen several questions of more complex xml transformations here, I hope this is not too complex. Maybe awk, sed or combination of those with bash.

Quote:

Originally Posted by Shell_Life

Why do you have to write a solution using unix shell script?

Try using a compiled language.

This because is what I could understand and what I thought could be the solution. I really dont know a compiled language to try doing something like this.

Many thanks for any help.

Regards.

cgkmal

View Public Profile for cgkmal

Find all posts by cgkmal

10-05-2011

Registered User

1,910, 488

Join Date: Sep 2008

Last Activity: 22 December 2019, 2:31 AM EST

Location: San Jose, CA

Posts: 1,910

Thanks Given: 54

Thanked 488 Times in 481 Posts

Like Shell_Life said, it is complex!
And yes it is possible through scripts, not that easy... But to what extend have you implemented this?...

btw, I also read your requirement 2 - 3 times, well couldn't catch all of them!

--ahamed

ahamed101

View Public Profile for ahamed101

Find all posts by ahamed101

10-06-2011

Registered User

290, 37

Join Date: Jan 2009

Last Activity: 28 June 2018, 4:18 PM EDT

Location: Tegucigalpa, Honduras

Posts: 290

Thanks Given: 8

Thanked 37 Times in 36 Posts

Hi ahamed,

I intend to parse the input in that way because the output would be open in MS Excel, but showing in different layout, as I said before, without repeat the names, only show them once and locate within the same block every associated value to each name. One block for each name.

Sorry for my explanation, I hope somebody could get my english explanation.

Thanks in advance.

Regards.

---------- Post updated 10-06-11 at 03:08 AM ---------- Previous update was 10-05-11 at 11:51 AM ----------

Hi again,

Answering my own question to show if anybody is interested in the future.

I had to separate step by step the conversion needed, it's not pure bash. I helped me with individual awk commands to get
each section of the script.

Probably the same script could be obtained in a unique awk program, I'll like to see how to join this code in a single awk program.

Well, the code I could work is:

Code:

#########################################################################################################################
### Begin of script of XML conversion ###################################################################################
#########################################################################################################################

Voting_Info="input.xml"

## (1) - Get cases id's between double quotes e.g. "<case id="No. 3 Div. 870">" and store them in varible array ###
oldIFS=$IFS
IFS=$'\n'
Cases=($(awk -F "[\"]" '/id=/{print $2}' "$Voting_Info"))
IFS=$oldIFS

## (2) - Count "cases" blocks
CasesNumber=($(awk -F "[\"]" '/id=/{print $2}' "$Voting_Info" | wc -l))
let "CasesNumber=$CasesNumber+1"  # Add "1" to set the value in "ExpandedColumnCount"

## (3) - Get unique names between ">" and "</j>", e.g. ">Mary</j>" and store them in varible array ###
UniqNames=($(awk -F "[><]" '/v=/{a[$3];} END{for (i in a) print i;}' "$Voting_Info" | sort))

## (4) Print first lines of output
echo "  <Table ss:ExpandedColumnCount=\""$CasesNumber"\" ss:ExpandedRowCount=\"6\" x:FullColumns=\"1\""
echo "   x:FullRows=\"1\" ss:DefaultColumnWidth=\"60\" ss:DefaultRowHeight=\"15\">"
echo "   <Column ss:Width=\"30.75\"/>"
echo "   <Column ss:Width=\"67.5\" ss:Span=\"1\"/>"

## (5) - Print first block, that is the "Cases" names block.
    echo "   <Row ss:AutoFitHeight=\"0\">"
    echo "    <Cell ss:Index=\"2\"><Data ss:Type=\"String\">""${Cases[0]}""</Data></Cell>"
#for i in "${Cases[@]}"
for ((i=1;i<${#Cases[*]};i++))
   do
    echo "    <Cell><Data ss:Type=\"String\">""${Cases[$i]}""</Data></Cell>"
done

## (6) - Loop to get values of each name within all cases blocks
for j in "${UniqNames[@]}"
   do
    echo "   </Row>"
    echo "    <Row ss:AutoFitHeight=\"0\">"
    echo "    <Cell><Data ss:Type=\"String\">"$j"</Data></Cell>"
    awk -v Z=$j -F"[\"><]+" '/case id/{v=0}/\/case/{print "    <Cell><Data ss:Type=\"Number\">" v "</Data></Cell>"}$0 ~ Z{v=$3}' "$Voting_Info"
done
## (7) - Print last lines to complete output
echo "   </Row>"
echo "  </Table>"

Hope this helps.

Thanks as always for your help and time.

Regards

cgkmal

View Public Profile for cgkmal

Find all posts by cgkmal

10-06-2011

Registered User

1,690, 205

Join Date: Jun 2007

Last Activity: 13 July 2020, 5:35 PM EDT

Location: Mumbai, India

Posts: 1,690

Thanks Given: 139

Thanked 205 Times in 199 Posts

If the rules are always same.. and the criteria for pattern matching are unchanged, which means the actual file is same as sample you have posted except the variables.

This is what I have..
Please note I have just tried to achieve your requirement i.e I hadn't thought about tuning and efficiency.

Code:

datafile=xmlfile
ExpandedColumnCount=$(( $(grep -c '<case id' ${datafile}) + 1 ))
ExpandedRowCount=$(( $(awk -F '[<>]' ' /<j/ {print $3| "sort -u|wc -l"}' ${datafile}) + 1))


cat <<-ENDCAT1
<Table ss:ExpandedColumnCount="${ExpandedColumnCount}" ss:ExpandedRowCount="${ExpandedRowCount}" x:FullColumns="1"
    x:FullRows="1" ss:DefaultColumnWidth="60" ss:DefaultRowHeight="15">
    <Column ss:Width="30.75"/>
    <Column ss:Width="67.5" ss:Span="1"/>
    <Row ss:AutoFitHeight="0">
$(awk -F '"' ' /case id=/ {print $2}' ${datafile} | sed 's|^|      <Cell ss:Index="2"><Data ss:Type="String">|g' | sed 's|$|</Data></Cell>|g')
    </Row>
ENDCAT1

awk -F '[<>]' ' /<j/ {print $3| "sort -u"}' ${datafile} | while read name
do
cat <<-ENDCAT2
    <Row ss:AutoFitHeight="0">
      <Cell><Data ss:Type="String">${name}</Data></Cell>"
$(awk -F '["<>]' -v n=$name '/<j/ && $5 == n {print $3}' ${datafile} | sed 's|^|      <Cell><Data ss:Type="Number">|g' | sed 's|$|</Data></Cell>|g')
    </Row>
ENDCAT2
done

echo '</Table>'

O/P

Code:

<Table ss:ExpandedColumnCount="3" ss:ExpandedRowCount="6" x:FullColumns="1"
    x:FullRows="1" ss:DefaultColumnWidth="60" ss:DefaultRowHeight="15">
    <Column ss:Width="30.75"/>
    <Column ss:Width="67.5" ss:Span="1"/>
    <Row ss:AutoFitHeight="0">
      <Cell ss:Index="2"><Data ss:Type="String">No. 3 Div. 870</Data></Cell>
      <Cell ss:Index="2"><Data ss:Type="String">No. 3 Div. 887</Data></Cell>
    </Row>
    <Row ss:AutoFitHeight="0">
      <Cell><Data ss:Type="String">Ely</Data></Cell>"
      <Cell><Data ss:Type="Number">1</Data></Cell>
      <Cell><Data ss:Type="Number">1</Data></Cell>
    </Row>
    <Row ss:AutoFitHeight="0">
      <Cell><Data ss:Type="String">Mark</Data></Cell>"
      <Cell><Data ss:Type="Number">9</Data></Cell>
      <Cell><Data ss:Type="Number">1</Data></Cell>
    </Row>
    <Row ss:AutoFitHeight="0">
      <Cell><Data ss:Type="String">Mary</Data></Cell>"
      <Cell><Data ss:Type="Number">1</Data></Cell>
    </Row>
    <Row ss:AutoFitHeight="0">
      <Cell><Data ss:Type="String">Perry</Data></Cell>"
      <Cell><Data ss:Type="Number">1</Data></Cell>
    </Row>
    <Row ss:AutoFitHeight="0">
      <Cell><Data ss:Type="String">Peter</Data></Cell>"
      <Cell><Data ss:Type="Number">1</Data></Cell>
      <Cell><Data ss:Type="Number">9</Data></Cell>
    </Row>
</Table>

Please note, this doesn't include one thing..
Putting entry with zero if the name doesn't appear in any row block. but i think you can try that yourself.

clx

View Public Profile for clx

Find all posts by clx

10-06-2011

Registered User

290, 37

Join Date: Jan 2009

Last Activity: 28 June 2018, 4:18 PM EDT

Location: Tegucigalpa, Honduras

Posts: 290

Thanks Given: 8

Thanked 37 Times in 36 Posts

Hello anchal_khare,

Many thanks for reply and give some of your time to share your knowledge to help. Your code works beautifully!

With your code I've learned more than one thing new.
1-) I didnt know that "cat" can be used in that way, it has a name use cat in that way?
2-) The use of "while read" in combination of awk commands to avoid several steps, processing, resources and memory using array variables.
3-) I had forgotten the great features of "sed" to replace directly at the begin or at the end of a string, great.

I only modified the following in your code (the added code in red).

I changed this:

Code:

$(awk -F '"' ' /case id=/ {print $2}' ${datafile} | sed 's|^|      <Cell ss:Index="2"><Data ss:Type="String">|g' | sed 's|$|</Data></Cell>|g')

for this:

Code:

$(awk -F '"' ' /case id=/ {print $2}' ${datafile} | sed q | sed 's|^|    <Cell ss:Index="2"><Data ss:Type="String">|g' | sed 's|$|</Data></Cell>|g')
$(awk -F '"' ' /case id=/ {print $2}' ${datafile} | sed 1d | sed 's|^|    <Cell><Data ss:Type="String">|g' | sed 's|$|</Data></Cell>|g')

and this:

Code:

$(awk -F '["<>]' -v n=$name '/<j/ && $5 == n {print $3}' ${datafile} | sed 's|^|      <Cell><Data ss:Type="Number">|g' | sed 's|$|</Data></Cell>|g')

for this (to get zeros when there is not a match)

Code:

$(awk -v Z=$name -F"[\"><]+" '/case id/{v=0}/\/case/{print "    <Cell><Data ss:Type=\"Number\">" v "</Data></Cell>"}$0 ~ Z{v=$3}' ${datafile})

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
I think I'll be able to emulate your code to extend it and process other file that I need to, after this input.xml and input1.xml have
been changed in similar way they will be part of an final output.xml like show below:

Code:

Line1..............
Line2..............
..
.
Line30............

Output of processing input1.xml # Output from other input.xml

Output of processing input2.xml # The output your script already does

Line31............
Line31............
..
.
.
Line50............

In order to get that, may you help me with suggestions regarding this (assume your code is rutine2 and rutine1 is a code I have
to add to process input1.xml and I have both rutines working already):

A suggestion or idea in how to surround your script with a loop to process all somename_ci.xml in folder with Rutine1 and proccess
somename_vi.xml in folder with Rutine2.

The final code should look like this:

Code:

# there are couples of files with the same name, only different at the end with "_ci" and "_vi". 
E.g. December_ci.xml and December_vi.xml, June_ci.xml and June_vi.xml,... etc.

For i=1 to CountOfAllFilesInFolder/2 # divided by 2 since there are many couples of files to process in one output each couple
 
file1=somename_ci.xml # somename is the string that will vary
file2=somename_vi.xml # somename is the string that will vary
  do
      Code to add first 30 fixed lines #I'll think of emulate the way you use "cat" to add this fixed lines
      Code to add lines after processing somename_ci.xml #Rutine1, I'll add it later.
      Code to add lines after processing somename_vi.xml #Rutine2, Your code....
      Code to add last 20 fixed lines  #I'll think of emulate the way you use "cat" to add this fixed lines
done

I hope you can give and idea of how to get this last part of my code.

Thanks in advance.

Regards.

Last edited by cgkmal; 10-06-2011 at 06:25 PM..

cgkmal

View Public Profile for cgkmal

Find all posts by cgkmal

Shell Programming and Scripting

Change XML file structure script

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to change values in xml file?

Discussion started by: s1s2s3s4

2. Shell Programming and Scripting

Change attribute value in xml using shell script

Discussion started by: sebbenw

3. Shell Programming and Scripting

To change Specific Lines in An XML file

Discussion started by: smarlaku

4. Shell Programming and Scripting

KSH - help needed for creating a script to generate xml file from text file

Discussion started by: Yoodit

5. Shell Programming and Scripting

Change values in Log4j.xml using ksh script

Discussion started by: sanjeevcseng

6. Shell Programming and Scripting

Shell script for a writing the directory structure to a file

Discussion started by: bhaskar_m

7. Shell Programming and Scripting

How to remove xml namespace from xml file using shell script?

Discussion started by: Gary1978

8. Programming

compare XML/flat file with UNIX file system structure

Discussion started by: shafi2all

9. Shell Programming and Scripting

change function structure with perl (regExp)

Discussion started by: umen