Extract strings from XML files and create a new XML


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract strings from XML files and create a new XML
# 1  
Old 06-09-2015
Extract strings from XML files and create a new XML

Hello everybody,

I have a double mission with some XML files, which is pretty challenging for my actual beginner UNIX knowledge. I need to extract some strings from multiple XML files and create a new XML file with the searched strings..

The original XML files contain the source code for creating PDF files. I write here an abstract example and explain after the challenge.

Code:
<Header>My favorite restaurant</Header>
   <breakfast_menu>
      <food>
         <name>Belgian Waffles</name>
         <price>$5.95</price>
         <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
         <calories>650</calories>
       </food>
       <food>
         <name>Strawberry Belgian Waffles</name>
         <price>$7.95</price>
         <description>Light Belgian waffles covered with strawberries and whipped cream</description>
         <calories>900</calories>
       </food>
       <food>
         <name>Berry-Berry American Pie</name>
         <price>$8.95</price>
         <description>Light American Pie covered with an assortment of fresh berries and whipped cream</description>
         <calories>900</calories>
       </food>
       <food>
          <name>French Toast</name>
          <price>$4.50</price>
          <description>Thick slices made from our homemade sourdough bread</description>
          <calories>600</calories></food><food><name>Homestyle Breakfast</name>
          <price>$6.95</price>
          <description>Two eggs, bacon or sausage, toast, and our ever-popular hash browns</description>
          <calories>950</calories>
          </food>
   </breakfast_menu>
<Footer>My favorite restaurant</Footer>

So, the UNIX script should extract the Header, the entire rows that contain 'Belgian' and 'American' and the Footer and put them in a new XML file. The list with the searched strings is provided through a separate Input file. I hope I managed to create a clear requirement. Please let me know if any extra information is needed.

Thank you very much,
Milano
Moderator's Comments:
Mod Comment Thank you for providing a much better input sample. But, please do not erase CODE tags that I have entered for you twice now!

Last edited by Don Cragun; 06-11-2015 at 02:07 PM.. Reason: Add CODE and ICODE tags, again.
# 2  
Old 06-09-2015
Hello and welcome to the forum milano.churchil
  1. This is not a valid xml code.
  2. Please use code tags, as you have accepted by the forum rules.
  3. What have you tried so far?
Have a nice day.
# 3  
Old 06-09-2015
Is this a homework assignment?

Homework must be posted in the homework & coursework questions forum and must include a fully filled out questionnaire from the homework template.
# 4  
Old 06-11-2015
Hello! This is not a homework, is something that I need for work. Please let me now if is necessary to change the topic or put more information. Thank you!

Milano

---------- Post updated at 04:59 AM ---------- Previous update was at 04:56 AM ----------

So far I tried the 'csplit' command, but it doesn't working for what I need, because there are multiple strings to be found and extract into a new XML file.
# 5  
Old 06-11-2015
Quote:
So, the UNIX script should extract the Header, the entire rows that contain 'Belgian' and 'American' and the Footer and put them in a new XML file. The list with the searched strings is provided through a separate Input file. I hope I managed to create a clear requirement. Please let me know if any extra information is needed.
What is the pathname of the "separate Input file"?
What is the format of the "separate Input file"?
What is the pathname of your "original XML file"?
What pathnames do you to be given to the output file (or files) that are to be created?
Show us a sample "separate Input file".
Show us the exact output file (or files) you want to create with the updated XML file you have provided in post #1 in this thread and the separate Input file that you will provide.

And, PLEASE, use CODE tags when displaying all sample input files, all sample output files, and all sample code segments!
# 6  
Old 06-12-2015
Hello,

1. The pathname of the input file is C:/temp/input.txt
2. The format of the input file is .txt
3. The pathname of the XML file is C:/temp/output.txt
4. The pathname of the output file is C:/temp/output.xml

Input file input.txt:
Code:
'Belgian'
'American'

Output file output.xml:
Code:
<Header>My favorite restaurant</Header>
         <name>Belgian Waffles</name>
         <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
         <name>Strawberry Belgian Waffles</name>
         <description>Light Belgian waffles covered with strawberries and whipped cream</description>
         <name>Berry-Berry American Pie</name>
         <description>Light American Pie covered with an assortment of fresh berries and whipped cream</description>
<Footer>My favorite restaurant</Footer>


I hope now is better! Thank you again!

Milano
# 7  
Old 06-12-2015
Better, but still a bit vague. For EXACTLY your setup, this might work:
Code:
grep -iE "$(tr -d "'" <C:/temp/input.txt | tr '\n' '|')header|footer" C:/temp/output.txt
<Header>My favorite restaurant</Header>
         <name>Belgian Waffles</name>
         <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
         <name>Strawberry Belgian Waffles</name>
         <description>Light Belgian waffles covered with strawberries and whipped cream</description>
         <name>Berry-Berry American Pie</name>
         <description>Light American Pie covered with an assortment of fresh berries and whipped cream</description>
<Footer>My favorite restaurant</Footer>

Redirect to C:/temp/output.xml if happy.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting a single xml file into multiple xml files

Hi, I'm having a xml file with multiple xml header. so i want to split the file into multiple files. Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix. eg : <?xml version="1.0" encoding="UTF-8"?> <ml:individual... (3 Replies)
Discussion started by: Narendra921631
3 Replies

2. Shell Programming and Scripting

Splitting xml file into several xml files using perl

Hi Everyone, I'm new here and I was checking this old post: /shell-programming-and-scripting/180669-splitting-file-into-several-smaller-files-using-perl.html (cannot paste link because of lack of points) I need to do something like this but understand very little of perl. I also check... (4 Replies)
Discussion started by: mcosta
4 Replies

3. Shell Programming and Scripting

Extract a particular xml only from an xml jar file

Hi..need help on how to extract a particular xml file only from an xml jar file... thanks! (2 Replies)
Discussion started by: qwerty000
2 Replies

4. Shell Programming and Scripting

Compare two xml files while ignoring some xml tags

I've got two different files and want to compare them. File 1 : <response ticketId="944" type="getQueryResults"><status>COMPLETE</status><description>Query results fetched successfully</description><recordSet totalCount="1" type="sms_records"><record id="38,557"><columns><column><name>orge... (2 Replies)
Discussion started by: Shaishav Shah
2 Replies

5. Shell Programming and Scripting

Extract strings within XML file between different delimiters

Good afternoon! I have an XML file from which I want to extract only certain elements contained within each line. The problem is that the format of each line is not exactly the same (though similiar). For example, oa_var will be in each line, however, there may be no value or other... (3 Replies)
Discussion started by: bab@faa
3 Replies

6. Programming

Extract xml data and create word document using perl.

Hi, I have large xml data file.I need to extract node and some tags in the node and after I need to create word document. my XMl data is look like as below -<student> <number>24</number> <education>bachelor</bachelor> <specialization>computers</specialization> ... (3 Replies)
Discussion started by: veerubiji
3 Replies

7. Programming

extract xml data and create word document using perl.

hi, i have large xml file which contains students information, i need to extract student number and some address tags and create a word document for the extracted data. my data looking llike this <student> <number>24</number> <education>bachelors</education> ... (1 Reply)
Discussion started by: veerubiji
1 Replies

8. Windows & DOS: Issues & Discussions

Renaming files with strings from xml tags

Hello! I need to rename 400+ xml files. The name of the specific file is contained in a xml tag in the file itself. The batch file should rename all these files with strings found in xml tags. Every xml file has the following tags: <footnote><para>FILENAME</para></footnote> I have to get... (3 Replies)
Discussion started by: degoor
3 Replies

9. Shell Programming and Scripting

Perl script for extract data from xml files

Hi All, Prepare a perl script for extracting data from xml file. The xml data look like as AC StartTime="1227858839" ID="88" ETime="1227858837" DSTFlag="false" Type="2" Duration="303" /> <AS StartTime="1227858849" SigPairs="119 40 98 15 100 32 128 18 131 23 70 39 123 20 120 27 100 17 136 12... (3 Replies)
Discussion started by: allways4u21
3 Replies

10. Shell Programming and Scripting

Parse an XML task list to create each task.xml file

I have an task definition listing xml file that contains a list of tasks such as <TASKLIST <TASK definition="Completion date" id="Taskname1" Some other <CODE name="Code12" <Parameter pname="Dog" input="5.6" units="feet" etc /Parameter> <Parameter... (3 Replies)
Discussion started by: MissI
3 Replies
Login or Register to Ask a Question