Extract strings from XML files and create a new XML

06-16-2015

Registered User

6, 0

Join Date: Mar 2015

Last Activity: 22 June 2015, 9:09 AM EDT

Posts: 6

Thanks Given: 1

Thanked 0 Times in 0 Posts

Hello RudiC,

Thank you for your reply! It doesn't work for me. I assume that the grep command you gave me is missing the XML file from where the information should be extracted.

Milano

milano.churchil

View Public Profile for milano.churchil

Find all posts by milano.churchil

06-16-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by milano.churchil

Hello RudiC,

Thank you for your reply! It doesn't work for me. I assume that the grep command you gave me is missing the XML file from where the information should be extracted.

Milano

You assume incorrectly. The code RudiC provided does exactly what you asked for given the filenames you provided. But, of course we're making assumptions about the utilities you have installed on your system, the shell you're using, and the operating system you're using.

What operating system are you using?
What version of UNIX/Linux utilities are you using?
What shell are you using?
What output did RudiC's code produce on your system?
Are you sure that the filenames you provided contain data in the same format as your sample data? (For instance, does C:/temp/input.txt contain <carriage-return><newline> line terminators instead of the <newline> line terminators expected by UNIX and Linux system utilities?)

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

06-19-2015

Registered User

6, 0

Join Date: Mar 2015

Last Activity: 22 June 2015, 9:09 AM EDT

Posts: 6

Thanks Given: 1

Thanked 0 Times in 0 Posts

Yes, this was my mistake. Is working good for my example. I will test it on larger files too. Now I am trying to delete the extracted rows from the original XML. Hope I will manage it.
Thnak you!

Milano

---------- Post updated at 05:19 AM ---------- Previous update was at 02:28 AM ----------

I tried this to remove the line that were extracted from the xml file, but I got an error of

Code:

enexpected end of file

when I am trying to run the script.

Code:

sed -i 'iE "$(tr -d "'" </home/qqomtws/kwom/Test_HY/test_hy.txt | tr '\n' '|')" /home/qqomtws/kwom/Test_HY/test_hy.xml' ./infile

I put the command just after the grep command that extracts the lines I am interested in. Any advice?

Thank you!
Milano

---------- Post updated 06-19-15 at 03:07 AM ---------- Previous update was 06-18-15 at 05:19 AM ----------

Quote:

Originally Posted by Don Cragun

Yes, this was my mistake. Is working good for my example. I will test it on larger files too. Now I am trying to delete the extracted rows from the original XML. Hope I will manage it.
Thnak you!

I tried this to remove the lines that were extracted from the xml file, but I got an error of

Code:

unexpected end of file

when I am trying to run the script.

Code:

sed -i 'iE "$(tr -d "'" <C:/temp/input.txt | tr '\n' '|') " C:/temp/output.txt'./infile

I put the command just after the grep command that extracts the lines I am interested in. Any advice?

Milano

---------- Post updated at 04:22 AM ---------- Previous update was at 03:07 AM ----------

Quote:

Originally Posted by RudiC

Better, but still a bit vague. For EXACTLY your setup, this might work:

Code:

grep -iE "$(tr -d "'" <C:/temp/input.txt | tr '\n' '|')header|footer" C:/temp/output.txt
<Header>My favorite restaurant</Header>
         <name>Belgian Waffles</name>
         <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
         <name>Strawberry Belgian Waffles</name>
         <description>Light Belgian waffles covered with strawberries and whipped cream</description>
         <name>Berry-Berry American Pie</name>
         <description>Light American Pie covered with an assortment of fresh berries and whipped cream</description>
<Footer>My favorite restaurant</Footer>

Redirect to C:/temp/output.xml if happy.

Yes, this was my mistake. Is working good for my example. I will test it on larger files too. Now I am trying to delete the extracted rows from the original file. Hope I will manage it.
Thnak you!

I tried this to remove the line that were extracted from the xml file, but I got an error of

Code:

enexpected end of file

when I am trying to run the script.

Code:

sed -i 'iE "$(tr -d "'" <C:/temp/input.txt | tr '\n' '|') " C:/temp/output.txt'./infile

I put the command just after the grep command that extracts the lines I am interested in. Any advice?

Many thanks,
Milano

milano.churchil

View Public Profile for milano.churchil

Find all posts by milano.churchil

06-19-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

You can't do that, replace one comand ( grep ) by another ( sed ) with identical parameter set, and hope that it works.
To remove those selected lines from the original file, redirect the grep result to a temp file and try

Code:

grep -vfTMP C:/temp/output.txt

RudiC

View Public Profile for RudiC

Find all posts by RudiC

06-19-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

The obvious simple thing to do (to extract all of the lines that the 1st grep did NOT extract) would be to just rerun that script adding a -v option:

Code:

grep -viE "$(tr -d "'" <C:/temp/input.txt | tr '\n' '|')header|footer" C:/temp/output.txt

Or, you could use RudiC's suggestion (but I would add a -F in case some of the strings extracted by the first grep contain characters that are special in a BRE):

Code:

grep -vFfTMP C:/temp/output.txt

If you are going to be running this script regularly, I would seriously consider rewriting it to use awk instead of grep. If you use awk you could produce both output files in one pass without needing to run grep twice and without needing to read your input XML file twice:

Code:

awk -v ERE="$(tr -d "'" <C:/temp/input.txt | tr '\n' '|')header|footer" '
BEGIN {ERE = tolower(ERE)}
      {print > ((tolower($0) ~ ERE) ? "matched.xml" : "unmatched.xml")}
' C:/temp/output.txt

Change matched.xml and unmatched.xml to the pathnames of the files you want to contain the matched lines and the unmatched lines, respectively. I assume that you already know that neither of those output files can be the input file for this awk script!

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

06-22-2015

Registered User

6, 0

Join Date: Mar 2015

Last Activity: 22 June 2015, 9:09 AM EDT

Posts: 6

Thanks Given: 1

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by Don Cragun

The obvious simple thing to do (to extract all of the lines that the 1st grep did NOT extract) would be to just rerun that script adding a -v option:

Code:

grep -viE "$(tr -d "'" <C:/temp/input.txt | tr '\n' '|')header|footer" C:/temp/output.txt

Or, you could use RudiC's suggestion (but I would add a -F in case some of the strings extracted by the first grep contain characters that are special in a BRE):

Code:

grep -vFfTMP C:/temp/output.txt

Code:

awk -v ERE="$(tr -d "'" <C:/temp/input.txt | tr '\n' '|')header|footer" '
BEGIN {ERE = tolower(ERE)}
      {print > ((tolower($0) ~ ERE) ? "matched.xml" : "unmatched.xml")}
' C:/temp/output.txt

Thank you a lot! It works very well! A great advice also with

Code:

awk

Many thanks,
Milano

Last edited by milano.churchil; 06-22-2015 at 07:26 AM.. Reason: writting misstake

milano.churchil

View Public Profile for milano.churchil

Find all posts by milano.churchil

Shell Programming and Scripting

Extract strings from XML files and create a new XML

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting a single xml file into multiple xml files

Discussion started by: Narendra921631

2. Shell Programming and Scripting

Splitting xml file into several xml files using perl

Discussion started by: mcosta

3. Shell Programming and Scripting

Extract a particular xml only from an xml jar file

Discussion started by: qwerty000

4. Shell Programming and Scripting

Compare two xml files while ignoring some xml tags

Discussion started by: Shaishav Shah

5. Shell Programming and Scripting

Extract strings within XML file between different delimiters

Discussion started by: bab@faa

6. Programming

Extract xml data and create word document using perl.

Discussion started by: veerubiji

7. Programming

extract xml data and create word document using perl.

Discussion started by: veerubiji

8. Windows & DOS: Issues & Discussions

Renaming files with strings from xml tags

Discussion started by: degoor

9. Shell Programming and Scripting

Perl script for extract data from xml files

Discussion started by: allways4u21

10. Shell Programming and Scripting

Parse an XML task list to create each task.xml file

Discussion started by: MissI