Insert text after match in XML file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Insert text after match in XML file
# 1  
Old 11-08-2017
Insert text after match in XML file

Having a little trouble getting this to work just right.

I have xml files that i want to split some data.

I have 2 <name> tags within the file

I would like to take only the first tag and split the data.

tag example.
From this.
Code:
TAB<Name>smith, john</Name>

to
Code:
TAB<Name>smith, john</Name>
TAB<LastName>smith</LastName>
TAB<FirstName>john</Firstname>

I can get the replace and tab to work but it add firstname and lastname to both matches of "/Name>" instead of just the first match.

Any help would be greatly appreciated.
Moderator's Comments:
Mod Comment Please use CODE tags when displaying sample input, output, AND code segments.

Last edited by Don Cragun; 11-08-2017 at 09:12 PM.. Reason: Add missing CODE and ICODE tags.
# 2  
Old 11-08-2017
Quote:
Originally Posted by whegra
Having a little trouble getting this to work just right.

I have xml files that i want to split some data.

I have 2 <name> tags within the file

I would like to take only the first tag and split the data.

tag example.
From this.
Code:
TAB<Name>smith, john</Name>

to
Code:
TAB<Name>smith, john</Name>
TAB<LastName>smith</LastName>
TAB<FirstName>john</Firstname>

I can get the replace and tab to work but it add firstname and lastname to both matches of "/Name>" instead of just the first match.

Any help would be greatly appreciated.
Moderator's Comments:
Mod Comment Please use CODE tags when displaying sample input, output, AND code segments.
Please tell us what operating system and shell you're using AND show us the code you have that replacing all occurrences of the <Name> tag data.
# 3  
Old 11-08-2017
Facebook

Running cygwin on windows.

Bash shell.

Here is my not so elegant but gets the job done code.

Code:
ls *.xml > /cygdrive/x/$$tmp
while read filename ; do
	#Get Name Tag, first occurance
        a=`less $filename|grep Name|head -n1`
	#Get only lastname,firstname
        b=`echo $a|cut -d"<" -f2|cut -d">" -f2`
        #Lastname
	d=`echo $b|cut -d"," -f1`
	#Firstname
        e=`echo $b|cut -d"," -f2`
	#remove space before firstname
        e=$(sed -e 's/^[[:space:]]*//' <<<"$e")
	f="<LastName>$d</LastName>"
	g="<FirstName>$e</FirstName>"
	sed -bi "/tagaftername/i\    $f" $filename
	sed -bi "/tagaftername/i\    $g" $filename
		
done < /cygdrive/x/$$tmp
rm -f /cygdrive/x/$$tmp

I couldn't get the first occurance code to work so I decided to reverse things and go up instead of down. I looked at the tag right after name which happens to be unique.
# 4  
Old 11-08-2017
I would be tempted to try a different approach. The following invokes awk once no matter how many XML files you have to process. This should be a lot faster than invoking ls once and invoking less, grep, and head once per file processed and cut four times per file processed and sed three times per file processed. Try:
Code:
#!/bin/bash
awk -F'[<>]' '
function copyback(filename) {
	if(filename == "")
		return
	for(i = 1; i <= lc; i++)
		print line[i] > filename
	close(filename)
	lc = 0
}
FNR == 1 {
	copyback(lastfile)
	print "Processing " FILENAME
	found = 0
	lastfile = FILENAME
}
found {	line[++lc] = $0
	next
}
$2 == "Name" && $4 == "/Name" {
	line[++lc] = $0
	n = split($3, names, /, */)
	line[++lc] = sprintf("\t<LastName>%s</Lastname>", names[1])
	if(n >= 2)
		line[++lc] = sprintf("\t<FirstName>%s</FirstName>", names[2])
	found = 1
}
END {	copyback(lastfile)
}' *.xml

If someone else wants to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

Last edited by Don Cragun; 11-08-2017 at 11:02 PM.. Reason: Correct indentation in awk script.
# 5  
Old 11-09-2017
One issue, kind of a big one. Your code cuts everything out of the xml file before the first instance of the name tag. So if the name tag was at line 50, everything before that line gets deleted in the new file.

Also is there a way to keep line feed as CRLF similar to sed -b ?

I'll give the code is very fast. 5K files took 26 seconds vs 15 min for my method.
# 6  
Old 11-09-2017
Sorry about that. If you would have given us a sample input file and the output that should be produced from that input, I would have caught that early lines in input files were being dropped. There is nothing in my code that would remove <carriage-return> characters from existing lines in the file, but it didn't put <carriage-return>s in the lines it adds (and if you wanted DOS format text files, that is something we would have expected you to explicitly state in your requirements). Does the following replacement come closer to meeting your requirements?
Code:
#!/bin/bash
awk -F'[<>]' '
function copyback(filename) {
	if(filename == "")
		return
	for(i = 1; i <= lc; i++)
		print line[i] > filename
	close(filename)
	lc = 0
}
FNR == 1 {
	copyback(lastfile)
	print "Processing " FILENAME
	found = 0
	lastfile = FILENAME
}
{	line[++lc] = $0
}
found {	next
}
$2 == "Name" && $4 == "/Name" {
	# line deleted here.
	n = split($3, names, /, */)
	line[++lc] = sprintf("\t<LastName>%s</Lastname>\r", names[1])
	if(n >= 2)
		line[++lc] = sprintf("\t<FirstName>%s</FirstName>\r", names[2])
	found = 1
}
END {	copyback(lastfile)
}' *.xml

Changes from the previous version are shown in red.
This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 11-09-2017
Works great. I do see your additions add CRLF linefeed, but all other lines get changed to LF.

I've just added a unix2dos argument at the very end once processing is done, only added 1min for 5k files.

To recap - Converting 5,000 xml files.

My code: 15min
Your code: 1min
unix2dos: 1min

I really appreciate your assistance.
Thank you.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Match text to lines in a file, iterate backwards until text or text substring matches, print to file

hi all, trying this using shell/bash with sed/awk/grep I have two files, one containing one column, the other containing multiple columns (comma delimited). file1.txt abc12345 def12345 ghi54321 ... file2.txt abc1,text1,texta abc,text2,textb def123,text3,textc gh,text4,textd... (6 Replies)
Discussion started by: shogun1970
6 Replies

2. Programming

How to write in other language in text/xml file by reading english text/xml file using C++?

Hello Team, I have 2 files.one contains english text and another contains Japanese. so i have to read english text and replace the text with Japanesh text in third file. Basically, I need a help to write japanese language in text/xml file.I heard wstring does this.Not sure how do i write... (2 Replies)
Discussion started by: SA_Palani
2 Replies

3. Shell Programming and Scripting

Insert file after only first match

i'm using the following code to add the entire content of a file (/tmp/resources.txt) to the line directly below the line containing a pattern (wonderful) in the file mainfile.txt: sed '/^wonderful/ r /tmp/resources.txt' mainfile.txt the problem is, it adds the entire content of... (1 Reply)
Discussion started by: SkySmart
1 Replies

4. Shell Programming and Scripting

Search a certain char and insert new text if a match found

Have a file which has the create statement like below create table emp ( empno integer, empname char(50)) primary index(empno); i need to find a string starting with create and ends with semi-colon ;. if so insert the below statement before create statement rename table emp to emp_rename;... (2 Replies)
Discussion started by: Mohan0509
2 Replies

5. Shell Programming and Scripting

Display match or no match and write a text file to a directory

The below bash connects to a site, downloads a file, searches that file based of user input - could be multiple (all that seems to work). What I am not able to figure out is how to display on the screen match found or no match found" and write a file to a directory (C:\Users\cmccabe\Desktop\wget)... (4 Replies)
Discussion started by: cmccabe
4 Replies

6. Shell Programming and Scripting

Sed; insert text two lines above match

Hi! Considering below text, how would I use sed to insert text right below the v0005-line, using the SEPARATOR-line as a pattern to search for, so two lines above the separator? I can do it right above the separator, but not 2 lines... # v0004 - Some text # v0005 - More text #... (5 Replies)
Discussion started by: indo1144
5 Replies

7. Shell Programming and Scripting

Insert a new subnode in a xml file

Hi, i have an xml file and i want to edit a new sub node in a file like val="<activity android:label="@string/app_name" android_name=".MainActivity1" android:launchMode="singleTask" android:screenOrientation="portrait" ... (1 Reply)
Discussion started by: gautamshrm3
1 Replies

8. Shell Programming and Scripting

Insert value of env variable in xml file

Hello, I have the following variables set in my env echo $MY_XSD_FILE /home/jak/sample.xsd echo $MY_INTERVAL_VALUE 4 I want to insert them between the xml tags in my xml file cat sample.xml ::::::::::::::: ::::::::::::::: <property name="FILE"></property> :::::::::::::::::::::::... (2 Replies)
Discussion started by: jakSun8
2 Replies

9. Shell Programming and Scripting

Insert few lines above a match using sed, and within a perl file.

Greetings all, I am trying to match a string, and after that insert a few lines above that match. The string is "Version 1.0.0". I need to insert a few lines ONLY above the first match (there are many Version numbers in the file). The rest of the matches must be ignored. The lines I need to... (2 Replies)
Discussion started by: nagaraj s
2 Replies

10. Shell Programming and Scripting

Insert text file only after the first match with SED

Hello, I'm new in Shell scripting but i should write a script, which inserts the license header out of a txt-File into the files in our Projekt. For the Java classes it runs without Problems but for XML files not. At xml-files i have to put the license Header after the xml-Header (?xml... (1 Reply)
Discussion started by: PhoenixONE
1 Replies
Login or Register to Ask a Question