Finding specific string in file and storing in another file


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Finding specific string in file and storing in another file
# 1  
Old 01-29-2019
Finding specific string in file and storing in another file

Text in input file is like this

Code:
<title>
	<band height="21"  isSplitAllowed="true" >
	<staticText>
	<reportElement
				x="1"
				y="1"
				width="313"
				height="20"
				key="staticText-1"/>
	    		<box></box>
				<textElement>
				<font fontName="Arial" pdfFontName="Helvetica-Bold" size="14" isBold="true" isUnderline="true"/>
				</textElement>
		    	<text><![CDATA[**4) Computation of Tier I and Tier II Capital :]**]></text>
				</staticText>
			</band>
		</title>

Output file should have:
4) Computation of Tier I and Tier II Capital :

File have many <title> and [CDATA] tags. but i want to copy text which is under tag <title> under <CDATA> and save its output in another file.
# 2  
Old 01-29-2019
Code:
sed -nr '/\<title/,/\/title/ H; /\/title/{x; s/.*CDATA[^ ]+\s+([^:]+:).*/\1/p}' file >newfile


Last edited by nezabudka; 01-29-2019 at 06:36 AM..
# 3  
Old 01-29-2019
This seems like XML, but there's an incorrect xml syntax in here: CDATA seems to be not correctly formed. (Is it just added manually to point out the data you want?)

Syntax used here is: [CDATA[**sometext]**]
CDATAs normal syntax is: [CDATA[some text]]

You can fix that with an intermediate file before using an xml parser like that:

Code:
sed -e 's/\[CDATA\[\*\*/[CDATA[/' -e 's/\]\*\*\]/]]/' data.xml >data.tmp.xml

With the Syntax fixed you may extract the wanted data as follows:

Code:
xmllint --nocdata --xpath  "//title/band/staticText/text/text()" data.tmp.xml

or as you likely want to have each result on a seperate line:
Code:
xmllint --nocdata --shell  <<<'cat //title/band/staticText/text/text()' data.tmp.xml \
     | grep -vE '^(/ > ?)?( +-+)?$'

Note
In parsing XML files with sed/awk one is dropping the advantages of a robust clear text file format and invites errors on any simple whitespace or ordering change(changed whitespace? (un-)compressed output?,...) of the file which is to be expected any time due to the nature of that file format.

--- Post updated at 12:34 PM ---

Hmmmm.... xmlstarlet is more convenient than xmllint:
Code:
xmlstarlet sel -t -v "//title/band/staticText/text" data.tmp.xml


Last edited by stomp; 01-29-2019 at 08:42 AM..
# 4  
Old 01-29-2019
CDATA syntax is correct in file. like this <![CDATA[4) Computation of Tier I and Tier II Capital :]]>
# 5  
Old 01-29-2019
xmllint finally got newlines to separate node sets 4 months ago (on linux / libxml2).

Add newlines to 'xmllint --xpath' output (da35eeae) . Commits . GNOME / libxml2 . GitLab

Maybe some years until it's propageted within the linux distributions.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Search for a specific String in a log file for a specific date range

Hi, I have log file which rolls out every second which is as this. HttpGenRequest - -<!--OXi dbPublish--> <created="2014-03-24 23:45:37" lastMsgId="" requestTime="0.0333"> <response request="getOutcomeDetails" code="114" message="Request found no matching data" debug="" provider="undefined"/>... (3 Replies)
Discussion started by: karthikprakash
3 Replies

2. Shell Programming and Scripting

Finding duplicates in a file excluding specific pattern

I have unix file like below >newuser newuser <hello hello newone I want to find the unique values in the file(excluding <,>),so that the out put should be >newuser <hello newone can any body tell me what is command to get this new file. (7 Replies)
Discussion started by: shiva2985
7 Replies

3. Shell Programming and Scripting

Finding 4 current files having specific File Name pattern

Hi All, I am trying to find 4 latest files inside one folder having following File Name pattern and store them into 4 different variables and then use for processing in my shell script. File name is fixed length. 1) Each file starts with = ABCJmdmfbsjop letters + 7 Digit Number... (6 Replies)
Discussion started by: lancesunny
6 Replies

4. Shell Programming and Scripting

finding file with a specific range

Hi All, Thanks in advance File is generated with following format 31000000.xml to 48999999.xml 74000000.xml to 88999999.xml Above range should be find and moved into the folder named abc and below is another range should should be find and moved into folder named xyz ... (1 Reply)
Discussion started by: sujit_kashyap
1 Replies

5. UNIX for Advanced & Expert Users

Finding a specific range of character in file

hi, I want to store from 102 character to 128 character to a variable of header record which can be identified as 'HDR' which is the first 3 characters in the same line of a same.txt file. Please advise. Thanks (4 Replies)
Discussion started by: techmoris
4 Replies

6. Shell Programming and Scripting

Extracting particular string in a file and storing matched string in output file

Hi , I have input file and i want to extract below strings <msisdn xmlns="">0492001956</ msisdn> => numaber inside brackets <resCode>3000</resCode> => 3000 needs to be extracted <resMessage>Request time getBalances_PSM.c(37): d out</resMessage></ns2:getBalancesResponse> => the word... (14 Replies)
Discussion started by: sushmab82
14 Replies

7. Shell Programming and Scripting

add newline in file after finding specific text

Hi All, I am tring to insert a newline with "/" in a text file whenever there is the text "end;" right now I have inside file: . . end; I want to have: . . end; / I tried doing the following within the file :g/^end;/s//end; \/ / (4 Replies)
Discussion started by: jxh461
4 Replies

8. Shell Programming and Scripting

Finding file in specific subdirectories

Hi experts problem: i have a directory "DATA" with lots of subdirectories named as date with hudge data containning files. Directory = "DATA" subdirectory = "20090611" & "20090612" ...... 20090611 = thousands of files i wanna apply find command to find all files in... (3 Replies)
Discussion started by: The_Archer
3 Replies

9. Shell Programming and Scripting

Finding what pages link to a specific file

First time poster (so please excuse me in advance) ;) I have a webserver running linux, apache, etc. I have a list of HTML webpages that I want to delete because I think they are old. While I could delete them then check for broken links, I'd like to be more pro-active. I want to write a... (2 Replies)
Discussion started by: iansocool
2 Replies

10. UNIX for Dummies Questions & Answers

finding specific values in a within a file

Hi everyone, Can anyone guide me on how to search through a huge file and look on specific column and if it finds a discrepancy on that column that does not conform to the specified criteria, ie (1) Numeric and (3) alpha chars F123 or G333..etc, etc! then idientify it and redirect... (3 Replies)
Discussion started by: Gerry405
3 Replies
Login or Register to Ask a Question