Visit Our UNIX and Linux User Community


extract strings between tags


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting extract strings between tags
# 1  
Old 08-04-2009
extract strings between tags

Hi,

I have data as follows in a text file

<key='data1'>
<String>abcdef</String>
<String>abcdef1</String>
<String>abcdef2</String>
</key>

<key='data2'>
<String>abcdef</String>
<String>abcdef1</String>
<String>abcdef2</String>
<String>abcdef3</String>
</key>

Is there a way i can just get entries between <String> </String> in the data1 tag?

Appreciate any help.
# 2  
Old 08-04-2009
it would be better if u also post ur expected output but try this...

Code:
 
sed -n '/\<String\>/,/\<\/String\>/p' yourfile

# 3  
Old 08-05-2009
Hi.

I don't use XMLish files, but I ran across this utility. if you have access to xml_grep, this task can be straight-forward. I modified your data file to put it into proper format and to differentiate between data1 and data2, then ran this script:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate extract data from XML file, xml_grep.
# Reference for XPath: http://en.wikipedia.org/wiki/XPath_1.0
# xml_grep: http://xmltwig.com/tool/

echo
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) xml_grep
set -o nounset
echo

FILE=${1-data1}

echo " Data file $FILE:"
cat $FILE

echo
echo " Results:"
xml_grep --text_only --cond '*[@name="data1"]/String' $FILE

exit 0

producing:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0 
GNU bash 3.2.39
/usr/bin/xml_grep version 0.7

 Data file data1:
<project>
<key name="data1">
<String>abcdef</String>
<String>abcdef1</String>
<String>abcdef2</String>
</key>

<key name="data2">
<String>abcdefg</String>
<String>abcdefg1</String>
<String>abcdefg2</String>
<String>abcdefg3</String>
</key>
</project>

 Results:
abcdef
abcdef1
abcdef2

The xml_grep perl script was in the Debian repository for me. The site URL is listed in the script above. Good luck ... cheers, drl
# 4  
Old 08-05-2009
Code:
 
sed -e 's/\(<[^<][^<]*>\)//g' file.xml
 
OR
 
sed -e 's/\(<[^<][^<]*>\)//g; /^$/d' file.xml

This User Gave Thanks to edidataguy For This Post:
# 5  
Old 08-05-2009
gawk
Code:
awk 'BEGIN{RS="";FS="</String>"}
/data1/{
 for(i=1;i<=NF;i++){
    if($i ~ /String/){
        gsub(/.*String>/,"",$i)
        print $i
    }    
 } 
}' file

# 6  
Old 08-05-2009
thank you all for your replies

for the sed comands i am getting this output

C:\Perl>sed -e 's/\(<[^<][^<]*>\)//g' dump.xml
The filename, directory name, or volume label syntax is incorrect.

The output is the same for all the sed commands.

I tried the awk code and i got this error

String found where operator expected at awk.pl line 9, near "}'"
(Might be a runaway multi-line '' string starting on line 1)
(Missing semicolon on previous line?)
syntax error at awk.pl line 9, near "}'"
Execution of awk.pl aborted due to compilation errors.

line 9 is the last line and i gave my filename there i.e., }' dump.xml. This is my 9th line.

Not sure what is wrong. Appreciate any help.
# 7  
Old 08-06-2009
Thanks for all the posts.I finally got it working. However, i am getting output from both the 'data1' and data2' tags.

My expected output is just from data1 tag - i.e,

abcdef
abcdef1
abcdef2

Thanks in advance for any help.

Previous Thread | Next Thread
Test Your Knowledge in Computers #419
Difficulty: Easy
Few popular modern Web browsers support JavaScript with built-in interpreters.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract strings from output

I am having the following output when executing a dig command : dig @1.1.1.1 google.com +noall +answer +stats ; <<>> DiG 9.11.4-P1 <<>> @1.1.1.1 google.com +noall +answer +stats ; (1 server found) ;; global options: +cmd obodrm.prod.at.dmdsdp.com. 86154 IN A ... (1 Reply)
Discussion started by: liviusbr
1 Replies

2. UNIX for Beginners Questions & Answers

Extract content between strings

Hello i am stuck with this. i have input which is as follows /type/work /works/OL10627594W 3 2019-04-24T16:46:21.351549 {"created": {"type": "/type/datetime", "value": "2009-12-11T03:18:17.488715"}, "title": "Tog the dog", "covers": , "last_modified": {"type":... (3 Replies)
Discussion started by: ahfze
3 Replies

3. UNIX for Dummies Questions & Answers

Issue when using egrep to extract strings (too many strings)

Dear all, I have a data like below (n of rows=400,000) and I want to extract the rows with certain strings. I use code below. It works if there is not too many strings for example n of strings <5000. while I have 90,000 strings to extract. If I use the egrep code below, I will get error: ... (3 Replies)
Discussion started by: forevertl
3 Replies

4. UNIX for Dummies Questions & Answers

Extract strings based on the value

I have a file with multiple columns (in this case, the file has 3 columns): NM_001006304 (-33.7) XM_418228 (-38.4) JN880447 (-33.7) CR387600 (-33.7) CR524203 (-36.3) GALGA_6AKII_KRT75 (-33.7) GALGA25_SC7 (-31.9) CR352795 (-36.3) NM_204172 (-31.7) NM_204137 (-31.9) NM_001030561 (-36.3) AB011672... (7 Replies)
Discussion started by: yuejian
7 Replies

5. UNIX for Dummies Questions & Answers

Extract code between 2 strings.

Hi, Im having some problems with this. I have loaded a file with html code. All code is placed in the same line. I want to get everything between two given strings (including these strings and get only the first appearance). Example: File contains <html><body><a href='a.html'>abc</a><a... (5 Replies)
Discussion started by: ngb
5 Replies

6. Shell Programming and Scripting

sed to extract all strings

Hi, I have a text file containing 2 lines as follows: I'm trying to extract all the strings following an "AME." The output would be as follows: BUSINESS_UNIT PROJECT_ID ACTIVITY_ID RES_USER1 RESOURCE_ID_FROM ANALYSIS_TYPE BI_DISTRIB_STATUS BUSINESS_UNIT PROJECT_ID ACTIVITY_ID... (5 Replies)
Discussion started by: simpletech369
5 Replies

7. Shell Programming and Scripting

Extract two strings from a file and create a new file with these strings

I have the following lines in a log file. It would be great if some one can help me to create a new file with the just entries in the below format. 66.150.161.195 HPSAC=Z05 66.150.161.196 HPSAC=A05 That is just extract the IP address and the string DPSAC=its value 66.150.161.195 -... (1 Reply)
Discussion started by: Tuxidow
1 Replies

8. Shell Programming and Scripting

Extract text between two strings

Hi I have something like this: EXAMPLE 1 CREATE UNIQUE INDEX "STRING_1"."STRING_2" ON "BOSNI_CAB_EVENTO" ("CD_EVENTO" , "CD_EJECUCION" ) PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE(INITIAL 5242880 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "DB1000_INDICES_512K"... (4 Replies)
Discussion started by: chrispaz81
4 Replies

9. Shell Programming and Scripting

How to Extract text between two strings?

Hi, I want to extract some text between two strings in a line i am using following command i.e; awk '/-string1/,/-string2/' filename contents of file is--- line1 line2 aaa -bbb -ccc -string1 c,d,e -string2 line4 but it is showing complete line which is having searched strings. aaa... (19 Replies)
Discussion started by: emresearch
19 Replies

10. Shell Programming and Scripting

Extract data between two strings

Hi , I have a billing CDR file which has repeated lines as indicated below and I need to extract data between two strings (i.e.: <?> and </?>). Eventually, map that information with the corresponding field. I'm new to unix, any help will be greatly appreciated. Gamini Input (single line): !... (3 Replies)
Discussion started by: jaygamini
3 Replies

Featured Tech Videos