extract strings between tags


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting extract strings between tags
# 8  
Old 08-06-2009
Quote:
Originally Posted by userscript
...
I finally got it working. However, i am getting output from both the 'data1' and data2' tags.

My expected output is just from data1 tag - i.e,

abcdef
abcdef1
abcdef2

...
Here's one way to do it using Perl:

Code:
$
$ cat file1
<key='data1'>
<String>abcdef</String>
<String>abcdef1</String>
<String>abcdef2</String>
</key>
<key='data2'>
<String>ABCDEF</String>
<String>ABCDEF1</String>
<String>ABCDEF2</String>
<String>ABCDEF3</String>
</key>
$
$ perl -ne 'print $1,"\n" if $_ =~ m/data1/i...m/\/key/i and />(.*)</' file1
abcdef
abcdef1
abcdef2
$
$

tyler_durden
# 9  
Old 08-06-2009
I tried the following code, however it doesnt seem to work


open (FILE, "/path/dump.xml") || die ("Can't open dump.xml\n");
while (<FILE>)
{
#$sentence=~/data1/ - option1
$sentence="<key='data1'>"
#$sentence = "<key='data1'>" - option1
if($sentence eq "<key='data1'>")
{
sed -e 's/\(<[^<][^<]*>\)//g; /^$/d' dump.xml
}
else
{
print 'no match';
}
}

I am getting the following error but not able to figure out where the mistake is

syntax error at sed.pl line 9, near ")
{"
Execution of sed.pl aborted due to compilation errors.

Appreciate any help.
# 10  
Old 08-06-2009
Quote:
Originally Posted by userscript
I tried the following code, however it doesnt seem to work

...
...
if($sentence eq "<key='data1'>")
{
sed -e 's/\(<[^<][^<]*>\)//g; /^$/d' dump.xml
}
else
...
...

I am getting the following error but not able to figure out where the mistake is

syntax error at sed.pl line 9, near ")
{"
Execution of sed.pl aborted due to compilation errors.

...
Well, you have put a sed command in what looks like a Perl script.

It's like adding a COBOL statement in a Java program.
Or a Visual Basic statement in a C program.

What do you think would happen ?

tyler_durden
# 11  
Old 08-07-2009
You said you finally got it working.
What was wrong and what did you fix?
Can you give details?

Now try this
Code:
sed '1,/<\/key>/! d; s/\(<[^<][^<]*>\)//g; /^$/d;' file.xml

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract strings from output

I am having the following output when executing a dig command : dig @1.1.1.1 google.com +noall +answer +stats ; <<>> DiG 9.11.4-P1 <<>> @1.1.1.1 google.com +noall +answer +stats ; (1 server found) ;; global options: +cmd obodrm.prod.at.dmdsdp.com. 86154 IN A ... (1 Reply)
Discussion started by: liviusbr
1 Replies

2. UNIX for Beginners Questions & Answers

Extract content between strings

Hello i am stuck with this. i have input which is as follows /type/work /works/OL10627594W 3 2019-04-24T16:46:21.351549 {"created": {"type": "/type/datetime", "value": "2009-12-11T03:18:17.488715"}, "title": "Tog the dog", "covers": , "last_modified": {"type":... (3 Replies)
Discussion started by: ahfze
3 Replies

3. UNIX for Dummies Questions & Answers

Issue when using egrep to extract strings (too many strings)

Dear all, I have a data like below (n of rows=400,000) and I want to extract the rows with certain strings. I use code below. It works if there is not too many strings for example n of strings <5000. while I have 90,000 strings to extract. If I use the egrep code below, I will get error: ... (3 Replies)
Discussion started by: forevertl
3 Replies

4. UNIX for Dummies Questions & Answers

Extract strings based on the value

I have a file with multiple columns (in this case, the file has 3 columns): NM_001006304 (-33.7) XM_418228 (-38.4) JN880447 (-33.7) CR387600 (-33.7) CR524203 (-36.3) GALGA_6AKII_KRT75 (-33.7) GALGA25_SC7 (-31.9) CR352795 (-36.3) NM_204172 (-31.7) NM_204137 (-31.9) NM_001030561 (-36.3) AB011672... (7 Replies)
Discussion started by: yuejian
7 Replies

5. UNIX for Dummies Questions & Answers

Extract code between 2 strings.

Hi, Im having some problems with this. I have loaded a file with html code. All code is placed in the same line. I want to get everything between two given strings (including these strings and get only the first appearance). Example: File contains <html><body><a href='a.html'>abc</a><a... (5 Replies)
Discussion started by: ngb
5 Replies

6. Shell Programming and Scripting

sed to extract all strings

Hi, I have a text file containing 2 lines as follows: I'm trying to extract all the strings following an "AME." The output would be as follows: BUSINESS_UNIT PROJECT_ID ACTIVITY_ID RES_USER1 RESOURCE_ID_FROM ANALYSIS_TYPE BI_DISTRIB_STATUS BUSINESS_UNIT PROJECT_ID ACTIVITY_ID... (5 Replies)
Discussion started by: simpletech369
5 Replies

7. Shell Programming and Scripting

Extract two strings from a file and create a new file with these strings

I have the following lines in a log file. It would be great if some one can help me to create a new file with the just entries in the below format. 66.150.161.195 HPSAC=Z05 66.150.161.196 HPSAC=A05 That is just extract the IP address and the string DPSAC=its value 66.150.161.195 -... (1 Reply)
Discussion started by: Tuxidow
1 Replies

8. Shell Programming and Scripting

Extract text between two strings

Hi I have something like this: EXAMPLE 1 CREATE UNIQUE INDEX "STRING_1"."STRING_2" ON "BOSNI_CAB_EVENTO" ("CD_EVENTO" , "CD_EJECUCION" ) PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE(INITIAL 5242880 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "DB1000_INDICES_512K"... (4 Replies)
Discussion started by: chrispaz81
4 Replies

9. Shell Programming and Scripting

How to Extract text between two strings?

Hi, I want to extract some text between two strings in a line i am using following command i.e; awk '/-string1/,/-string2/' filename contents of file is--- line1 line2 aaa -bbb -ccc -string1 c,d,e -string2 line4 but it is showing complete line which is having searched strings. aaa... (19 Replies)
Discussion started by: emresearch
19 Replies

10. Shell Programming and Scripting

Extract data between two strings

Hi , I have a billing CDR file which has repeated lines as indicated below and I need to extract data between two strings (i.e.: <?> and </?>). Eventually, map that information with the corresponding field. I'm new to unix, any help will be greatly appreciated. Gamini Input (single line): !... (3 Replies)
Discussion started by: jaygamini
3 Replies
Login or Register to Ask a Question