Extracting text between two strings, first instance only


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extracting text between two strings, first instance only
# 1  
Old 11-06-2009
Extracting text between two strings, first instance only

There are a lot of ways to extract text from between two strings, but what if those strings occur multiple times and you only want the text from the first two strings? I can't seem to find anything to work here. I'm using sed to process the text after it's extracted, so I prefer a sed answer, but whatever works is fine with me.

It's an xml file, the text is between string tags (hope that doesn't cause any confusion). The text may be 1 or 100 lines long and may also contain whitespace, linebreaks, indentions, etc, which shouldn't matter much, but the location of the tags may seem fairly random in relation to the actual text and not a clean "^tagTEXTtag$". I want everything, whitespace, blank lines, etc, between the first open and close tags.

Code:
         <string>This is
         the text 
         that I want
          
          </string>

          <string>text I don't want</string>

         <string>more text
I don't want</string>

# 2  
Old 11-07-2009
See here or here for 2 similar examples using gawk. Use "exit" to cause a 1 instance search.
# 3  
Old 11-07-2009
Thanks. It took me a while to figure out where to put the "exit". I'm not sure this is correct, but it works.
Code:
awk 'BEGIN{ RS="</string>"}{gsub(/.*<string>/,"")}1{print $RS;exit}' textfile

# 4  
Old 11-07-2009
You can trim it a little: -

Code:
awk 'BEGIN{ RS="</string>"}{gsub(/.*<string>/,"");print;exit}' infile

# 5  
Old 11-07-2009
Or you can use Perl:

Code:
$
$ cat -n f3
     1  <string>This is
     2  the text
     3  that I want
     4  ...
     5  </string>
     6
     7  <string>text I don't want</string>
     8
     9  <string>more text
    10  I don't want</string>
$
$ perl -ne 'BEGIN{undef $/} /<string>(.*?)<\/string>/s and print $1' f3
This is
the text
that I want
...
$
$

tyler_durden
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Extracting strings at various positions of text file

Hi Team - I hope everyone has been well! I export a file from one of our source systems that gives me more information than I need. The way the file outputs, I need to extract certain strings at different positions on the file and echo them to another file. I can do this in batch easily,... (2 Replies)
Discussion started by: SIMMS7400
2 Replies

2. UNIX for Dummies Questions & Answers

Extracting 22-character strings from text using sed/awk?

Here is my task, I feel sure this can be accomplished with see/awk but can't seem to figure out how. I have large flat file from which I need to extract every case of a pairing of characters (GG) in this case PLUS the previous 20 characters. The output should be a list (which I plan to make... (17 Replies)
Discussion started by: Twinklefingers
17 Replies

3. Shell Programming and Scripting

Help extracting single instance of numbers which repeat

Hi, the title isn't very descriptive but it'll be easier to explain what I need if I write out the coordinates from which I need to extract certain information: ATOM 2521 C MAM X 61 44.622 49.357 12.584 1.00 0.00 C ATOM 2522 H MAM X 61 43.644 49.102 12.205 ... (10 Replies)
Discussion started by: crunchgargoyle
10 Replies

4. Shell Programming and Scripting

Extracting text between two strings, multiple instances

Hi experts, Ive got a text file which has the following text which will occur in this format at least one time: +=========================>> Some stuff that evreryone should knnow other stufsjdokajkajokajda aijhjajcdjajcisajcqsqdqwdqad <<=========================+ It is likely that... (8 Replies)
Discussion started by: martin0852
8 Replies

5. Shell Programming and Scripting

Extracting text between two constant strings

Hi All, I have a file whose common patter is like this: .I 1 .U 87049087 .S Some text here too .M This is a text .T Some another text here .P Name of the book .W Some lines of more text. This text needs to be extracted. .A more text goes here too .I 2 (2 Replies)
Discussion started by: shoaibjameel123
2 Replies

6. Shell Programming and Scripting

Extracting text between two strings

Hi, I've looked at a few existing posts on this, but they don't seem to work for my inputs. I have a text file where I want to extract all the text between two strings, every time that occurs. Eg my input file is Anna said that she would fetch the bucket. Anna and Ben moved the bucket.... (9 Replies)
Discussion started by: JamesForeman
9 Replies

7. Shell Programming and Scripting

Deleting files that don't contain particular text strings / more than one instance of a string

Hi all, I have a directory containing many subdirectories each named like KOG#### where # represents any digit 0-9. There are several files in each KOG#### folder but the one I care about is named like KOG####_final.fasta. I am trying to write a script to copy all of the KOG####_final.fasta... (3 Replies)
Discussion started by: kmkocot
3 Replies

8. Shell Programming and Scripting

extracting a set of strings from a text file

i have textfiles that contain a series of lines that look like this: string0 .................................................... column3a column4a string1**384y0439 ..................................... column3b column4b... (2 Replies)
Discussion started by: Deanne
2 Replies

9. Shell Programming and Scripting

Help with extracting strings from a file

I want to collect the characters from 1-10 and 20-30 from each line of the file and take them in a file in the following format.Can someone help me with this : string1,string2 string1,string2 string1,string2 : : : : (7 Replies)
Discussion started by: cmsdelhi
7 Replies

10. UNIX for Dummies Questions & Answers

Extracting strings

Hi, How do I extract the bytes size string from the ls -l command. (1 Reply)
Discussion started by: hugow
1 Replies
Login or Register to Ask a Question