The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
.
google unix.com



UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
c program to extract text between two delimiters from some text file kukretiabhi13 High Level Programming 7 12-03-2008 06:29 PM
Building Full-Text Search Applications with Oracle Text iBot Oracle Updates (RSS) 0 04-06-2008 05:10 AM
how to read all the unique words in a text file aditya.ece1985 Shell Programming and Scripting 5 11-30-2007 02:26 AM
text formating/Text space padding hugow UNIX for Dummies Questions & Answers 6 06-29-2005 09:49 AM
grep multiple text files in folder into 1 text file? coppertone UNIX for Dummies Questions & Answers 7 08-23-2002 02:50 PM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 05-28-2007
spindoctor spindoctor is offline
Registered User
  
 

Join Date: May 2007
Posts: 31
Copying Text between two unique text patterns

Dear Colleagues:
I have .rtf files of a collection of newspaper articles. Each newspaper article starts with a variation of the phrase "Document * of 20" and is separated from the next article with the character string "==================="

I would like to be able to take the text composing each news article from between these two patterns and dump them into separate, uniquely named files. I've been playing around with SED, grep, cut and csplit, but nothing seems to be working. I have the regular expressions developed to capture the two lines "Document * of 20" and "--------" independently, but I can't figure out how to capture and play with the text between the two lines. I hope you can help.
Yours,
Simon J. Kiss
Queen's University
  #2 (permalink)  
Old 05-28-2007
ennstate ennstate is offline
Registered User
  
 

Join Date: Mar 2007
Location: Chennai
Posts: 222
Hi Simon,
Though there could some other smarter solution,I have used the following approach to solve this problem.

Assuming we have the contents of the file /tmp/MyNewArticleFile.rtf as ,

cat /tmp/MyNewArticleFile.rtf
HTML Code:
Times of India
Edition-1
Date:27 th May

Document 1 of 20

All blah blah goes here
Ad Page
Blah

================================

Document 2 of 20

All blah blah goes here
Ad Page
Blah

================================

Document 3 of 20

All blah blah goes here
Ad Page
Blah

================================
Document 4 of 20

All blah blah goes here
Ad Page
Blah

================================
End of the Edition
Thanks
Editor
I have written the following script that process the above file to generate the output.
Here the assumption is the Document has 20 Pages.
Code:
#!/bin/ksh
let page=1
while [[ page -le 20 ]] ; do
sed -n /Document\ $page/,/==========*/p /tmp/MyNewArticleFile.rtf > /tmp/ArticleSplitPage-$page
((page=page+1))
done
Upon execution of the above script i get 20 pages spilt according to the Document no.

cat /tmp/ArticleSpiltPage-1
HTML Code:
Document 1 of 20

All blah blah goes here
Ad Page
Blah

================================
Thanks,
Nagarajan Ganesan.
  #3 (permalink)  
Old 05-28-2007
drl's Avatar
drl drl is offline Forum Advisor  
Registered User
  
 

Join Date: Apr 2007
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 711
Hi.

For the sample data file "data1":
Code:
Document * of 20
Hello

=====
Document one of 20

World

=====
Document 44 of 20

Now is

=====
Document "Climatology Review" in of 20

with no Documents at the beginning of the time.

=====
I ran this script:
Code:
#!/bin/sh

# @(#) s1       Demonstrate csplit.

F=${1-data1}

csplit -k -s -z $F "/^Document.*of/" {\*}

echo
for file in xx*
do
        echo
        echo "File: $file"
        head -3 $file |
        cat -n
done

exit 0
To produce this:
Code:
% ./s1


File: xx00
     1  Document * of 20
     2  Hello
     3

File: xx01
     1  Document one of 20
     2
     3  World

File: xx02
     1  Document 44 of 20
     2
     3  Now is

File: xx03
     1  Document "Climatology Review" in of 20
     2
     3  with no Documents at the beginning of the time.
This assumes that the lines "=====" are visual sugar ... cheers, drl
  #4 (permalink)  
Old 01-16-2009
praveen21's Avatar
praveen21 praveen21 is offline
Registered User
  
 

Join Date: Jan 2009
Posts: 1
Exclamation How to grep the text between patterns

Hi
I am having a very small problem,but just cdnt find out the solution.

I am having a file which has multiple entries as :

<id>QIIC.QA</id>
<id>.AEX</id>
<id>QIIC</id>
..
I want the output as
QIIC.QA
.AEX
QIIC


And then check which pattern has been repeated and how many times?
Please Help.
Thanks.
Closed Thread

Bookmarks

Tags
regex, regular expressions

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 01:14 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0