![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| c program to extract text between two delimiters from some text file | kukretiabhi13 | High Level Programming | 7 | 12-03-2008 06:29 PM |
| Building Full-Text Search Applications with Oracle Text | iBot | Oracle Updates (RSS) | 0 | 04-06-2008 05:10 AM |
| how to read all the unique words in a text file | aditya.ece1985 | Shell Programming and Scripting | 5 | 11-30-2007 02:26 AM |
| text formating/Text space padding | hugow | UNIX for Dummies Questions & Answers | 6 | 06-29-2005 09:49 AM |
| grep multiple text files in folder into 1 text file? | coppertone | UNIX for Dummies Questions & Answers | 7 | 08-23-2002 02:50 PM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
Copying Text between two unique text patterns
Dear Colleagues:
I have .rtf files of a collection of newspaper articles. Each newspaper article starts with a variation of the phrase "Document * of 20" and is separated from the next article with the character string "===================" I would like to be able to take the text composing each news article from between these two patterns and dump them into separate, uniquely named files. I've been playing around with SED, grep, cut and csplit, but nothing seems to be working. I have the regular expressions developed to capture the two lines "Document * of 20" and "--------" independently, but I can't figure out how to capture and play with the text between the two lines. I hope you can help. Yours, Simon J. Kiss Queen's University |
|
||||
|
Hi Simon,
Though there could some other smarter solution,I have used the following approach to solve this problem. Assuming we have the contents of the file /tmp/MyNewArticleFile.rtf as , cat /tmp/MyNewArticleFile.rtf HTML Code:
Times of India Edition-1 Date:27 th May Document 1 of 20 All blah blah goes here Ad Page Blah ================================ Document 2 of 20 All blah blah goes here Ad Page Blah ================================ Document 3 of 20 All blah blah goes here Ad Page Blah ================================ Document 4 of 20 All blah blah goes here Ad Page Blah ================================ End of the Edition Thanks Editor Here the assumption is the Document has 20 Pages. Code:
#!/bin/ksh let page=1 while [[ page -le 20 ]] ; do sed -n /Document\ $page/,/==========*/p /tmp/MyNewArticleFile.rtf > /tmp/ArticleSplitPage-$page ((page=page+1)) done cat /tmp/ArticleSpiltPage-1 HTML Code:
Document 1 of 20 All blah blah goes here Ad Page Blah ================================ Nagarajan Ganesan. |
|
|||||
|
Hi
I am having a very small problem,but just cdnt find out the solution. I am having a file which has multiple entries as : <id>QIIC.QA</id> <id>.AEX</id> <id>QIIC</id> .. I want the output as QIIC.QA .AEX QIIC And then check which pattern has been repeated and how many times? Please Help. Thanks. |
![]() |
| Bookmarks |
| Tags |
| regex, regular expressions |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|