Visit Our UNIX and Linux User Community


copy and merge texts between two pattern


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting copy and merge texts between two pattern
# 1  
Old 06-06-2011
copy and merge texts between two pattern

I need
1. to find all the files from a Folder "folder_name" and from its 4 or 5 sub folders which contain a certain pattern "pattern1".
2. from these files copy and merge all the text between "pattern1" and another different pattern "pattern2" to "mergefile".
3. Get rid of every html tag.
4. By doing so, - if possible - I need to write (as first line of the new portion of text to be merged) the name of each file that particular text was taken from.
5. To be attentive because "pattern1" and "pattern2" (which both are always present there) may appear more than once in a same file.

Many thanks for any help (also for just a part of the task)
mjomba

Example of an input file: "xsargg777.html"
pattern1 = "Lectio altera"
pattern2 = "Ad Laudes matutinas"
Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html>

#menu li {float:left; padding:4; margin:0 1px 0 0; position:relative; width:111px; height:1px; z-index:100;}
#menu li a, #menu li a:visited {text-decoration:none;}
<p class="re"><a name="_hlk121580295"></a>Lectio altera</p>
Ex Scriptis sancti Petri Can&iacute;sii presb&yacute;teri <font color="red">
cuius alter ap&oacute;stolus </a></p>
<font color="red">V/.</font> In corde prud&eacute;ntis</p>
<p class="rebo"><a name="_hlk91924987"></a>Ad Laudes matutinas</p>
<font color="red">Ant.</font> Qui docti f&uacute;erint, fulg&eacute;bunt quasi splendor firmam&eacute;nti, </p>
<p class="rebo"><a name="_hlk91925031"></a>Ad Vesperas</p>
<font color="red">Ant.</font> O doctor &oacute;ptime, Eccl&eacute;si&aelig; sanct&aelig; .</p>
&nbsp;</p>
<p class="rebo">Die 23 decembris</p>
<p class="rebo">Ad Officium lectionis</p>
<p class="re"><a name="_hlk121580428"></a>Lectio altera</p>
<i>Et soc&iacute;etas nostra sit cum Deo Patre et Iesu Christo F&iacute;lio eius. </i>
<font color="red">Responsorium<br>
R/.</font> Iste est Io&aacute;nnes, qui supra pectus D&oacute;mini in cena rec&uacute;buit: .<br>
<p class="re">Hymnus <a href="breviarionlineff3175.html?formato=1&amp;archivo=zte_deum.htm">Te Deum</a>.</p>
<p class="renm">Oratio</p>
Deus, qui per be&aacute;tum ap&oacute;stolum Io&aacute;nnem Verbi tui nobis arc&aacute;na reser&aacute;sti, </p>
<p class="rebo"><a name="_hlk91927253"></a>Ad Laudes matutinas</p>
<p class="re">Hymnus</p>
<div class="indhym">
...etc.etc.

Exemple of Output file:
Code:
FILENAME=xsargg777.html
Lectio altera
Ex Scriptis sancti Petri Can&iacute;sii presb&yacute;teri 
cuius alter ap&oacute;stolus 
V/. In corde prud&eacute;ntis
Ad Laudes matutinas
Lectio altera
Et soc&iacute;etas nostra sit cum Deo Patre et Iesu Christo F&iacute;lio eius. 
Responsorium
R/. Iste est Io&aacute;nnes, qui supra pectus D&oacute;mini in cena rec&uacute;buit: 
Hymnus Te Deum
Oratio
Deus, qui per be&aacute;tum ap&oacute;stolum Io&aacute;nnem Verbi tui nobis arc&aacute;na reser&aacute;sti, 
Ad Laudes matutinas

# 2  
Old 06-06-2011
To get rid of the HTML tags you can use:

Code:
 
sed -n '/</p' input_file | sed -e :a -e 's/<[^>]*>//g;/</N;//ba'

remaining part of your question is a excercise to you.

regards
Ravi
# 3  
Old 06-06-2011
1. to find all the files from a Folder "folder_name" and from its 4 or 5 sub folders which contain a certain pattern "pattern1".
Code:
grep -r -l 'pattern' folder_name

# 4  
Old 06-06-2011
Code:
 nawk '/Lectio altera/,/Ad Laudes/ {gsub(/<[^>]*>/,"");print}' filename

Thanks
Sha

Previous Thread | Next Thread
Test Your Knowledge in Computers #156
Difficulty: Easy
The first two-network TCP/IP communications test was performed between Stanford and University College London, in 1975.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merge files and copy some data using sed or awk

Copy data from other file to paste cat file1: Name: server1.data.com data1 server1 running Name: server3.data.com data3 server3 running cat file2: server1 good server2 bad network not ok server3 good Output: (10 Replies)
Discussion started by: kenshinhimura
10 Replies

2. Shell Programming and Scripting

Need to merge lines based on pattern

Hi, I have a requirement to merge multiple lines based on search pattern. The search criteria is : it will search for CONSTRAINT and when it found CONSTRAINT, it will merge all lines to 1 line till it founds blank line. For Example: CREATE TABLE "AMS_DISTRIBUTOR_XREF" ( "SOURCE"... (5 Replies)
Discussion started by: satyaatcgi
5 Replies

3. Shell Programming and Scripting

merge same pattern of same column in one line

Hello, I have some problem in the modified shell script. I would like to merge the same word in column 1 in one line. Example : A1 B1 2 A2 B1 4 A3 B1 7 A1 B2 1 A2 B2 10 A3 B2 8 Expected output : A1 B1 B2 2 1 A2 B1 B2 4 10 A3 ... (6 Replies)
Discussion started by: awil
6 Replies

4. UNIX for Dummies Questions & Answers

merge lines within a file that start with a similar pattern

Hello! i have a text file.. which contains the data as follows i want to merge the declarations lines pertaining to one datatype in to a single line as follows i've searched the forum for help.. but couldn't find much help.. how can i do this?? (1 Reply)
Discussion started by: a_ba
1 Replies

5. Shell Programming and Scripting

Shell Scripting: Compare pattern in two files and merge the o/p in one.

one.txt ONS.1287677000.820.log 20Oct2010 ONS.1287677000.123.log 21Oct2010 ONS.1287677000.456.log 22Oct2010 two.txt ONS.1287677000.820.log:V AC CC EN ONS.1287677000.123.log:V AC CC EN ONS.1287677000.820.log:V AC CC EN In file two.txt i have to look for pattern which column one... (17 Replies)
Discussion started by: saluja.deepak
17 Replies

6. Shell Programming and Scripting

Merge lines if pattern matches in ksh

I have a file like this. Pls help me to solve this . (I should look for only Message : 111 and need to print the start time to end time Need to ignore other type of messages. Ex: if first message is 111 and second message is 000 or anything else then ignore the 2nd one and print start time of the... (1 Reply)
Discussion started by: mnjx
1 Replies

7. Shell Programming and Scripting

merge same pattern lines together

Hi people... I normally find with out any problem the solutions I need just by searching. But for this I'm not having any joy or jsut failing to adapt what I'ev found to work. I have applciation report that doesn't allow for manipulation at creation so I want to do some post modifcation... (2 Replies)
Discussion started by: nhatch
2 Replies

8. Shell Programming and Scripting

Merge lines from one file if pattern matches

I have one comma separated file (a.txt) with two or more records all matching except for the last column. I would like to merge all matching lines into one and consolidate the last column, separated by ":". Does anyone know of a way to do this easily? I've searched the forum but most talked... (6 Replies)
Discussion started by: giannicello
6 Replies

9. Shell Programming and Scripting

Merge lines in text file based on pattern

Hello, I have searched forum trying to find a solution to my problem, but could not find anything or I did not understand the examples.... I should say, I am very inexperienced with text processing. I have a text file with approx 60k lines in it. I need to merge lines based on the number... (8 Replies)
Discussion started by: Bertik
8 Replies

10. Shell Programming and Scripting

merge columns into one line after a specific pattern

Hi all, im a linux newbie, plz help! I have a file - box -------- Fox-2 -------- UF29 zip42 -------- zf-CW SNF2_N Heli_Z -------- Fox -------- Kel_1 box (3 Replies)
Discussion started by: sam_2921
3 Replies

Featured Tech Videos