Finding strings through multiple lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Finding strings through multiple lines
# 1  
Old 09-21-2011
It is working for the data that you provided:
Code:
[root@rhel2 ~]# cat file
112  <TABLE name="something>
113  <ROW>
123  </ROW>
124  </TABLE>
125  <TABLE name="somethingelse>
126  <ROW>
129  </ROW>
130  </TABLE>
493  <TABLE name="Headers_11" number="15">
494  <ROW>
495  <dest_semi_11><![CDATA[destination]]></dest_semi_11>
496  <num_semi_11><![CDATA[number sent]]></num_semi_11>
497  <cost_semi_11><![CDATA[cost]]></cost_semi_11>
498  </ROW>
499  </TABLE>

Code:
[root@rhel2 ~]# perl -0pe 's/\d+\s+<TABLE name=[^>]*>\n\d+\s+<ROW>\n\d+\s+<\/ROW>\n\d+\s+<\/TABLE>\n//g' file
493  <TABLE name="Headers_11" number="15">
494  <ROW>
495  <dest_semi_11><![CDATA[destination]]></dest_semi_11>
496  <num_semi_11><![CDATA[number sent]]></num_semi_11>
497  <cost_semi_11><![CDATA[cost]]></cost_semi_11>
498  </ROW>
499  </TABLE>

Can you post output of:
Code:
cat -Te sample_file

# 2  
Old 09-21-2011
I have just created a file with just that sample in it and the output I got was correct like you said it
however, when I use it on the larger file it doesn't seem to work :s
the line numbers go into the thousands I dont know if that makes a difference?

---------- Post updated at 01:02 PM ---------- Previous update was at 11:49 AM ----------

maybe if I give a better example with the full range of code it would be easier?
Code:
  2823  <TABLE name="TotalEventSemiSum_13>
  2824  </TABLE>
  2827  <TABLE name="TotalEventSemiSum_22>
  2828  </TABLE>
  2831  <TABLE name="TotalEventSemiSum_57>
  2832  <ROW>
  2837  </ROW>
  2838  </TABLE>
  2841  <TABLE name="TotalEventSemiSum_58>
  2842  <ROW>
  2843  <EventSemiSumTotal_58><![CDATA[£20.40]]></EventSemiSumTotal_58>
  2844  <EventSemiSumDescription_58><![CDATA[Roaming text messages non-EU]]></EventSemiSumDescription_58>
  2845  <EventSemiSumTotalSent_58><![CDATA[51]]></EventSemiSumTotalSent_58>
  2846  <source_58><![CDATA[07775884968]]></source_58>
  2847  </ROW>
  2848  </TABLE>
  2851  <TABLE name="TotalEventSemiSum_16>
  2852  <ROW>
  2857  </ROW>
  2858  </TABLE>

output needed

Code:
  2841  <TABLE name="TotalEventSemiSum_58>
  2842  <ROW>
  2843  <EventSemiSumTotal_58><![CDATA[£20.40]]></EventSemiSumTotal_58>
  2844  <EventSemiSumDescription_58><![CDATA[Roaming text messages non-EU]]></EventSemiSumDescription_58>
  2845  <EventSemiSumTotalSent_58><![CDATA[51]]></EventSemiSumTotalSent_58>
  2846  <source_58><![CDATA[07775884968]]></source_58>
  2847  </ROW>
  2848  </TABLE>

does that help?
Thank you for your continued support you are all very helpful on here Smilie
# 3  
Old 09-21-2011
Try:
Code:
perl -0pe 's/\s+\d+\s+<TABLE name=[^>]*>\n(\s+\d+\s+<ROW>\n\s+\d+\s+<\/ROW>\n)?\s+\d+\s+<\/TABLE>//g' file

I hope you can see now that simplifying sample data not always help getting the solution Smilie
This User Gave Thanks to bartus11 For This Post:
# 4  
Old 09-21-2011
Many thanks for all your help!

---------- Post updated at 03:12 PM ---------- Previous update was at 02:57 PM ----------

Could you let me know just one more thing, which part of that code indicates the fact that the page number precedes the text? I need to change it so the works when there is no line number and the first letter of each line is then
example
Code:
<TABLE name="something>
<ROW> 
</ROW> 
</TABLE> 
<TABLE name="somethingelse> 
<ROW> 
</ROW> 
</TABLE> 
<TABLE name="Headers_11" number="15"> 
<ROW> 
<dest_semi_11><![CDATA[destination]]></dest_semi_11> 
<num_semi_11><![CDATA[number sent]]></num_semi_11> 
<cost_semi_11><![CDATA[cost]]></cost_semi_11> 
</ROW> 
</TABLE>

Thanks

Last edited by legolad; 09-21-2011 at 11:35 AM..
# 5  
Old 09-21-2011
Try:
Code:
perl -0pe 's/\s*(\d+\s+)?<TABLE name=[^>]*>\n(\s*(\d+\s+)?<ROW>\n\s*(\d+\s+)?<\/ROW>\n)?\s*(\d+\s+)?<\/TABLE>//g' file

# 6  
Old 09-21-2011
that didn't work sorry its needs to do the exact same thing as before but now the file no long has line numbers on it

P.S I mean that the input file has no line numbers on it, not that I want them removing when it goes to output

Last edited by legolad; 09-21-2011 at 11:53 AM..
# 7  
Old 09-21-2011
Code:
[root@rhel2 ~]# cat file
<TABLE name="TotalEventSemiSum_13>
</TABLE>
<TABLE name="TotalEventSemiSum_22>
</TABLE>
<TABLE name="TotalEventSemiSum_57>
<ROW>
</ROW>
</TABLE>
<TABLE name="TotalEventSemiSum_58>
<ROW>
<EventSemiSumTotal_58><![CDATA[£20.40]]></EventSemiSumTotal_58>
<EventSemiSumDescription_58><![CDATA[Roaming text messages non-EU]]></EventSemiSumDescription_58>
<EventSemiSumTotalSent_58><![CDATA[51]]></EventSemiSumTotalSent_58>
<source_58><![CDATA[07775884968]]></source_58>
</ROW>
</TABLE>
<TABLE name="TotalEventSemiSum_16>
<ROW>
</ROW>
</TABLE>

Code:
[root@rhel2 ~]# perl -0pe 's/\s*(\d+\s+)?<TABLE name=[^>]*>\n(\s*(\d+\s+)?<ROW>\n\s*(\d+\s+)?<\/ROW>\n)?\s*(\d+\s+)?<\/TABLE>//g' file

<TABLE name="TotalEventSemiSum_58>
<ROW>
<EventSemiSumTotal_58><![CDATA[£20.40]]></EventSemiSumTotal_58>
<EventSemiSumDescription_58><![CDATA[Roaming text messages non-EU]]></EventSemiSumDescription_58>
<EventSemiSumTotalSent_58><![CDATA[51]]></EventSemiSumTotalSent_58>
<source_58><![CDATA[07775884968]]></source_58>
</ROW>
</TABLE>

This User Gave Thanks to bartus11 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Solaris

How to find multiple strings on different lines in file?

Hello, I have spent considerable amount of time breaking my head on this and reached out here. here is the back ground. OS - Solaris 10 There are two strings '<Orin>sop' and '<Dup>two' which I wanted to look for in a file without the quotes on different lines and ONLY if both strings are... (5 Replies)
Discussion started by: keithTait309875
5 Replies

2. Shell Programming and Scripting

Exclude lines in a file with matches with multiple Strings using egrep

Hi I have a txt file and I would like to use egrep without using -v option to exclude the lines which matches with multiple Strings. Let's say I have some text in the txt file. The command should not fetch lines if they have strings something like CAT MAT DAT The command should fetch me... (4 Replies)
Discussion started by: Sathwik
4 Replies

3. Shell Programming and Scripting

Print lines between two strings multiple occurencies (with sed, awk, or grep)

Hello, I can extract lines in a file, between two strings but only one time. If there are multiple occurencies, my command show only one block. Example, monfichier.txt contains : debut_sect texte L1 texte L2 texte L3 texte L4 fin_sect donnees inutiles 1 donnees inutiles 2 ... (8 Replies)
Discussion started by: theclem35
8 Replies

4. Shell Programming and Scripting

Sed or Awk for lines between two strings multiple times and keep the last one

Hi, I am trying to get lines between the last occurrences of two patterns. I have files that have several occurrences of “Standard” and “Visual”. I will like to get the lines between “Standard” and “Visual” but I only want to retain only the last one e.g. Standard Some words Some words Some... (4 Replies)
Discussion started by: damanidada
4 Replies

5. Shell Programming and Scripting

CSV to SQL insert: Awk for strings with multiple lines in csv

Hi Fellows, I have been struggling to fix an issue in csv records to compose sql statements and have been really losing sleep over it. Here is the problem: I have csv files in the following pipe-delimited format: Column1|Column2|Column3|Column4|NEWLINE Address Type|some descriptive... (4 Replies)
Discussion started by: khayal
4 Replies

6. UNIX for Dummies Questions & Answers

Finding numbers in lines with strings and number and doing some manipulation

Hi, I want to write a script that does something like this: I have a file, in which in every line, there is a string of words, and followed by some space, a number. Now, I want to identify the line, which has the largest startFace number (say m=8118), take that number and add it to the... (2 Replies)
Discussion started by: super_commando
2 Replies

7. Shell Programming and Scripting

Extract strings from multiple lines into one csv file

Hi all, Please go through my requirement. I have a log file in the location /opt/WebSphere61/AppServer/profiles/EMQbatchprofile/logs/EMQbatch This file contains the follwing pattern data <af type="tenured" id="42" timestamp="May 14 13:44:13 2011" intervalms="955.624"> <minimum... (8 Replies)
Discussion started by: satish.vampire
8 Replies

8. Shell Programming and Scripting

replace a string with contents of a txt file containing multiple lines of strings

Hello everyone, ive been trying to replace a string "kw01" in an xml file with the contents of a txt file having multiple lines. im a unix newbie and all the sed combinations i tried resulted to being garbled. Below is the contents of the txt file: RAISEDATTIME --------------------... (13 Replies)
Discussion started by: 4dirk1
13 Replies

9. Shell Programming and Scripting

Extract strings from multiple lines into one file -

input file Desired csv output gc_type, date/time, milli secs af, Mar 17 13:09:04 2011, 144.596 af, Mar 20 00:37:37 2011, 144.242 af, ar 20 21:30:59 2011, 108.518 Hi All, Any help in acheiving the above would be appreciated. I would like to parse through lines within one file and... (5 Replies)
Discussion started by: satish.vampire
5 Replies

10. UNIX for Dummies Questions & Answers

grep command to find multiple strings in multiple lines in a file.

I want to search files (basically .cc files) in /xx folder and subfolders. Those files (*.cc files) must contain #include "header.h" AND x() function. I am writing it another way to make it clear, I wanna list of *.cc files that have 'header.h' & 'x()'. They must have two strings, header.h... (2 Replies)
Discussion started by: ritikaSharma
2 Replies
Login or Register to Ask a Question