awk to get text between 2 strings


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to get text between 2 strings
# 1  
Old 06-28-2013
awk to get text between 2 strings

Hi,

I am trying different scenarios now, 1 of those is getting the text between the following 2 strings.

Code:
Type of msg:          -in_full [+]
>date
>alr text
>ID_on_exit
AWXX-Ready to commit (98) msg type: (10)

I need to get all the occurrences having the same start line and end line. Starting with Type of msg ....... and ending with the last line.

The last string must be at the beginning of the line, as well as the 1st string. I said that, because there are cases where that AWXX code appears in the middle of another lines, I am not interested in those.


I tried something like this, but I am not really good with awk

Code:
perl -lne '{if(/"Type of msg:          -in_full \[+\]"/){$#A=-1;$f=1;} if(/^AWXX-Ready to commit (98) msg type: (10)/ && ($f)){print join("\n",@A,$_);next}($f)?push(@A,$_):next;}' privacy.log6_27_13


Any help is greatly appreciated.

---------- Post updated 06-28-13 at 12:06 AM ---------- Previous update was 06-27-13 at 04:37 PM ----------

I think I found my problem.

Is there any limitation in the size of the file that we process with SED for example?, now I am using sed and it works only in little files, but not in my 100MB logs.

---------- Post updated at 12:18 AM ---------- Previous update was at 12:06 AM ----------

Guys, now switching back to AWK

I am using the following:

Code:
awk '/^Type of msg:          -in_full \[+\]/{s=x}{s=s$0"\n"}/^AWXX-Ready to commit \(98\)/{print s}' test6

For some reason, it shows all the file, instead of what I am looking for. HELP!!
# 2  
Old 06-28-2013
Your message is inconsistent and not clear.

The title of this thread says you're using awk, but your original post contained a perl script; not an awk script.

You gave us a six line example of text that you seem to want to be copied directly with no examples of lines that should be skipped.

You say sed isn't working, but you haven't shown us any sed code nor any sed error messages.

It isn't clear from your description whether you want to start copying input lines to output for any line starting with Type of msg: and continuing through any line that starts with AWXX or if you only want to copy lines that exactly match:
Code:
Type of msg:          -in_full [+]

through lines that exactly match:
Code:
AWXX-Ready to commit (98) msg type: (10)

but your awk script wasn't working because - and + are special characters in extended regular expressions in addition to the [ and ( that you had already escaped.

With the trivial example you gave us, the following awk scripts both seem to do what you want. Hopefully, one of them will enable you to create an awk script that will work with your actual input files.

matching line starts:
Code:
awk '/^Type of msg:/ {p = 1}
p
/^AWXX/ {p = 0}' test6

matching exact lines:
Code:
awk '$0 == "Type of msg:          -in_full [+]" {p = 1}
p
$0 == "AWXX-Ready to commit (98) msg type: (10)" {p = 0}' test6

If you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or /bin/nawk instead of /bin/awk.
# 3  
Old 06-28-2013
Thanks Don,

Yeah, I just saw that I put the perl line and not my seds and awks .

Well, the thing is, that it is not working yet, I am running linux here.

Basically, I need to match the exact lines (both), and only the lines starting with it. (even that they have more text at the end )

So, we must match all the lines starting line this
Code:
Type of msg:          -in_full [+]

And ending like this
Code:
AWXX-Ready to commit (98) msg type: (10)

In both cases, they can have additional text at the end.



I got a question on this.

If I have an scenario like this:
Code:
Type of msg:          -in_full [+]
 >date >alr text
 >ID_on_exit 
AWXX-Ready to commit (98) msg type: (10)
Type of msg:          -in_full [+] 
>date >alr text 
>ID_on_exit 
AW AWXX-Ready to commit (96) msg type: (10)
Type of msg:          -in_full [+] 
>date >alr text 
>ID_on_exit 
AWXX-Ready to commit (98) msg type: (10)

Does sed or awk will take all the match from the beginning to the end? .. or it breaks the buffer when find another "starting pattern" ?
# 4  
Old 06-28-2013
most sed or awk expressions only consider a line at a time, but a few will cross lines.

Usefully, you can alter what awk considers a "line" to be.

None of which really let us help you much. Show more of your input and expected output so we can stop guessing please.
# 5  
Old 06-28-2013
Quote:
Originally Posted by ocramas
Thanks Don,

Yeah, I just saw that I put the perl line and not my seds and awks .

Well, the thing is, that it is not working yet, I am running linux here.

Basically, I need to match the exact lines (both), and only the lines starting with it. (even that they have more text at the end )

So, we must match all the lines starting line this
Code:
Type of msg:          -in_full [+]

And ending like this
Code:
AWXX-Ready to commit (98) msg type: (10)

In both cases, they can have additional text at the end.



I got a question on this.

If I have an scenario like this:
Code:
Type of msg:          -in_full [+]
 >date >alr text
 >ID_on_exit 
AWXX-Ready to commit (98) msg type: (10)
Type of msg:          -in_full [+] 
>date >alr text 
>ID_on_exit 
AW AWXX-Ready to commit (96) msg type: (10)
Type of msg:          -in_full [+] 
>date >alr text 
>ID_on_exit 
AWXX-Ready to commit (98) msg type: (10)

Does sed or awk will take all the match from the beginning to the end? .. or it breaks the buffer when find another "starting pattern" ?
The sed and awk utilities can be programmed to take all match from the beginning to the end or to do something else if they find a 2nd starting pattern without finding an ending pattern matching an earlier starting pattern.

Again, with the sample input you have provided every single line will be copied from your input file to your output file because there are no lines in your input file before your starting pattern, there are no lines between an ending pattern and a following starting pattern in your input file, and there are no lines after your ending pattern.

For every example you have shown us so far, the command:
Code:
cat input > output

meets all of your requirements.

Please show us some real input and show us the output you want to get from it (where the output file is not just a copy of the input file). And, explicitly specify what, if anything, needs to happen if a starting pattern is seen when the previous starting pattern wasn't followed by an ending pattern.
# 6  
Old 06-28-2013
Did you try this:
Code:
awk '/^Type of msg:/,/^AWXX-Ready/' file

# 7  
Old 06-29-2013
Quote:
Originally Posted by Don Cragun
The sed and awk utilities can be programmed to take all match from the beginning to the end or to do something else if they find a 2nd starting pattern without finding an ending pattern matching an earlier starting pattern.

Again, with the sample input you have provided every single line will be copied from your input file to your output file because there are no lines in your input file before your starting pattern, there are no lines between an ending pattern and a following starting pattern in your input file, and there are no lines after your ending pattern.

For every example you have shown us so far, the command:
Code:
cat input > output

meets all of your requirements.

Please show us some real input and show us the output you want to get from it (where the output file is not just a copy of the input file). And, explicitly specify what, if anything, needs to happen if a starting pattern is seen when the previous starting pattern wasn't followed by an ending pattern.

Hi Don,


Look, this example, it is .. really weird for me.

My file is the following:
Code:
Type of msg:          -in_full [+]
 >date >alr text
 >ID_on_exit 
AWXX-Ready to commit (98) msg type: (10)
Type of msg:          -in_full [+] 
>date >alr text 
>ID_on_exit 
AW AWXX-Ready to commit (96) msg type: (10)
Type of msg:          -in_full [+] 
>date >alr text 
>ID_on_exit 
AWXX-Ready to commit (98) msg type: (10)


When I run the following command, it takes everything, that is what is happening basically.
Code:
[519] /tmp/script_test$ awk '/^Type of msg:/,/^AWXX-Ready to commit (96)/' test
Type of msg:          -in_full [+]
 >date >alr text
 >ID_on_exit 
AWXX-Ready to commit (98) msg type: (10)
Type of msg:          -in_full [+] 
>date >alr text 
>ID_on_exit 
AW AWXX-Ready to commit (96) msg type: (10)
Type of msg:          -in_full [+] 
>date >alr text 
>ID_on_exit 
AWXX-Ready to commit (98) msg type: (10)

I have tried the 96 like .. \(96\), and the entire first like "Type of msg: using the \ for the brackets and the plus sign.

---------- Post updated at 08:54 PM ---------- Previous update was at 08:53 PM ----------

hi RudiC, yes,I those things I just tried, for some reason it is showing more info than what I expect it to show.

---------- Post updated at 09:13 PM ---------- Previous update was at 08:54 PM ----------

I just did it.

The following statement shows only the real matches, and not all the other information.


Code:
perl -lne '{if(/Type of msg:          -in_full \[\+\]/){$#A=-1;$f=1;} if(/^AWXX-Ready to commit \(96\)/ && ($f)){print join("\n",@A,$_);next}($f)?push(@A,$_):next;}' test

Thanks guys

---------- Post updated at 09:14 PM ---------- Previous update was at 09:13 PM ----------

Thanks guys,

Resolved with the following statement.

Code:
perl -lne '{if(/Type of msg:          -in_full \[\+\]/){$#A=-1;$f=1;} if(/^AWXX-Ready to commit \(96\)/ && ($f)){print join("\n",@A,$_);next}($f)?push(@A,$_):next;}' test

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Use strings from nth field from one file to match strings in entire line in another file, awk

I cannot seem to get what should be a simple awk one-liner to work correctly and cannot figure out why. I would like to use patterns from a specific field in one file as regex to search for matching strings in the entire line ($0) of another file. I would like to output the lines of File2 which... (1 Reply)
Discussion started by: jvoot
1 Replies

2. Shell Programming and Scripting

awk to remove field and match strings to add text

In file1 field $18 is removed.... column header is "Otherinfo", then each line in file1 is used to search file2 for a match. When a match is found the last four strings in file2 are copied to file1. Maybe: cut -f1-17 file1 and then match each line to file2 file1 Chr Start End ... (6 Replies)
Discussion started by: cmccabe
6 Replies

3. Shell Programming and Scripting

Extract text between two strings

Hi, I have a text like these: ECHO "BEGGINING THE SHELL....." MV FILE1 > FILE2 UNIQ_ID=${1} PARTITION_1=`${PL}/Q${CON}.KSH "SELECT ....." PARTITION_2=`${PL}/Q${CON}.KSH "SELECT ........" ${PL}/Q${CON}.KSH "CREATE ...." IF .... ....... I would like to extract only text that only... (4 Replies)
Discussion started by: mierdatuti
4 Replies

4. UNIX for Dummies Questions & Answers

Extracting 22-character strings from text using sed/awk?

Here is my task, I feel sure this can be accomplished with see/awk but can't seem to figure out how. I have large flat file from which I need to extract every case of a pairing of characters (GG) in this case PLUS the previous 20 characters. The output should be a list (which I plan to make... (17 Replies)
Discussion started by: Twinklefingers
17 Replies

5. Shell Programming and Scripting

Extract text between two strings

Hi I have something like this: EXAMPLE 1 CREATE UNIQUE INDEX "STRING_1"."STRING_2" ON "BOSNI_CAB_EVENTO" ("CD_EVENTO" , "CD_EJECUCION" ) PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE(INITIAL 5242880 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "DB1000_INDICES_512K"... (4 Replies)
Discussion started by: chrispaz81
4 Replies

6. UNIX for Advanced & Expert Users

bash/grep/awk/sed: How to extract every appearance of text between two specific strings

I have a text wich looks like this: clid=2 cid=6 client_database_id=35 client_nickname=Peter client_type=0|clid=3 cid=22 client_database_id=57 client_nickname=Paul client_type=0|clid=5 cid=22 client_database_id=7 client_nickname=Mary client_type=0|clid=6 cid=22 client_database_id=6... (3 Replies)
Discussion started by: Pioneer1976
3 Replies

7. Shell Programming and Scripting

AWK: How to extract text lines between two strings

Hi. I have a text test1.txt file like:Receipt Line1 Line2 Line3 End Receipt Line4 Line5 Line6 Canceled Receipt Line7 Line8 Line9 End (9 Replies)
Discussion started by: TQ3
9 Replies

8. UNIX for Dummies Questions & Answers

Using awk/sed to extract text between Strings

Dear Unix Gurus, I've got a data file with a few hundred lines (see truncated sample)... BEGIN_SCAN1 TASK_NAME=LA48 PDD Profiles PROGRAM=ArrayScan 1.00 21.220E+00 2.00 21.280E+00 END_DATA END_SCAN1 BEGIN_SCAN2 TASK_NAME=LA48 PDD Profiles 194.00 2.1870E+00 ... (5 Replies)
Discussion started by: tintin72
5 Replies

9. Shell Programming and Scripting

using awk to extract text between two constant strings

Hi, I have a file from which i need to extract data between two constant strings. The data looks like this : Line 1 SUN> read db @cmpd unit 60 Line 2 Parameter: CMPD -> "C00071" Line 3 Line 4 SUN> generate Line 5 tabint>ERROR: (Variable data) The data i need to extract is... (11 Replies)
Discussion started by: mjoshi
11 Replies

10. Shell Programming and Scripting

using AWK how to extract text between two same strings

I have a file like: myfile.txt it is easy to learn awk and begin awk scripting and awk has got many features awk is a powerful text processing tool Now i want to get the text between first awk and immediate awk not the third awk . How to get it ? its urgent pls help me and file is unevenly... (2 Replies)
Discussion started by: santosh1234
2 Replies
Login or Register to Ask a Question