Using awk/sed to extract text between Strings


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Using awk/sed to extract text between Strings
# 1  
Old 12-21-2009
Using awk/sed to extract text between Strings

Dear Unix Gurus,

I've got a data file with a few hundred lines (see truncated sample)...

Code:
 
BEGIN_SCAN1
  TASK_NAME=LA48 PDD Profiles
  PROGRAM=ArrayScan
   1.00  21.220E+00
   2.00  21.280E+00
 END_DATA
 END_SCAN1
 BEGIN_SCAN2
  TASK_NAME=LA48 PDD Profiles
  194.00  2.1870E+00
   196.00  2.1190E+00
   198.00  2.0590E+00
   200.00  2.0070E+00
  END_DATA
 END_SCAN2
 BEGIN_SCAN3
  TASK_NAME=LA48 PDD Profiles
  198.00  1.8420E+00
  200.00  1.7850E+00
  END_DATA
 END_SCAN3
.....
.....
.....
 BEGIN_SCAN10
  TASK_NAME=LA48 PDD Profiles
  PROGRAM=ArrayScan
  MEAS_DATE=25-Sep-2006 11:19
  200.00  1.7610E+00
  END_DATA
 END_SCAN10
 BEGIN_SCAN11
  TASK_NAME=LA48 PDD Profiles
  PROGRAM=ArrayScan
  MEAS_DATE=25-Sep-2006 11:19
  200.00  1.7610E+00
  END_DATA
END_SCAN11
 BEGIN_SCAN12
  TASK_NAME=LA48 PDD Profiles
  PROGRAM=ArrayScan
  MEAS_DATE=25-Sep-2006 11:19
  200.00  1.7610E+00
  END_DATA
 END_SCAN12
 BEGIN_SCAN13
  TASK_NAME=LA48 PDD Profiles
  PROGRAM=ArrayScan
  MEAS_DATE=25-Sep-2006 11:19
  200.00  1.7610E+00
  END_DATA
END_SCAN13
.....
....
END_SCANn

What I want to do is extract only all text between the strings "BEGIN_SCANx" and "END_SCANx", where x is 1, 2, 3, .......10, 11, 12, and so on up to n and dump each into separate files.

I've tried extracting the information by looping over the file and using:
Code:
 
sed -n '/BEGIN_SCANx/,/END_SCANx/p' inputfile > outputfilex

However my problem is that when "x" is "1" the script extracts not only all the text between "BEGIN_SCAN1" and "END_SCAN1", but also all text between BEGIN_SCAN11 and END_SCAN11, BEGIN_SCAN12 and END_SCAN12, BEGIN_SCAN13 and END_SCAN13.

In this instance how do I get the script to select text between BEGIN_SCAN1 and END_SCAN1 only!

thanks!!
# 2  
Old 12-21-2009
Code:
sed -n '/BEGIN_SCAN1$/,/END_SCAN1$/p' inputfile > outputfilex

1$ means the line ends with the character "1"
# 3  
Old 12-21-2009
Quote:
Originally Posted by tintin72
Dear Unix Gurus,

I've got a data file with a few hundred lines (see truncated sample)...

Code:
 
BEGIN_SCAN1
  TASK_NAME=LA48 PDD Profiles
  PROGRAM=ArrayScan
   1.00  21.220E+00
   2.00  21.280E+00
 END_DATA
 END_SCAN1
 BEGIN_SCAN2
  TASK_NAME=LA48 PDD Profiles
  194.00  2.1870E+00
   196.00  2.1190E+00
   198.00  2.0590E+00
   200.00  2.0070E+00
  END_DATA
 END_SCAN2
 BEGIN_SCAN3
  TASK_NAME=LA48 PDD Profiles
  198.00  1.8420E+00
  200.00  1.7850E+00
  END_DATA
 END_SCAN3
.....
.....
.....
 BEGIN_SCAN10
  TASK_NAME=LA48 PDD Profiles
  PROGRAM=ArrayScan
  MEAS_DATE=25-Sep-2006 11:19
  200.00  1.7610E+00
  END_DATA
 END_SCAN10
 BEGIN_SCAN11
  TASK_NAME=LA48 PDD Profiles
  PROGRAM=ArrayScan
  MEAS_DATE=25-Sep-2006 11:19
  200.00  1.7610E+00
  END_DATA
END_SCAN11
 BEGIN_SCAN12
  TASK_NAME=LA48 PDD Profiles
  PROGRAM=ArrayScan
  MEAS_DATE=25-Sep-2006 11:19
  200.00  1.7610E+00
  END_DATA
 END_SCAN12
 BEGIN_SCAN13
  TASK_NAME=LA48 PDD Profiles
  PROGRAM=ArrayScan
  MEAS_DATE=25-Sep-2006 11:19
  200.00  1.7610E+00
  END_DATA
END_SCAN13
.....
....
END_SCANn

What I want to do is extract only all text between the strings "BEGIN_SCANx" and "END_SCANx", where x is 1, 2, 3, .......10, 11, 12, and so on up to n and dump each into separate files.

I've tried extracting the information by looping over the file and using:
Code:
 
sed -n '/BEGIN_SCANx/,/END_SCANx/p' inputfile > outputfilex

However my problem is that when "x" is "1" the script extracts not only all the text between "BEGIN_SCAN1" and "END_SCAN1", but also all text between BEGIN_SCAN11 and END_SCAN11, BEGIN_SCAN12 and END_SCAN12, BEGIN_SCAN13 and END_SCAN13.

In this instance how do I get the script to select text between BEGIN_SCAN1 and END_SCAN1 only!

thanks!!
Code:
awk '!/SCAN/{print}' in_file

using perl
Code:
perl -wln -e 'print if(!/scan/i)'

regards
# 4  
Old 12-21-2009
Quote:
Originally Posted by tintin72
What I want to do is extract only all text between the strings "BEGIN_SCANx" and "END_SCANx", where x is 1, 2, 3, .......10, 11, 12, and so on up to n and dump each into separate files.
This should gives you the sections in separated files (file1, file2 and so on):

Code:
awk '/BEGIN_SCAN/{s++}{print > "file" s}' file

# 5  
Old 12-21-2009
Thanks guys,

Your answers worked except for the answer given by gaurav1086

Code:
 
awk '!/SCAN/{print}' in_file

your script prints all lines not containing the word SCAN! I think that perhaps you may have misunderstood what I wanted. What I wanted was to print ALL text between and including two specific strings BEGIN_SCANx and END_SCANx.

Not to worry, the other answers do this just fine. Smilie

cheers!
# 6  
Old 12-21-2009
MySQL

Quote:
Originally Posted by tintin72
Thanks guys,

Your answers worked except for the answer given by gaurav1086

Code:
 
awk '!/SCAN/{print}' in_file

your script prints all lines not containing the word SCAN! I think that perhaps you may have misunderstood what I wanted. What I wanted was to print ALL text between and including two specific strings BEGIN_SCANx and END_SCANx.

Not to worry, the other answers do this just fine. Smilie

cheers!
Hello,
your file contains separate lines for
BEGIN_SCANx and END_SCAN
so all the contents goes in where the pattern /SCAN/ doesnt come into scene.
Yes I did misunderstood that you wanted each section of text into a different file.
Glad all the answers are working.
Cheers. Image

Last edited by gaurav1086; 12-21-2009 at 01:26 PM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract text between two strings

Hi, I have a text like these: ECHO "BEGGINING THE SHELL....." MV FILE1 > FILE2 UNIQ_ID=${1} PARTITION_1=`${PL}/Q${CON}.KSH "SELECT ....." PARTITION_2=`${PL}/Q${CON}.KSH "SELECT ........" ${PL}/Q${CON}.KSH "CREATE ...." IF .... ....... I would like to extract only text that only... (4 Replies)
Discussion started by: mierdatuti
4 Replies

2. UNIX for Dummies Questions & Answers

Extracting 22-character strings from text using sed/awk?

Here is my task, I feel sure this can be accomplished with see/awk but can't seem to figure out how. I have large flat file from which I need to extract every case of a pairing of characters (GG) in this case PLUS the previous 20 characters. The output should be a list (which I plan to make... (17 Replies)
Discussion started by: Twinklefingers
17 Replies

3. Shell Programming and Scripting

sed to extract all strings

Hi, I have a text file containing 2 lines as follows: I'm trying to extract all the strings following an "AME." The output would be as follows: BUSINESS_UNIT PROJECT_ID ACTIVITY_ID RES_USER1 RESOURCE_ID_FROM ANALYSIS_TYPE BI_DISTRIB_STATUS BUSINESS_UNIT PROJECT_ID ACTIVITY_ID... (5 Replies)
Discussion started by: simpletech369
5 Replies

4. Shell Programming and Scripting

Extract word from text (sed,awk, etc...)

Hello, I need some help extracting the number after the RBA e.g 15911688 from the below block of text (e.g: grep RBA |sed .......). The code should be valid for blocks if text generated at different times as well and not for the below text only. ... (2 Replies)
Discussion started by: drbiloukos
2 Replies

5. Shell Programming and Scripting

Extract text between two strings

Hi I have something like this: EXAMPLE 1 CREATE UNIQUE INDEX "STRING_1"."STRING_2" ON "BOSNI_CAB_EVENTO" ("CD_EVENTO" , "CD_EJECUCION" ) PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE(INITIAL 5242880 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "DB1000_INDICES_512K"... (4 Replies)
Discussion started by: chrispaz81
4 Replies

6. Shell Programming and Scripting

How to Extract text between two strings?

Hi, I want to extract some text between two strings in a line i am using following command i.e; awk '/-string1/,/-string2/' filename contents of file is--- line1 line2 aaa -bbb -ccc -string1 c,d,e -string2 line4 but it is showing complete line which is having searched strings. aaa... (19 Replies)
Discussion started by: emresearch
19 Replies

7. UNIX for Advanced & Expert Users

bash/grep/awk/sed: How to extract every appearance of text between two specific strings

I have a text wich looks like this: clid=2 cid=6 client_database_id=35 client_nickname=Peter client_type=0|clid=3 cid=22 client_database_id=57 client_nickname=Paul client_type=0|clid=5 cid=22 client_database_id=7 client_nickname=Mary client_type=0|clid=6 cid=22 client_database_id=6... (3 Replies)
Discussion started by: Pioneer1976
3 Replies

8. Shell Programming and Scripting

AWK: How to extract text lines between two strings

Hi. I have a text test1.txt file like:Receipt Line1 Line2 Line3 End Receipt Line4 Line5 Line6 Canceled Receipt Line7 Line8 Line9 End (9 Replies)
Discussion started by: TQ3
9 Replies

9. Shell Programming and Scripting

using awk to extract text between two constant strings

Hi, I have a file from which i need to extract data between two constant strings. The data looks like this : Line 1 SUN> read db @cmpd unit 60 Line 2 Parameter: CMPD -> "C00071" Line 3 Line 4 SUN> generate Line 5 tabint>ERROR: (Variable data) The data i need to extract is... (11 Replies)
Discussion started by: mjoshi
11 Replies

10. Shell Programming and Scripting

using AWK how to extract text between two same strings

I have a file like: myfile.txt it is easy to learn awk and begin awk scripting and awk has got many features awk is a powerful text processing tool Now i want to get the text between first awk and immediate awk not the third awk . How to get it ? its urgent pls help me and file is unevenly... (2 Replies)
Discussion started by: santosh1234
2 Replies
Login or Register to Ask a Question