Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Using awk/sed to extract text between Strings Post 302381890 by tintin72 on Monday 21st of December 2009 08:37:51 AM
Old 12-21-2009
Using awk/sed to extract text between Strings

Dear Unix Gurus,

I've got a data file with a few hundred lines (see truncated sample)...

Code:
 
BEGIN_SCAN1
  TASK_NAME=LA48 PDD Profiles
  PROGRAM=ArrayScan
   1.00  21.220E+00
   2.00  21.280E+00
 END_DATA
 END_SCAN1
 BEGIN_SCAN2
  TASK_NAME=LA48 PDD Profiles
  194.00  2.1870E+00
   196.00  2.1190E+00
   198.00  2.0590E+00
   200.00  2.0070E+00
  END_DATA
 END_SCAN2
 BEGIN_SCAN3
  TASK_NAME=LA48 PDD Profiles
  198.00  1.8420E+00
  200.00  1.7850E+00
  END_DATA
 END_SCAN3
.....
.....
.....
 BEGIN_SCAN10
  TASK_NAME=LA48 PDD Profiles
  PROGRAM=ArrayScan
  MEAS_DATE=25-Sep-2006 11:19
  200.00  1.7610E+00
  END_DATA
 END_SCAN10
 BEGIN_SCAN11
  TASK_NAME=LA48 PDD Profiles
  PROGRAM=ArrayScan
  MEAS_DATE=25-Sep-2006 11:19
  200.00  1.7610E+00
  END_DATA
END_SCAN11
 BEGIN_SCAN12
  TASK_NAME=LA48 PDD Profiles
  PROGRAM=ArrayScan
  MEAS_DATE=25-Sep-2006 11:19
  200.00  1.7610E+00
  END_DATA
 END_SCAN12
 BEGIN_SCAN13
  TASK_NAME=LA48 PDD Profiles
  PROGRAM=ArrayScan
  MEAS_DATE=25-Sep-2006 11:19
  200.00  1.7610E+00
  END_DATA
END_SCAN13
.....
....
END_SCANn

What I want to do is extract only all text between the strings "BEGIN_SCANx" and "END_SCANx", where x is 1, 2, 3, .......10, 11, 12, and so on up to n and dump each into separate files.

I've tried extracting the information by looping over the file and using:
Code:
 
sed -n '/BEGIN_SCANx/,/END_SCANx/p' inputfile > outputfilex

However my problem is that when "x" is "1" the script extracts not only all the text between "BEGIN_SCAN1" and "END_SCAN1", but also all text between BEGIN_SCAN11 and END_SCAN11, BEGIN_SCAN12 and END_SCAN12, BEGIN_SCAN13 and END_SCAN13.

In this instance how do I get the script to select text between BEGIN_SCAN1 and END_SCAN1 only!

thanks!!
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

using AWK how to extract text between two same strings

I have a file like: myfile.txt it is easy to learn awk and begin awk scripting and awk has got many features awk is a powerful text processing tool Now i want to get the text between first awk and immediate awk not the third awk . How to get it ? its urgent pls help me and file is unevenly... (2 Replies)
Discussion started by: santosh1234
2 Replies

2. Shell Programming and Scripting

using awk to extract text between two constant strings

Hi, I have a file from which i need to extract data between two constant strings. The data looks like this : Line 1 SUN> read db @cmpd unit 60 Line 2 Parameter: CMPD -> "C00071" Line 3 Line 4 SUN> generate Line 5 tabint>ERROR: (Variable data) The data i need to extract is... (11 Replies)
Discussion started by: mjoshi
11 Replies

3. Shell Programming and Scripting

AWK: How to extract text lines between two strings

Hi. I have a text test1.txt file like:Receipt Line1 Line2 Line3 End Receipt Line4 Line5 Line6 Canceled Receipt Line7 Line8 Line9 End (9 Replies)
Discussion started by: TQ3
9 Replies

4. UNIX for Advanced & Expert Users

bash/grep/awk/sed: How to extract every appearance of text between two specific strings

I have a text wich looks like this: clid=2 cid=6 client_database_id=35 client_nickname=Peter client_type=0|clid=3 cid=22 client_database_id=57 client_nickname=Paul client_type=0|clid=5 cid=22 client_database_id=7 client_nickname=Mary client_type=0|clid=6 cid=22 client_database_id=6... (3 Replies)
Discussion started by: Pioneer1976
3 Replies

5. Shell Programming and Scripting

How to Extract text between two strings?

Hi, I want to extract some text between two strings in a line i am using following command i.e; awk '/-string1/,/-string2/' filename contents of file is--- line1 line2 aaa -bbb -ccc -string1 c,d,e -string2 line4 but it is showing complete line which is having searched strings. aaa... (19 Replies)
Discussion started by: emresearch
19 Replies

6. Shell Programming and Scripting

Extract text between two strings

Hi I have something like this: EXAMPLE 1 CREATE UNIQUE INDEX "STRING_1"."STRING_2" ON "BOSNI_CAB_EVENTO" ("CD_EVENTO" , "CD_EJECUCION" ) PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE(INITIAL 5242880 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "DB1000_INDICES_512K"... (4 Replies)
Discussion started by: chrispaz81
4 Replies

7. Shell Programming and Scripting

Extract word from text (sed,awk, etc...)

Hello, I need some help extracting the number after the RBA e.g 15911688 from the below block of text (e.g: grep RBA |sed .......). The code should be valid for blocks if text generated at different times as well and not for the below text only. ... (2 Replies)
Discussion started by: drbiloukos
2 Replies

8. Shell Programming and Scripting

sed to extract all strings

Hi, I have a text file containing 2 lines as follows: I'm trying to extract all the strings following an "AME." The output would be as follows: BUSINESS_UNIT PROJECT_ID ACTIVITY_ID RES_USER1 RESOURCE_ID_FROM ANALYSIS_TYPE BI_DISTRIB_STATUS BUSINESS_UNIT PROJECT_ID ACTIVITY_ID... (5 Replies)
Discussion started by: simpletech369
5 Replies

9. UNIX for Dummies Questions & Answers

Extracting 22-character strings from text using sed/awk?

Here is my task, I feel sure this can be accomplished with see/awk but can't seem to figure out how. I have large flat file from which I need to extract every case of a pairing of characters (GG) in this case PLUS the previous 20 characters. The output should be a list (which I plan to make... (17 Replies)
Discussion started by: Twinklefingers
17 Replies

10. Shell Programming and Scripting

Extract text between two strings

Hi, I have a text like these: ECHO "BEGGINING THE SHELL....." MV FILE1 > FILE2 UNIQ_ID=${1} PARTITION_1=`${PL}/Q${CON}.KSH "SELECT ....." PARTITION_2=`${PL}/Q${CON}.KSH "SELECT ........" ${PL}/Q${CON}.KSH "CREATE ...." IF .... ....... I would like to extract only text that only... (4 Replies)
Discussion started by: mierdatuti
4 Replies
adjust(1)						      General Commands Manual							 adjust(1)

NAME
adjust - simple text formatter SYNOPSIS
column] tabsize] [files]... DESCRIPTION
The command is a simple text formatter for filling, centering, left and right justifying, or only right justifying text paragraphs, and is designed for interactive use. It reads the concatenation of input files (or standard input if none are given) and produces on standard output a formatted version of its input, with each paragraph formatted separately. If is given as an input filename, reads standard input at that point (use as an argument to separate from options.) reads text from input lines as a series of words separated by space characters, tabs, or newlines. Text lines are grouped into paragraphs separated by blank lines. By default, text is copied directly to the output, subject only to simple filling (see below) with a right mar- gin of 72, and leading spaces are converted to tabs where possible. Options The command recognizes the following command-line options: Do not convert leading space characters to tabs on output; (output contains no tabs, even if there were tabs in input). Center text on each line. Lines are pre- and post-processed, but no filling is performed. Justify text. After filling, insert spaces in each line as needed to right justify it (except in the last line of each paragraph) while keeping the justified left margin. After filling text, adjust the indentation of each line for a smooth right margin (ragged left margin). Set the right fill margin to the given column number, instead of 72. Text is filled, and optionally right justified, so that no output line extends beyond this column (if possible). If is given, the current right margin of the first line of each paragraph is used for that and all subsequent lines in the para- graph. By default, text is centered on column 40. With the option sets the middle column of the centering "window", but auto- sets the right side as before (which then determines the center of the "window"). Set the tab size to other than the default (eight columns). Only one of the and options is allowed in a single command line. Details Before doing anything else to a line of input text, first handles backspaces, rubbing out preceding characters in the usual way. Next, it ignores all nonprintable characters except tab. It then expands all tabs to spaces. For simple text filling, the first word of the first line of each paragraph is indented the same amount as in the input line. Each word is then carried to the output followed by one space. "Words" ending in terminal_character[quote][closing_character] are followed by two spa- ces, where terminal_character is any of or quote is a single closing quote or double-quote character (), and close is any of or Here are some examples: does not place two spaces after a pair of single closing quotes following a terminal_character). starts a new output line whenever adding a word (other than the first one) to the current line would exceed the right margin. understands indented first lines of paragraphs (such as this one) when filling. The second and subsequent lines of each paragraph are indented the same amount as the second line of the input paragraph if there is a second line, else the same as the first line. also has a rudimentary understanding of tagged paragraphs (such as this one) when filling. If the second line of a paragraph is indented more than the first, and the first line has a word beginning at the same indentation as the second line, the input column position of the tag word or words (prior to the one matching the second line indentation) is preserved. Tag words are passed through without change of column position, even if they extend beyond the right margin. The rest of the line is filled or right justified from the position of the first nontag word. When is given, uses an intelligent algorithm to insert spaces in output lines where they are most needed, until the lines extend to the right margin. First, all one space word separators are examined. One space is added to each separator, starting with the one having the most letters between it and the preceding and following separators, until the modified line reaches the right margin. If all one space separators are increased to two spaces and more spaces must be inserted, the algorithm is repeated with two space separators, and so on. Output line indentation is held to one less than the right margin. If a single word is larger than the line size (right margin minus indentation), that word appears on a line by itself, properly indented, and extends beyond the right margin. However, if is used, such words are still right justified, if possible. If the current locale defines class names and (see iswctype(3C)), formats the text in accordance with the character classification and mar- gin settings (see and options). EXTERNAL INFLUENCES
Environment Variables provides a default value for the internationalization variables that are unset or null. If is unset or null, the default value of "C" (see lang(5)) is used. If any of the internationalization variables contains an invalid setting, will behave as if all internationalization variables are set to "C". See environ(5). If set to a nonempty string value, overrides the values of all the other internationalization variables. determines the interpretation of text as single and/or multi-byte characters, the classification of characters as printable, and the char- acters matched by character class expressions in regular expressions. determines the locale that should be used to affect the format and contents of diagnostic messages written to standard error and informa- tive messages written to standard output. determines the location of message catalogs for the processing of International Code Set Support Single- and multi-byte character code sets are supported. DIAGNOSTICS
complains to standard error and later returns a nonzero value if any input file cannot be opened (it skips the file). It does the same (but quits immediately) if the argument to or is out of range, or if the program is improperly invoked. Input lines longer than are silently split (before tab expansion) or truncated (afterwards). Lines that are too wide to center begin in column 1 (no leading spaces). EXAMPLES
This command is useful for filtering text while in vi(1). For example, reformats the rest of the current paragraph (from the current line down), evening the lines. The command: (where denotes control characters) sets up a useful "finger macro". Typing (Ctrl-X) reformats the entire current paragraph. is a simple way to break text into separate words without whitespace, except for tagged-paragraph tags. WARNINGS
This program is designed to be simple and fast. It does not recognize backslash to escape whitespace or other characters. It does not recognize tagged paragraphs where the tag is on a line by itself. It knows that lines end in newline or null, and how to deal with tabs and backspaces, but it does not do anything special with other characters such as form feed (they are simply ignored). For complex opera- tions, standard text processors are likely to be more appropriate. This program could be implemented instead as a set of independent programs, fill, center, and justify (with the option). However, this would be much less efficient in actual use, especially given the program's special knowledge of tagged paragraphs and last lines of para- graphs. AUTHOR
was developed by HP. SEE ALSO
nroff(1). adjust(1)
All times are GMT -4. The time now is 11:20 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy