Need help either with awk or sed to get text between words


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Need help either with awk or sed to get text between words
# 1  
Old 01-17-2012
Need help either with awk or sed to get text between words

Hello All,

My requirement is to get test between two words START & END, something like html tags
Eg. Input file:
START
Line1
Line2
Line3
CLOSE

START
Line4
Line5
Line6
END

START
Line7
START
Line8
END
Line9
END

Output Required:
START
Line4
Line5
Line6
END

START
Line8
END

START
Line7
Line9
END

The order of the two output blocks can change. That is not a problem.


Thanks and Regards,
Suneel C Koneru.
# 2  
Old 01-17-2012
This may be difficult without more information. You first have a START tag without an END. How do you know not to continue until reaching the end of file? Do you reset at a blank line?

---------- Post updated at 07:41 AM ---------- Previous update was at 07:22 AM ----------

I was able to recreate your output:
Code:
mute@eeepc:~$ ./tags file
START
Line4
Line5
Line6
END

START
Line8
END

START
Line7
Line9
END

mute@eeepc:~$

I create this awk program and comment it to help. This does not reset on blank line as I asked previously. In fact if you add a matching END tag to end of file, you'll get the elements which were not yet printed..

Code:
#!/usr/bin/awk -f

# p=print, counts the "block" we are in
# c[p] then is the record number (line count) within that block
$0 == "START" {p++; c[p]=0}

# if we reach END and are within a block, print out each stored record
$0 == "END"&&p {
	for (i=0;i<c[p];i++)
		print a[p,i]
	# in case of a nested block, we'll need to decrement to continue
	# capturing for it
	p--
	print $0 "\n"	# print END, and extra newline
	next		# don't process this line any further
}
# if in a block, store it in our array
p { a[p,c[p]++] = $0 }

If this is to be embedded in sh script, of course you can condense the code:
Code:
awk '/^START$/{p++;c[p]=0}/^END$/&&p{for(i=0;i<c[p];i++)print a[p,i];p--;print $0 RS;next}p{a[p,c[p]++]=$0}' file


Last edited by neutronscott; 01-17-2012 at 08:46 AM.. Reason: show 1-liner
This User Gave Thanks to neutronscott For This Post:
# 3  
Old 01-17-2012
absolutely what I wanted......

kudos to you Scott....... this is absolutely what I wanted....... Smilie
First block of lines 1, 2 & 3 was required to be omitted.........

thank you very very much.......
You saved my day.........

- Suneel

---------- Post updated at 06:29 PM ---------- Previous update was at 06:24 PM ----------

Can this be modified to use with any combination of upper or lower case letters given in the input file and return the output ............?
# 4  
Old 01-17-2012
you mean instead of START you can have Start, etc? This is not pretty in awk:

Replace either
Code:
$0 == "START"

with
Code:
toupper($0) == "START"


OR


Code:
/^[Ss][Tt][Aa][Rr][Tt]$/


Last edited by neutronscott; 01-17-2012 at 09:22 AM.. Reason: toupper() works too...
This User Gave Thanks to neutronscott For This Post:
# 5  
Old 01-17-2012
getting error

I now tried your code which you have put in "tags" file and I get error
> sh -x tags.sh test.txt
+ tags.sh == START '{p++'
tags.sh: line 5: tags.sh: command not found
+ c[p]='0}'
+ tags.sh == END
tags.sh: line 8: tags.sh: command not found
tags.sh: line 9: syntax error near unexpected token `('
tags.sh: line 9: ` for (i=0;i<c[p];i++)'


> cat tags.sh
#!/bin/bash
#!/usr/bin/awk -f

# p=print, counts the "block" we are in
# c[p] then is the record number (line count) within that block
$0 == "START" {p++; c[p]=0}

# if we reach END and are within a block, print out each stored record
$0 == "END"&&p {
for (i=0;i<c[p];i++)
print a[p,i]
# in case of a nested block, we'll need to decrement to continue
# capturing for it
p--
print $0 "\n" # print END, and extra newline
next # don't process this line any further
}

I'm running suse linux with /bin/bash
# 6  
Old 01-17-2012
That entire file is an awk program. Notice the #!/usr/bin/awk -f already on the first line.
You can:
  1. Set it executable (chmod a+x tags), and run it as ./tags
  2. Run as: awk -f tags.awk input-file
  3. Use the condensed 1-line version without storing any code in a separate file:
    awk 'tolower($0)=="start"{p++;c[p]=0}tolower($0)=="end"&&p{for(i=0;i<c[p];i++)print a[p,i];p--;print $0 RS;next}p{a[p,c[p]++]=$0}' input-file
This User Gave Thanks to neutronscott For This Post:
# 7  
Old 01-19-2012
that solved it.......

Thanks buddy........ sometimes brain doesn't just work for even small things when I struggle with bigger things............ Smilie

anyways.......... thank u very much.......... Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Format the text using sed or awk

I was able to figure out how to format a text. Raw Data: $ cat test Thu Aug 23 15:43:28 UTC 2018, hostname01, 232.02, 3, 0.00 Thu Aug 23 15:43:35 UTC 2018, hostname02, 231.09, 4, 0.31 Thu Aug 23 15:43:37 UTC 2018, hostname03, 241.67, 4, 0.43 (5 Replies)
Discussion started by: kenshinhimura
5 Replies

2. Shell Programming and Scripting

Awk, sed, shell all words in INPUT.txt find in column1 of TABLE.txt and replce with column2 in

Hi dears i have text file like this: INPUT.txt 001_1_173 j nuh ]az 001_1_174 j ]esma. nuh ]/.xori . . . and have another text like this TABLE.txt j j nuh word1... (6 Replies)
Discussion started by: alii
6 Replies

3. Shell Programming and Scripting

sed Find and Replace Text Between Two Strings or Words

I am looking for a sed in which I can recognize all of the text in between two indicators and then replace it with a place holder. For instance, the 1st indicator is a list of words "no|noone|havent" and the 2nd indicator is a list of punctuation ".|,|!".From a sentence such as "noone... (3 Replies)
Discussion started by: owwow14
3 Replies

4. Shell Programming and Scripting

Text replacement with awk or sed?

Hi guys, I worked for almost a half-day for the replacement of some text automatically with script. But no success. The problem is I have hundred of files, which need to be replaced with some new text. It's a painful work to work manually and it's so easy to do it wrong. For example, I... (2 Replies)
Discussion started by: liuzhencc
2 Replies

5. Debian

Using awk and sed to replace text

Good Day Every one I have a problem finding and replacing text in some large files that will take a long time to manually edit. Example text file looks like this #Example Large Text File unix linux dos squid bind dance bike car plane What im trying to do is to edit all the... (4 Replies)
Discussion started by: linuxjunkie
4 Replies

6. Shell Programming and Scripting

SED - delete words between two possible words

Hi all, I want to make an script using sed that removes everything between 'begin' (including the line that has it) and 'end1' or 'end2', not removing this line. Let me paste an 2 examples: anything before any string begin few lines of content end1 anything after anything before any... (4 Replies)
Discussion started by: meuser
4 Replies

7. Shell Programming and Scripting

sed/awk: Delete matching words leaving only the first instance

I have an input text that looks like this (comes already sorted): on Caturday 22 at 10:15, some event on Caturday 22 at 10:15, some other event on Caturday 22 at 21:30, even more events on Funday 23 at 11:00, yet another event I need to delete all the matching words between the lines, from... (2 Replies)
Discussion started by: GrinningArmor
2 Replies

8. Shell Programming and Scripting

Swapping lines beginning with certain words using sed/awk

I have a large file which reads like this: fixed-address 192.168.6.6 { hardware ethernet 00:22:64:5b:db:b1; host X; } fixed-address 192.168.6.7 { hardware ethernet 00:22:64:5b:db:b3; host Y; } fixed-address 192.168.6.8 { hardware ethernet 00:22:64:5b:db:b4; host A; }... (4 Replies)
Discussion started by: ksk
4 Replies

9. Shell Programming and Scripting

text transformation with sed or awk

Hi there, I'm trying to extract automatically opening hours from a website. The page displaying the schedules is http://www.natureetdecouvertes.com/pages/gener/view_FO_STORE_corgen.asp?mag_cod=xxx with xxx going from 101 to 174 I managed to get the following output : le lundi de 10.30 à... (4 Replies)
Discussion started by: chebarbudo
4 Replies

10. Shell Programming and Scripting

text processing ( sed/awk)

hi.. I have a file having record on in 1 line.... I want every 400 characters in a new line... means in 1st line 1-400 in 2nd line - 401-800 etc pl help. (12 Replies)
Discussion started by: clx
12 Replies
Login or Register to Ask a Question