How to extract every repeated string between two specific string?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to extract every repeated string between two specific string?
# 8  
Old 10-08-2014
You haven't told us what OS or shell you're using. You have told us some suggestions aren't working, but you haven't supplied any details about how they failed. You have shown us sample input with 3 SOH ... ETX pairs, but the sample output you say you want from that input only has two output files. You haven't said what should happen to text before the 1st SOH, between one ETX and the next SOH, nor after the last ETX. Nonetheless, the following seems to do what you want (making several wild assumptions):
Code:
awk '
{	while(length)
		if(soh) {
			# We have already seen SOH...
			# Copy text until we find ETX.
			if(etx = index($0, "ETX")) {
				# ETX found...  print through ETX, close output
				# file, clear soh, and throw away te part of the
				# line we have already processed.
				printf("%s\n", substr($0, soh, etx - soh + 3)) > f
				close(f)
				soh = 0
				$0 = substr($0, etx + 3)
			} else {# ETX not found...  print rest of the line.
				printf("%s\n", substr($0, soh)) > f
				soh = 1
				next
			}
		} else {# Look for SOH...
			if(soh = index($0, "SOH")) {
				# SOH found... set output filename...
				f = "file" ++nof ".txt"
				continue
			} else {# SOH not found...
				next
			}
		}
}' file

If you want to use this on a Solaris/SunOS system, change awk to /usr/xp4/bin/awk, /usr/xp6/bin/awk, or nawk.
These 2 Users Gave Thanks to Don Cragun For This Post:
# 9  
Old 10-08-2014
Hello,

Sorry for bad description. Shell I used is /sbin/sh default shell of HPUX, os is HPUX. I have a big file that contains many tagged text.

Code:
SOH
...
ETX
SOH
...
ETX
.
.
.
ETX

I need to split this one big file into seperate text files. Per tagged text must be in the one text file. Thanks for your help. I' gonna try it.
# 10  
Old 10-08-2014
/sbin/sh is not the default shell of HP-UX unless HP has changed recently its strategy... the default shell is /usr/bin/sh which is a posix compatible shell or /usr/bin/ksh.
/sbin/sh is as its directory shows is not intented to be used by users other than root since its a true basic bourne shell with nothing more it was compiled statically so root can work in maintenance mode without having having more than / only mounted...
# 11  
Old 10-08-2014
Hi.

Standard HPUX utility csplit is designed for this. For example, assuming a well-formatted file, this script:
Code:
#!/usr/bin/env bash

# @(#) s1       Demonstrate context splitting, csplit.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C csplit

# Remove debris from previous runs.
rm -f xx*

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Results:"
# -z only in GNU/Linux.
# csplit -k -z $FILE '/^SOH/' '{*}'
csplit -k $FILE '/^SOH/' '{*}'
ls -lgo xx*

EXAMPLE=xx02
pl " Content for example file $EXAMPLE:"
cat $EXAMPLE

exit 0

will produce:
Code:
$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: HP-UX, B.11.11, 9000/785
Distribution        : GenericSysName [HP Release B.11.11] (see /etc/issue)
GNU bash 4.2.37
csplit - ( /usr/bin/csplit Nov 14 2000 )

-----
 Input data file data1:
SOH
Stuff in first section.
ETX
SOH
Stuff in second section.
ETX
SOH
Stuff in last section.
ETX

-----
 Results:
0
32
33
31
-rw-r--r--   1       0 Oct  8 18:29 xx00
-rw-r--r--   1      32 Oct  8 18:29 xx01
-rw-r--r--   1      33 Oct  8 18:29 xx02
-rw-r--r--   1      31 Oct  8 18:29 xx03

-----
 Content for example file xx02:
SOH
Stuff in second section.
ETX

Best wishes ... cheers, drl
These 2 Users Gave Thanks to drl For This Post:
# 12  
Old 10-08-2014
Hi dri,
Unfortunately, if you look at the sample data shown in the 1st post in this thread, the data:
Code:
SOH
bla bla bla
bla bla bla
ETX                SOH
bla bla bla
ETX
SOH
bla bla bla
ETX

is NOT what you call well-formatted. The spaces between the ETX and the SOH on the line marked in red apparently are not supposed to appear in any of the output files.
# 13  
Old 10-08-2014
Hi, Don.

Yes, I noticed that in the original post, but it was re-posted in #9, so that was the form I used.

I certainly agree that csplit as written in my script would not handle the original data, so I leave it up to sembii to decide which format is correct, which, in turn would suggest the appropriate solution ... cheers, drl
# 14  
Old 10-08-2014
How about this this:

Code:
awk '
length {
  sub("ETX.*", "ETX")
  print RS $0 > "file" ++n ".txt"
  close ("file" n ".txt")
}' RS="SOH" infile

input:
Code:
SOH
bla bla bla
bla bla bla
ETX IGNORE THIS    SOH
bla bla bla
ETX
SOH
bla bla bla
ETX


Code:
----file1.txt---
SOH
bla bla bla
bla bla bla
ETX

----file2.txt---
SOH
bla bla bla
ETX

----file3.txt---
SOH
bla bla bla
ETX

This User Gave Thanks to Chubler_XL For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need to filter string between specific string in ksh

My argument has data as below. 10.9.9.85 -rwxr-xr-x user1 2019-10-15 17:40 /app/scripts/testingscr5.scr 127869538 -rwxr-xr-x user1 2019-10-15 17:40 /app/scripts/testingscr56scr 127869538 ....... (note all these between lines will start with hyphen '-' ) -rwxr-xr-x user1 2019-10-15 17:40... (3 Replies)
Discussion started by: mohtashims
3 Replies

2. UNIX for Beginners Questions & Answers

How to grep repeated string on the same line?

I have this a file.txt with one line, whose content is /app/jdk/java/bin/java -server -Xms3g -Xmx3g -XX:MaxPermSize=256m -Dweblogic.Name=O2pPod8_mapp_msrv1_1 -Djava.security.policy=/app/Oracle/Middleware/Oracle_Home/wlserver/server/lib/weblogic.policy -Djava.security.egd=file:/dev/./urandom... (3 Replies)
Discussion started by: Lam
3 Replies

3. UNIX for Dummies Questions & Answers

Reading between a repeated string

I have a requirement where I have to read the lines between a repeated string FileName: abc.txt ls /data/abc.txt 1 2 #ZENCO 3 4 5 6 #ZENCO 11 213 454 7 #ZENCO (8 Replies)
Discussion started by: eskay
8 Replies

4. Shell Programming and Scripting

To Search for a string and to extract the string from the text

Hi Team I have an huge xml where i need to search for a ceratin numbers. For example 2014-05-06 15:15:41,498 INFO WebContainer : 10 CommonServicesLogs - CleansingTriggerService.invokeCleansingService Entered PUBSUB NOTIFY MESSAGE () - <?xml version="1.0" encoding="UTF-8"... (5 Replies)
Discussion started by: Kannannair
5 Replies

5. Shell Programming and Scripting

Search String and extract few lines under the searched string

Need Assistance in shell programming... I have a huge file which has multiple stations and i wanted to search particular station and extract few lines from it and the rest is not needed Bold letters are the stations . The whole file has multiple stations . Below example i wanted to search... (4 Replies)
Discussion started by: ajayram_arya
4 Replies

6. Shell Programming and Scripting

Extract a specific string from a file

Hi, I have a file whose contents are as follows. 2013-03-08/15:09:20.134 INFO 00000000-00000000 0034 09700400 CON_IN SessionID:ED5E1400-4805-85E2-17B2-5BE45684886A Connection ID:ED5E1400-4805-68F1-BB1D-F06496BCF910 TO:<sip:51234999@10.239.94.146:5060 FROM:<sip:9302280716@97.208.31.7:51024... (2 Replies)
Discussion started by: SunilB2011
2 Replies

7. Shell Programming and Scripting

Extract a string between 2 ref string from a file

Hi, May i ask if someone share some command for extracting a string between 2 ref string in a txt file My objective: i had a file with multiple lines and wants only to extract the string "watch?v=IbkAXOmEHpY" or "watch?v=<11 random character>", when i used "grep 'watch?=*' i got a results per... (4 Replies)
Discussion started by: jao_madn
4 Replies

8. Shell Programming and Scripting

to extract string from main string and string comparison

continuing from my previous post, whose link is given below as a reference https://www.unix.com/shell-programming-scripting/171076-shell-scripting.html#post302573569 consider there is create table commands in a file for eg: CREATE TABLE `Blahblahblah` ( `id` int(11) NOT NULL... (2 Replies)
Discussion started by: vivek d r
2 Replies

9. Shell Programming and Scripting

extract specific string and rename file

Hi all, I am working on a small prog.. i have a file.txt which contains random data... K LINES V4 ADD CODE `COMPANY` ADD CODE `DISTRIBUTOR` SEQ NAME^K LINES V5 SEQ NAME^K LINES V6 ADD `PACK-LDATE` SEQ NAME^K^KCOMMAND END^KHEADINFO... (1 Reply)
Discussion started by: mukeshguliao
1 Replies

10. Shell Programming and Scripting

Search for string in a file and extract another string to a variable

Hi, guys. I have one question: I need to search for a string in a file, and then extract another string from the file and assign it to a variable. For example: the contents of the file (group) is below: ... ftp:x:23: mail:x:34 ... testing:x:2001 sales:x:2002 development:x:2003 ...... (6 Replies)
Discussion started by: daikeyang
6 Replies
Login or Register to Ask a Question