Sponsored Content
Top Forums UNIX for Dummies Questions & Answers String pattern matching and position Post 302915209 by Scrutinizer on Monday 1st of September 2014 01:11:08 PM
Old 09-01-2014
Indeed it is best to keep the file original. Awk can be easily adjusted to work with the original file. For example an adjustment of Jotne's suggestion:

Code:
awk -F"[Tt][Rr]" '{gsub(/\n/,x); for (i=1;i<NF;i++) {p+=length($i); print ++a, p+1+(a-1)*2}}' RS=± file

Will maybe work with gawk and maybe mawk, since they have very good line limitations.

Also a perl solution like:
Code:
perl -0077 -ne 's/\n//g; print (++$c," ",(pos() +1 -2)."\n") while /tr/gi' file

But while it perhaps may be even less likely than awk to run into line length limitations, just like the awk approach it will read the entire file in memory, which with 200M records is at least a 200 MB footprint...

I came up with a similar approach to Don's, but it uses index() rather than match() and it works for variable length patterns:

Code:
awk -v pattern="tr" '
BEGIN {
  pat_width=length(pattern)
}

{
  curline=tolower($0)
  chunk=rest curline
  while (pos=index(chunk,pattern)) {
    relpos+=pos
    print ++count, basepos + relpos
    chunk=substr(chunk, pos+pat_width)
    relpos+=pat_width - 1
  } 
  relpos=1-pat_width
  rest=substr(curline, length(curline) - pat_width + 2)
  basepos+=length(curline)
}

' file

Also, with all the approaches so far, they will look for the next match AFTER last match.

This next approach will also find additional pattern that were already part of a previous match:

Code:
awk -v pattern="trt" '
BEGIN {
  pat_width=length(pattern)
}

{
  curline=tolower($0)
  chunk=rest curline
  while (pos=index(chunk,pattern)) {
    relpos+=pos
    print ++count, basepos + relpos
    chunk=substr(chunk, pos+1)
  } 
  rest=substr(curline, length(curline) - pat_width + 2)
  basepos+=length(curline)
  relpos=-pat_width+1
}

' file

If we take the last part of Don's example: trtRTrTR, when trying to match "try" it will find 3 matches, while the others find only two.

Output:
Code:
1 1
2 3
3 5
4 893
5 895
6 897

Whereas the previous (using the pattern "trt" ) will find:
Code:
1 1
2 5
3 893
4 897

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting a string matching a pattern from a line

Hi All, I am pretty new to pattern matching and extraction using shell scripting. Could anyone please help me in extracting the word matching a pattern from a line in bash. Input Sample (can vary between any of the 3 samples below): 1) Adaptec SCSI RAID 5445 2) Adaptec SCSI 5445S RAID 3)... (8 Replies)
Discussion started by: jharish
8 Replies

2. Shell Programming and Scripting

Find the position of lines matching string

I have a file with the below format, GS*8***** ST*1******** A* B* E* RMR*123455(This is the unique number to locate this row) F* SE*1*** GE** GS*9***** ST*2 H* J* RMR*567889(This is the unique number to locate this row) L* SE* GE***** (16 Replies)
Discussion started by: Muthuraj K
16 Replies

3. Shell Programming and Scripting

Get matching string pattern from a file

Hi, file -> temp.txt cat temp.txt /home/pradeep/123/a_asp.html /home/pradeep/123/a_asp1.html /home/pradeep/435/a_asp2.html /home/pradeep/arun/abc/a_dfr.html /home/pradeep/arun/123/a_kir.html /home/pradeep/123/arun/a_dir.html .... .... .. i need to get a_*.html(bolded strings... (4 Replies)
Discussion started by: pradebban
4 Replies

4. Shell Programming and Scripting

Fetching string after matching pattern from last

I have a file a file having entries are like @ram@sham@sita @krishan@kumar @deep@kumar@hello@sham in this file all line are having different no of pattern-@. need to fetch the substring after the last pattern. like sita kumar sham thanks in advance (3 Replies)
Discussion started by: saluja.deepak
3 Replies

5. UNIX for Dummies Questions & Answers

Extracting sub-string matching the pattern.

Hi, I have a string looks like the following: USERS 32767.9844 UNDOTBS1 32767.9844 SYSAUX 32767.9844 SYSTEM 32767.9844 EMS 8192 EMS 8192 EMS_INDEXES 4096 EMS_INDEXES 4096 8 rows selected. How do I extract a sub-string to get the expected output as following: EMS 8192 EMS_INDEXES 4096 ... (3 Replies)
Discussion started by: NetBear
3 Replies

6. Shell Programming and Scripting

Problems with Multiple Pattern String Matching

I am facing a problem and I would be grateful if you can help me :wall: I have a list of words like And I have a datafile like the box of the box of tissues out of of tissues out of the book, the the book, the pen and the the pen and the I want to find Patterns of “x.*x” where... (2 Replies)
Discussion started by: A-V
2 Replies

7. Shell Programming and Scripting

sed or awk command to replace a string pattern with another string based on position of this string

here is what i want to achieve... consider a file contains below contents. the file size is large about 60mb cat dump.sql INSERT INTO `table1` (`id`, `action`, `date`, `descrip`, `lastModified`) VALUES (1,'Change','2011-05-05 00:00:00','Account Updated','2012-02-10... (10 Replies)
Discussion started by: vivek d r
10 Replies

8. Shell Programming and Scripting

PHP - Regex for matching string containing pattern but without pattern itself

The sample file: dept1: user1,user2,user3 dept2: user4,user5,user6 dept3: user7,user8,user9 I want to match by '/^dept2.*/' but don't want to have substring 'dept2:' in output. How to compose such regex? (8 Replies)
Discussion started by: urello
8 Replies

9. Shell Programming and Scripting

Taking out part of a string by matching a pattern

Hi All, My Problem is like below. I have a file which contains just one row and contains data like PO_CREATE12457888888888889SK1234567878744551111111111SK89456321145789955455555SK8888888815788852222 i want to extract SK12345678 SK89456321 SK88888888 So basically SK and next 8... (4 Replies)
Discussion started by: Asfakul Islam
4 Replies

10. Shell Programming and Scripting

Replace String matching wildcard pattern

Hi, I know how to replace a string with another in a file. But, i wish to replace the below string pattern EncryptedPassword="{gafgfa}]\asffafsf312a" i.e EncryptedPassword="<any random string>" To EncryptedPassword="" i.e remove the random password to a empty string. Can you... (3 Replies)
Discussion started by: mohtashims
3 Replies
All times are GMT -4. The time now is 02:02 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy