Sponsored Content
Top Forums UNIX for Dummies Questions & Answers String pattern matching and position Post 302914987 by biowizz on Saturday 30th of August 2014 02:16:23 AM
Old 08-30-2014
String pattern matching and position

I am not an expert with linux, but following various posts on this forum, I have been trying to write a script to match pattern of charters occurring together in a file.
My file has approximately 200 million characters (upper and lower case), with about 50 characters per line. I have merged all the lines together to make it one line using

Code:
tr -d '\n' < input.txt > oneLineInput.txt

I now have all charcters in my file in the same line without spaces.

I am trying to count the number of times the specific characters occur together. For example, in the file below

Code:
IamTryingtobuildascriptfortrestingthetyposinmysentence

I am trying to look for the pattern 'tr' that occurs in the sentence. The script I have now is

Code:
grep -o -i oneLineInput.txt -e tr | sort | uniq -c

The above script works perfectly fine for a small file, but when I try to run it on my actual file with more than 200 million characters, it takes ages to finish the task (I lost patience and did not check the total time taken).

Is there a way I can optimize the code?

Next, I have been trying to get the position of the match. For example, in the above example file, 'tr' is starts on 4th and 27th position. I just want the number as output.

Is it possible?

Thank you Smilie
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting a string matching a pattern from a line

Hi All, I am pretty new to pattern matching and extraction using shell scripting. Could anyone please help me in extracting the word matching a pattern from a line in bash. Input Sample (can vary between any of the 3 samples below): 1) Adaptec SCSI RAID 5445 2) Adaptec SCSI 5445S RAID 3)... (8 Replies)
Discussion started by: jharish
8 Replies

2. Shell Programming and Scripting

Find the position of lines matching string

I have a file with the below format, GS*8***** ST*1******** A* B* E* RMR*123455(This is the unique number to locate this row) F* SE*1*** GE** GS*9***** ST*2 H* J* RMR*567889(This is the unique number to locate this row) L* SE* GE***** (16 Replies)
Discussion started by: Muthuraj K
16 Replies

3. Shell Programming and Scripting

Get matching string pattern from a file

Hi, file -> temp.txt cat temp.txt /home/pradeep/123/a_asp.html /home/pradeep/123/a_asp1.html /home/pradeep/435/a_asp2.html /home/pradeep/arun/abc/a_dfr.html /home/pradeep/arun/123/a_kir.html /home/pradeep/123/arun/a_dir.html .... .... .. i need to get a_*.html(bolded strings... (4 Replies)
Discussion started by: pradebban
4 Replies

4. Shell Programming and Scripting

Fetching string after matching pattern from last

I have a file a file having entries are like @ram@sham@sita @krishan@kumar @deep@kumar@hello@sham in this file all line are having different no of pattern-@. need to fetch the substring after the last pattern. like sita kumar sham thanks in advance (3 Replies)
Discussion started by: saluja.deepak
3 Replies

5. UNIX for Dummies Questions & Answers

Extracting sub-string matching the pattern.

Hi, I have a string looks like the following: USERS 32767.9844 UNDOTBS1 32767.9844 SYSAUX 32767.9844 SYSTEM 32767.9844 EMS 8192 EMS 8192 EMS_INDEXES 4096 EMS_INDEXES 4096 8 rows selected. How do I extract a sub-string to get the expected output as following: EMS 8192 EMS_INDEXES 4096 ... (3 Replies)
Discussion started by: NetBear
3 Replies

6. Shell Programming and Scripting

Problems with Multiple Pattern String Matching

I am facing a problem and I would be grateful if you can help me :wall: I have a list of words like And I have a datafile like the box of the box of tissues out of of tissues out of the book, the the book, the pen and the the pen and the I want to find Patterns of “x.*x” where... (2 Replies)
Discussion started by: A-V
2 Replies

7. Shell Programming and Scripting

sed or awk command to replace a string pattern with another string based on position of this string

here is what i want to achieve... consider a file contains below contents. the file size is large about 60mb cat dump.sql INSERT INTO `table1` (`id`, `action`, `date`, `descrip`, `lastModified`) VALUES (1,'Change','2011-05-05 00:00:00','Account Updated','2012-02-10... (10 Replies)
Discussion started by: vivek d r
10 Replies

8. Shell Programming and Scripting

PHP - Regex for matching string containing pattern but without pattern itself

The sample file: dept1: user1,user2,user3 dept2: user4,user5,user6 dept3: user7,user8,user9 I want to match by '/^dept2.*/' but don't want to have substring 'dept2:' in output. How to compose such regex? (8 Replies)
Discussion started by: urello
8 Replies

9. Shell Programming and Scripting

Taking out part of a string by matching a pattern

Hi All, My Problem is like below. I have a file which contains just one row and contains data like PO_CREATE12457888888888889SK1234567878744551111111111SK89456321145789955455555SK8888888815788852222 i want to extract SK12345678 SK89456321 SK88888888 So basically SK and next 8... (4 Replies)
Discussion started by: Asfakul Islam
4 Replies

10. Shell Programming and Scripting

Replace String matching wildcard pattern

Hi, I know how to replace a string with another in a file. But, i wish to replace the below string pattern EncryptedPassword="{gafgfa}]\asffafsf312a" i.e EncryptedPassword="<any random string>" To EncryptedPassword="" i.e remove the random password to a empty string. Can you... (3 Replies)
Discussion started by: mohtashims
3 Replies
Devel::Refcount(3pm)					User Contributed Perl Documentation				      Devel::Refcount(3pm)

NAME
"Devel::Refcount" - obtain the REFCNT value of a referent SYNOPSIS
use Devel::Refcount qw( refcount ); my $anon = []; print "Anon ARRAY $anon has " . refcount($anon) . " reference "; my $otherref = $anon; print "Anon ARRAY $anon now has " . refcount($anon) . " references "; DESCRIPTION
This module provides a single function which obtains the reference count of the object being pointed to by the passed reference value. FUNCTIONS
$count = refcount($ref) Returns the reference count of the object being pointed to by $ref. COMPARISON WITH SvREFCNT This function differs from "Devel::Peek::SvREFCNT" in that SvREFCNT() gives the reference count of the SV object itself that it is passed, whereas refcount() gives the count of the object being pointed to. This allows it to give the count of any referent (i.e. ARRAY, HASH, CODE, GLOB and Regexp types) as well. Consider the following example program: use Devel::Peek qw( SvREFCNT ); use Devel::Refcount qw( refcount ); sub printcount { my $name = shift; printf "%30s has SvREFCNT=%d, refcount=%d ", $name, SvREFCNT($_[0]), refcount($_[0]); } my $var = []; printcount 'Initially, $var', $var; my $othervar = $var; printcount 'Before CODE ref, $var', $var; printcount '$othervar', $othervar; my $code = sub { undef $var }; printcount 'After CODE ref, $var', $var; printcount '$othervar', $othervar; This produces the output Initially, $var has SvREFCNT=1, refcount=1 Before CODE ref, $var has SvREFCNT=1, refcount=2 $othervar has SvREFCNT=1, refcount=2 After CODE ref, $var has SvREFCNT=2, refcount=2 $othervar has SvREFCNT=1, refcount=2 Here, we see that SvREFCNT() counts the number of references to the SV object passed in as the scalar value - the $var or $othervar respectively, whereas refcount() counts the number of reference values that point to the referent object - the anonymous ARRAY in this case. Before the CODE reference is constructed, both $var and $othervar have SvREFCNT() of 1, as they exist only in the current lexical pad. The anonymous ARRAY has a refcount() of 2, because both $var and $othervar store a reference to it. After the CODE reference is constructed, the $var variable now has an SvREFCNT() of 2, because it also appears in the lexical pad for the new anonymous CODE block. PURE-PERL FALLBACK An XS implementation of this function is provided, and is used by default. If the XS library cannot be loaded, a fallback implementation in pure perl using the "B" module is used instead. This will behave identically, but is much slower. Rate pp xs pp 225985/s -- -66% xs 669570/s 196% -- SEE ALSO
o Test::Refcount - assert reference counts on objects AUTHOR
Paul Evans <leonerd@leonerd.org.uk> perl v5.14.2 2011-11-15 Devel::Refcount(3pm)
All times are GMT -4. The time now is 03:11 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy