Hi!
I'm trying to figure out how to find words with X number of doubles, only. I'm searching a dictionary, (one word per line). For instance, if you want to find words containing only one pair of double letters, you could do something like this:
That'll get rid of words with two, or more, doubles. But when you want to search for two or three sets of doubles, it gets a bit unwieldy.
And so on...
It seems to me that there must be a way to specify the max number of doubles in a single regex, but I cannot figure out how. I've found a number of pages online that talk about finding doubles, but none of them mention how to limit them to only the desired amount. I thought maybe a negative backreference could do it but, either I'm writing it wrong, or it just doesn't work. I still get words with more than X doubles.
And I tried a bunch of other stuff, but can't figure it out, so I'm turning to you all.
To find words that have at least one double but not more than 3:
The number can be parameterized with a shell variable.
Regards,
Alister
Hey Alister!
I don't really know sed very well, but let me see if I can figure this out. I think it's worth pointing out that I have no background with this stuff. I'm just learning on my own, in my spare time, such as it is.
In the first regex, it looks like you're finding any instance of doubles, then saying do not delete, presumably so the pattern gets passed to the second regex?
In the second regex, it looks like you're saying substitute "nothing" with the found pattern. I'm guessing the "4" is a quantifier? And the "t'"? I have no idea.
I feel I have a vague notion of what you're doing, but can't entirely parse the two regexes. But, here we are with two (three?) regexes again. Is it really not possible to do this in a single grep ERE, or PCRE?
/\(.\)\1/!d :
For all lines not matching the pattern, delete the pattern space. The ! is for the pattern and not for the action d. This means that for all lines not having at least 1 consecutive double character, the rest of the script will not be attempted to be executed and the next line from the input stream will be loaded in the pattern space.
s//&/4:
For all lines having at least 1 consecutive double character pair (filtered by the previous subcommand), try to substitute the 4th occurrence of the last matched pattern (that will be the pair of doubles matched by the first pattern, that is the meaning of //, not "nothing") with the matched string itself. Remember it's the 4th occurrence of the pattern and not the matched string.
t:
That's a programming command. It says that if the last substitution was successful (since the last line read), go to the end of the script (since no label is given). This ensures that if your line has 4 or more pairs of doubles, it will not be printed (helped by the -n option).
p:
And, if the line manages to cross that last barrier, just print it. This way you are assured that the line has from 1 to 3 double pairs.
Oh, seems like a long time since I used sed in my scripts.
------
@alister: Good one.
Last edited by elixir_sinari; 06-24-2013 at 02:12 PM..
These 2 Users Gave Thanks to elixir_sinari For This Post:
Hello,
I have two files. All urls are space seperated.
source
http://xx.yy.zz http://df.ss.sd.xz http://09.09.090.01
http://11.22.33 http://canada.xx.yy http://01.02.03.04
http://33.44.55 http://98.87.76.65 http://russia.xx.zz
http://aa.tt.xx.zz http://1w.2e.3r.4t http://china.rr.tt
... (4 Replies)
Hi!
I have concatenated two files which are wordlists, i.e., one word per line. The new file contains some doubles, but I cannot use sort and uniq as I need to keep the sort order that it is already in, which is not alphabetical, and uniq only compares adjacent lines, and the doubles are not on... (15 Replies)
Queue on node in domain
description :
type : local
max message len : 104857600
max queue depth : 5000
queue depth max event : enabled
persistent msgs : yes
backout threshold : 0
msg delivery seq :... (4 Replies)
to determine if two two doubles are equal, we check to see if
their absolute difference is very close to zero. . .if two numbers
are less than .00001 apart, theyre equal.
keep a count field in each record (as you did in p5).
once the list is complete, ask the user to see if an element
is on... (2 Replies)
Hi frnds
i want to desplay file names that should be word1 and word2
ex :
i have 10 *.log files
5 files having word1 and word2
5 files having only word1,
i have used below command
egrep -l 'word1|word2' *.log
its giving all 10 files, but i want to display only 5... (20 Replies)
I have been trying to find files containing the words AAA, BBB and CCC.
I tried:
grep AAA `grep BBB files*` grep CCC files*
but is does not work
I tried several ways
this is an easy one but I am a dummy, Does anyone can help me?
Thanks
:( (12 Replies)
I have a .txt file which contains several lines of text. I need to write a script program using grep or any other unix tool so as to detect part of the text (words) between / / that begin with the symbol ~.
For example if somewhere in the text appears a webpage address like... (8 Replies)
hey there,
i've been trrying to calculate the first 10000 fibonacci numbers using a long double. weird thing is that from a certain value it returns Inf.
i'm declaring the vars as
long double var;
and printing them to a file using:
fprintf(filepointer, "%.0Ld\n", var);
am i doing... (1 Reply)