Hello all- New to this forum, and relatively new to using grep at the Terminal command line to work with regular expressions. I've got a background in math and some programming experience, so it's not been too difficult to learn the basics of searching through my word lists for particular types of matches. Now I've got a new problem that I don't know how to solve, and I'm not sure it's even the sort of thing that these tools can let me do.
I've got a long list of words. I would like to be able to find, for example, all of the sets of words within it that differ by exactly one letter. I know I can find all the three letter words that match the pattern
but would I manually have to do this for
and
and
and so on, if I wanted all three-letter strings in the list that are exactly one letter off (in the last position)? I would need then to do
,
and so on, for all the strings that were the same except for the middle position. Is there a smart way to do this? I would also eventually want to find sets of words that are the same except for exactly two letters, exactly three letters, and so on.
Any help would be greatly appreciated. Thanks for reading.
Not sure I understand. Unfortunately you don't mention your OS nor your shell version. Given you're using a recent shell that has the braceexpand (-B) option, would this help:
Hi dtalvacchio,
Welcome to the UNIX & Linux Forum. I agree with RudiC... I'm not sure that I understand what you're trying to do. In addition to telling us what operating system and shell you're using (i.e., output from the commands: uname -a (or, if you consider the node name of your system private, uname -srvp) and echo $SHELL), please show us a sample word list and the output you're trying to produce from that list.
This User Gave Thanks to Don Cragun For This Post:
I've got a long list of words. I would like to be able to find, for example, all of the sets of words within it that differ by exactly one letter.
What exactly do you mean by that? It is obvious that "abc" and "abd" differ by one character but what about "abc" and "abcd"? What about "abbc"? And what about reversals? Are "dog" and "god" different by two characters per your requirement or are they identical?
Plus, either your requirement is trivial: search for all words where one character is arbitrary - or you will have difficulties because two words each one character off another word (and hence in the same set) will not be one character off each other in every case: "abd" is one character off "abc" and "xbc" is also one character off "abc" but "abd" and "xbc" differ in two characters.
I hope this helps.
bakunin
Last edited by bakunin; 09-07-2015 at 09:35 AM..
Reason: Forgot to ask another question.
Hello, thank you all for your replies, and apologies for the delay in responding in kind. Let me try to address all of your questions/comments. First, Don, hopefully the attached screenshot answers the bit about the OS and shell.
Now if I can clarify the problem I'm trying to solve. For the sake of simplicity let's say we're only dealing with a long list of three-letter words. What I am looking for are sets of words that share exactly (as opposed to at least) two letters in the same position. So if we roughly imagine a list of all three-letter words that are in some reliable English-language dictionary, one such set that would result from the search would be:
ACE
ADE
AGE
ALE
APE
ARE
ATE
AVE
AWE
AXE
AYE
And what I want is all such sets, for every permutation of the two fixed letters. Does that clarify the problem?
I'm afraid you'll have to search for (e.g. grep) for every single combination of chars, for above it would be grep "A.E" or grep "A[[:alpha:]]E", to be more precise. Writing an awk script might cause less effort for the OS running just one command, but still it would need to open and close many many files...
Hi gents,
Have only a passing familiarity with linux/shell at this point, so please forgive simple question.
I have text files that have lines something like the following:
a
b
c
d
d
d
e
f
e
f
e
f
a
b (6 Replies)
Hi All,
I have a XML file which is looks like as below. <<please see the attachment >>
<?xml version="1.0" encoding="UTF-8"?>
<esites>
<esite>
<name>XXX.com</name>
<storeId>10001</storeId>
<module>
... (4 Replies)
Hi, i have a file like this:
A1
kdfjdljfdkljfdlf
A2
lfjdlfkjddkjf
A3
***no hit***
A4
ldjfldjfdk
A5
***no hit***
A6
jldfjdlfjdlkfjd
I want to remove the lines "***no hit*** and their above line to get an output file like this: (11 Replies)
Total UNIX Rookie, but I'm learning. I have columns of integer data separated by spaces, and I'm using a Mac terminal.
What I want to do:
1. Compare "line 1 column 2" (x) to "line 2 column 2" (y); is y-x>=100?
2. If yes, display difference and y's line number
3. If no, increment x and y by... (9 Replies)
Hi,
I have to search a word in a text file and then I have to delete lines above from the word searched . For eg suppose the file is like this:
Records
P1
10,23423432
,77:1
,234:2
P2
10,9089004
,77:1
,234:2
,87:123
,9898:2
P3
456456
P1
:123,456456546
P2
abc:324234 (2 Replies)
Hello,
Please help me with this problem if you have a solution.
I have two files:
<file1> : In each line, first word is an Id and then other words that belong to this Id
piMN-1 abc pqr xyz py12
niLM y12 FY4 pqs
fiRLym F12 kite red
<file2> : same as file1, but can have extra lds... (3 Replies)
i have 2 files and i want to compare
i currently cat the files and awk print $1, $2 and doing if file1=file2 then fail, else exit 0
what i want to do is compare values, with column 1 being a reference i want to compare line by line and then still be able to do if then statement to see if worked... (1 Reply)
Hi all, I am trying to write a command that can help me count the number of lines in the /etc/passwd file ending in bash.
I have read through other threads but am yet to find one indicating how to locate a specifc word at the end of a line. I know i will need to use the wc command but when i... (8 Replies)
Hi Friends,
I have 2 files A and B . I want to compare the 3rd line of file A and B .
(I dont want to compare the 2 files, using diff or cmp). I just want to know whether 3rd line of A matches the 3 rd line of B. Can anybody share their knowledge on the same?
Thanks ,
Vijaya (12 Replies)
Hi all,
I need to compare the contents of 2 directories where the file contents are similar and take out the filenames whose contents does not exist within the 2 directories.
Directory1
1
2
3
4
Directory2
54
55
56
57
Does anyone has a script which can do this?
At the end of... (6 Replies)