Comparing lines within a word list


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Comparing lines within a word list
# 1  
Old 09-06-2015
Comparing lines within a word list

Hello all- New to this forum, and relatively new to using grep at the Terminal command line to work with regular expressions. I've got a background in math and some programming experience, so it's not been too difficult to learn the basics of searching through my word lists for particular types of matches. Now I've got a new problem that I don't know how to solve, and I'm not sure it's even the sort of thing that these tools can let me do.

I've got a long list of words. I would like to be able to find, for example, all of the sets of words within it that differ by exactly one letter. I know I can find all the three letter words that match the pattern
Code:
/AA./

but would I manually have to do this for
Code:
/AB./

and
Code:
/AC./

and
Code:
/AD./

and so on, if I wanted all three-letter strings in the list that are exactly one letter off (in the last position)? I would need then to do
Code:
/A.A/

,
Code:
/A.B/

and so on, for all the strings that were the same except for the middle position. Is there a smart way to do this? I would also eventually want to find sets of words that are the same except for exactly two letters, exactly three letters, and so on.

Any help would be greatly appreciated. Thanks for reading.

Best-
Dominick
# 2  
Old 09-06-2015
Not sure I understand. Unfortunately you don't mention your OS nor your shell version. Given you're using a recent shell that has the braceexpand (-B) option, would this help:
Code:
for i in A{A..Z}.; do echo grep "$i" file; done
grep AA. file
grep AB. file
grep AC. file
grep AD. file
.
.
.

This User Gave Thanks to RudiC For This Post:
# 3  
Old 09-06-2015
Hi dtalvacchio,
Welcome to the UNIX & Linux Forum. I agree with RudiC... I'm not sure that I understand what you're trying to do. In addition to telling us what operating system and shell you're using (i.e., output from the commands: uname -a (or, if you consider the node name of your system private, uname -srvp) and echo $SHELL), please show us a sample word list and the output you're trying to produce from that list.
This User Gave Thanks to Don Cragun For This Post:
# 4  
Old 09-06-2015
Quote:
Originally Posted by dtalvacchio
I've got a long list of words. I would like to be able to find, for example, all of the sets of words within it that differ by exactly one letter.
What exactly do you mean by that? It is obvious that "abc" and "abd" differ by one character but what about "abc" and "abcd"? What about "abbc"? And what about reversals? Are "dog" and "god" different by two characters per your requirement or are they identical?

Plus, either your requirement is trivial: search for all words where one character is arbitrary - or you will have difficulties because two words each one character off another word (and hence in the same set) will not be one character off each other in every case: "abd" is one character off "abc" and "xbc" is also one character off "abc" but "abd" and "xbc" differ in two characters.

I hope this helps.

bakunin

Last edited by bakunin; 09-07-2015 at 09:35 AM.. Reason: Forgot to ask another question.
This User Gave Thanks to bakunin For This Post:
# 5  
Old 09-07-2015
Hi.

Possibly of some help:
Code:
NAME
       agrep - search a file for a string or regular expression, with
       approximate matching capabilities

DESCRIPTION
       agrep searches the input filenames (standard input is the default, but
       see a warning under LIMITATIONS) for records containing strings which
       either exactly or approximately match a pattern.  A record is by
       default a line, but it can be defined differently using the -d option
       (see below).  Normally, each record found is copied to the standard
       output.  Approximate matching allows finding records that contain the
       pattern with several errors including substitutions, insertions, and
       deletions.  For example, Massechusets matches Massachusetts with two
       errors (one substitution and one insertion).  Running agrep -2
       Massechusets foo outputs all lines in foo containing any string with at
       most 2 errors from Massechusets.

-- man agrep, q.v.

The agrep code is in repositories for CentOS, Debian, Ubuntu, OpenSuSE, etc.

Best wishes ... cheers, drl
This User Gave Thanks to drl For This Post:
# 6  
Old 09-09-2015
Hello, thank you all for your replies, and apologies for the delay in responding in kind. Let me try to address all of your questions/comments. First, Don, hopefully the attached screenshot answers the bit about the OS and shell.

Now if I can clarify the problem I'm trying to solve. For the sake of simplicity let's say we're only dealing with a long list of three-letter words. What I am looking for are sets of words that share exactly (as opposed to at least) two letters in the same position. So if we roughly imagine a list of all three-letter words that are in some reliable English-language dictionary, one such set that would result from the search would be:

ACE
ADE
AGE
ALE
APE
ARE
ATE
AVE
AWE
AXE
AYE

And what I want is all such sets, for every permutation of the two fixed letters. Does that clarify the problem?

Thanks very much-- Dominick
Comparing lines within a word list-os-shelljpg
# 7  
Old 09-09-2015
I'm afraid you'll have to search for (e.g. grep) for every single combination of chars, for above it would be grep "A.E" or grep "A[[:alpha:]]E", to be more precise. Writing an awk script might cause less effort for the OS running just one command, but still it would need to open and close many many files...
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparing alternate lines of code

Hi gents, Have only a passing familiarity with linux/shell at this point, so please forgive simple question. I have text files that have lines something like the following: a b c d d d e f e f e f a b (6 Replies)
Discussion started by: cabled
6 Replies

2. Shell Programming and Scripting

Shell Script @ Find a key word and If the key word matches then replace next 7 lines only

Hi All, I have a XML file which is looks like as below. <<please see the attachment >> <?xml version="1.0" encoding="UTF-8"?> <esites> <esite> <name>XXX.com</name> <storeId>10001</storeId> <module> ... (4 Replies)
Discussion started by: Rajeev_hbk
4 Replies

3. UNIX for Dummies Questions & Answers

Delete lines with a word and their above lines

Hi, i have a file like this: A1 kdfjdljfdkljfdlf A2 lfjdlfkjddkjf A3 ***no hit*** A4 ldjfldjfdk A5 ***no hit*** A6 jldfjdlfjdlkfjd I want to remove the lines "***no hit*** and their above line to get an output file like this: (11 Replies)
Discussion started by: the_simpsons
11 Replies

4. UNIX for Dummies Questions & Answers

Comparing lines of data

Total UNIX Rookie, but I'm learning. I have columns of integer data separated by spaces, and I'm using a Mac terminal. What I want to do: 1. Compare "line 1 column 2" (x) to "line 2 column 2" (y); is y-x>=100? 2. If yes, display difference and y's line number 3. If no, increment x and y by... (9 Replies)
Discussion started by: markymarkg123
9 Replies

5. Shell Programming and Scripting

Search the word to be deleted and delete lines above this word starting from P1 to P3

Hi, I have to search a word in a text file and then I have to delete lines above from the word searched . For eg suppose the file is like this: Records P1 10,23423432 ,77:1 ,234:2 P2 10,9089004 ,77:1 ,234:2 ,87:123 ,9898:2 P3 456456 P1 :123,456456546 P2 abc:324234 (2 Replies)
Discussion started by: vsachan
2 Replies

6. Shell Programming and Scripting

Comparing lines of two different files

Hello, Please help me with this problem if you have a solution. I have two files: <file1> : In each line, first word is an Id and then other words that belong to this Id piMN-1 abc pqr xyz py12 niLM y12 FY4 pqs fiRLym F12 kite red <file2> : same as file1, but can have extra lds... (3 Replies)
Discussion started by: mira
3 Replies

7. Shell Programming and Scripting

comparing lines in file

i have 2 files and i want to compare i currently cat the files and awk print $1, $2 and doing if file1=file2 then fail, else exit 0 what i want to do is compare values, with column 1 being a reference i want to compare line by line and then still be able to do if then statement to see if worked... (1 Reply)
Discussion started by: sigh2010
1 Replies

8. Shell Programming and Scripting

Word count of lines ending with certain word

Hi all, I am trying to write a command that can help me count the number of lines in the /etc/passwd file ending in bash. I have read through other threads but am yet to find one indicating how to locate a specifc word at the end of a line. I know i will need to use the wc command but when i... (8 Replies)
Discussion started by: warlock129
8 Replies

9. Shell Programming and Scripting

comparing lines from 2 files

Hi Friends, I have 2 files A and B . I want to compare the 3rd line of file A and B . (I dont want to compare the 2 files, using diff or cmp). I just want to know whether 3rd line of A matches the 3 rd line of B. Can anybody share their knowledge on the same? Thanks , Vijaya (12 Replies)
Discussion started by: vijaya2006
12 Replies

10. Shell Programming and Scripting

Comparing a distinct value in 1 list with another list

Hi all, I need to compare the contents of 2 directories where the file contents are similar and take out the filenames whose contents does not exist within the 2 directories. Directory1 1 2 3 4 Directory2 54 55 56 57 Does anyone has a script which can do this? At the end of... (6 Replies)
Discussion started by: manualvin
6 Replies
Login or Register to Ask a Question