Finding similar strings between two files


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Finding similar strings between two files
# 1  
Old 04-09-2013
Finding similar strings between two files

Hi,
I have a file1 like this:
Code:
ABAT
ABCA1
ABCC1
ABCC5
ABCC8
ABCE1
ABHD2
ABL1
CAMTA1
ACBD3
ACCN1

And I have a second file like this:
Code:
chr19   46118590        46119564        MACS_peak_1499  3100.00 chr19   46122009        46148405        CYP2B7P1        -2445
chr1    7430312 7430990 MACS_peak_15    2848.88 chr1    6767970 7752353 CAMTA1  678
chr8    129223169       129223828       MACS_peak_3189  2087.99 chr8    129231543       129231616       MIR1208 -7715

I want a script that goes through the 9th column of my second file and find the strings which are similar to my first file and then put an extra string as an extra column at the end of the lines having those similar strings. The extra string should be "estrogen regulated". For example here my output should be a file like this:
Code:
chr19   46118590        46119564        MACS_peak_1499  3100.00 chr19   46122009        46148405        CYP2B7P1        -2445
chr1    7430312 7430990 MACS_peak_15    2848.88 chr1    6767970 7752353 CAMTA1  678         estrogen regulated
chr8    129223169       129223828       MACS_peak_3189  2087.99 chr8    129231543       129231616       MIR1208 -7715

I'm not sure if I could make it clear. Please let me know if you have more questions.
Thanks in advance

Last edited by a_bahreini; 04-09-2013 at 08:03 PM..
# 2  
Old 04-09-2013
You need to explicitly define what "similar" means for this assignment.
# 3  
Old 04-09-2013
I meant exactly the same string. In my example, CMTA1 in the column 9 of the second line in my second file is exactly the same as CMTA1 in my first file. Please let me know if that makes sense to you.
# 4  
Old 04-09-2013
Quote:
Originally Posted by a_bahreini
I meant exactly the same string. In my example, CMTA1 in the column 9 of the second line in my second file is exactly the same as CMTA1 in my first file. Please let me know if that makes sense to you.
It makes sense, but it doesn't match your example. In your example, you have CAMTA1 in column 9 of the second file; but there is no line with that value n the first file. CMTA1 is not the same as CAMTA1.
# 5  
Old 04-09-2013
Oh, I'm sorry. I corrected my post.
# 6  
Old 04-09-2013
OK. I can't really tell from your example, but it looks like you intend to have tabs as field separators. I will assume that the tabs were changed to spaces as part of a copy and paste process. Assuming that it is true, the following should do what you want:
Code:
awk '
FNR == NR {
        a[$1]
        next
}
$9 in a {
        $(NF + 1) = "estrogen regulated"
}
1' OFS="\t" file1 file2

As always, if you're using a Solaris/SunOS system, use /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk instead of awk.
This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 04-09-2013
That worked. Thanks a lot!
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Collapsing similar strings

I have a file that looks like this: BC00001 GA 2 2 3 3 2 5 1 5 3 3 2 4 ... (6 Replies)
Discussion started by: Xterra
6 Replies

2. Shell Programming and Scripting

Finding files in directory with similar names

So, I have a directory tree that has many files named thusly: X_REVY.PDF I need to find any files that have the same X portion (which can be nearly anything) as any another file (in any directory) but have different Y portions (which can be any number from 1-99). I then need it to return... (3 Replies)
Discussion started by: Kamezero
3 Replies

3. Shell Programming and Scripting

Finding difference in between two array's of strings

Hi, Can anybody help me in finding the difference between two array elements with the help of code pls. purge=("Purge Concurrent Request and/or Manager Data" "Purge Signon Audit data" "Purge Obsolete Workflow Runtime Data" "Purge Logs and Closed System Alerts") purge_1=("Purge Obsolete... (3 Replies)
Discussion started by: Y.balakrishna
3 Replies

4. Shell Programming and Scripting

Finding a text in files & replacing it with unique strings

Hallo Everyone. I have to admit I'm shell scripting illiterate . I need to find certain strings in several text files and replace each of the string by unique & corresponding text. I prepared a csv file with 3 columns: <filename>;<old_pattern>;<new_pattern> ... (5 Replies)
Discussion started by: gordom
5 Replies

5. Shell Programming and Scripting

Finding/replacing strings in some files based on a file

Hi, We have a file (e.g. a .csv file, but could be any other format), with 2 columns: the old value and the new value. We need to modify all the files within the current directory (including subdirectories), so find and replace the contents found in the first column within the file, with the... (9 Replies)
Discussion started by: Talkabout
9 Replies

6. Shell Programming and Scripting

awk to search similar strings and arrange in a specified pattern

Hi, I'm running a DB query which returns names of people and writes it in a text file as shown below: Carey, Jim; Cena, John Cena, John Sen, Tim; Burt, Terrence Lock, Jessey; Carey, Jim Norris, Chuck; Lee, Bruce Rock, Dwayne; Lee, Bruce I want to use awk and get all the names... (9 Replies)
Discussion started by: prashu_g
9 Replies

7. Shell Programming and Scripting

awk to search similar strings and add their values

Hi, I have a text file with the following content: monday,20 tuesday,10 wednesday,29 monday,10 friday,12 wednesday,14 monday,15 thursday,34 i want the following output: monday,45 tuesday,10 wednesday,43 friday,12 (3 Replies)
Discussion started by: prashu_g
3 Replies

8. Shell Programming and Scripting

Finding strings through multiple lines

Hi, I need to search for a multiple line pattern and remove it the pattern is search for (ln number) <TABLE name=*> and if 3 lines below that the line is (ln number) </TABLE> Then remove those 4 lines. Thank you (14 Replies)
Discussion started by: legolad
14 Replies

9. Shell Programming and Scripting

Finding number of strings in List

I have a list of strings stored in $Lst Example set Lst = "John Fred Kate Paul" I want to return 4 in this case. (1 Reply)
Discussion started by: kristinu
1 Replies

10. Shell Programming and Scripting

Finding strings

Hi I made a post earlier but now my problem has become a lot more complicated. So I have a file that looks like this: Name 1 13 94 1 AGGTT Name 1 31 44 1 TTCCG Name 1 13 94 2 AAAAATTTT Name 1 41 47 2 GGGGGGGGGGG So the file is tab delimited and what I want to do is find... (8 Replies)
Discussion started by: kylle345
8 Replies
Login or Register to Ask a Question