Search strings and highlight them using Perl or bash/awk/sed


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Search strings and highlight them using Perl or bash/awk/sed
# 1  
Old 08-10-2013
Question Search strings and highlight them using Perl or bash/awk/sed

Hi,

I have two files: a.doc and b.txt

Quote:
(1) a.doc with DNA sequences with lenghth at the end of sequence (no. of characters)

Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49
Quote:
(2) b.txt
GACAAGT
AAAAAG
TAATG
CAAGC
ACG
GCTTG
I wish to search the strings from file b.txt in a.doc and want to highlight them in a.doc with different colours using Perl or bash./awk/sed?

Please guide me. Smilie
Thanks!!!!!
# 2  
Old 08-10-2013
Try:
Code:
perl -lpe 'BEGIN{open b, "b.txt";chomp(@b=<b>)}{for $i (@b) {s/$i/\033[31m$i\033[0m/g}}' a.doc

This User Gave Thanks to bartus11 For This Post:
# 3  
Old 08-11-2013
Thanks Smilie
I would highly appreciate if you can explain this script.

I am getting following error while running the above script (1.pl):

Quote:
String found where operator expected at 1.pl line 1, near "lpe 'BEGIN{open b, "b.txt";chomp(@b=<b>)}{for $i (@b) {s/$i/\033[31m$i\033[0m/g}}'"
<Do you need to predeclare lpe?>
Bareword found where operator expected at 1.pl line 1, near "'BEGIN{open b, "b.txt";chomp(@b=<b>)}{for $i (@b) {s/$i/\033[31m$i\033[0m/g}}' a"
<Missing operator before a?>
syntax error at 1.pl line 1, near "lpe 'BEGIN{open b, "b.txt";chomp(@b=<b>)}{for $i (@b) {s/$i/\033[31m$i\033[0m/g}}' "
Execution of 1.pl aborted due to compilation errors.

Last edited by bioinfo; 08-11-2013 at 12:33 AM..
# 4  
Old 08-11-2013
I need some help here to help out Smilie
Whit this script I get only one single line, correct is 5, why?
Code:
awk 'NR==FNR {a[$0]=$0;next} {for (i in a) {if ($0~a[i]) print}}' b.txt a.txt
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48

correct answer is
Code:
Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49

Running a test like this show correctly all possibility
Code:
awk 'NR==FNR {a[$0]=$0;next} {for (i in a) {print $0,a[i]}}' b.txt a.txt
Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21 ACG
Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21 TAATG
Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21 AAAAAG
Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21 GACAAGT
Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21 CAAGC
Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21 GCTTG
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50 ACG
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50 TAATG
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50 AAAAAG
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50 GACAAGT
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50 CAAGC
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50 GCTTG
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48 ACG
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48 TAATG
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48 AAAAAG
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48 GACAAGT
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48 CAAGC
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48 GCTTG
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49 ACG
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49 TAATG
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49 AAAAAG
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49 GACAAGT
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49 CAAGC
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49 GCTTG


Last edited by Jotne; 08-11-2013 at 05:49 AM..
# 5  
Old 08-11-2013
bioinfo, don't put my code in a file. Simply run it in a terminal as I posted it, replacing a.doc and b.txt for the filenames you have (in case they differ from those two).

Last edited by bartus11; 08-11-2013 at 09:37 AM..
This User Gave Thanks to bartus11 For This Post:
# 6  
Old 08-11-2013
@bartus11
Here is what I got when run your code
Code:
perl -lpe 'BEGIN{open b, "b.txt";chomp(@b=<b>)}{for $i (@b) {s/$i/\033[31m$i\033[0m/g}}' a.txt
Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49

As you see, it show the line with hits, but only highlight one hit, look at my post #4.
OP request is color on all data in bold.


For some strange reason this has same problem, it only highlight one hits.
Code:
awk 'NR==FNR {a[$0]=$0;next} {for (i in a) {gsub(a[i],"\033[1;31m&\033[0m",$0)}}1' b.txt a.txt
Seq1 -------------------------------TTAAAAAGTTTGAGTTCTAAA---------------- 21
Seq2 -----CTTGGCTCTTTCGTAAGTTTTTCATTAAGGAACTTGAATACACGGTTT----AC- 50
Seq3 TTAAACTTTTTTCAACCCTAATG-----CGGTTTGAACCATTAACC-----------TAAC 48
Seq4 --------GAAAGGAGCGGAGTG-GTCACGTGACAAGTTCTCAGACGCACGTGC--TTGT 49


Last edited by Jotne; 08-11-2013 at 09:36 AM..
This User Gave Thanks to Jotne For This Post:
# 7  
Old 08-11-2013
Jotne, post output of:
Code:
cat -ev a.txt
cat -ev b.txt

These 2 Users Gave Thanks to bartus11 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to search and lossless replace strings using sed?

Hello, I would like to replace all occurencies of long data types by others (coresponding int) using 'sed' in the extensive source code of a software package written for 32 bit CPUs. I must use regular expressions to avoid wrong replacements like s/unsigned]+long/ulong/gLeaving out... (2 Replies)
Discussion started by: Mick P. F.
2 Replies

2. Shell Programming and Scripting

Rsync script to rewrite suffix - BASH, awk, sed, perl?

trying to write up a script to put the suffix back. heres what I have but can't get it to do anything :( would like it to be name.date.suffix rsync -zrlpoDtub --suffix=".`date +%Y%m%d%k%M%S`.~" --bwlimit=1024 /mymounts/test1/ /mymounts/test2/ while IFS=. read -r -u 9 -d '' name... (1 Reply)
Discussion started by: jmituzas
1 Replies

3. Shell Programming and Scripting

Display records between two search strings using sed

I have input file like AAA AAA CCC CCC CCC EEE EEE EEE EEE FFF FFF GGG GGG i was trying to retrieve data between two strings using sed. sed -n /CCC/,/FFF/p input_file Am getting output like CCC CCC CCC (1 Reply)
Discussion started by: NareshN
1 Replies

4. Shell Programming and Scripting

Advance search using sed/awk/perl

Hi, I have a file with more than 50,000 lines of records and each record is 50 bytes in length. I need to search every record in this file between positions 11-19 (9 bytes) and 32-40 (9 bytes) and in case any of the above 2 fields is alpha-numeric, i need to replace the whole 9 bytes of that... (7 Replies)
Discussion started by: kikionline
7 Replies

5. Shell Programming and Scripting

Using Bash/Sed to delete between identical strings

Hi. I'm hoping that someone can help me with a bash script to delete a block of lines from a file. What I want to do is delete every line between two stings that are the same, including the line the first string is on but not the second. (Marked lines to match with !) For example if I... (2 Replies)
Discussion started by: Zykr
2 Replies

6. Shell Programming and Scripting

Using Awk to Search Two Strings on One Line

If i wanted to search for two strings that are on lines in the log, how do I do it? The following code searches for just one string that is one one line. awk '/^/ {split($2,s,",");a=$1 FS s} /failure agaf@fafa/ {b=a} END{print b}' urfile What if I wanted to search for "failure agaf@fafa"... (3 Replies)
Discussion started by: SkySmart
3 Replies

7. Shell Programming and Scripting

Help with sed search&replace between two strings

Hi, i read couple of threads here on forum, and googled about what bugs me, yet i still can't find solution. Problem is below. I need to change this string (with sed if it is possible): This is message text that is being quoted to look like this: This is message text that is being quotedI... (2 Replies)
Discussion started by: angrybb
2 Replies

8. UNIX for Advanced & Expert Users

bash/grep/awk/sed: How to extract every appearance of text between two specific strings

I have a text wich looks like this: clid=2 cid=6 client_database_id=35 client_nickname=Peter client_type=0|clid=3 cid=22 client_database_id=57 client_nickname=Paul client_type=0|clid=5 cid=22 client_database_id=7 client_nickname=Mary client_type=0|clid=6 cid=22 client_database_id=6... (3 Replies)
Discussion started by: Pioneer1976
3 Replies

9. Shell Programming and Scripting

Awk search for a element in the list of strings

Hi, how do I match a particular element in a list and replace it with blank? awk 'sub///' $FILE list="AL, AK, AZ, AR, CA, CO, CT, DE, FL, GA, HI, ID, IL, IN, IA, KS, KY, LA, ME, MD, MA, MI, MN, MS, MO, MT, NE, NV, NH, NJ, NM, NY, NC, ND, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VT, VA, WA,... (2 Replies)
Discussion started by: grossgermany
2 Replies

10. Shell Programming and Scripting

awk search for Quoted strings (')

Hi All, I have files: 1. abc.sql 'This is a sample file for testing' This does not have quotations this also does not have quotations. and this 'has quotations'. here I need to list the hard coded strings 'This is a sample file for testing' and 'has quotations'. So i have... (13 Replies)
Discussion started by: kprattip
13 Replies
Login or Register to Ask a Question