Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Reporting characters after string Post 302970690 by Xterra on Sunday 10th of April 2016 07:32:34 PM
Old 04-10-2016
Scrutinizer

Thanks! It works; however, it is taking a lot of time to run. I will be searching hundreds of strings among thousands of files; and I suspect it will take way too long. This is the time I got when using your script in one real dataset searching for only one string:
Code:
real    0m28.877s
user    0m28.734s
sys     0m0.015s

Same dataset with my old script:
Code:
awk 'FNR==1{n=substr(FILENAME,1,index(FILENAME,".")-1)} { print n "\t" (/AATTCCGGATCG|CGATCCGGAATT/ ? "ATCG" : "NOT Present" ) }'

I got this:
Code:
real    0m0.074s
user    0m0.031s
sys     0m0.030s

It is pretty fast but it does not report the last 4 characters so it's no good.
I kinda get what I want using the following sed scripts:
Code:
sed -n 's/^[ATCG]*AATTCCGG\(.\)\(.\)\(.\)\(.\)[ATCG]*$/\1\2\3\4/p' file
sed -n 's/^[ATCG]*\(.\)\(.\)\(.\)\(.\)CCGGAATT[ATCG]*$/\4\3\2\1/p' file | tr ATGC TACG

The timing for the individual script, again using the same dataset mentioned above:
Code:
real    0m0.482s
user    0m0.405s
sys     0m0.138s

I would like to combine both sed script into one. Maybe using an IF statement in bash. Even though I would like to avoid bash if at all possible
I am also not sure how to modify so it can output the last three characters after the string

Almost there
Code:
 sed -n '0,/AATTCCGG/s/^[ATCG]*AATTCCGG\(.\)\(.\)\(.\)\(.\)[ATCG]*$/\1\2\3\4/p; 0,/CCGGAATT/s/^[ATCG]*\(.\)\(.\)\(.\)\(.\)CCGGAATT[ATCG]*$/\4\3\2\1/p; y/ATCG/TAGC/'

However, the last script y/ATCG/TAGC/, is being ignored
I solved the problem with y/ATCG/TAGC/
Code:
sed -n '0,/AATTCCGG/s/^[ATCG]*AATTCCGG\(.\)\(.\)\(.\)\(.\)[ATCG]*$/\1\2\3\4/p; /AATTCCGG/!y/ATCG/TAGC/; s/^[ATCG]*\(.\)\(.\)\(.\)\(.\)GGCCTTAA[ATCG]*$/\4\3\2\1/p'

But if I add 0,/y/ATCG/TAGC/ to limit the extent of the script to the first occurrence exclusively
I guess I got it:
Code:
sed -n '0,/AATTCCGG/s/^[ATCG]*AATTCCGG\(.\)\(.\)\(.\)\(.\)[ATCG]*$/\1\2\3\4/p; 0,/CCGGAATT/y/ATCG/TAGC/; s/^[ATCG]*\(.\)\(.\)\(.\)\(.\)GGCCTTAA[ATCG]*$/\4\3\2\1/p'


Last edited by Xterra; 04-10-2016 at 09:16 PM.. Reason: Final version
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing characters from a string

I need help to strip out the first two characters of the variable $FileName. Please help. FileName=`find . -mtime +0 -name '*'` Contents of variable $FileName: ./SRIZVI4.MCR_IDEAS_REPORT.LAST.052705.075405.csv I want to strip out "./" and place the contents in another variable. How do I... (3 Replies)
Discussion started by: mh53j_fe
3 Replies

2. Shell Programming and Scripting

Looking for a string in files and reporting matches

Can someone please help me figure out what the command syntax I need to use is? Here is what I am wanting to do. I have hundreds of thousands of files I need to look for a specific search string in. These files are spread across multiple subdirectories from one main directory. I would like... (4 Replies)
Discussion started by: btrotter
4 Replies

3. Shell Programming and Scripting

Add string after another string with special characters

Hello everyone, I'm writing a script to add a string to an XML file, right after a specified string that only occurs once in the file. For testing purposes I created a file 'testfile' that looks like this: 1 2 3 4 5 6 6 7 8 9 And this is the script as far as I've managed: ... (2 Replies)
Discussion started by: heliode
2 Replies

4. Programming

string with invalid characters

This is a pretty straight-forward question. Within a program of mine, I have a string that's going to be used as a filename, but it might have some invalid characters in it that wouldn't be valid in a filename. If there are any invalid characters, I want to get rid of them and essentially squeeze... (4 Replies)
Discussion started by: cleopard
4 Replies

5. Shell Programming and Scripting

get certain characters in a string

Hi Everyone, I have a.txt 12341" <sip:191@vo.my>;asdf=q" 116aaaa<sip:00091@vo.my>;penguin would like to get the output 191 00091 Please advice. Thanks (4 Replies)
Discussion started by: jimmy_y
4 Replies

6. UNIX for Dummies Questions & Answers

Count the characters in a string

Hi all, I like to know how to get the count of each character in a given word. Using the commands i can easily get the output. How do it without using the commands ( in shell programming or any programming) if you give outline of the program ( pseudo code ) i used the following commands ... (3 Replies)
Discussion started by: itkamaraj
3 Replies

7. Programming

C++ Special Characters in a String?

Hello. How can i put all of the special characters on my keyboard into a string in c++ ? I tried this but it doesn't work. string characters("~`!@#$%^&*()_-+=|\}]{ How can i accomplish this? Thanks in advance. (1 Reply)
Discussion started by: cbreiny
1 Replies

8. Shell Programming and Scripting

remove characters from string based on occurrence of a string

Hello Folks.. I need your help .. here the example of my problem..i know its easy..i don't all the commands in unix to do this especiallly sed...here my string.. dwc2_dfg_ajja_dfhhj_vw_dec2_dfgh_dwq desired output is.. dwc2_dfg_ajja_dfhhj it's a simple task with tail... (5 Replies)
Discussion started by: victor369
5 Replies

9. UNIX for Beginners Questions & Answers

Extract characters from a string name

Hi All, I am trying to extract only characters from a string value eg: abcdedg1234.cnf How can I extract only characters abcdedg and assign to a variable. Please help. Thanks (2 Replies)
Discussion started by: abhi_123
2 Replies

10. Shell Programming and Scripting

Outputting characters after a given string and reporting the characters in the row below --sed

I have this fastq file: @M04961:22:000000000-B5VGJ:1:1101:9280:7106 1:N:0:86 GGGGGGGGGGGGCATGAAAACATACAAACCGTCTTTCCAGAAATTGTTCCAAGTATCGGCAACAGCTTTATCAATACCATGAAAAATATCAACCACACCA +test-1 GGGGGGGGGGGGGGGGGCCGGGGGFF,EDFFGEDFG,@DGGCGGEGGG7DCGGGF68CGFFFGGGG@CGDGFFDFEFEFF:30CGAFFDFEFF8CAF;;8... (10 Replies)
Discussion started by: Xterra
10 Replies
All times are GMT -4. The time now is 08:40 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy