Your script with the latest mods:
my sed script:
This is what I got:
I was wondering if there is any way I can limit the extent of either script to let say the first 10 occurrences only? That will significantly reduce the running time, and still allow me to 'sample' the data sufficiently to identify the consensus string for each file
Thanks a TON!
10 occurrences per line ? Or per file?
What is you expected output?
Could you repeat the results with mawk, do you know how to install it?
--
Your GNU sed script will only find one occurrence per line and one occurrence per set of files of regular and reversed/complemented versions (the latter because only part of the file is reversed). Any additional patterns will not be shown and neither will it be shown which files or records these belong to, is that as intended? In the sample in post #1 one it printed the filename and could take multiple files...
Since it only reverse part of the file(s) and searches the whole file(s) there is a risk that it will find a reversed match in a non-reversed part of the file, which would mean a false positive .
To counteract that, you would need something like this, using GNU sed:
So the sed script it looking for very different things than the awk script us, and is only suited to investigate of there is one occurrence of either pattern in a single (set of) files, and for multiple files you would need a shell loop, which would significantly slow down processing, whereas the awk version can scan multiple files at once.
Last edited by Scrutinizer; 04-12-2016 at 01:31 AM..
I will install mawk and report back
Sorry, my code should be as follows:
I can search all occurrences in each and every line using global:
I still wondering how would you limit your awk script to only 10 occurrences in the file
Last edited by Xterra; 04-12-2016 at 03:54 PM..
Reason: comment
No, that will not fly. This new sed code will match multiple lines per file, but whether you use global or not, this will still only one occurrence of a regular match or one reversed/complemented match per line, the latter only if there is no regular match on that line...
--
You could limit to 10 matches per file, like so, try:
Last edited by Scrutinizer; 04-12-2016 at 11:00 PM..
Reason: Swapped Function for the faster option...
I have this fastq file:
@M04961:22:000000000-B5VGJ:1:1101:9280:7106 1:N:0:86
GGGGGGGGGGGGCATGAAAACATACAAACCGTCTTTCCAGAAATTGTTCCAAGTATCGGCAACAGCTTTATCAATACCATGAAAAATATCAACCACACCA
+test-1
GGGGGGGGGGGGGGGGGCCGGGGGFF,EDFFGEDFG,@DGGCGGEGGG7DCGGGF68CGFFFGGGG@CGDGFFDFEFEFF:30CGAFFDFEFF8CAF;;8... (10 Replies)
Hi All,
I am trying to extract only characters from a string value eg: abcdedg1234.cnf
How can I extract only characters abcdedg and assign to a variable.
Please help.
Thanks (2 Replies)
Hello Folks..
I need your help ..
here the example of my problem..i know its easy..i don't all the commands in unix to do this especiallly sed...here my string..
dwc2_dfg_ajja_dfhhj_vw_dec2_dfgh_dwq
desired output is..
dwc2_dfg_ajja_dfhhj
it's a simple task with tail... (5 Replies)
Hello. How can i put all of the special characters on my keyboard into a string in c++ ?
I tried this but it doesn't work.
string characters("~`!@#$%^&*()_-+=|\}]{
How can i accomplish this?
Thanks in advance. (1 Reply)
Hi all,
I like to know how to get the count of each character in a given word. Using the commands i can easily get the output. How do it without using the commands ( in shell programming or any programming)
if you give outline of the program ( pseudo code )
i used the following commands
... (3 Replies)
Hi Everyone,
I have a.txt
12341" <sip:191@vo.my>;asdf=q"
116aaaa<sip:00091@vo.my>;penguin
would like to get the output
191
00091
Please advice.
Thanks (4 Replies)
This is a pretty straight-forward question. Within a program of mine, I have a string that's going to be used as a filename, but it might have some invalid characters in it that wouldn't be valid in a filename. If there are any invalid characters, I want to get rid of them and essentially squeeze... (4 Replies)
Hello everyone,
I'm writing a script to add a string to an XML file, right after a specified string that only occurs once in the file. For testing purposes I created a file 'testfile' that looks like this:
1
2
3
4
5
6
6
7
8
9
And this is the script as far as I've managed:
... (2 Replies)
Can someone please help me figure out what the command syntax I need to use is?
Here is what I am wanting to do.
I have hundreds of thousands of files I need to look for a specific search string in.
These files are spread across multiple subdirectories from one main directory.
I would like... (4 Replies)
I need help to strip out the first two characters of the variable $FileName. Please help.
FileName=`find . -mtime +0 -name '*'`
Contents of variable $FileName:
./SRIZVI4.MCR_IDEAS_REPORT.LAST.052705.075405.csv
I want to strip out "./" and place the contents in another variable. How do I... (3 Replies)