Wow, Chubler_XL, I stand in awe... After thirty years or so using Unix, Linux, and awk (among others, see, this is my work AND my hobby too), I am completely stupefied at:
and
Please, don't get me wrong: It is amazing to cut your code, paste it in my terminal and see the expected output... Is like listening "Bazinga!" in the background!
Would you please give us mere mortals a bit of feedback?
I have searched the internet (including these forums) and perhaps I'm not using the right wording.
What I'm looking for is a function (preferably C) that analyzes the similitude of two numerical or near-numerical values, and returns either a true/false (match/nomatch) or a return code that... (4 Replies)
Hi all
I have two files X.txt and Y.txt. Both file contains same number of sentences. The content of X.txt is
The filter described above may be combined.
and the content of Y.txt is
The filter describ+ed above may be combin+ed.
Some of the words are separated with "+"... (2 Replies)
Hello,
I'm new in Shell scripting but i should write a script, which inserts the license header out of a txt-File into the files in our Projekt. For the Java classes it runs without Problems but for XML files not. At xml-files i have to put the license Header after the xml-Header (?xml... (1 Reply)
Hi all,
I have a file like this
ID 3BP5L_HUMAN Reviewed; 393 AA.
AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3;
DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot.
DT 05-JUL-2004, sequence version 1.
DT 05-SEP-2012, entry version 71.
FT COILED 59 140 ... (1 Reply)
Hi,
I have one file with one column and several hundred entries
File1:
NA1
NA2
NA3And now I need to run a command within a mapping aligner tool to insert these sample names into a sequence alignment file (SAM) such that they look like this
@RG ID:Library1 SM:NA1 PL:Illumina ... (7 Replies)
I hope this makes sense and is possible.
I am trying to match $1 of panel_genes.txt with $3 of RefSeqGene.txt and when a match is found the value in $6 of RefSeqGene.txt
Example: ACTA2 is $1 of panel_genes.txt
ACTA2 NM_001613.2
ACTA2 NM_001141945.1
awk 'FNR==NR {... (4 Replies)
The below bash connects to a site, downloads a file, searches that file based of user input - could be multiple (all that seems to work). What I am not able to figure out is how to display on the screen match found or no match found" and write a file to a directory (C:\Users\cmccabe\Desktop\wget)... (4 Replies)
In the below file I am trying to grep or similar, all lines where only AF= is less than 0.4.. Thank you :).
grep
grep "AF=" ,+ .4 file
file
12 112036782 . T C 34.0248 PASS ... (3 Replies)
Having a little trouble getting this to work just right.
I have xml files that i want to split some data.
I have 2 <name> tags within the file
I would like to take only the first tag and split the data.
tag example.
From this.
TAB<Name>smith, john</Name>
to
TAB<Name>smith,... (8 Replies)
hi all,
trying this using shell/bash with sed/awk/grep
I have two files, one containing one column, the other containing multiple columns (comma delimited).
file1.txt
abc12345
def12345
ghi54321
...
file2.txt
abc1,text1,texta
abc,text2,textb
def123,text3,textc
gh,text4,textd... (6 Replies)
Discussion started by: shogun1970
6 Replies
LEARN ABOUT DEBIAN
bp_mask_by_search
BP_MASK_BY_SEARCH(1p) User Contributed Perl Documentation BP_MASK_BY_SEARCH(1p)NAME
mask_by_search - mask sequence(s) based on its alignment results
SYNOPSIS
mask_by_search.pl -f blast genomefile blastfile.bls > maskedgenome.fa
DESCRIPTION
Mask sequence based on significant alignments of another sequence. You need to provide the report file and the entire sequence data which
you want to mask. By default this will assume you have done a TBLASTN (or TFASTY) and try and mask the hit sequence assuming you've
provided the sequence file for the hit database. If you would like to do the reverse and mask the query sequence specify the -t/--type
query flag.
This is going to read in the whole sequence file into memory so for large genomes this may fall over. I'm using DB_File to prevent keeping
everything in memory, one solution is to split the genome into pieces (BEFORE you run the DB search though, you want to use the exact file
you BLASTed with as input to this program).
Below the double dash (--) options are of the form --format=fasta or --format fasta or you can just say -f fasta
By -f/--format I mean either are acceptable options. The =s or =n or =c specify these arguments expect a 'string'
Options:
-f/--format=s Search report format (fasta,blast,axt,hmmer,etc)
-sf/--sformat=s Sequence format (fasta,genbank,embl,swissprot)
--hardmask (booelean) Hard mask the sequence
with the maskchar [default is lowercase mask]
--maskchar=c Character to mask with [default is N], change
to 'X' for protein sequences
-e/--evalue=n Evalue cutoff for HSPs and Hits, only
mask sequence if alignment has specified evalue
or better
-o/--out/
--outfile=file Output file to save the masked sequence to.
-t/--type=s Alignment seq type you want to mask, the
'hit' or the 'query' sequence. [default is 'hit']
--minlen=n Minimum length of an HSP for it to be used
in masking [default 0]
-h/--help See this help information
AUTHOR - Jason Stajich
Jason Stajich, jason-at-bioperl-dot-org.
perl v5.14.2 2012-03-02 BP_MASK_BY_SEARCH(1p)