Hello,
I am searching large (~25gb) DNA sequence data in fasta short read format:
for short tandem repeats, meaning instances of any 2-6 character based run that are repeated in tandem a number of times given as an input variable. Seems like a reasonably simple job, but I'm having trouble developing a regex that will work. As a start, I have:
The substring constraints have to do with downstream requirements. But, I'm having trouble integrating in the regex that I want repeats of discrete motifs, not ANY 5 or more repeats (for example) of ANY 2-6 bases, which obviously returns every read.
Hello
on my cdrom, the length of the file names are 8 characters, not > 8. On a linux with the same cd, there are > 8 characters.
What's wrong.
Tanks
Urs (3 Replies)
I have a text file that I want to search for repeated lines and print those lines. These would be lines in the file that appear more than once. Is there a way to do this?
Thanks (4 Replies)
I need to search a very large file. 13g in size. i am looking for a record that has a value in the byte 4200 . how can i view the file or how can i search for value in the byte 4200? (1 Reply)
Hey All
Can any one please suggest the procedure to search a part of line in a very large file in which log entries are entered with very high speed.
i have trued with grep and egrep
grep 'text text text' <file-name>
egrep 'text text text' <file-name>
here 'text text text' is... (4 Replies)
I tried to make the title/subject detailed, but well.. have to keep it short as well.
I am wanting to take a large list of strings, and search through a large list of files to hopefully find numerous matches. I am not sure the quickest way to do this though.
// List of files
file1.txt... (2 Replies)
I've got a simple log file that looks something like this:
And I need to append it to look like this:
So I just want to add a timestamp and a static (non-variable) word to each line in the file. Is there an easy scripted way to cat the file and append that data to each line....?? (4 Replies)
without using conventional file searching commands like find etc, is it possible to locate a file if i just know that the file that i'm searching for contains a particular text like "Hello world" or something? (5 Replies)
if I wanted to know if the word DOG(followed by several random numbers) appears in col 1, how many times will that same word DOG* appeared in col 2? This is a very large file
Thanks! (7 Replies)
Hello,
I use UBUNTU 12.04.
I want to write a short program using awk to select some lines in a file based on a second file.
My first file has this format with about 400,000 lines and 47 fields:
SNP1 1 12.1
SNP2 1 13.2
SNP3 1 45.2
SNP4 1 23.4
My second file has this format:
SNP2
SNP3... (1 Reply)
Discussion started by: Homa
1 Replies
LEARN ABOUT DEBIAN
re-pcr
RE-PCR(1) General Commands Manual RE-PCR(1)NAME
re-PCR -- Find sequence tagged sites (STS) in DNA sequences
SYNOPSIS
re-PCR [-hV] -p hash-file [-g gaps] [-n mism] [-lq] [primer ...]
re-PCR [-hV] -P hash-file [-g gaps] [-n mism] [-l] [-m margin] [-O+|-] [-C batchcnt] [-o outfile] [-r+|-] [primers-file ...]
re-PCR [-hV] -s hash-file [-g gaps] [-n mism] [-lq] [-m margin] [-o outfile] [-r+|-] [left right lo[-hi] [...]]
re-PCR [-hV] -S hash-file [-g gaps] [-n mism] [-lq] [-m margin] [-O+|-] [-C batchcnt] [-o outfile] [-r+|-] [stsfile ...]
DESCRIPTION
Implements reverse searching (called Reverse e-PCR) to make it feasible to search the human genome sequence and other large genomes by per-
forming STS and primer searches.
OPTIONS -p=hash-file
Perform primer lookup using hash-file
-P=hash-file
Perform primer lookup using hash-file
-s=hash-file
Perform STS lookup using hash-file
-S=hash-file
Perform STS lookup using hash-file
-n=mism Set max allowed mismatches per primer for lookup
-g=gaps Set max allowed indels per primer for lookup
-m=margin Set variability for STS size for lookup
-l Use presize alignments (only if gaps>0)
-G Print alignments in comments
-d=min-max
Set default STS size
-r=+|- Enable/disable reverse STS lookup
-O=+|- Enable/disable syscall optimisation
-C=batchcnt
Set number of STSes per batch
-o=outfile
Set output file name
-q Quiet (no progress indicator)
EXAMPLE
famap -tN -b genome.famap org/chr_*.fa
fahash -b genome.hash -w 12 -f3 ${PWD}/genome.famap
re-PCR -s genome.hash -n1 -g1 ACTATTGATGATGA AGGTAGATGTTTTT 120-200
See famap(1) and fahash(1)SEE ALSO
/usr/share/doc/ncbi-epcr/README.txt
bioperl(1), e-pcr(1), famap(1) and fahash(1)AUTHORS
This manual page was written by Andreas Tille <tille@debian.org> for the Debian system (but may be used by others). Permission is granted
to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by
the Free Software Foundation.
On Debian systems, the complete text of the GNU General Public License can be found in /usr/share/common-licenses/GPL.
April 2008 RE-PCR(1)