Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Mean score value by ID over a defined genomic region Post 302954990 by fadista on Monday 14th of September 2015 05:23:46 AM
Old 09-14-2015
If the genomic regions in file 2 overlap any of the genomic regions of file 1, average the scores by ID

file1 fields explanation:
chromosome startPosition endPosition ID

file2 fields explanation:
chromosome startPosition endPosition score


One decimal place is enough. Thank you.
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

stack region

how can i determine that what percentage of stack region is currently is used? (i am using tru64 unix) (2 Replies)
Discussion started by: yakari
2 Replies

2. Post Here to Contact Site Administrators and Moderators

I cant updated the score on space invaders

Hello The same thing happen to me yesterday I canīt record my score on invaders game. (0 Replies)
Discussion started by: lo-lp-kl
0 Replies

3. Shell Programming and Scripting

remove lines based on score criteria

Hi guys, Please guide for Solution. PART-I INPUT FILE (has 2 columns ID and score) TC5584_1 93.9 DV161411_2 79.5 BP132435_5 46.8 EB682112_1 34.7 BP132435_4 29.5 TC13860_2 10.1 OUTPUT FILE (It shudn't contain the line ' BP132435_4 29.5 ' as BP132435 is repeated... (2 Replies)
Discussion started by: smriti_shridhar
2 Replies

4. Shell Programming and Scripting

Grade Score Script Project

What I thought would be an extremely simple project has proven more difficult for me than I thought. Here are the parameters: Thus far, I've been able to sort the final grades, but I'm having a lot of trouble with appending the correlating letter grade to the end of each line. Any help would be... (3 Replies)
Discussion started by: lazypeterson
3 Replies

5. UNIX for Dummies Questions & Answers

Genomic data processing

Dear fellow members, I've just joined the forum and am a newbie to shell scripting and programming. I'm stuck on the following problem. I'm working with large scale genomic data and need to do some analyses on it. Essentially it is text processing problem, so please don't mind the scientific... (0 Replies)
Discussion started by: mvaishnav
0 Replies

6. Shell Programming and Scripting

Region between lines

How can I find the regions between specific lines? I have a file which contains lines like this: chr1 0 17388 0 chr1 17388 17444 1 chr1 17444 17599 2 chr1 17599 17601 1 chr1 17601 569791 0 chr1 569791 569795 1 chr1 569795 569808 2 chr1 569808 569890 3 chr1 569890 570047 4 ... (9 Replies)
Discussion started by: linseyr
9 Replies

7. UNIX for Dummies Questions & Answers

overlapped genomic coordinates

Hi, I would like to know how can I get the ID of a feature if its genomic coordinates overlap the coordinates of another file. Example: Get the 4th column (ID) of this file1: chr1 10 100 gene1 chr2 3000 5000 gene2 chr3 200 1500 gene3 if it overlaps with a feature in this file2: chr2... (1 Reply)
Discussion started by: fadista
1 Replies

8. AIX

Change lv REGION in HDISK1

Dears my rootvg is missed up i can not extend the /opt as soon as i try to extend the Filesystem its give me that there is not enough space . as there any way to change the REGION of the LVs in HDISK1 ? lspv -p hdisk0 hdisk0: PP RANGE STATE REGION LV NAME TYPE ... (8 Replies)
Discussion started by: thecobra151
8 Replies

9. Shell Programming and Scripting

Split a file in more files based on score content

Dear All, I have the following file tabulated: ID distanceTSS score 8434 571269 10 10122 393912 9 7652 6 10 4863 1451 9 8419 39 2 9363 564 21 9333 7714 22 9638 8334 9 1638 1231 11 10701 918 1000 6587 32056 111 What I would like to do is the following, create 100 new files based... (5 Replies)
Discussion started by: paolo.kunder
5 Replies

10. Shell Programming and Scripting

Average score

awk '{if(len==0){last=$4;total=$6;len=1;getline}if($4!=last){printf("%s\t%f\n", last, total/len);last=$4;total=$6;len=1}else{total+=$6;len+=1}}END{printf("%s\t%f\n", last, total/len)}' exon.txt > output.txt In the attached file I am just trying to group all the same names in column $4 and then... (2 Replies)
Discussion started by: cmccabe
2 Replies
TIGR-GLIMMER(1) 					      General Commands Manual						   TIGR-GLIMMER(1)

NAME
tigr-glimmer -- Find/Score potential genes in genome-file using the probability model in icm-file SYNOPSIS
tigr-glimmer3 [genome-file] [icm-file] [[options]] DESCRIPTION
tigr-glimmer is a system for finding genes in microbial DNA, especially the genomes of bacteria and archaea. tigr-glimmer (Gene Locator and Interpolated Markov Modeler) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA. The IMM approach, described in our Nucleic Acids Research paper on tigr-glimmer 1.0 and in our subsequent paper on tigr-glimmer 2.0, uses a combination of Markov models from 1st through 8th-order, weighting each model according to its predictive power. tigr-glimmer 1.0 and 2.0 use 3-periodic nonhomogenous Markov models in their IMMs. tigr-glimmer is the primary microbial gene finder at TIGR, and has been used to annotate the complete genomes of B. burgdorferi (Fraser et al., Nature, Dec. 1997), T. pallidum (Fraser et al., Science, July 1998), T. maritima, D. radiodurans, M. tuberculosis, and non-TIGR projects including C. trachomatis, C. pneumoniae, and others. Its analyses of some of these genomes and others is available at the TIGR microbial database site. A special version of tigr-glimmer designed for small eukaryotes, GlimmerM, was used to find the genes in chromosome 2 of the malaria para- site, P. falciparum.. GlimmerM is described in S.L. Salzberg, M. Pertea, A.L. Delcher, M.J. Gardner, and H. Tettelin, "Interpolated Markov models for eukaryotic gene finding," Genomics 59 (1999), 24-31. Click here (http://www.tigr.org/software/glimmerm/) to visit the GlimmerM site, which includes information on how to download the GlimmerM system. The tigr-glimmer system consists of two main programs. The first of these is the training program, build-imm. This program takes an input set of sequences and builds and outputs the IMM for them. These sequences can be complete genes or just partial orfs. For a new genome, this training data can consist of those genes with strong database hits as well as very long open reading frames that are statistically almost certain to be genes. The second program is glimmer, which uses this IMM to identify putative genes in an entire genome. tigr-glimmer automatically resolves conflicts between most overlapping genes by choosing one of them. It also identifies genes that are suspected to truly overlap, and flags these for closer inspection by the user. These ``suspect'' gene candidates have been a very small percentage of the total for all the genomes analyzed thus far. tigr-glimmer is a program that... OPTIONS
-C n Use n as GC percentage of independent model Note: n should be a percentage, e.g., -C 45.2 -f Use ribosome-binding energy to choose start codon +f Use first codon in orf as start codon -g n Set minimum gene length to n -i filename Use filename to select regions of bases that are off limits, so that no bases within that area will be examined -l Assume linear rather than circular genome, i.e., no wraparound -L filename Use filename to specify a list of orfs that should be scored separately, with no overlap rules -M Input is a multifasta file of separate genes to be scored separately, with no overlap rules -o n Set minimum overlap length to n. Overlaps shorter than this are ignored. -p n Set minimum overlap percentage to n%. Overlaps shorter than this percentage of *both* strings are ignored. -q n Set the maximum length orf that can be rejected because of the independent probability score column to (n - 1) -r Don't use independent probability score column +r Use independent probability score column -r Don't use independent probability score column -s s Use string s as the ribosome binding pattern to find start codons. +S Do use stricter independent intergenic model that doesn't give probabilities to in-frame stop codons. (Option is obsolete since this is now the only behaviour -t n Set threshold score for calling as gene to n. If the in-frame score >= n, then the region is given a number and considered a potential gene. -w n Use "weak" scores on tentative genes n or longer. Weak scores ignore the independent probability score. SEE ALSO
tigr-adjust (1), tigr-anomaly (1), tigr-build-icm (1), tigr-check (1), tigr-codon-usage (1), tigr-compare-lists (1), tigr-extract (1), tigr-generate (1), tigr-get-len (1), tigr-get-putative (1), tigr-glimmer3 (1), tigr-long-orfs (1) http://www.tigr.org/software/glimmer/ Please see the readme in /usr/share/doc/glimmer for a description on how to use Glimmer. AUTHOR
This manual page was quickly copied from the glimmer web site by Steffen Moeller moeller@debian.org for the Debian system. TIGR-GLIMMER(1)
All times are GMT -4. The time now is 09:44 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy