Sponsored Content
Top Forums Shell Programming and Scripting Program to match the id and replace one letter in the content Post 302840221 by Don Cragun on Monday 5th of August 2013 03:11:01 AM
Old 08-05-2013
Quote:
Originally Posted by kaav06
Hi Don,
I am expecting the output file might be around 250kb to 300kb. All the sequences will have one header line starting with >sp.... The sequence line will have 60 letters each line. The change might happen anywhere not restricted to first line. The new record will start in new line with >sp and the end of the sequence will have *.

Thanks Kaavya

---------- Post updated at 03:35 PM ---------- Previous update was at 03:35 PM ----------

Hi Don,

The position count should start after the header line
There is no asterisk in your sample input. What do you mean by "the end of the sequence will have *."?

I know the position starts with 1 being the 1st character of the sequence. What I asked was what is the position of the 1st character of the second line of the sequence? Is it 61 or 62? (Do the newlines in the sequence count?)
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find out the match data content?!

Hi, Long list of Input file1 content: 1285_t 4860_i 4817_v 8288_c 9626_a . . . Long list of Input file2 content: 1285_t chris germany 8288_c steve england 9626_a dave swiss 9260_s stephanie denmark . . . (14 Replies)
Discussion started by: patrick87
14 Replies

2. Shell Programming and Scripting

Extract all content that match exactly only specific word

Input: 21 templeta parent 35718 36554 . - . ID=parent_cluster_50.21.11; Name=Partial%20parent%20for%20training%20set; 21 templeta kids 35718 36554 . - . ID=_52; Parent=parent_cluster_5085.21.11; 21 templeta ... (7 Replies)
Discussion started by: patrick87
7 Replies

3. Shell Programming and Scripting

Column content match and add suffix

My input chr3 galGal3_xenoRefFlat CDS 4178235 4178264 0.000000 + 0 gene_id "T6J4.19; T6J4_19"; transcript_id "T6J4.19; T6J4_19"; chr3 galGal3_xenoRefFlat exon 4178235 4178264 0.000000 + . gene_id "T6J4.19; T6J4_19"; transcript_id "T6J4.19;... (2 Replies)
Discussion started by: jacobs.smith
2 Replies

4. Shell Programming and Scripting

Letter Frequency Decryption Program in Perl

Hello, :/ (0 Replies)
Discussion started by: jvr42
0 Replies

5. Shell Programming and Scripting

Upper case letter match

Hi, im able to search for string in a file (ex: grep -w "$a" input.txt). but i have to search for the uppercase of a string in a file where upper case of the file content matches something like below. where upper("$a")== converted to upper case string in (input.txt) can someone please provide... (5 Replies)
Discussion started by: p_satyambabu
5 Replies

6. Shell Programming and Scripting

Sorting content between match pattern and move on with awk and sed

S 0.0 0.0 (reg, inst050) k e f d c S 0.0 0.0 (mux, m030) k g r s x v S 0.0 0.0 (reg, inst020) q s n m (12 Replies)
Discussion started by: ctphua
12 Replies

7. Shell Programming and Scripting

Replace specific letter in a file by other letter

Good afternoon all, I want to ask how to change some letter in my file with other letter in spesific line eg. data.txt 1 1 1 0 0 0 0 for example i want to change the 4th line with character 1. How could I do it by SED or AWK. I have tried to run this code but actually did not... (3 Replies)
Discussion started by: weslyarfan
3 Replies

8. Shell Programming and Scripting

Replace the first letter of each line by a capital

Hi, I need to replace, as the title says, the first letter of each line (when it's not a number) by the same letter, but capital. For instance : hello Who 123pass Would become : Hello Who 123pass Is there a way with sed to do that ? Or other unix command ? Thank you :) (7 Replies)
Discussion started by: ganon551
7 Replies

9. UNIX for Dummies Questions & Answers

Replace space in column with letter for several rows

I have a pbd file, which has the following format: TITLE Protein X MODEL 1 ATOM 1 N PRO 24 45.220 71.410 43.810 1.00 0.00 ATOM 2 H1 PRO 24 45.800 71.310 42.000 1.00 0.00 TER ENDMDL Column 22 is the chain... (5 Replies)
Discussion started by: Egy
5 Replies

10. Shell Programming and Scripting

awk command to get file content until 2 occurrence of pattern match

AWK command to get file content until 3 occurrence of pattern match, INPUT FILE: JMS_BODY_FIELD:JMSText = <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <custOptIn xmlns="http://com/walm/ta/cu/ccs/xml2"> <person>Romi</person> <appName>SAP</appName> </custOptIn> ... (4 Replies)
Discussion started by: prince1987
4 Replies
Grinder::KmerCollection(3pm)				User Contributed Perl Documentation			      Grinder::KmerCollection(3pm)

NAME
Grinder::KmerCollection - A collection of kmers from sequences SYNOPSIS
my $col = Grinder::KmerCollection->new( -k => 10, -file => 'seqs.fa' ); DESCRIPTION
Manage a collection of kmers found in various sequences. Store information about what sequence a kmer was found in and its starting position on the sequence. AUTHOR
Florent Angly <florent.angly@gmail.com> APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ new Title : new Usage : my $col = Grinder::KmerCollection->new( -k => 10, -file => 'seqs.fa', -revcom => 1 ); Function: Build a new kmer collection Args : -k set the kmer length (default: 10 bp) -revcom count kmers before and after reverse-complementing sequences (default: 0) -seqs count kmers in the provided arrayref of sequences (Bio::Seq objects) -ids if specified, index the sequences provided to -seq using the use the IDs in this arrayref instead of using the sequences $seq->id() method -file count kmers in the provided file of sequences -weights if specified, assign the abundance of each sequence from the values in this arrayref Returns : Grinder::KmerCollection object k Usage : $col->k; Function: Get the length of the kmers Args : None Returns : Positive integer weights Usage : $col->weights({'seq1' => 3, 'seq10' => 0.45}); Function: Get or set the weight of each sequence. Each sequence is given a weight of 1 by default. Args : hashref where the keys are sequence IDs and the values are the weight of the corresponding (e.g. their relative abundance) Returns : Grinder::KmerCollection object collection_by_kmer Usage : $col->collection_by_kmer; Function: Get the collection of kmers, indexed by kmer Args : None Returns : A hashref of hashref of arrayref: hash->{kmer}->{ID of sequences with this kmer}->[starts of kmer on sequence] collection_by_seq Usage : $col->collection_by_seq; Function: Get the collection of kmers, indexed by sequence ID Args : None Returns : A hashref of hashref of arrayref: hash->{ID of sequences with this kmer}->{kmer}->[starts of kmer on sequence] add_file Usage : $col->add_file('seqs.fa'); Function: Process the kmers in the given file of sequences. Args : filename Returns : Grinder::KmerCollection object add_seqs Usage : $col->add_seqs([$seq1, $seq2]); Function: Process the kmers in the given sequences. Args : * arrayref of Bio::Seq objects * arrayref of IDs to use for the indexing of the sequences Returns : Grinder::KmerCollection object filter_rare Usage : $col->filter_rare( 2 ); Function: Remove kmers occurring at less than the (weighted) abundance specified Args : integer Returns : Grinder::KmerCollection object filter_shared Usage : $col->filter_shared( 2 ); Function: Remove kmers occurring in less than the number of sequences specified Args : integer Returns : Grinder::KmerCollection object counts Usage : $col->counts Function: Calculate the total count of each kmer. Counts are affected by the weights you gave to the sequences. Args : * restrict sequences to search to specified sequence ID (optional) * starting position from which counting should start (optional) * 0 to report counts (default), 1 to report frequencies (normalize to 1) Returns : * arrayref of the different kmers * arrayref of the corresponding total counts sources Usage : $col->sources() Function: Return the sources of a kmer and their (weighted) abundance. Args : * kmer to get the sources of * sources to exclude from the results (optional) * 0 to report counts (default), 1 to report frequencies (normalize to 1) Returns : * arrayref of the different sources * arrayref of the corresponding total counts If the kmer requested does not exist, the array will be empty. kmers Usage : $col->kmers('seq1'); Function: This is the inverse of sources(). Return the kmers found in a sequence (given its ID) and their (weighted) abundance. Args : * sequence ID to get the kmers of * 0 to report counts (default), 1 to report frequencies (normalize to 1) Returns : * arrayref of sequence IDs * arrayref of the corresponding total counts If the sequence ID requested does not exist, the arrays will be empty. positions Usage : $col->positions() Function: Return the positions of the given kmer on a given sequence. An error is reported if the kmer requested does not exist Args : * desired kmer * desired sequence with this kmer Returns : Arrayref of the different positions. The arrays will be empty if the desired combination of kmer and sequence was not found. perl v5.14.2 2012-01-17 Grinder::KmerCollection(3pm)
All times are GMT -4. The time now is 06:32 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy