Deleting sequences based on character frequency Post: 302430285

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Deleting all characters from 350th character to 450th character from the log file

Hi All, I have a big log file i want to delete all characters (between 350th to 450th characters) starting at 350th character position to 450th character position. please advice or sample code.

2. Shell Programming and Scripting

Trimming sequences based on specific pattern

My files look like this And I need to cut the sequences at the last "A" found in the following 'pattern' -highlighted for easier identification, the pattern is the actual file is not highlighted. The expected result should look like this Thus, all the sequences would end with AGCCCTA...

3. Shell Programming and Scripting

Removing low frequency sequences

If I have a file with the following information And I would like to remove all the sequences with Freq less than 3, so I end up having the following file: I am currently using awk to accomplish this task but I am not getting the results I actually want. Any help will be greatly appreciated.

4. Shell Programming and Scripting

Trimming sequences based on Reference

My file looks something like this Wnat I need is to look for the Reference sequence (">Reference1") and based on the length of that sequence trim all the entries in that file. So, the rersulting file will contain all sequences with the same length, like this Thus, all sequences will keep...

5. Shell Programming and Scripting

Extract sequences based on the list

Hi, I have a file with more than 28000 records and it looks like below.. >mm10_refflat_ABCD range=chr1:1234567-2345678 tgtgcacactacacatgactagtacatgactagac....so on >mm10_refflat_BCD range=chr1:3234567-4545678... tgtgcacactacacatgactagtatgtgcacactacacatgactagta . . . . . so on ...

6. Shell Programming and Scripting

Selecting sequences based on scores

I have two files with thousands of sequences of different lengths. infile1 contains the actual sequences and infile2 the scores for each A, T, G and C in infile1. Something like this: infile1: >HZVJKYI01ECH5R TTGATGTGCCAGCTGCCGTTGGTGTGCCAA >HZVJKYI01AQWJ8 GGATATGATGATGAACTGGTTTGGCACACC...

7. Shell Programming and Scripting

Eliminating sequences based on Distances

I have to remove sequences from a file based on the distance value. I am attaching the file containing the distances (Distance.xls) The second file looks something like this: Sequences.txt >Sample1 Freq 59 ggatatgatgatgaactggt >Sample1 Freq 54 ggatatgatgttgaactggt >Sample1 Freq 44...

8. Shell Programming and Scripting

Print sequences from file2 based on match to, AND in same order as, file1

I have a list of IDs in file1 and a list of sequences in file2. I can print sequences from file2, but I'm asking for help in printing the sequences in the same order as the IDs appear in file1. file1: EN_comp12952_c0_seq3:367-1668 ES_comp17168_c1_seq6:1-864 EN_comp13395_c3_seq14:231-1088...

9. UNIX for Dummies Questions & Answers

Filling positions based on frequency

I have files with hundreds of sequences with frequency values reported as "Freq X" and missing characters represented by a dash ("-"), something like this >39sample Freq 4 TAGATGTGCCCGTGGGTTTCCCGTCAACACCGGATAGTAGCAGCACTA >22sample Freq 15 T-GATGTCGTGGGTTTCCCGTCAACACCGGCAAATAGTAGCAGCACTA...

10. Shell Programming and Scripting

Outputting sequences based on length with sed

I have this file: >ID1 AA >ID2 TTTTTT >ID-3 AAAAAAAAA >ID4 TTTTTTGGAGATCAGTAGCAGATGACAG-GGGGG-TGCACCCC Add I am trying to use this script to output sequences longer than 15 characters: sed -r '/^>/N;{/^.{,15}$/d}' The desire output would be this: >ID4...

LEARN ABOUT HPUX

tr

tr(1)							      General Commands Manual							     tr(1)

NAME

       tr - translate characters

SYNOPSIS

       string1 string2

       string1

       string1

       string1 string1

DESCRIPTION

       copies  the  standard input to the standard output with substitution or deletion of selected characters.  Input characters from string1 are
       replaced with the corresponding characters in string2.  If necessary, string1 and string2 can be quoted to avoid pattern  matching  by  the
       shell.

       recognizes the following command line options:

	      Translates on a byte-by-byte basis. When this flag is specified
			     does not support extended characters.

	      Complements the set of characters in
			     string1, which is the set of all characters in the current character set, as defined by the current setting of except
			     for those actually specified in the string1 argument. These characters are placed in the array in ascending collation
			     sequence, as defined by the current setting of

	      Deletes all occurrences of input characters or collating elements found in
			     the array specified in string1.

			     If  and are both specified, all characters except those specified by string1 are deleted. The contents of string2 are
			     ignored, unless is also specified. Note, however, that the same string cannot be used for both  the  and  the  flags;
			     when both flags are specified, both string1 (used for deletion) and string2 (used for squeezing) are required.

			     If is not specified, each input character or collating element found in the array specified by string1 is replaced by
			     the character or collating element in the same relative position specified by string2.

	      Replaces any character specified in
			     string1 that occurs as a string of two or more repeating characters as a single instance of the character in string2.

			     If the string2 contains a character class, the argument's array contains all of  the  characters  in  that  character
			     class. For example:

			     In  a  case conversion, however, the string2 array contains only those characters defined as the second characters in
			     each of the or character pairs, as appropriate. For example:

       The following abbreviation conventions can be used to introduce ranges of characters, repeated  characters  or  single-character  collating
       elements into the strings:

	      c1-c2 or
	      Stands for the range of collating elements
			     c1 through c2, inclusive, as defined by the current setting of the locale category.

	      Stands for all the characters belonging to the defined character class,
			     as defined by the current setting of locale category. The following character class names will be accepted when spec-
			     ified in string1: or Character classes are expanded in collation order.

			     When the and flags are specified together, any of the character class names are accepted in string2; otherwise,  only
			     character class names or are accepted in string2 and then only if the corresponding character class and respectively)
			     is specified in the same relative position in string1.  Such a specification is interpreted as  a	request  for  case
			     conversion.

			     When appears in string1 and appears in string2, the arrays contain the characters from the mapping in the category of
			     the current locale. When appears in string1 and appears in string2, the arrays contain the characters from  the  map-
			     ping in the category of the current locale.

	      Stands for all the characters or collating elements belonging to the same
			     equivalence  class  as  c,  as  defined by the current setting of locale category. An equivalence class expression is
			     allowed only in string1, or in string2 when it is being used by the combined and options.

	      Stands for     n repetitions of a.  If the first digit of n is n is considered octal; otherwise, n is treated as a decimal value.  A
			     zero  or missing n is interpreted as large enough to extend string2-based sequence to the length of the string1-based
			     sequence.

       The escape character can be used as in the shell to remove special meaning from any character in a string.  In addition, followed by 1,	2,
       or 3 octal digits represents the character whose ASCII code is given by those digits.

       An ASCII NULL character in string1 or string2 can be represented only as an escaped character; i.e. as but is treated like other characters
       and translated correctly if so specified.  NULL characters in the input are not stripped out unless the option is given.

EXTERNAL INFLUENCES

   Environment Variables
       provides a default value for the internationalization variables that are unset or null. If is unset or null, the default value of "C"  (see
       lang(5))  is  used.  If	any  of the internationalization variables contains an invalid setting, will behave as if all internationalization
       variables are set to "C".  See environ(5).

       If set to a non-empty string value, overrides the values of all the other internationalization variables.

       determines the interpretation of text as single and/or multi-byte characters, the classification of characters as printable, and the  char-
       acters matched by character class expressions in regular expressions.

       determines  the	locale that should be used to affect the format and contents of diagnostic messages written to standard error and informa-
       tive messages written to standard output.

       determines the location of message catalogues for the processing of

RETURN VALUE

       exits with one of the following values:

	      All input was processed successfully.

	      An error occurred.

EXAMPLES

       For the ASCII character set and default collation sequence, create a list of all the words in file1, one per line in file2, where a word is
       taken  to be a maximal string of alphabetics.  Quote the strings to protect the special characters from interpretation by the shell (012 is
       the ASCII code for a new-line (line feed) character):

       Same as above, but for all character sets and collation sequences:

       Translate all lower case characters in file1 to upper case and write the result to standard output.

       Use an equivalence class to identify accented variants of the base character in file1, strip them of diacritical marks and write the result
       to file2:

       Translate each digit in file1 to a (number sign), and write the result to file2.

       The (asterisk) tells to repeat the (number sign) enough times to make the second string as long as the first one.

AUTHOR

       was developed by OSF and HP.

SEE ALSO

       ed(1), sh(1), ascii(5), environ(5), lang(5), regexp(5).

STANDARDS CONFORMANCE

																	     tr(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Deleting all characters from 350th character to 450th character from the log file

Discussion started by: rajeshorpu

2. Shell Programming and Scripting

Trimming sequences based on specific pattern

Discussion started by: Xterra

3. Shell Programming and Scripting

Removing low frequency sequences

Discussion started by: Xterra

4. Shell Programming and Scripting

Trimming sequences based on Reference

Discussion started by: Xterra

5. Shell Programming and Scripting

Extract sequences based on the list

Discussion started by: Diya123

6. Shell Programming and Scripting

Selecting sequences based on scores

Discussion started by: Xterra

7. Shell Programming and Scripting

Eliminating sequences based on Distances

Discussion started by: Xterra

8. Shell Programming and Scripting

Print sequences from file2 based on match to, AND in same order as, file1

Discussion started by: pathunkathunk

9. UNIX for Dummies Questions & Answers

Filling positions based on frequency

Discussion started by: Xterra

10. Shell Programming and Scripting

Outputting sequences based on length with sed

Discussion started by: Xterra

LEARN ABOUT HPUX

tr