Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

squizz(1) [debian man page]

SQUIZZ(1)							   User Manuals 							 SQUIZZ(1)

NAME
squizz - Sequence format checker SYNOPSIS
squizz [-AShlns] [-c format] [-f format] file OPTIONS
Following command line options are allowed: -A Restrict detection/verification to alignment formats (conflict with -S option). -S Restrict detection/verification to sequence formats (conflict with -A option). -c format Convert detected sequence/alignment into format. This option implies strict alignment checking. -f format Assume input format is format. Do not try to detect the format, just verify that the given one is correct. -h Usage display. -l List all supported formats. -n Count and report detected entries. This option is only available when the detection is restricted to a single type (with -A or -S options) and strict checks (without -s option) are enabled. -s Disable strict format checks (enabled by default). DESCRIPTION
squizz is a sequence format file checker, but it has some conversion capabilities too. squizz can detect the most common sequence and alignment formats : * EMBL, FASTA, GCG, GDE, GENBANK, IG, NBRF, PIR (codata), RAW, and SWISSPROT. * CLUSTAL, FASTA, MSF, NEXUS, PHYLIP (interleaved and sequential) and STOCKHOLM. squizz can do some conversions too, if the format the input format is supported. Only 3 types are available : sequence to sequence, align- ment to alignment, and alignment to sequence (the last one, sequence to alignment, require multiple alignments algorithms and cannot be handled with formatting tools). Strict format checks validate the previously detected objects, by making some sanity checks: - sequence strings must exists. - alignment is made of more than one sequence. - alignment sequence strings must have the same length. - alignment sequence names must exists, and be unique. SEE ALSO
seqfmt(5), alifmt(5) AUTHOR
Nicolas Joly (njoly@pasteur.fr), Institut Pasteur. Unix 2009-05-19 SQUIZZ(1)

Check Out this Related Man Page

SEQFMT(5)							   User Manuals 							 SEQFMT(5)

NAME
seqfmt - Sequences formats DESCRIPTION
This document illustrates some common formats used for sequences representation. EMBL ID MMVASPHOS standard; RNA; EST; 140 BP. AC X97897; DE M.musculus mRNA for protein homologous to DE vasodilator-stimulated phosphoprotein SQ Sequence 140 BP; 25 A; 58 C; 39 G; 17 T; 1 other; ttctcccaga agctgactct atggngaccc cgagagagac tgagcagaac 60 ccccgcaccc ctgcacttcc aatcaggggc gccccgggag cactccccgt 120 ccgccctccg cgcagccatg 140 // FASTA >MMVASPHOS ttctcccagaagctgactctatggngaccccgagagagactgagcagaacctggagccag ccccgcacccctgcacttccaatcaggggcgccccgggagcactccccgtggcgcgccgc ccgccctccgcgcagccatg GCG !!NA_SEQUENCE 1.0 (No documentation) dna1.txt Length: 88 Nov 22, 2001 14:38 Type: N Check: 3818 .. 1 TAGTCGTAGT CGGAGCGATG CTGACGATGA CGATGACGAT CGTAGCTGAT 51 CGATCGAGCT GATGCTGATC GAGCTAGCTG ATCGATCG GDE #sample1 TTCAAGAGAAACAGCGGCCAAGGAAAAGACTCGGCATGATTGTCCATAGCTTACAAAGCG #sample2 TTCAAGAGAAACAGCGGCTGGGGGAAAGACTCGTCCTGATTGCCTGTAGATGGTAAAGCG GENBANK LOCUS HUMHBV1 130 bp DNA PRI 17-JUN-1993 DEFINITION Human DNA/endogenous Hepatitis B virus (HBV) DNA, left host viral junction. ACCESSION M15770 BASE COUNT 32 a 43 c 29 g 26 t ORIGIN 1 agcgggcagt gcagctgctt ggacagcagg ggtgtttctt caacccaggc 61 ctcctgtcac aacaggccca ttcaattctg aacctgcaag ccaactccaa 121 cctcttttcc cagggggaac caaaaaccct // IG ; comment U03518 AACCTGCGGAAGGATCATTACCGAGTGCGGGTCCTTTGGGCCCAACCTCCCATCCGTGTC TATTGTACCCTGTTGCTTCGGCGGGCCCGCCGCTTGTCGGCCGCCGGGGGGGCGCCTCTG TGAGTTGATTGAATGCAATCAGTTAAAACTTTCAACAATGGATCTCTTGGTTCCGGC1 NBRF >P1;CCHU cytochrome c [validated] - human MGDVEKGKKIFIMKCSQCHTVEKGGKHKTGPNLHGLFGRKTGQAPGYSYTAANKNKGIIW GEDTLMEYLENPKKYIPGTKMIFVGIKKKEERADLIAYLKKATNE* PIR ENTRY CCHU #type complete TITLE cytochrome c [validated] - human ACCESSIONS A31764; A05676; I55192; A00001 SUMMARY #length 105 #molecular-weight 11749 #checksum 3247 SEQUENCE 5 10 15 20 25 30 1 M G D V E K G K K I F I M K C S Q C H T V E K G G K H K T G 31 P N L H G L F G R K T G Q A P G Y S Y T A A N K N K G I I W 61 G E D T L M E Y L E N P K K Y I P G T K M I F V G I K K K E 91 E R A D L I A Y L K K A T N E /// RAW ttctcccagaagctgactctatggngaccccgagagagactgagcagaacctggagccag ccccgcacccctgcacttccaatcaggggcgccccgggagcactccccgtggcgcgccgc ccgccctccgcgcagccatg Warning: This format cannot handle more than one sequence per file. SWISSPROT ID 100K_RAT STANDARD; PRT; 149 AA. AC Q62671; DE 100 kDa protein (EC 6.3.2.-). SQ SEQUENCE 149 AA; 17004 MW; D06484B8BC29112E CRC64; MMSARGDFLN YALSLMRSHN DEHSDVLPVL DVCSLKHVAY VFQALIYWIK PQLERKRTRE LLELGIDNED SEHENDDDTS QSATLNDKDD ESLPAETGQN SITIRPPDDQ HLPTANTCIS RLYVPLYSSK QILKQKLLLA IKTKNFGFV // SEE ALSO
squizz(1), alifmt(5) AUTHOR
Nicolas Joly (njoly@pasteur.fr), Institut Pasteur. Unix 2009-05-19 SEQFMT(5)
Man Page