07-22-2013
awk code to reconstruct sequence from alignment
Hi Everyone,
I need some help to construct a long 'Sbjct' string from the following input using incremental order of 'Sbjct' starting number (e.g. 26325115,33716368,33769033,34869860 etc.)
Different 'Sbject' string will be separated by 'NNNN's as:
(Sbjct:26325115-26325094)NNNN(Sbjct:33716368-33716347)NNNN(Sbjct:33769033-33769073)NNNN(Sbjct:34869860-34869889)
The output expected is shown in 'Example Output'.
--------- Example Input (just a small segment of the whole file)-----------
Score = 44.1 bits (22), Expect = 0.30
Identities = 28/30 (93%)
Strand = Plus / Plus
Query: 1684 atcaaaatgaccaaaatatttcattaaaaa 1713
|||||||||| |||||| ||||||||||||
Sbjct: 34869860 atcaaaatgaacaaaatgtttcattaaaaa 34869889
Score = 44.1 bits (22), Expect = 0.30
Identities = 22/22 (100%)
Strand = Plus / Minus
Query: 1758 ttagggtttagagttaaggggt 1779
||||||||||||||||||||||
Sbjct: 26325115 ttagggtttagagttaaggggt 26325094
Score = 44.1 bits (22), Expect = 0.30
Identities = 22/22 (100%)
Strand = Plus / Minus
Query: 1687 aaaatgaccaaaatatttcatt 1708
||||||||||||||||||||||
Sbjct: 33716368 aaaatgaccaaaatatttcatt 33716347
Score = 44.1 bits (22), Expect = 0.30
Identities = 38/42 (90%), Gaps = 1/42 (2%)
Strand = Plus / Plus
Query: 1734 ccctagggttaactaattcaaaccttagggtttagagttaag 1775
||||||| ||||||||| |||| ||||||||||||||||||
Sbjct: 33769033 ccctaggattaactaatctaaac-ttagggtttagagttaag 33769073
----------------------
---------------------- Example Output -----------
Whole Sbjct string
ttagggtttagagttaaggggtNNNNaaaatgaccaaaatatttcattNNNNccctaggattaactaatctaaac-ttagggtttagagttaagNNNNatcaaaatgaacaaaatgtttcattaaaaa
Thanks for your help.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Alright, I'm relativly new to the Unix enviroment and C in general. I'm writing a script for AWK to search through a file and return what it finds with a center alignment, but so far, I can't get it to work. If anyone could help me out, I'd really appreciate it. (1 Reply)
Discussion started by: Mavrick3020
1 Replies
2. UNIX for Dummies Questions & Answers
Hello,
Do we have any freeware which helps in alignment of code wrt spaces, sections etc?
Thanks (6 Replies)
Discussion started by: eagercyber
6 Replies
3. Shell Programming and Scripting
hi
I have a string pattern like
...
...
000446448742 00432265 040520100408 21974435 DEWSWATER GARRIER AAG IK4000 N 017500180000000000000000077000000000100
000446448742 00580937 040520100408 32083576 PEWSWATER BARRIER DAG GK4000 ... (6 Replies)
Discussion started by: zainravi
6 Replies
4. Shell Programming and Scripting
Hi all,
I have a file like this
ID 3BP5L_HUMAN Reviewed; 393 AA.
AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3;
DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot.
DT 05-JUL-2004, sequence version 1.
DT 05-SEP-2012, entry version 71.
FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies
5. Shell Programming and Scripting
So I have a file in the following format
>*42
abssdfalsdfkjfuf
asdhfskdkdklllllllffl
eiffejcif
>2
dfhucujf
dhfjdkfhskskkkkk
eifjvujf
ddftttyy
yyy
>~
ojcufk
kcdheycjc
djcyfjf
and I would like it to output
abssdfalsdfkjfufasdhfskdkdklllllllffleiffejcif (3 Replies)
Discussion started by: viored
3 Replies
6. Shell Programming and Scripting
Greetings!
Here's one which has been bugging me for a bit ;)
As might be known, LibreOffice is available to some of us Linux folk as a large set of debs. Of course, being a curious sort, I'd like to dig in and recreate the original tree which is composed of these assorted archives.
So, I... (1 Reply)
Discussion started by: LinQ
1 Replies
7. Shell Programming and Scripting
Hi,
I have one file with one column and several hundred entries
File1:
NA1
NA2
NA3And now I need to run a command within a mapping aligner tool to insert these sample names into a sequence alignment file (SAM) such that they look like this
@RG ID:Library1 SM:NA1 PL:Illumina ... (7 Replies)
Discussion started by: nans
7 Replies
8. UNIX for Dummies Questions & Answers
hello gurus,
I want to use an associative array from a file to populate a field of another file, by matching several columns in order of priority. If the first column matches, then i dont want to match $2. Similarly I only want to match $3 when $1 and $2 are not in associative array.
For the... (6 Replies)
Discussion started by: ritakadm
6 Replies
9. AIX
Hello,
P7 machine
PCI Express x8 Planar 3Gb SAS Adapter
RAID10 array(2 disks)(not AIX lvm) was configured and working, then one disk failed and IBM support replaced that. Now raid array is degraded, data is not lost. I see new disk model(same as original) serial and etc.
What I did trying... (0 Replies)
Discussion started by: vilius
0 Replies
10. UNIX for Beginners Questions & Answers
Dear All,
I am in the beginning stage of learning shell scripting and preparing shell script on my own now.
I would like to get help from fellow mates here.
As I am trying to take O/P with space included from I/P table.
Kindly guide me to align given I/P table as Expected O/P.
... (5 Replies)
Discussion started by: Raja007
5 Replies
LEARN ABOUT DEBIAN
squizz
SQUIZZ(1) User Manuals SQUIZZ(1)
NAME
squizz - Sequence format checker
SYNOPSIS
squizz [-AShlns] [-c format] [-f format] file
OPTIONS
Following command line options are allowed:
-A Restrict detection/verification to alignment formats (conflict with -S option).
-S Restrict detection/verification to sequence formats (conflict with -A option).
-c format
Convert detected sequence/alignment into format. This option implies strict alignment checking.
-f format
Assume input format is format. Do not try to detect the format, just verify that the given one is correct.
-h Usage display.
-l List all supported formats.
-n Count and report detected entries. This option is only available when the detection is restricted to a single type (with -A or -S
options) and strict checks (without -s option) are enabled.
-s Disable strict format checks (enabled by default).
DESCRIPTION
squizz is a sequence format file checker, but it has some conversion capabilities too.
squizz can detect the most common sequence and alignment formats :
* EMBL, FASTA, GCG, GDE, GENBANK, IG, NBRF, PIR (codata), RAW, and SWISSPROT.
* CLUSTAL, FASTA, MSF, NEXUS, PHYLIP (interleaved and sequential) and STOCKHOLM.
squizz can do some conversions too, if the format the input format is supported. Only 3 types are available : sequence to sequence, align-
ment to alignment, and alignment to sequence (the last one, sequence to alignment, require multiple alignments algorithms and cannot be
handled with formatting tools).
Strict format checks validate the previously detected objects, by making some sanity checks:
- sequence strings must exists.
- alignment is made of more than one sequence.
- alignment sequence strings must have the same length.
- alignment sequence names must exists, and be unique.
SEE ALSO
seqfmt(5), alifmt(5)
AUTHOR
Nicolas Joly (njoly@pasteur.fr), Institut Pasteur.
Unix 2009-05-19 SQUIZZ(1)