05-09-2009
Need help with a file that prints letters from a file according to another file!
So basically what I want to do is pull out DNA sequences for a particular gene name.
I have 2 files (FILE1 and FILE2) and I want an output into a separate file (FILE3).
FILE1 and 2 are MASSIVE so I am only posting examples from each file.
So FILE1 looks like this (tab deliminted, 4 columns):
##gff-version 1
1154 10 + AAD6
418 7429 + AAH1
702 759 + AAT1
584 10 - ABF2
642 4894 - ACC1
651 7213 - ACN9
1055 3454 - ADE1
The next file, FILE2, looks like this:
>1154
ATCTCACTCGTAATTCTACATAATTTTGTTTATGCTTTTATTGTCATTTTATATATTGTCAGTCATTATCCTATTACATTATCAATCCTTGCATTTCAGC TTCCACTTATTTCGATGACCGCTTCTCATAACTTATGTCATCTTCTAACACCGTATATGATAATGTACCAGTAGTATGAC
>584
GCAAGCTTTATAGTGACAACAATAAGGTATCACTCGGTTACAATTACCCCCACTTCCCCT
What I want to do is identify column 1 of FILE1 with the ># on FILE2. So for example, 1154 from FILE1 will match up with 1154 from FILE2. Next, I want it to identify the value on column 2 (so for 1154, it will identify the 10th letter which happens to be G). So if column 3 of FILE1 is + then it will print the first 8 letters in from of it (i.e. the 8 letters in front of G would be TCTCACTC). But if is it - on column 3, then it will take the reverse. So for ABF2 on “584” it will take the top 8 sequences starting from the reverse end. So instead of starting at “G” at >584, it will start at “T” (the end). So the position of ABF2 will be 25 letters away from “T” , so the letter will be “C”. Then it will take the values behind it... so CCACTTCC.
The output file will print out column 4 of FILE1, the top 8 letters from FILE2 and column 3 from FILE1.
The final file (FILE3) will look like this:
AAD6 TCTCACTC +
ABF2 CCACTTCC -
Could someone give me some help on this! I am new to perl and I am put in a situation where I have to program at a very high level.
Thanks
10 More Discussions You Might Find Interesting
1. UNIX for Advanced & Expert Users
okay, I need some help! Im trying to write a script where it looks in the file you designate, pulls apart all the words so i can count how many of each letter there is in the file, then i need to put them in the order of the most occuring letter to the least. This most likley will need a loop... (3 Replies)
Discussion started by: chekeitout
3 Replies
2. Shell Programming and Scripting
Hello,
I have a name file in Unix for example : ABC_TODAYFirst.001 and I want just capture or display the 3 first letters so : ABC.
I tried with cut -c,1-3 and the name but it displays the 3 first letters of all lines.
Can you help , Thanks a lot (8 Replies)
Discussion started by: steiner
8 Replies
3. UNIX for Dummies Questions & Answers
Looking how to find only three or four letter strings using grep in a file called hello:
file contains:
TIT
TAT
RATA
ERAT
RATE
HI
RE
CA
PA
CHANGE
SANDY
ANSWER
I am using the code: (4 Replies)
Discussion started by: auerbeck.tyler
4 Replies
4. Shell Programming and Scripting
Hi guys.
I have file named output.txt containing file names. one per line. I use this command to convert all characters to capital letters and write to the same file.
cat output.txt | tr 'a-z' 'A-Z' > output.txtBut at the end output.txt is emtpy. Could anyone help?? (6 Replies)
Discussion started by: majid.merkava
6 Replies
5. Shell Programming and Scripting
i have 3 files as below:
i want to print 1st,2nd,5th and 10th filed of 1st to 5th lines from each files into a line of an output file, so the result would be:
:
{line1}(field 1 of line 1 from file 1)(field 2 of line 1 from file 1)(field 5 of line 1 from file 1)(field 10 of line 1 from file... (1 Reply)
Discussion started by: saeed.soltani
1 Replies
6. Shell Programming and Scripting
Hi everyone. I need to change a script (ksh) so that it will grep on the 1st 2 letters in the second column of a 5 column file such as this one:
192.168.1.1 CAXY0_123 10ABFL000001 # Comment
192.168.1.2 CAYZ0_123 10ABTX000002 # Comment
192.168.2.1 FLXY0_123 11ABCA000001 ... (4 Replies)
Discussion started by: TheNovice
4 Replies
7. Shell Programming and Scripting
I am connecting to a device using telnet, I want my script to perform certain commands : ie- show device , show inventory..etc and write the output it sees from the terminal to a file.
this is what I have got :
#!/usr/bin/expect --
set running 1
spawn telnet <ip address>
expect ... (1 Reply)
Discussion started by: samantha123
1 Replies
8. UNIX for Beginners Questions & Answers
I know that I can use wild cards:ls ???????to list files 7 characters long, but how do i omit the .?! and spaces?
Please use CODE tags when displaying sample input, sample output, and code segments. (2 Replies)
Discussion started by: hiya54
2 Replies
9. Shell Programming and Scripting
I have a file name :
var=UsrAccChgRpt
I want to make them upper case.
Tried:
$var | tr
Error:
tr: Invalid combination of options and Strings.
Usage: tr | -ds | -s | -ds | -s ] String1 String2
tr { -d | -s | -d | -s } String1
Could you please help. I am using AIX... (2 Replies)
Discussion started by: digioleg54
2 Replies
10. UNIX for Beginners Questions & Answers
I have 2 big files over 4Gbs each. I'm looking for a way to print 1 file, then when that file finish printing another file proceeds to print beside it and merge the lines together. How would to cmd or code this?
from itertools import izip_longest
with open("file1") as textfile1,... (14 Replies)
Discussion started by: bigvito19
14 Replies
JOIN(1) User Commands JOIN(1)
NAME
join - join lines of two files on a common field
SYNOPSIS
join [OPTION]... FILE1 FILE2
DESCRIPTION
For each pair of input lines with identical join fields, write a line to standard output. The default join field is the first, delimited
by blanks.
When FILE1 or FILE2 (not both) is -, read standard input.
-a FILENUM
also print unpairable lines from file FILENUM, where FILENUM is 1 or 2, corresponding to FILE1 or FILE2
-e EMPTY
replace missing input fields with EMPTY
-i, --ignore-case
ignore differences in case when comparing fields
-j FIELD
equivalent to '-1 FIELD -2 FIELD'
-o FORMAT
obey FORMAT while constructing output line
-t CHAR
use CHAR as input and output field separator
-v FILENUM
like -a FILENUM, but suppress joined output lines
-1 FIELD
join on this FIELD of file 1
-2 FIELD
join on this FIELD of file 2
--check-order
check that the input is correctly sorted, even if all input lines are pairable
--nocheck-order
do not check that the input is correctly sorted
--header
treat the first line in each file as field headers, print them without trying to pair them
-z, --zero-terminated
line delimiter is NUL, not newline
--help display this help and exit
--version
output version information and exit
Unless -t CHAR is given, leading blanks separate fields and are ignored, else fields are separated by CHAR. Any FIELD is a field number
counted from 1. FORMAT is one or more comma or blank separated specifications, each being 'FILENUM.FIELD' or '0'. Default FORMAT outputs
the join field, the remaining fields from FILE1, the remaining fields from FILE2, all separated by CHAR. If FORMAT is the keyword 'auto',
then the first line of each file determines the number of fields output for each line.
Important: FILE1 and FILE2 must be sorted on the join fields. E.g., use "sort -k 1b,1" if 'join' has no options, or use "join -t ''" if
'sort' has no options. Note, comparisons honor the rules specified by 'LC_COLLATE'. If the input is not sorted and some lines cannot be
joined, a warning message will be given.
AUTHOR
Written by Mike Haertel.
REPORTING BUGS
GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
Report join translation bugs to <http://translationproject.org/team/>
COPYRIGHT
Copyright (C) 2017 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.
SEE ALSO
comm(1), uniq(1)
Full documentation at: <http://www.gnu.org/software/coreutils/join>
or available locally via: info '(coreutils) join invocation'
GNU coreutils 8.28 January 2018 JOIN(1)