Need help with a file that prints letters from a file according to another file! Post: 302314741

Sponsored Content

Top Forums Shell Programming and Scripting Need help with a file that prints letters from a file according to another file! Post 302314741 by kylle345 on Saturday 9th of May 2009 10:25:01 PM

05-09-2009

Registered User

Need help with a file that prints letters from a file according to another file!

So basically what I want to do is pull out DNA sequences for a particular gene name.

I have 2 files (FILE1 and FILE2) and I want an output into a separate file (FILE3).

FILE1 and 2 are MASSIVE so I am only posting examples from each file.

So FILE1 looks like this (tab deliminted, 4 columns):

##gff-version 1

1154 10 + AAD6
418 7429 + AAH1
702 759 + AAT1
584 10 - ABF2
642 4894 - ACC1
651 7213 - ACN9
1055 3454 - ADE1

The next file, FILE2, looks like this:

>1154
ATCTCACTCGTAATTCTACATAATTTTGTTTATGCTTTTATTGTCATTTTATATATTGTCAGTCATTATCCTATTACATTATCAATCCTTGCATTTCAGC TTCCACTTATTTCGATGACCGCTTCTCATAACTTATGTCATCTTCTAACACCGTATATGATAATGTACCAGTAGTATGAC
>584
GCAAGCTTTATAGTGACAACAATAAGGTATCACTCGGTTACAATTACCCCCACTTCCCCT

What I want to do is identify column 1 of FILE1 with the ># on FILE2. So for example, 1154 from FILE1 will match up with 1154 from FILE2. Next, I want it to identify the value on column 2 (so for 1154, it will identify the 10th letter which happens to be G). So if column 3 of FILE1 is + then it will print the first 8 letters in from of it (i.e. the 8 letters in front of G would be TCTCACTC). But if is it - on column 3, then it will take the reverse. So for ABF2 on �584� it will take the top 8 sequences starting from the reverse end. So instead of starting at �G� at >584, it will start at �T� (the end). So the position of ABF2 will be 25 letters away from �T� , so the letter will be �C�. Then it will take the values behind it... so CCACTTCC.

The output file will print out column 4 of FILE1, the top 8 letters from FILE2 and column 3 from FILE1.

The final file (FILE3) will look like this:

AAD6 TCTCACTC +
ABF2 CCACTTCC -

Could someone give me some help on this! I am new to perl and I am put in a situation where I have to program at a very high level.

Thanks

kylle345

View Public Profile for kylle345

Find all posts by kylle345

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

look in file, seperate letters, put in order...

okay, I need some help! Im trying to write a script where it looks in the file you designate, pulls apart all the words so i can count how many of each letter there is in the file, then i need to put them in the order of the most occuring letter to the least. This most likley will need a loop...

2. Shell Programming and Scripting

How can I find the 3 first letters from the name file

Hello, I have a name file in Unix for example : ABC_TODAYFirst.001 and I want just capture or display the 3 first letters so : ABC. I tried with cut -c,1-3 and the name but it displays the 3 first letters of all lines. Can you help , Thanks a lot

3. UNIX for Dummies Questions & Answers

Searching for three or four Uppercase Letters within a file

Looking how to find only three or four letter strings using grep in a file called hello: file contains: TIT TAT RATA ERAT RATE HI RE CA PA CHANGE SANDY ANSWER I am using the code:

4. Shell Programming and Scripting

changing all characters of a file to capital letters

Hi guys. I have file named output.txt containing file names. one per line. I use this command to convert all characters to capital letters and write to the same file. cat output.txt | tr 'a-z' 'A-Z' > output.txtBut at the end output.txt is emtpy. Could anyone help??

5. Shell Programming and Scripting

prints some fields from different files into a line of new file

i have 3 files as below: i want to print 1st,2nd,5th and 10th filed of 1st to 5th lines from each files into a line of an output file, so the result would be: : {line1}(field 1 of line 1 from file 1)(field 2 of line 1 from file 1)(field 5 of line 1 from file 1)(field 10 of line 1 from file...

6. Shell Programming and Scripting

Grep/Awk on 1st 2 Letters in 2nd Column of File

Hi everyone. I need to change a script (ksh) so that it will grep on the 1st 2 letters in the second column of a 5 column file such as this one: 192.168.1.1 CAXY0_123 10ABFL000001 # Comment 192.168.1.2 CAYZ0_123 10ABTX000002 # Comment 192.168.2.1 FLXY0_123 11ABCA000001 ...

7. Shell Programming and Scripting

Script which telnets to a device, runs commands and prints output to a file

I am connecting to a device using telnet, I want my script to perform certain commands : ie- show device , show inventory..etc and write the output it sees from the terminal to a file. this is what I have got : #!/usr/bin/expect -- set running 1 spawn telnet <ip address> expect ...

8. UNIX for Beginners Questions & Answers

Listing a file/directory with 7 letters long

I know that I can use wild cards:ls ???????to list files 7 characters long, but how do i omit the .?! and spaces? Please use CODE tags when displaying sample input, sample output, and code segments.

9. Shell Programming and Scripting

Cannot find correct syntax to make file name uppercase letters

I have a file name : var=UsrAccChgRpt I want to make them upper case. Tried: $var | tr Error: tr: Invalid combination of options and Strings. Usage: tr | -ds | -s | -ds | -s ] String1 String2 tr { -d | -s | -d | -s } String1 Could you please help. I am using AIX...

10. UNIX for Beginners Questions & Answers

How to print 1 file then when finished another file prints beside it?

I have 2 big files over 4Gbs each. I'm looking for a way to print 1 file, then when that file finish printing another file proceeds to print beside it and merge the lines together. How would to cmd or code this? from itertools import izip_longest with open("file1") as textfile1,...

LEARN ABOUT X11R4

join

JOIN(1) 							   User Commands							   JOIN(1)

NAME

       join - join lines of two files on a common field

SYNOPSIS

       join [OPTION]... FILE1 FILE2

DESCRIPTION

       For  each  pair of input lines with identical join fields, write a line to standard output.  The default join field is the first, delimited
       by blanks.

       When FILE1 or FILE2 (not both) is -, read standard input.

       -a FILENUM
	      also print unpairable lines from file FILENUM, where FILENUM is 1 or 2, corresponding to FILE1 or FILE2

       -e EMPTY
	      replace missing input fields with EMPTY

       -i, --ignore-case
	      ignore differences in case when comparing fields

       -j FIELD
	      equivalent to '-1 FIELD -2 FIELD'

       -o FORMAT
	      obey FORMAT while constructing output line

       -t CHAR
	      use CHAR as input and output field separator

       -v FILENUM
	      like -a FILENUM, but suppress joined output lines

       -1 FIELD
	      join on this FIELD of file 1

       -2 FIELD
	      join on this FIELD of file 2

       --check-order
	      check that the input is correctly sorted, even if all input lines are pairable

       --nocheck-order
	      do not check that the input is correctly sorted

       --header
	      treat the first line in each file as field headers, print them without trying to pair them

       -z, --zero-terminated
	      line delimiter is NUL, not newline

       --help display this help and exit

       --version
	      output version information and exit

       Unless -t CHAR is given, leading blanks separate fields and are ignored, else fields are separated by CHAR.  Any FIELD is  a  field  number
       counted	from 1.  FORMAT is one or more comma or blank separated specifications, each being 'FILENUM.FIELD' or '0'.  Default FORMAT outputs
       the join field, the remaining fields from FILE1, the remaining fields from FILE2, all separated by CHAR.  If FORMAT is the keyword  'auto',
       then the first line of each file determines the number of fields output for each line.

       Important:  FILE1  and  FILE2 must be sorted on the join fields.  E.g., use "sort -k 1b,1" if 'join' has no options, or use "join -t ''" if
       'sort' has no options.  Note, comparisons honor the rules specified by 'LC_COLLATE'.  If the input is not sorted and some lines	cannot	be
       joined, a warning message will be given.

AUTHOR

       Written by Mike Haertel.

REPORTING BUGS

       GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
       Report join translation bugs to <http://translationproject.org/team/>

COPYRIGHT

       Copyright (C) 2017 Free Software Foundation, Inc.  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
       This is free software: you are free to change and redistribute it.  There is NO WARRANTY, to the extent permitted by law.

SEE ALSO

       comm(1), uniq(1)

       Full documentation at: <http://www.gnu.org/software/coreutils/join>
       or available locally via: info '(coreutils) join invocation'

GNU coreutils 8.28						   January 2018 							   JOIN(1)

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

look in file, seperate letters, put in order...

Discussion started by: chekeitout

2. Shell Programming and Scripting

How can I find the 3 first letters from the name file

Discussion started by: steiner

3. UNIX for Dummies Questions & Answers

Searching for three or four Uppercase Letters within a file

Discussion started by: auerbeck.tyler

4. Shell Programming and Scripting

changing all characters of a file to capital letters

Discussion started by: majid.merkava