make sure that "file1.txt" is changed to point to a files that has 89 columns.
I tested it and it works fine here with 89 columns, given column 1 is the id, column 6 is the Chr data, column 88 is the geno type, and column 89 is the reference.
I added the reference characters to the end of the file I have so that it is at column 89 and I get these results:
Here is the code in test5.pl:
Notice that mine says "file3.txt" since I joined "file1.txt" and "file2.txt".
Hi I am now facing two current issues when trying to expand this program. Ddreggors you have been wonderful so far and I am hoping you can help me with these issues.
The first is that for some of the data, the genotype is not available and is given as "NA" in the file. Thus when the reference allele is "A", the program wrongly assumes that the criteria is met, and this row is given in the output. Is there anyway for me to specify that for genotype = "NA", skip on to the next row?
Secondly I would like to attach the following on to the end of the program so that the number of each mutation is recorded:
However, this does not seem to be working simply by saving the output of the first part of the program into a file and opening that file again for the second part. Any help would be greatly appreciated.
Thanks a lot,
Drossy
To make sure that the lines with "NA" are not used simply change:
to this
As to the second problem, I am not sure I understand...
Quote:
Secondly I would like to attach the following on to the end of the program so that the number of each mutation is recorded
You say "attach" this code to the end of what we have already written. How are you "attaching" it?
Are you copying and pasting?
If so, are you removing the perl "shabang" at the top?
Are you simply calling this script from inside the other script at the end?
Can you verify that the file you are opening has the expected data?
Are you getting error, or no results at all?
---------- Post updated at 09:36 PM ---------- Previous update was at 08:54 PM ----------
After looking at the code you want to attach, I see some problems...
You should always have the following at the top of your code:
This way you will get warnings and errors if the code is flawed.
I see that you do not properly initialize your variables, and these lines above would have immediately shown you that as well.
You will want something like this:
Then there are 2 lines above that I am not sure what you are trying to do with them...
First naming the variable the same as the array (@dna vs $dna) is not a god idea.
Keep in mind that to access the "@data" array you write "$data[x]" (where x = index number of item).
Example:
At the very least it is confusing. Then, you proceed to join the array to the string as one long string and remove spaces so you end up with:
and finally, you never actually USE "$dna", you only call "@dna" later.
Beyond that the code looks good, the foreach with the if/elsif/else block looks good.
Last edited by ddreggors; 06-11-2012 at 10:41 PM..
Sorry I was not more clear. I meant to say that I want the output of the first part of the program to be the input for the second part. So basically I would like to count the number of each type of mutation that is being given as output.
I hope thats a little more clear.
Thanks again for all the help
---------- Post updated at 10:39 AM ---------- Previous update was at 10:02 AM ----------
I see where you have tried to adopt some of what I have done but there are some points that you are missing.
Let's start here:
At the top you are trying to open a file, and you appear to want to "READ" it because you try to set $dnafile with the filehandle. However your method is a slightly off, to read is "<", if writing (">") you would not try to GET content from the file into a variable (unless using read & write ("+<" or "+>").
should be something more like (if reading the file):
The open function is easiest to use when you stick to the 3 argument form. Also, when using a variable, you should escape the "$" symbol in the open function...
Taken from perldoc.org:
Quote:
open($fh, ">", \$variable) || ..
While that is not to say "only" takes 3 arguments, it is most often used that way. You can read more on that function at: Perl Doc - Open Function
The "or die $!" says if we couldn't open the file, exit with the error that was returned while trying to open it. The "$!" variable contains the actual error that was returned to perl from the OS. You will want to see that error not "Could't open file". A generic error message is confusing and makes debugging very hard. The actual error will be more helpful in tracking down errors.
I suggest changing to this style for the rest of your file operations as well.
The errors you see are are actually correct because you try to use "COUNTBASE" which is the file handle (not the file name or even the file itself) AFTER you have closed it.
Given that as a start, I am not sure exactly what you are trying to do with that section of code.
Consider this code: RESULTS:
Now consider this code (more like yours):
RESULTS:
That code is not what you want, you have type cast "data" as a string not an array!
Because of this you only managed to get the first line into the string variable.
What I think you wanted was something more like this though...
RESULTS:
Last edited by ddreggors; 06-12-2012 at 10:32 PM..
I have attached the file, but this also will give you a quick look at the changes. Everything with a "+" is a change I added, everything with a "-" are lines I removed.
Okay I now have a new specification that I would like to add to my program.
Given this is the original file and corresponding program:
Code:
Non-ref "A" "A1" "A2" ....... (column6)
A AT 5 15 INTRONIC
Code:
#!/usr/bin/perl
use strict;
use warnings;
#initialize data
my(@data,$row,@dataline,$dnafile);
#open file to be read or quit
open(FILE1,"<",'file.txt')or die $!;
#initialize counts of variables
my $countExonic=0;
my $countIntergenic=0;
my $countIntronic=0;
my $countUpstream=0;
my $countDownstream=0;
my $countUTR5=0;
my $countUTR3=0;
my $countFrameshift=0;
my $countNonsynonymous=0;
my $countSyn=0;
my $countStopgain=0;
my $countStoploss=0;
my $countSplicing=0;
my $countErrors=0;
#set data equal to the previously opened file
@data=<FILE1>;close(FILE1);
#loop to grep rows of file that have genotype that matches non-ref allele
#$dataline[1] is genotype, $dataline[0] is non-ref allele, $dataline[7] is
#mutation type
#selects for ratio above .5
#the number for each type of mutation occurring in the grepped data is recorded
foreach $row(@data) {
next unless $row=~/^\d/;
@dataline=split(/\s+/,$row);
next unless $dataline[1] ne "NA";
next unless $dataline[1]=~/$dataline[0]/i;
next unless $dataline[2]>=10;
next unless $dataline[2]/$dataline[3]>=.5;
if($dataline[7] =~ /exonic/){
++$countExonic;
}elsif($dataline[7]=~/inter/){
++$countIntergenic;
}elsif($dataline[7]=~/intron/){
++$countIntronic;
}elsif($dataline[7]=~/upstream/){
++$countUpstream;
}elsif($dataline[7]=~/downstream/){
++$countDownstream;
}elsif($dataline[7]=~/UTR5/){
++$countUTR5;
}elsif($dataline[7]=~/UTR3/){
++$countUTR3;
}elsif($dataline[7]=~/frame/){
++$countFrameshift;
}elsif($dataline[7]=~/nonsyn/){
++$countNonsynonymous;
}elsif($dataline[7]=~/syn/){
++$countSyn;
}elsif($dataline[7]=~/gain/){
++$countStopgain;
}elsif($dataline[7]=~/loss/){
++$countStoploss;
}elsif($dataline[7]=~/splic/){
++$countSplicing;
}else{++$countErrors;
}
}
#print out results of mutation counts
print "nc_RNA Exonic Mutations=$countExonic\n\n";
print "Intergenic Mutations=$countIntergenic\n\n";
print "Intronic Mutations=$countIntronic\n\n";
print "Upstream Mutations=$countUpstream\n\n";
print "Downstream Mutations=$countDownstream\n\n";
print "UTR5 Mutations=$countUTR5\n\n";
print "UTR3 Mutations=$countUTR3\n\n";
print "Frameshift Mutations=$countFrameshift\n\n";
print "Nonsynonymous Mutations=$countNonsynonymous\n\n";
print "Synonymous Mutations=$countSyn\n\n";
print "Stop Gain Mutations=$countStopgain\n\n";
print "Stop Loss Mutations=$countStoploss\n\n";
print "Splicing Mutations=$countSplicing\n\n";
print "Others=$countErrors\n\n";
exit;
And say the new file being read is as follows:
Code:
Non-ref "A" "A1" "A2" "B" "B1" "B2"
A AT 5 15 AA 13 14
I want to be able to select either the column of "A" or "B" to be performed on depending on what i type in on the command line.
In other words the original code:
Code:
next unless $dataline[1] ne "NA";
next unless $dataline[1]=~/$dataline[0]/i;
next unless $dataline[2]>=10;
next unless $dataline[2]/$dataline[3]>=.5;
Should change so that [1] is instead the column of "A" or "B' depending on the input, and [2] is "A"+1 or "B"+1 and [3] is "A"+2 or "B"+2
Hi everyone
I have a question for you, as I am trying to learn more about Perl and work with some weather data. I have an ascii file (shown below) that has 10 lines with different columns. What I would like is have Perl find an "anomalous" value by comparing a field with the values from the last... (2 Replies)
I have these two file that I am trying to compare using shell arrays. I need to find out the changed or the missing
enteries from File2. For example. The line "f nsd1" in file2 is different from file1 and the line "g nsd6" is missing
from file2.
I dont want to use "for loop" because my files... (2 Replies)
What do i need to do have the below perl program load 205 million record files into the hash. It currently works on smaller files, but not working on huge files. Any idea what i need to do to modify to make it work with huge files:
#!/usr/bin/perl
$ot1=$ARGV;
$ot2=$ARGV;
open(mfileot1,... (12 Replies)
Hi,
I'm new to perl and i have to write a perl script that will compare to log/txt files and display the differences. Unfortunately I'm not allowed to use any complied binaries or applications like diff or comm.
So far i've across a code like this:
use strict;
use warnings;
my $list1;... (2 Replies)
Hi there, i have been trying different methods and i wonder if somebody could explain to me how i would perform a comparison on two arrays for example
my @array1 = ("gary" ,"peter", "paul");
my @array2 = ("gary" ,"peter", "joe");
I have two arrays above, and i want to something like this... (5 Replies)
I have a main file with variable tokens like this:
name: File1
===========
Destination/Company=@deploy.company@
Destination/Environment=@deploy.env@
Destination/Location=@deploy.location@
Destination/Domain=@deploy.location@
MIG_GatewayAddresses=@deploy.gwaddress@
MIG_URL=@deploy.mig_url@... (1 Reply)
Hi all,
I am trying to compare two arrays in perl using the following code.
foreach $item (@arrayA){
push(@arrayC, $item) unless grep(/$item/, @arrayB); ... (1 Reply)
Hi.
I have three arrays.
@a=('AB','CD','EF');
@b=('AB,'DG',HK');
@c=('DD','TT','MM');
I want to compare the elements of the first two array and if they match then so some substition.
I tried using the if statement using the scalar value of the array but its not giving me any output.
... (7 Replies)
Hello all
im facing some kind of problem i have this string :
functionA() $" "$ functionB("arg1") $" = "$
i will like to replace all the pairs of opening and closing "$" to be something like that
functionA() <#" "#> functionB("arg1") <#" = "#>
i cant of course do is with simple ... (1 Reply)