Need help comparing Base Pairs within PERL


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Need help comparing Base Pairs within PERL
# 1  
Old 06-05-2012
Need help comparing Base Pairs within PERL

Hi I have a multi-step project I am working on and have been finding it difficult to come up with the correct approach.
The data I have been given resembles:

Code:
 
Index      Chr       Genotype   Mutation Type
1           Chr1           TT            Intronic
2           Chr1           AA            Exonic
3           Chr1           AG            Exonic
4           Chr1           CC            Frameshift
5           Chr1           CA            Intronic
...         ...              ....

My goal here is to compare a large file such as this one to a set of references that are a single letter (T,A,C,G). I need to split the genotype in the given file into two individual letters and then compare this to the reference. If either matches the reference, then the program should move on to the next reference. If, however, neither letter is consistant, then the program should give an output of the index number and mutation type that corresponds to that data.
I am still fairly new to this so all help would be greatly appreciated.
Thanks!
# 2  
Old 06-05-2012
Can you post sample data and desired output?
# 3  
Old 06-05-2012
Yes sorry for not specifying more.
Say the given reference was:
Code:
 
Index 
1)T
2)A
3)C
4)G
5)C

and that the matix in my first post remains the same, indexes 3 and 4 would not have an allele in their genotype that matches that of the reference for that position. My desired output in this situation would be:

Code:
 
3   Chr1   Exonic
4   Chr1   Frameshift

I hope this explanation is more useful.
Thanks again
# 4  
Old 06-05-2012
If I understand correctly, you will want something like this:

Code:
#!/usr/bin/perl


use strict;
use warnings;

my ($index, $chr, $geno, $mutation);
my $ref = "A";
open(FILE,"<","file.txt") or die $!;
while (<FILE>) {
	next unless $_ !~ /Index/;
	next unless $_ =~ /^(\d*)\s*([a-z]*\d)\s*([a-z]*)\s*([a-z]*)/i;
	$index = $1;
	$geno = $3;
	next unless $geno !~ /$ref/;
	print "Genotype '" . $geno . "' (Index: " . $index . ") does not match the reference type '" . $ref . "'.\n";
}


Which will result in this:
Code:
Genotype 'TT' (Index: 1) does not match the reference type 'A'.
Genotype 'CC' (Index: 4) does not match the reference type 'A'.


Obviously, I have printed my own formatted line, however you could simply use the '$_' to leave as is ( print $_ . "\n"; ).
# 5  
Old 06-05-2012
Thank you ddreggors for your help. It is similar to that but the reference is not the same for each index. For example, for index 1, 'T' is the reference and thus the genotype, 'TT', for that index fits the requirement of having one or more alleles that matches the reference. In index 3 and 4, however, the reference alleles are 'C' and 'G' respectively, and and both of these cases, the coordinating genotype does not have an allele that matches the reference. It is in these cases that I wish for output.
Sorry if I had not specified that well

Last edited by drossy; 06-05-2012 at 11:52 AM..
# 6  
Old 06-05-2012
@drossy

I am not exactly sure I follow but that's ok. Fortunately for us, I do not have to understand genome reference indexes or alleles for that matter to understand logic.

Using the example I have given you, you should be able see that while I set $ref as a static value (A), you can follow the exact logic inside the while loop to pull a value or array of values from another file. If all needed values exist in "THIS" file, then I have already given you what you need.

Example:
Code:
next unless $_ =~ /^(\d*)\s*([a-z]*\d)\s*([a-z]*)\s*([a-z]*)/i;
$index = $1;
$geno = $3;

these lines do something very nice for you, namely it grabs all text and splits it into separate variables delimiting on white space (or multiple white space) characters.All text surrounded in parenthesis are "kept".

You can see that I have given friendly names to the index and genotype columns, but $2 would contain the "Chr" column and $4 would contain the "Mutation Type" column with that regex match. You can easily give them friendly names to reuse as well...


Example:

Code:
next unless $_ =~ /^(\d*)\s*([a-z]*\d)\s*([a-z]*)\s*([a-z]*)/i;
$index = $1;
$chr = $2;
$geno = $3;
$mutation = $4;

Going a bit further, if you need to split geno into 2 separate characters you can now take the $geno variable and do something like this:

Example:
Code:
my ($geno1,$geno2) = split(undef,$geno);


The framework is all here, for you to do everything you want now, but for me (or others) to give you a better solution it would require a more logical approach in explaining the problem I fear.


Maybe I am slow, but I do not see (based on your explanation) the correlation you are trying to make with these references/alleles.

You say:
Quote:
In index 3 and 4, however, the reference alleles are 'C' and 'G' respectively, and and both of these cases, the coordinating genotype does not have an allele that matches the reference.
What is considered a "coordinating genotype?
What are the reference alleles?
Last letter in the pair of 2 is reference?


More precisely it would be easier to phrase as:

The second letter in that column must match the first.

cheers Smilie
# 7  
Old 06-06-2012
I still don't believe I am explaining it correctly:
The index number for the given data must match that of the reference.
Data:
Code:
Index      Chr       Genotype   Mutation Type
1           Chr1           TT            Intronic
2           Chr1           AA            Exonic
3           Chr1           AG            Exonic
4           Chr1           CC            Frameshift
5           Chr1           CA            Intronic


Reference:
Code:
1)T
2)A
3)C
4)G
5)C

For example: The genotype at position 1 is TT and the reference is T. Because atleast one of the letters in the genotype matches the reference, then the program should move on.
At position 3, the genotype is AG and the reference is C. Neither of the letters that make up the genotype match the reference, so the program should take note of this and display '3 Chr1 AG Exonic' in the output.

Thanks again

Moderator's Comments:
Mod Comment edit by bakunin: please use [CODE]..[/CODE]-tags ehen posting code or file content. It makes it easier to read and it stays formatted. Thank you.

Last edited by bakunin; 06-07-2012 at 03:56 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl for comparing numbers from previous lines in a file?

Hi everyone I have a question for you, as I am trying to learn more about Perl and work with some weather data. I have an ascii file (shown below) that has 10 lines with different columns. What I would like is have Perl find an "anomalous" value by comparing a field with the values from the last... (2 Replies)
Discussion started by: lucshi09
2 Replies

2. Shell Programming and Scripting

Need help in comparing two files using shell or Perl

I have these two file that I am trying to compare using shell arrays. I need to find out the changed or the missing enteries from File2. For example. The line "f nsd1" in file2 is different from file1 and the line "g nsd6" is missing from file2. I dont want to use "for loop" because my files... (2 Replies)
Discussion started by: sags007_99
2 Replies

3. Shell Programming and Scripting

Perl: Need help comparing huge files

What do i need to do have the below perl program load 205 million record files into the hash. It currently works on smaller files, but not working on huge files. Any idea what i need to do to modify to make it work with huge files: #!/usr/bin/perl $ot1=$ARGV; $ot2=$ARGV; open(mfileot1,... (12 Replies)
Discussion started by: mrn6430
12 Replies

4. Shell Programming and Scripting

Perl: Comparing to two files and displaying the differences

Hi, I'm new to perl and i have to write a perl script that will compare to log/txt files and display the differences. Unfortunately I'm not allowed to use any complied binaries or applications like diff or comm. So far i've across a code like this: use strict; use warnings; my $list1;... (2 Replies)
Discussion started by: dont_be_hasty
2 Replies

5. Shell Programming and Scripting

PERL: simple comparing arrays question

Hi there, i have been trying different methods and i wonder if somebody could explain to me how i would perform a comparison on two arrays for example my @array1 = ("gary" ,"peter", "paul"); my @array2 = ("gary" ,"peter", "joe"); I have two arrays above, and i want to something like this... (5 Replies)
Discussion started by: hcclnoodles
5 Replies

6. Shell Programming and Scripting

comparing list values in Perl

Hi, I have tab separated list: KB0005 1019 T IFVATVPVI 0.691 PKC YES KB0005 1036 T YFLQTSQQL 0.785 PKC YES KB0005 1037 S FLQTSQQLK 0.585 DNAPK YES KB0005 508 S ENIISGVSY 0.507 cdc2 YES KB0005 511 S ... (1 Reply)
Discussion started by: karla
1 Replies

7. Shell Programming and Scripting

PERL name value pairs substituions

I have a main file with variable tokens like this: name: File1 =========== Destination/Company=@deploy.company@ Destination/Environment=@deploy.env@ Destination/Location=@deploy.location@ Destination/Domain=@deploy.location@ MIG_GatewayAddresses=@deploy.gwaddress@ MIG_URL=@deploy.mig_url@... (1 Reply)
Discussion started by: uandme2k2
1 Replies

8. Shell Programming and Scripting

Comparing arrays in perl

Hi all, I am trying to compare two arrays in perl using the following code. foreach $item (@arrayA){ push(@arrayC, $item) unless grep(/$item/, @arrayB); ... (1 Reply)
Discussion started by: chriss_58
1 Replies

9. Shell Programming and Scripting

Comparing Variables in Perl

Hi. I have three arrays. @a=('AB','CD','EF'); @b=('AB,'DG',HK'); @c=('DD','TT','MM'); I want to compare the elements of the first two array and if they match then so some substition. I tried using the if statement using the scalar value of the array but its not giving me any output. ... (7 Replies)
Discussion started by: kamitsin
7 Replies

10. Shell Programming and Scripting

perl search and replace pairs

Hello all im facing some kind of problem i have this string : functionA() $" "$ functionB("arg1") $" = "$ i will like to replace all the pairs of opening and closing "$" to be something like that functionA() <#" "#> functionB("arg1") <#" = "#> i cant of course do is with simple ... (1 Reply)
Discussion started by: umen
1 Replies
Login or Register to Ask a Question