Sponsored Content
Top Forums Shell Programming and Scripting perl: comparision of field line by line in two files Post 302636231 by Thelost on Monday 7th of May 2012 07:25:10 AM
Old 05-07-2012
Computer perl: comparision of field line by line in two files

Hi everybody,
First I apologize if my question seems demasiad you silly, but it really took 4 days struggling with this, I looked at books, forums ... And Also ask help to a friend that is software developer and he told me that it is a bad idea do it by perl... but this is my problem.
I moved to another lab for a couple of months, in which they use perl as tool to analyse DNA data (at my lab I ever use or developed software, command lines to modificate files to use it correctly, and some tools that people of my lab perform previously). Really in the weeks that I'm working here I saw the power of perform your own scripts to solve problem.
The problem is that i must to compare two files and select the lines of one of them whose fields comply a few requirements, which are comparisons with the other file fields.

my files are (of course that are only few lines)
File 1
Code:
Start	End	Origin	HomeCluster	BAPSIndex	Strain
1	58292	5	5	1	TW20.dna
87840	87883	5	5	1	TW20.dna
247298	253176	5	5	1	TW20.dna
395979	400031	5	5	1	TW20.dna
404314	404824	5	5	1	TW20.dna

File 2
Code:
Coordinate	type	RefAllele	Strain	SNPAllele
358909	Int	<T>	5083_6_1	>A<
2074234	syn	<G>	5083_6_1	>A<
31160	non	<G>	5083_6_12	>A<

I must locate the file lines 2, which is within the range Coordinate generated by start and End, and also the strain match. ie I must compare each line of the file 2 with each line of 1.
I started the script many times, the variables are defined ... but can not get results ... I have tried arrays, hash .. I can not.
I include the script (the part that works) and the conditions that must be met.

Code:
#!/usr/bin/perl -w
# insideRecombinantSNP.pl
#Script to analyze the snps inside the recombinat regions
# if the file is not in your working directory, you have to write the complete path 
use warnings;

print "Coordinate	Type	Reference Allele	Strain		Strain Allele\n";

 
open IN, "resultsnplinev2.out" or die;     # file 1 y file 2 compared files
open INN, "turkish_segments_tabularv2.txt" or die;

while(<IN>){
		if(m/^line\s+(\d+\s+\S+\s+\S+\s+\S+\s+\S+)/){
			$lineSNP=$1;
			$lineSNP =~m/^(\d+)\s+\S+\s+\S+\s+\S+\s+\S+/;
			$SNPcoor=$1;
			 $lineSNP =~m/^\d+\s+\S+\s+\S+\s+(\S+)\s+\S+/;
			$SNPstrain=$1;
					 	 		  		 		 }
while(<INN>){	 	 		  		 		 
		if(m/^(\d+\s+\d+\s+\S+\s+\S+\s+\S+\s+.*)/){
		$recline=$1;
		$recline =~m/^\d+\s+\d+\s+\S+\s+\S+\s+\S+\s+(.*)/;
		$recstrain=$1;
		$recline =~m/^(\d+)\s+\d+\s+\S+\s+\S+\s+\S+\s+.*/;
	 	$leftcoor=$1;
	 	$recline =~m/^\d+\s+(\d+)\s+\S+\s+\S+\s+\S+\s+.*/;
		$rightcoor=$1;
		 		}
}
if (($leftcoor<=$SNPcoor) && ($SNPcoor<=$rightcoor)){
print "$lineSNP\n";
}elsif ($recstrain eq $SNPstrain){
print "$lineSNP\n";	
}
}


Any idea, any hint or suggestion ...


Moderator's Comments:
Mod Comment How to use code tags

Last edited by Franklin52; 05-07-2012 at 08:27 AM.. Reason: Please use code tags
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

file comparision by line

i have two files and i want to compare these two it shoud print those lines which are not in 2nd file a.txt 1236,AB,0 2345,CD,1 5679,EF,1 9123,AA,1 9223,AA,0 b.txt 1234,AB,0 2345,CD,1 5678,EF,1 9123,AA,0 o/p 1236,AB,0 5679,EF,1 9123,AA,1 9223,AA,0 (6 Replies)
Discussion started by: aaysa123
6 Replies

2. Shell Programming and Scripting

Line by Line Comparision of 2 files and print only the difference

Hi, I am trying to find an alternative way to do tkdiff. In tkdiff the gui compares each line and highlights only the differences. for eg: John works at McDonalds s(test) He was playing guitar tywejk John works in McDonalds 9908 He was playing guitar I am... (1 Reply)
Discussion started by: naveen@
1 Replies

3. Shell Programming and Scripting

how to read the contents of two files line by line and compare the line by line?

Hi All, I'm trying to figure out which are the trusted-ips and which are not using a script file.. I have a file named 'ip-list.txt' which contains some ip addresses and another file named 'trusted-ip-list.txt' which also contains some ip addresses. I want to read a line from... (4 Replies)
Discussion started by: mjavalkar
4 Replies

4. Shell Programming and Scripting

Two files comparision with single field

Hi , Im new to uxin environment and shell scripting.... please help me with the code for the following scenario..... file 1 contains the following fields abc 200 rupee IND cdf 400 dollar USA efg 300 euro GER hij 600 pound ENG file 2 SBI abc 321 dollar CANAD kvr mnd ... (6 Replies)
Discussion started by: shivaji_veer
6 Replies

5. Shell Programming and Scripting

File comparision line by line

Hi, I want to compare 2 files and get output file into seperate folder. Both file names will change daily with timestamp (ex: file1_06_17_2013_0514), so i can't mention the file names in the script to compare, but i need to compare these 2 files daily and generate output to another... (28 Replies)
Discussion started by: rkrish123
28 Replies

6. Shell Programming and Scripting

Perl how to compare two pdf files line by line

Hi Experts, Would really appreciate if anyone can guide me how to compare two pdf files line by line and report the difference to another file. (3 Replies)
Discussion started by: prasanth_babu
3 Replies

7. Shell Programming and Scripting

Replace first field of a line with previous filed of the line

Hi Everyone, I have a file as below: IM2345638,sherfvf,usha,30 IM384940374,deiufbd,usha,30 IM323763822,cdejdkdnbds,theju,15 0,dhejdncbfd,us,20 IM398202038,dhekjdkdld,tj,30 0,foifsjd,u2,40 The output i need is as below IM2345638,sherfvf,usha,30... (4 Replies)
Discussion started by: usha rao
4 Replies

8. Shell Programming and Scripting

Add specific string to last field of each line in perl based on value

I am trying to add a condition to the below perl that will capture the GTtag and place a specific string in the last field of each line. The problem is that the GT value used is not right after the tag rather it is a few fields away. The values should always be 0/1 or 1/2 and are in bold in the... (12 Replies)
Discussion started by: cmccabe
12 Replies

9. Shell Programming and Scripting

Perl command line option '-n','-p' and multiple files: can it know a file name of a printed line?

I am looking for help in processing of those options: '-n' or '-p' I understand what they do and how to use them. But, I would like to use them with more than one file (and without any shell-loop; loading the 'perl' once.) I did try it and -n works on 2 files. Question is: - is it possible to... (6 Replies)
Discussion started by: alex_5161
6 Replies

10. Shell Programming and Scripting

Printing string from last field of the nth line of file to start (or end) of each line (awk I think)

My file (the output of an experiment) starts off looking like this, _____________________________________________________________ Subjects incorporated to date: 001 Data file started on machine PKSHS260-05CP ********************************************************************** Subject 1,... (9 Replies)
Discussion started by: samonl
9 Replies
All times are GMT -4. The time now is 11:48 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy