I am a beginner in perl and I am trying to write a perl script. Basically I want to separate gene entries from phenotype entries in a text file which contains huge number of records and copy them in a separate file. The gene entries will have * symbol after the line FIELD TI. A sample of a record is given below.
Code:
*RECORD*
*FIELD* NO
100050
*FIELD* TI
100050 AARSKOG SYNDROME, AUTOSOMAL DOMINANT
*FIELD* TX
It would be really great if someone could help me with this.
Last edited by Scrutinizer; 05-03-2012 at 02:09 AM..
Reason: code tags
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 2,288
Thanks Given: 430
Thanked 480 Times in 395 Posts
Hi.
Welcome to the forum.
Code:
Advice for forum posts, general:
To obtain the best answers quickly for processing datasets --
extracting, transforming, filtering, you should, after having
searched for answers (man pages, Google, etc.):
1. Post representative samples of your data (i.e. data that
should "succeed" and data that should "fail")
2. Post what you expect the results to be, in addition to
describing them. Be clear about how the results are to be
obtained, e.g. "add field 2 from file1 to field 3 from file2",
"delete all lines that contain 'possum', etc.
3. Post what you have attempted to do so far. Post scripts,
programs, etc. within CODE tags. If you have a specific
question about an error, please post the shortest example of the
code, script, etc. that exhibits the problem.
4. Place the data and expected output within CODE tags, so that
they are more easily readable.
Special cases, exceptions, etc., are very important to include
in the samples.
Also, is there some reason that you need to use perl? ... cheers, drl
The following prints the next line if the pattern is matched, however your spec is more than a little bit vague...
Code:
#!/usr/bin/perl
use strict;
use warnings;
while(<DATA>){
if (/^\*FIELD\*\sTI$/){
my $gene_record=readline(DATA);
print $gene_record;
}
}
__DATA__
*RECORD*
*FIELD* NO
100050
*FIELD* TI
100050 AARSKOG SYNDROME, AUTOSOMAL DOMINANT
*FIELD* TX
This User Gave Thanks to Skrynesaver For This Post:
Hi corona688 and Skrynesaver,
Thank you so much for the reply. And for the program skrynesaver gave, thank you so much. It was so helpful. But with that program,even entries without " * " after the line FIELD TI is being printed. I want only the entries with " * ". I would like to be more specific with the input I have and the output expected.
The text file I have is with 21,000 records with explanation given for each entries. All the entries start with the line RECORD. I have attached a split file with this so you all can have an idea about what I am talking about exactly.
WHAT I REQUIRE FROM THE PROGRAM
1. to separate gene entries from phenotype entries and want all the gene records to be copied in a new file and phenotype records to be copied in another file.
gene entries will start with * (asterisk) symbol after the line "FIELD TI". phenotype entries might not have any symbol or might start with # or %.
2. I want the output records to be in a copied in a separate file.
Gene entries in a separate file and phenotype entries in a separate file
The output file with just the entries should like for example
3. I actually need two files for gene entries. one file will have the separated gene records with the record number and description given in the same line as the above example.
The second file should have all the information of the gene records as the example given below.
the file with all the information of the records should have like
Code:
RECORD*
*FIELD* NO
100050
*FIELD* TI
100050 AARSKOG SYNDROME, AUTOSOMAL DOMINANT
*FIELD* TX
Grier et al. (1983) reported father and 2 sons with typical Aarskog
syndrome, including short stature, hypertelorism, and shawl scrotum.
They tabulated the findings in 82 previous cases. X-linked recessive
inheritance has repeatedly been suggested (see 305400). The family
reported by Welch (1974) had affected males in 3 consecutive
generations. Thus, there is either genetic heterogeneity or this is an
autosomal dominant with strong sex-influence and possibly ascertainment
bias resulting from use of the shawl scrotum as a main criterion.
Stretchable skin was present in the cases of Grier et al. (1983).
Teebi et al. (1993) reported the case of an affected mother and 4 sons
(including a pair of monozygotic twins) by 2 different husbands. They
suggested that the manifestations were as severe in the mother as in the
sons and that this suggested autosomal dominant inheritance. Actually,
the mother seemed less severely affected, compatible with X-linked
inheritance.
*FIELD* RF
1. Grier, R. E.; Farrington, F. H.; Kendig, R.; Mamunes, P.: Autosomal
dominant inheritance of the Aarskog syndrome. Am. J. Med. Genet. 15:
39-46, 1983.
2. Teebi, A. S.; Rucquoi, J. K.; Meyn, M. S.: Aarskog syndrome: report
of a family with review and discussion of nosology. Am. J. Med. Genet. 46:
501-509, 1993.
3. Welch, J. P.: Elucidation of a 'new' pleiotropic connective tissue
disorder. Birth Defects Orig. Art. Ser. X(10): 138-146, 1974.
*FIELD* CS
Growth:
Mild to moderate short stature
Head:
Normocephaly
Hair:
Widow's peak
Facies:
Maxillary hypoplasia;
Broad nasal bridge;
Anteverted nostrils;
Long philtrum;
Broad upper lip;
Curved linear dimple below the lower lip
Eyes:
Hypertelorism;
Ptosis;
Down-slanted palpebral fissures;
Ophthalmoplegia;
Strabismus;
Hyperopic astigmatism;
Large cornea
Ears:
Floppy ears;
Lop-ears
Mouth:
Cleft lip/palate
GU:
Shawl scrotum;
Saddle-bag scrotum;
Cryptorchidism
Limbs:
Brachydactyly;
Digital contractures;
Clinodactyly;
Mild syndactyly;
Transverse palmar crease;
Lymphedema of the feet
Hi all.
I am working on the below requirement of generating .xml file from .xls file which i have , can someone please help me or in writing the perl script for the same:
The xls file format is as below which has two columns and number of rows are not fixed:
Fixlet Name ... (12 Replies)
Hey everyone... I'm just stretching my wings a bit and seeing how things work. If I wanted to write a script that had me ssh to my remote computer, how can this be done? If the script runs without me, how can I enter the required password? the same is true for any time of authentication method like... (2 Replies)
Hi,
I am new to perl.
I am trying to write a small perl script for search and replace in a file :
========================================================
#!/usr/bin/perl
my $searchStr = "register_inst\.write_t\(";
my $replaceStr = "model\.fc_block\.";
open(FILE,"temp.sv") ||... (2 Replies)
I have a function which does awk proceessing
sub mergeDescription {
system (q@awk -F'~' '
NR == FNR {
A = $1
B = $2
C = $0
next
}
{
n = split ( C, V, "~" )
if... (3 Replies)
I want to write a Perl script that manipulates multiple files. In the directory, I have files 250.*chr$.ped where * is from 1 to 1000 and $ is from 1-22 for a total of 22 x 10,000 = 22,000 files.
I want to write a script that only manipulates files 250.1chr*.ped where * is from 1 to 22.... (10 Replies)
Hi folks,
I have a input.file with the following contents:-
flor
geor
enta
vpal
domi
pegl
cars
mted
four
rose
annc
gabi
ward
dalv
elph
beac (8 Replies)
Hi all!
I have to write a perl script that gets trashholds from a file and match them with an output of a command.
The trashhold file looks like this:
"pl-it_prod.GW.Sync.reply.*" "500" "-1" "" ""
"pl-it_prod.A.*" "100" "-1" "" ""
"application.log" ... (29 Replies)
Hi,
Here is my problem.. i have 2 files (file1, file2).. i have wrote the last two lines and first 4 lines of "file2" into two different variables .. say..
my $firstrec = `head -4 $file2`;
my $lastrec = `tail -2 $file2`;
and i write the rest of the file2 to a tmpfile and cat it with head... (2 Replies)
Hi friends,
i am trying to print warning for partition size which exceed limit of 90%
& other are ok.
i m using below command which print partition which exceed 90%
# df -h | sort -k5 | head -1 | awk 'END{ print $1" :- Not Having more space on This Partition"}'
i want to print... (14 Replies)
Hi Folks,
I posted a few days ago, thanks for the responses. My original question was for renaming files of sort 3p2325294.dgn in a directory containing multiple files. I need to drop the first 2 characters and the last in a unix script using Perl. How does it differ from using the Unix... (1 Reply)