Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Getting non unique lines from concatenated files Post 302507543 by pawannoel on Thursday 24th of March 2011 06:40:44 AM
Old 03-24-2011
@Bartus11
Hi hope you are well.
Can I ask you another question continuing from with same data set for which you have kindly provided other answers
Code:
chr01   levure5 SNP     12745   12745   0.000000        .        .        genotype=S;reference=C;coverage=91;refAlleleCounts=44;refAlleleStarts=28;refAlleleMeanQV=19;novelAlleleCounts=40;novelAlleleStarts=25;novelAlleleMeanQV=18;diColor1=22;diColor2=11;het=1;flag=
chr01   levure6 SNP     12745   12745   0.000000        .       .        genotype=S;reference=C;coverage=62;refAlleleCounts=29;refAlleleStarts=19;refAlleleMeanQV=19;novelAlleleCounts=32;novelAlleleStarts=20;novelAlleleMeanQV=20;diColor1=11;diColor2=22;het=1;flag=
chr01   levure7 SNP     12745   12745   0.000000        .       .        genotype=S;reference=C;coverage=24;refAlleleCounts=9;refAlleleStarts=8;refAlleleMeanQV=23;novelAlleleCounts=13;novelAlleleStarts=12;novelAlleleMeanQV=20;diColor1=11;diColor2=22;het=1;flag=
chr01   levure8 SNP     12745   12745   0.000000        .       .        genotype=S;reference=C;coverage=37;refAlleleCounts=18;refAlleleStarts=17;refAlleleMeanQV=18;novelAlleleCounts=18;novelAlleleStarts=13;novelAlleleMeanQV=20;diColor1=11;diColor2=22;het=1;flag=
chr01   levure5 SNP     16254   16254   0.000000        .       .        genotype=R;reference=G;coverage=111;refAlleleCounts=82;refAlleleStarts=41;refAlleleMeanQV=18;novelAlleleCounts=28;novelAlleleStarts=9;novelAlleleMeanQV=18;diColor1=10;diColor2=32;het=1;flag=
chr01   levure6 SNP     16254   16254   0.000000        .       .        genotype=R;reference=G;coverage=96;refAlleleCounts=72;refAlleleStarts=38;refAlleleMeanQV=17;novelAlleleCounts=24;novelAlleleStarts=6;novelAlleleMeanQV=15;diColor1=10;diColor2=32;het=1;flag=
chr01   levure7 SNP     16254   16254   0.000000        .       .        genotype=R;reference=G;coverage=32;refAlleleCounts=20;refAlleleStarts=18;refAlleleMeanQV=19;novelAlleleCounts=12;novelAlleleStarts=5;novelAlleleMeanQV=17;diColor1=10;diColor2=32;het=1;flag=
chr01   levure8 SNP     16254   16254   0.000000        .       .        genotype=R;reference=G;coverage=45;refAlleleCounts=33;refAlleleStarts=25;refAlleleMeanQV=20;novelAlleleCounts=10;novelAlleleStarts=6;novelAlleleMeanQV=19;diColor1=10;diColor2=32;het=1;flag=
chr01   levure5 SNP     16511   16511   0.000000        .       .        genotype=A;reference=G;coverage=42;refAlleleCounts=0;refAlleleStarts=0;refAlleleMeanQV=0;novelAlleleCounts=35;novelAlleleStarts=16;novelAlleleMeanQV=19;diColor1=12;diColor2=12;het=0;flag=h4,h10,h9,
chr01   levure6 SNP     16511   16511   0.000000        .       .        genotype=A;reference=G;coverage=32;refAlleleCounts=0;refAlleleStarts=0;refAlleleMeanQV=0;novelAlleleCounts=23;novelAlleleStarts=11;novelAlleleMeanQV=17;diColor1=12;diColor2=12;het=0;flag=h4,h10,h9,

Last time you helped me sort the data accoring to any ;delimited parameter of choice in the last field of each line. This time I want to know details about the genotype parameter in this feild.
So taking the above example what I want is to know is the count of each type of genotype. So my expected output for the above would be:
Code:
S=4
R=4
A=2

Genotype is always denoted by a capital A-Z letter, so I reckon the regex can be restricted to that pattern.

Would be nice if you can help out on this.

Have a nice day.

Cheers Smilie
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Lines Concatenated with awk

Hello, I have a bash shell script and I use awk to print certain columns of one file and direct the output to another file. If I do a less or cat on the file it looks correct, but if I email the file and open it with Outlook the lines outputted by awk are concatenated. Here is my awk line:... (6 Replies)
Discussion started by: xadamz23
6 Replies

2. Shell Programming and Scripting

Comparing 2 files and return the unique lines in first file

Hi, I have 2 files file1 ******** 01-05-09|java.xls| 02-05-08|c.txt| 08-01-09|perl.txt| 01-01-09|oracle.txt| ******** file2 ******** 01-02-09|windows.xls| 02-05-08|c.txt| 01-05-09|java.xls| 08-02-09|perl.txt| 01-01-09|oracle.txt| ******** (8 Replies)
Discussion started by: shekhar_v4
8 Replies

3. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Hi All, I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space. I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Discussion started by: krishnix
16 Replies

4. Shell Programming and Scripting

Compare multiple files and print unique lines

Hi friends, I have multiple files. For now, let's say I have two of the following style cat 1.txt cat 2.txt output.txt Please note that my files are not sorted and in the output file I need another extra column that says the file from which it is coming. I have more than 100... (19 Replies)
Discussion started by: jacobs.smith
19 Replies

5. UNIX for Dummies Questions & Answers

getting unique lines from 2 files

hi i have used comm -13 <(sort 1.txt) <(sort 2.txt) option to get the unique lines that are present in file 2 but not in file 1. but some how i am getting the entire file 2. i would expect few but not all uncommon lines fro my dat. is there anything wrong with the way i used the command? my... (1 Reply)
Discussion started by: anurupa777
1 Replies

6. Shell Programming and Scripting

compare 2 files and return unique lines in each file (based on condition)

hi my problem is little complicated one. i have 2 files which appear like this file 1 abbsss:aa:22:34:as akl abc 1234 mkilll:as:ss:23:qs asc abc 0987 mlopii:cd:wq:24:as asd abc 7866 file2 lkoaa:as:24:32:sa alk abc 3245 lkmo:as:34:43:qs qsa abc 0987 kloia:ds:45:56:sa acq abc 7805 i... (5 Replies)
Discussion started by: anurupa777
5 Replies

7. Shell Programming and Scripting

Print only lines where fields concatenated match strings

Hello everyone, Maybe somebody could help me with an awk script. I have this input (field separator is comma ","): 547894982,M|N|J,U|Q|P,98,101,0,1,1 234900027,M|N|J,U|Q|P,98,101,0,1,1 234900023,M|N|J,U|Q|P,98,54,3,1,1 234900028,M|H|J,S|Q|P,98,101,0,1,1 234900030,M|N|J,U|F|P,98,101,0,1,1... (2 Replies)
Discussion started by: Ophiuchus
2 Replies

8. Shell Programming and Scripting

Look up 2 files and print the concatenated output

file 1 Sun Mar 17 00:01:33 2013 submit , Name="1234" Sun Mar 17 00:01:33 2013 submit , Name="1344" Sun Mar 17 00:01:33 2013 submit , Name="1124" .. .. .. .. Sun Mar 17 00:01:33 2013 submit , Name="8901" file 2 Sun Mar 17 00:02:47 2013 1234 execute SUCCEEDED Sun Mar 17... (24 Replies)
Discussion started by: aravindj80
24 Replies

9. UNIX for Dummies Questions & Answers

Print unique lines without sort or unique

I would like to print unique lines without sort or unique. Unfortunately the server I am working on does not have sort or unique. I have not been able to contact the administrator of the server to ask him to add it for several weeks. (7 Replies)
Discussion started by: cokedude
7 Replies

10. UNIX for Beginners Questions & Answers

Print number of lines for files in directory, also print number of unique lines

I have a directory of files, I can show the number of lines in each file and order them from lowest to highest with: wc -l *|sort 15263 Image.txt 16401 reference.txt 40459 richtexteditor.txt How can I also print the number of unique lines in each file? 15263 1401 Image.txt 16401... (15 Replies)
Discussion started by: spacegoose
15 Replies
Bio::Variation::SNP(3pm)				User Contributed Perl Documentation				  Bio::Variation::SNP(3pm)

NAME
Bio::Variation::SNP - submitted SNP SYNOPSIS
$SNP = Bio::Variation::SNP->new (); DESCRIPTION
Inherits from Bio::Variation::SeqDiff and Bio::Variation::Allele, with additional methods that are (db)SNP specific (ie, refSNP/subSNP IDs, batch IDs, validation methods). FEEDBACK
Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists Support Please direct usage questions or support issues to the mailing list: bioperl-l@bioperl.org rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible. Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web: https://redmine.open-bio.org/projects/bioperl/ AUTHOR
Allen Day <allenday@ucla.edu> APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ get/set-able methods Usage : $is = $snp->method() Function: for getting/setting attributes Returns : a value. probably a scalar. Args : if you're trying to set an attribute, pass in the new value. Methods: -------- id type observed seq_5 seq_3 ncbi_build ncbi_chr_hits ncbi_ctg_hits ncbi_seq_loc ucsc_build ucsc_chr_hits ucsc_ctg_hits heterozygous heterozygous_SE validated genotype handle batch_id method locus_id symbol mrna protein functional_class is_subsnp Title : is_subsnp Usage : $is = $snp->is_subsnp() Function: returns 1 if $snp is a subSNP Returns : 1 or undef Args : NONE subsnp Title : subsnp Usage : $subsnp = $snp->subsnp() Function: returns the currently active subSNP of $snp Returns : Bio::Variation::SNP Args : NONE add_subsnp Title : add_subsnp Usage : $subsnp = $snp->add_subsnp() Function: pushes the previous value returned by subsnp() onto a stack, accessible with each_subsnp(). Sets return value of subsnp() to a new Bio::Variation::SNP object, and returns that object. Returns : Bio::Varitiation::SNP Args : NONE each_subsnp Title : each_subsnp Usage : @subsnps = $snp->each_subsnp() Function: returns a list of the subSNPs of a refSNP Returns : list Args : NONE perl v5.14.2 2012-03-02 Bio::Variation::SNP(3pm)
All times are GMT -4. The time now is 11:09 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy