Retrieve data from one file comparing the ID in the second file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Retrieve data from one file comparing the ID in the second file
# 1  
Old 09-18-2012
Retrieve data from one file comparing the ID in the second file

Hi all,

I have one file with IDs
Code:
Q8NDM7
P0C1S8
Q8TF30
Q9BRP8
O00258
Q6AWC2
Q9ULE0
Q702N8
A4UGR9
Q13426
Q6P2D8
Q9ULM3
A8MXQ7

I want to compare ID file with another file which has complete information about these IDs and also about other IDs which are not in the above ID file. As a result I want only information about the entries in the ID file. The second file has information such as
Code:
ID   3BP5L_HUMAN             Reviewed;         393 AA.
AC   Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3;
DT   05-FEB-2008, integrated into UniProtKB/Swiss-Prot.
DT   05-JUL-2004, sequence version 1.
DT   05-SEP-2012, entry version 71.
DE   RecName: Full=SH3 domain-binding protein 5-like;
DE            Short=SH3BP-5-like;
GN   Name=SH3BP5L; Synonyms=KIAA1720; ORFNames=UNQ2766/PRO7133;
OS   Homo sapiens (Human).
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
OC   Catarrhini; Hominidae; Homo.
OX   NCBI_TaxID=9606;
RN   [1]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RC   TISSUE=Brain;
RX   MEDLINE=21082932; PubMed=11214970; DOI=10.1093/dnares/7.6.347;
RA   Nagase T., Kikuno R., Hattori A., Kondo Y., Okumura K., Ohara O.;
RT   "Prediction of the coding sequences of unidentified human genes. XIX.
RT   The complete sequences of 100 new cDNA clones from brain which code
RT   for large proteins in vitro.";
RL   DNA Res. 7:347-355(2000).
RN   [2] //

# 2  
Old 09-18-2012
We need more records from the second file to see how they are separated. Also please post desired output for that sample data.
# 3  
Old 09-18-2012
As @bartus11 says, your two files aren't connected by any means. Usually (depending on the grep yersion you have installed) grep -f file1 file2 would do the job of finding all lines in file2 that have an ID from file1.
# 4  
Old 09-18-2012
So you want to extract everything from an ID entry to the next ID entry if the AC entry in the record contains an "ID" which is present in your file?
Code:
#!/usr/bin/perl

use strict;
use warnings;

open(my $id_file, "<", "id_file"); # list of ids
my $in_record=0;
my @ids=<$id_file>;
close $id_file;
chomp(@ids);
my %id_check;
map {$_++} @id_check{@ids};
open(my $records, "<", "tmp.dat"); # records of the form above
my $head;
while(<$records>){
    $head=$_ if (/^ID/);
    if (/^AC/){
        $in_record=0;
        my @entries=$_=~/\s+([^;]+);/g;
        for my$id(@entries){
            $in_record=1 if ($id_check{$id});
        }
    print $head if $in_record;
    }
print if $in_record;
}

# 5  
Old 09-18-2012
Hi all,

Thanks for reply.

Here are sample example of two records: each record is separated by "//"


Code:
ID   3BP5L_HUMAN             Reviewed;         393 AA.
AC   Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3;
DT   05-FEB-2008, integrated into UniProtKB/Swiss-Prot.
DT   05-JUL-2004, sequence version 1.
DT   05-SEP-2012, entry version 71.
DE   RecName: Full=SH3 domain-binding protein 5-like;
DE            Short=SH3BP-5-like;
GN   Name=SH3BP5L; Synonyms=KIAA1720; ORFNames=UNQ2766/PRO7133;
OS   Homo sapiens (Human).
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
OC   Catarrhini; Hominidae; Homo.
OX   NCBI_TaxID=9606;
RN   [1]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RC   TISSUE=Brain;
RX   MEDLINE=21082932; PubMed=11214970; DOI=10.1093/dnares/7.6.347;
RA   Nagase T., Kikuno R., Hattori A., Kondo Y., Okumura K., Ohara O.;
RT   "Prediction of the coding sequences of unidentified human genes. XIX.
RT   The complete sequences of 100 new cDNA clones from brain which code
RT   for large proteins in vitro.";
RL   DNA Res. 7:347-355(2000).
RN   [2]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RC   TISSUE=Amygdala;
RX   MEDLINE=21154917; PubMed=11230166; DOI=10.1101/gr.GR1547R;
RA   Wiemann S., Weil B., Wellenreuther R., Gassenhuber J., Glassl S.,
RA   Ansorge W., Boecher M., Bloecker H., Bauersachs S., Blum H.,
RA   Lauber J., Duesterhoeft A., Beyer A., Koehrer K., Strack N.,
RA   Mewes H.-W., Ottenwaelder B., Obermaier B., Tampe J., Heubner D.,
RA   Wambutt R., Korn B., Klein M., Poustka A.;
RT   "Towards a catalog of human genes and proteins: sequencing and
RT   analysis of 500 novel complete protein coding human cDNAs.";
RL   Genome Res. 11:422-435(2001).
RN   [3]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RX   MEDLINE=22887296; PubMed=12975309; DOI=10.1101/gr.1293003;
RA   Clark H.F., Gurney A.L., Abaya E., Baker K., Baldwin D.T., Brush J.,
RA   Chen J., Chow B., Chui C., Crowley C., Currell B., Deuel B., Dowd P.,
RA   Eaton D., Foster J.S., Grimaldi C., Gu Q., Hass P.E., Heldens S.,
RA   Huang A., Kim H.S., Klimowski L., Jin Y., Johnson S., Lee J.,
RA   Lewis L., Liao D., Mark M.R., Robbie E., Sanchez C., Schoenfeld J.,
RA   Seshagiri S., Simmons L., Singh J., Smith V., Stinson J., Vagts A.,
RA   Vandlen R.L., Watanabe C., Wieand D., Woods K., Xie M.-H.,
RA   Yansura D.G., Yi S., Yu G., Yuan J., Zhang M., Zhang Z., Goddard A.D.,
RA   Wood W.I., Godowski P.J., Gray A.M.;
RT   "The secreted protein discovery initiative (SPDI), a large-scale
RT   effort to identify novel human secreted and transmembrane proteins: a
RT   bioinformatics assessment.";
RL   Genome Res. 13:2265-2270(2003).
RN   [4]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RX   PubMed=14702039; DOI=10.1038/ng1285;
RA   Ota T., Suzuki Y., Nishikawa T., Otsuki T., Sugiyama T., Irie R.,
RA   Wakamatsu A., Hayashi K., Sato H., Nagai K., Kimura K., Makita H.,
RA   Sekine M., Obayashi M., Nishi T., Shibahara T., Tanaka T., Ishii S.,
RA   Yamamoto J., Saito K., Kawai Y., Isono Y., Nakamura Y., Nagahari K.,
RA   Murakami K., Yasuda T., Iwayanagi T., Wagatsuma M., Shiratori A.,
RA   Sudo H., Hosoiri T., Kaku Y., Kodaira H., Kondo H., Sugawara M.,
RA   Takahashi M., Kanda K., Yokoi T., Furuya T., Kikkawa E., Omura Y.,
RA   Abe K., Kamihara K., Katsuta N., Sato K., Tanikawa M., Yamazaki M.,
RA   Ninomiya K., Ishibashi T., Yamashita H., Murakawa K., Fujimori K.,
RA   Tanai H., Kimata M., Watanabe M., Hiraoka S., Chiba Y., Ishida S.,
RA   Ono Y., Takiguchi S., Watanabe S., Yosida M., Hotuta T., Kusano J.,
RA   Kanehori K., Takahashi-Fujii A., Hara H., Tanase T.-O., Nomura Y.,
RA   Togiya S., Komai F., Hara R., Takeuchi K., Arita M., Imose N.,
RA   Musashino K., Yuuki H., Oshima A., Sasaki N., Aotsuka S.,
RA   Yoshikawa Y., Matsunawa H., Ichihara T., Shiohata N., Sano S.,
RA   Moriya S., Momiyama H., Satoh N., Takami S., Terashima Y., Suzuki O.,
RA   Nakagawa S., Senoh A., Mizoguchi H., Goto Y., Shimizu F., Wakebe H.,
RA   Hishigaki H., Watanabe T., Sugiyama A., Takemoto M., Kawakami B.,
RA   Yamazaki M., Watanabe K., Kumagai A., Itakura S., Fukuzumi Y.,
RA   Fujimori Y., Komiyama M., Tashiro H., Tanigami A., Fujiwara T.,
RA   Ono T., Yamada K., Fujii Y., Ozaki K., Hirao M., Ohmori Y.,
RA   Kawabata A., Hikiji T., Kobatake N., Inagaki H., Ikema Y., Okamoto S.,
RA   Okitani R., Kawakami T., Noguchi S., Itoh T., Shigeta K., Senba T.,
RA   Matsumura K., Nakajima Y., Mizuno T., Morinaga M., Sasaki M.,
RA   Togashi T., Oyama M., Hata H., Watanabe M., Komatsu T.,
RA   Mizushima-Sugano J., Satoh T., Shirai Y., Takahashi Y., Nakagawa K.,
RA   Okumura K., Nagase T., Nomura N., Kikuchi H., Masuho Y., Yamashita R.,
RA   Nakai K., Yada T., Nakamura Y., Ohara O., Isogai T., Sugano S.;
RT   "Complete sequencing and characterization of 21,243 full-length human
RT   cDNAs.";
RL   Nat. Genet. 36:40-45(2004).
RN   [5]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RX   PubMed=16710414; DOI=10.1038/nature04727;
RA   Gregory S.G., Barlow K.F., McLay K.E., Kaul R., Swarbreck D.,
RA   Dunham A., Scott C.E., Howe K.L., Woodfine K., Spencer C.C.A.,
RA   Jones M.C., Gillson C., Searle S., Zhou Y., Kokocinski F.,
RA   McDonald L., Evans R., Phillips K., Atkinson A., Cooper R., Jones C.,
RA   Hall R.E., Andrews T.D., Lloyd C., Ainscough R., Almeida J.P.,
RA   Ambrose K.D., Anderson F., Andrew R.W., Ashwell R.I.S., Aubin K.,
RA   Babbage A.K., Bagguley C.L., Bailey J., Beasley H., Bethel G.,
RA   Bird C.P., Bray-Allen S., Brown J.Y., Brown A.J., Buckley D.,
RA   Burton J., Bye J., Carder C., Chapman J.C., Clark S.Y., Clarke G.,
RA   Clee C., Cobley V., Collier R.E., Corby N., Coville G.J., Davies J.,
RA   Deadman R., Dunn M., Earthrowl M., Ellington A.G., Errington H.,
RA   Frankish A., Frankland J., French L., Garner P., Garnett J., Gay L.,
RA   Ghori M.R.J., Gibson R., Gilby L.M., Gillett W., Glithero R.J.,
RA   Grafham D.V., Griffiths C., Griffiths-Jones S., Grocock R.,
RA   Hammond S., Harrison E.S.I., Hart E., Haugen E., Heath P.D.,
RA   Holmes S., Holt K., Howden P.J., Hunt A.R., Hunt S.E., Hunter G.,
RA   Isherwood J., James R., Johnson C., Johnson D., Joy A., Kay M.,
RA   Kershaw J.K., Kibukawa M., Kimberley A.M., King A., Knights A.J.,
RA   Lad H., Laird G., Lawlor S., Leongamornlert D.A., Lloyd D.M.,
RA   Loveland J., Lovell J., Lush M.J., Lyne R., Martin S.,
RA   Mashreghi-Mohammadi M., Matthews L., Matthews N.S.W., McLaren S.,
RA   Milne S., Mistry S., Moore M.J.F., Nickerson T., O'Dell C.N.,
RA   Oliver K., Palmeiri A., Palmer S.A., Parker A., Patel D., Pearce A.V.,
RA   Peck A.I., Pelan S., Phelps K., Phillimore B.J., Plumb R., Rajan J.,
RA   Raymond C., Rouse G., Saenphimmachak C., Sehra H.K., Sheridan E.,
RA   Shownkeen R., Sims S., Skuce C.D., Smith M., Steward C.,
RA   Subramanian S., Sycamore N., Tracey A., Tromans A., Van Helmond Z.,
RA   Wall M., Wallis J.M., White S., Whitehead S.L., Wilkinson J.E.,
RA   Willey D.L., Williams H., Wilming L., Wray P.W., Wu Z., Coulson A.,
RA   Vaudin M., Sulston J.E., Durbin R.M., Hubbard T., Wooster R.,
RA   Dunham I., Carter N.P., McVean G., Ross M.T., Harrow J., Olson M.V.,
RA   Beck S., Rogers J., Bentley D.R.;
RT   "The DNA sequence and biological annotation of human chromosome 1.";
RL   Nature 441:315-321(2006).
RN   [6]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RA   Mural R.J., Istrail S., Sutton G.G., Florea L., Halpern A.L.,
RA   Mobarry C.M., Lippert R., Walenz B., Shatkay H., Dew I., Miller J.R.,
RA   Flanigan M.J., Edwards N.J., Bolanos R., Fasulo D., Halldorsson B.V.,
RA   Hannenhalli S., Turner R., Yooseph S., Lu F., Nusskern D.R.,
RA   Shue B.C., Zheng X.H., Zhong F., Delcher A.L., Huson D.H.,
RA   Kravitz S.A., Mouchard L., Reinert K., Remington K.A., Clark A.G.,
RA   Waterman M.S., Eichler E.E., Adams M.D., Hunkapiller M.W., Myers E.W.,
RA   Venter J.C.;
RL   Submitted (JUL-2005) to the EMBL/GenBank/DDBJ databases.
RN   [7]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RC   TISSUE=Colon, and Lung;
RX   PubMed=15489334; DOI=10.1101/gr.2596504;
RG   The MGC Project Team;
RT   "The status, quality, and expansion of the NIH full-length cDNA
RT   project: the Mammalian Gene Collection (MGC).";
RL   Genome Res. 14:2121-2127(2004).
RN   [8]
RP   PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-343; SER-350 AND
RP   SER-362, AND MASS SPECTROMETRY.
RC   TISSUE=Cervix carcinoma;
RX   PubMed=18669648; DOI=10.1073/pnas.0805139105;
RA   Dephoure N., Zhou C., Villen J., Beausoleil S.A., Bakalarski C.E.,
RA   Elledge S.J., Gygi S.P.;
RT   "A quantitative atlas of mitotic phosphorylation.";
RL   Proc. Natl. Acad. Sci. U.S.A. 105:10762-10767(2008).
RN   [9]
RP   PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-362, AND MASS
RP   SPECTROMETRY.
RC   TISSUE=Leukemic T-cell;
RX   PubMed=19690332; DOI=10.1126/scisignal.2000007;
RA   Mayya V., Lundgren D.H., Hwang S.-I., Rezaul K., Wu L., Eng J.K.,
RA   Rodionov V., Han D.K.;
RT   "Quantitative phosphoproteomic analysis of T cell receptor signaling
RT   reveals system-wide modulation of protein-protein interactions.";
RL   Sci. Signal. 2:RA46-RA46(2009).
CC   -!- SIMILARITY: Belongs to the SH3BP5 family.
CC   -!- SEQUENCE CAUTION:
CC       Sequence=BAB21811.1; Type=Erroneous initiation;
CC   -----------------------------------------------------------------------
CC   Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms
CC   Distributed under the Creative Commons Attribution-NoDerivs License
CC   -----------------------------------------------------------------------
DR   EMBL; AB051507; BAB21811.1; ALT_INIT; mRNA.
DR   EMBL; AL136569; CAB66504.1; -; mRNA.
DR   EMBL; AY358453; AAQ88818.1; -; mRNA.
DR   EMBL; AK056382; BAB71171.1; -; mRNA.
DR   EMBL; AL732583; CAI18798.1; -; Genomic_DNA.
DR   EMBL; CH471257; EAW57534.1; -; Genomic_DNA.
DR   EMBL; BC010871; AAH10871.1; -; mRNA.
DR   EMBL; BC017254; AAH17254.1; -; mRNA.
DR   IPI; IPI00028359; -.
DR   RefSeq; NP_085148.1; NM_030645.1.
DR   UniGene; Hs.298573; -.
DR   ProteinModelPortal; Q7L8J4; -.
DR   IntAct; Q7L8J4; 2.
DR   MINT; MINT-1688351; -.
DR   PhosphoSite; Q7L8J4; -.
DR   DMDM; 74749902; -.
DR   PRIDE; Q7L8J4; -.
DR   Ensembl; ENST00000366472; ENSP00000355428; ENSG00000175137.
DR   GeneID; 80851; -.
DR   KEGG; hsa:80851; -.
DR   UCSC; uc001iev.1; human.
DR   CTD; 80851; -.
DR   GeneCards; GC01M249104; -.
DR   H-InvDB; HIX0160026; -.
DR   HGNC; HGNC:29360; SH3BP5L.
DR   HPA; HPA038068; -.
DR   neXtProt; NX_Q7L8J4; -.
DR   PharmGKB; PA142670923; -.
DR   eggNOG; NOG263345; -.
DR   GeneTree; ENSGT00390000018500; -.
DR   HOGENOM; HOG000190360; -.
DR   HOVERGEN; HBG057307; -.
DR   InParanoid; Q7L8J4; -.
DR   OMA; GVRGGRH; -.
DR   OrthoDB; EOG4PZJ78; -.
DR   GenomeRNAi; 80851; -.
DR   NextBio; 71284; -.
DR   ArrayExpress; Q7L8J4; -.
DR   Bgee; Q7L8J4; -.
DR   CleanEx; HS_SH3BP5L; -.
DR   Genevestigator; Q7L8J4; -.
DR   InterPro; IPR007940; SH3-bd_5.
DR   PANTHER; PTHR19423; SH3_bd_5; 1.
DR   Pfam; PF05276; SH3BP5; 1.
PE   1: Evidence at protein level;
KW   Coiled coil; Complete proteome; Phosphoprotein; Reference proteome.
FT   CHAIN         1    393       SH3 domain-binding protein 5-like.
FT                                /FTId=PRO_0000317508.
FT   COILED       59    140       Potential.
FT   COILED      169    272       Potential.
FT   COMPBIAS     37     40       Poly-Gly.
FT   COMPBIAS     41     44       Poly-Ser.
FT   COMPBIAS     52     55       Poly-Glu.
FT   MOD_RES     343    343       Phosphoserine.
FT   MOD_RES     350    350       Phosphoserine.
FT   MOD_RES     362    362       Phosphoserine.
FT   MOD_RES     378    378       Phosphoserine (By similarity).
SQ   SEQUENCE   393 AA;  43499 MW;  3693431765F90FDC CRC64;
     MAELRQVPGG RETPQGELRP EVVEDEVPRS PVAEEPGGGG SSSSEAKLSP REEEELDPRI
     QEELEHLNQA SEEINQVELQ LDEARTTYRR ILQESARKLN TQGSHLGSCI EKARPYYEAR
     RLAKEAQQET QKAALRYERA VSMHNAAREM VFVAEQGVMA DKNRLDPTWQ EMLNHATCKV
     NEAEEERLRG EREHQRVTRL CQQAEARVQA LQKTLRRAIG KSRPYFELKA QFSQILEEHK
     AKVTELEQQV AQAKTRYSVA LRNLEQISEQ IHARRRGGLP PHPLGPRRSS PVGAEAGPED
     MEDGDSGIEG AEGAGLEEGS SLGPGPAPDT DTLSLLSLRT VASDLQKCDS VEHLRGLSDH
     VSLDGQELGT RSGGRRGSDG GARGGRHQRS VSL
//
ID   A16L1_HUMAN             Reviewed;         607 AA.
AC   Q676U5; A3EXK9; A3EXL0; B6ZDH0; Q6IPN1; Q6UXW4; Q6ZVZ5; Q8NCY2;
AC   Q96JV5; Q9H619;
DT   12-APR-2005, integrated into UniProtKB/Swiss-Prot.
DT   12-APR-2005, sequence version 2.
DT   05-SEP-2012, entry version 92.
DE   RecName: Full=Autophagy-related protein 16-1;
DE   AltName: Full=APG16-like 1;
GN   Name=ATG16L1; Synonyms=APG16L; ORFNames=UNQ9393/PRO34307;
OS   Homo sapiens (Human).
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
OC   Catarrhini; Hominidae; Homo.
OX   NCBI_TaxID=9606;
RN   [1]
RP   NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1), AND VARIANT ALA-300.
RC   TISSUE=Fetal brain;
RX   PubMed=15620219; DOI=10.1080/10425170400004104;
RA   Zheng H., Ji C., Li J., Jiang H., Ren M., Lu Q., Gu S., Mao Y.,
RA   Xie Y.;
RT   "Cloning and analysis of human Apg16L.";
RL   DNA Seq. 15:303-305(2004).
RN   [2]
RP   NUCLEOTIDE SEQUENCE [MRNA] (ISOFORMS 2 AND 5), AND ASSOCIATION OF
RP   VARIANT ALA-300 WITH SUSCEPTIBILITY TO IBD10.
RX   PubMed=17200669; DOI=10.1038/ng1954;
RA   Hampe J., Franke A., Rosenstiel P., Till A., Teuber M., Huse K.,
RA   Albrecht M., Mayr G., De La Vega F.M., Briggs J., Guenther S.,
RA   Prescott N.J., Onnie C.M., Haesler R., Sipos B., Foelsch U.R.,
RA   Lengauer T., Platzer M., Mathew C.G., Krawczak M., Schreiber S.;
RT   "A genome-wide association scan of nonsynonymous SNPs identifies a
RT   susceptibility variant for Crohn disease in ATG16L1.";
RL   Nat. Genet. 39:207-211(2007).
RN   [3]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 3).
RX   MEDLINE=22887296; PubMed=12975309; DOI=10.1101/gr.1293003;
RA   Clark H.F., Gurney A.L., Abaya E., Baker K., Baldwin D.T., Brush J.,
RA   Chen J., Chow B., Chui C., Crowley C., Currell B., Deuel B., Dowd P.,
RA   Eaton D., Foster J.S., Grimaldi C., Gu Q., Hass P.E., Heldens S.,
RA   Huang A., Kim H.S., Klimowski L., Jin Y., Johnson S., Lee J.,
RA   Lewis L., Liao D., Mark M.R., Robbie E., Sanchez C., Schoenfeld J.,
RA   Seshagiri S., Simmons L., Singh J., Smith V., Stinson J., Vagts A.,
RA   Vandlen R.L., Watanabe C., Wieand D., Woods K., Xie M.-H.,
RA   Yansura D.G., Yi S., Yu G., Yuan J., Zhang M., Zhang Z., Goddard A.D.,
RA   Wood W.I., Godowski P.J., Gray A.M.;
RT   "The secreted protein discovery initiative (SPDI), a large-scale
RT   effort to identify novel human secreted and transmembrane proteins: a
RT   bioinformatics assessment.";
RL   Genome Res. 13:2265-2270(2003).
RN   [4]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 4), AND NUCLEOTIDE
RP   SEQUENCE [LARGE SCALE MRNA] OF 55-607 (ISOFORM 2).
RC   TISSUE=Brain, Placenta, and Small intestine;
RX   PubMed=14702039; DOI=10.1038/ng1285;
RA   Ota T., Suzuki Y., Nishikawa T., Otsuki T., Sugiyama T., Irie R.,
RA   Wakamatsu A., Hayashi K., Sato H., Nagai K., Kimura K., Makita H.,
RA   Sekine M., Obayashi M., Nishi T., Shibahara T., Tanaka T., Ishii S.,
RA   Yamamoto J., Saito K., Kawai Y., Isono Y., Nakamura Y., Nagahari K.,
RA   Murakami K., Yasuda T., Iwayanagi T., Wagatsuma M., Shiratori A.,
RA   Sudo H., Hosoiri T., Kaku Y., Kodaira H., Kondo H., Sugawara M.,
RA   Takahashi M., Kanda K., Yokoi T., Furuya T., Kikkawa E., Omura Y.,
RA   Abe K., Kamihara K., Katsuta N., Sato K., Tanikawa M., Yamazaki M.,
RA   Ninomiya K., Ishibashi T., Yamashita H., Murakawa K., Fujimori K.,
RA   Tanai H., Kimata M., Watanabe M., Hiraoka S., Chiba Y., Ishida S.,
RA   Ono Y., Takiguchi S., Watanabe S., Yosida M., Hotuta T., Kusano J.,
RA   Kanehori K., Takahashi-Fujii A., Hara H., Tanase T.-O., Nomura Y.,
RA   Togiya S., Komai F., Hara R., Takeuchi K., Arita M., Imose N.,
RA   Musashino K., Yuuki H., Oshima A., Sasaki N., Aotsuka S.,
RA   Yoshikawa Y., Matsunawa H., Ichihara T., Shiohata N., Sano S.,
RA   Moriya S., Momiyama H., Satoh N., Takami S., Terashima Y., Suzuki O.,
RA   Nakagawa S., Senoh A., Mizoguchi H., Goto Y., Shimizu F., Wakebe H.,
RA   Hishigaki H., Watanabe T., Sugiyama A., Takemoto M., Kawakami B.,
RA   Yamazaki M., Watanabe K., Kumagai A., Itakura S., Fukuzumi Y.,
RA   Fujimori Y., Komiyama M., Tashiro H., Tanigami A., Fujiwara T.,
RA   Ono T., Yamada K., Fujii Y., Ozaki K., Hirao M., Ohmori Y.,
RA   Kawabata A., Hikiji T., Kobatake N., Inagaki H., Ikema Y., Okamoto S.,
RA   Okitani R., Kawakami T., Noguchi S., Itoh T., Shigeta K., Senba T.,
RA   Matsumura K., Nakajima Y., Mizuno T., Morinaga M., Sasaki M.,
RA   Togashi T., Oyama M., Hata H., Watanabe M., Komatsu T.,
RA   Mizushima-Sugano J., Satoh T., Shirai Y., Takahashi Y., Nakagawa K.,
RA   Okumura K., Nagase T., Nomura N., Kikuchi H., Masuho Y., Yamashita R.,
RA   Nakai K., Yada T., Nakamura Y., Ohara O., Isogai T., Sugano S.;
RT   "Complete sequencing and characterization of 21,243 full-length human
RT   cDNAs.";
RL   Nat. Genet. 36:40-45(2004).
RN   [5]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RX   PubMed=15815621; DOI=10.1038/nature03466;
RA   Hillier L.W., Graves T.A., Fulton R.S., Fulton L.A., Pepin K.H.,
RA   Minx P., Wagner-McPherson C., Layman D., Wylie K., Sekhon M.,
RA   Becker M.C., Fewell G.A., Delehaunty K.D., Miner T.L., Nash W.E.,
RA   Kremitzki C., Oddy L., Du H., Sun H., Bradshaw-Cordum H., Ali J.,
RA   Carter J., Cordes M., Harris A., Isak A., van Brunt A., Nguyen C.,
RA   Du F., Courtney L., Kalicki J., Ozersky P., Abbott S., Armstrong J.,
RA   Belter E.A., Caruso L., Cedroni M., Cotton M., Davidson T., Desai A.,
RA   Elliott G., Erb T., Fronick C., Gaige T., Haakenson W., Haglund K.,
RA   Holmes A., Harkins R., Kim K., Kruchowski S.S., Strong C.M.,
RA   Grewal N., Goyea E., Hou S., Levy A., Martinka S., Mead K.,
RA   McLellan M.D., Meyer R., Randall-Maher J., Tomlinson C.,
RA   Dauphin-Kohlberg S., Kozlowicz-Reilly A., Shah N.,
RA   Swearengen-Shahid S., Snider J., Strong J.T., Thompson J., Yoakum M.,
RA   Leonard S., Pearman C., Trani L., Radionenko M., Waligorski J.E.,
RA   Wang C., Rock S.M., Tin-Wollam A.-M., Maupin R., Latreille P.,
RA   Wendl M.C., Yang S.-P., Pohl C., Wallis J.W., Spieth J., Bieri T.A.,
RA   Berkowicz N., Nelson J.O., Osborne J., Ding L., Meyer R., Sabo A.,
RA   Shotland Y., Sinha P., Wohldmann P.E., Cook L.L., Hickenbotham M.T.,
RA   Eldred J., Williams D., Jones T.A., She X., Ciccarelli F.D.,
RA   Izaurralde E., Taylor J., Schmutz J., Myers R.M., Cox D.R., Huang X.,
RA   McPherson J.D., Mardis E.R., Clifton S.W., Warren W.C.,
RA   Chinwalla A.T., Eddy S.R., Marra M.A., Ovcharenko I., Furey T.S.,
RA   Miller W., Eichler E.E., Bork P., Suyama M., Torrents D.,
RA   Waterston R.H., Wilson R.K.;
RT   "Generation and annotation of the DNA sequences of human chromosomes 2
RT   and 4.";
RL   Nature 434:724-731(2005).
RN   [6]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RA   Mural R.J., Istrail S., Sutton G.G., Florea L., Halpern A.L.,
RA   Mobarry C.M., Lippert R., Walenz B., Shatkay H., Dew I., Miller J.R.,
RA   Flanigan M.J., Edwards N.J., Bolanos R., Fasulo D., Halldorsson B.V.,
RA   Hannenhalli S., Turner R., Yooseph S., Lu F., Nusskern D.R.,
RA   Shue B.C., Zheng X.H., Zhong F., Delcher A.L., Huson D.H.,
RA   Kravitz S.A., Mouchard L., Reinert K., Remington K.A., Clark A.G.,
RA   Waterman M.S., Eichler E.E., Adams M.D., Hunkapiller M.W., Myers E.W.,
RA   Venter J.C.;
RL   Submitted (JUL-2005) to the EMBL/GenBank/DDBJ databases.
RN   [7]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] OF 114-607 (ISOFORM 2).
RC   TISSUE=Mammary gland;
RX   PubMed=15489334; DOI=10.1101/gr.2596504;
RG   The MGC Project Team;
RT   "The status, quality, and expansion of the NIH full-length cDNA
RT   project: the Mammalian Gene Collection (MGC).";
RL   Genome Res. 14:2121-2127(2004).
RN   [8]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] OF 513-607.
RC   TISSUE=Testis;
RX   PubMed=17974005; DOI=10.1186/1471-2164-8-399;
RA   Bechtel S., Rosenfelder H., Duda A., Schmidt C.P., Ernst U.,
RA   Wellenreuther R., Mehrle A., Schuster C., Bahr A., Bloecker H.,
RA   Heubner D., Hoerlein A., Michel G., Wedler H., Koehrer K.,
RA   Ottenwaelder B., Poustka A., Wiemann S., Schupp I.;
RT   "The full-ORF clone resource of the German cDNA consortium.";
RL   BMC Genomics 8:399-399(2007).
RN   [9]
RP   PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-287; SER-290 AND
RP   SER-304, AND MASS SPECTROMETRY.
RC   TISSUE=Cervix carcinoma;
RX   PubMed=17924679; DOI=10.1021/pr070152u;
RA   Yu L.-R., Zhu Z., Chan K.C., Issaq H.J., Dimitrov D.S., Veenstra T.D.;
RT   "Improved titanium dioxide enrichment of phosphopeptides from HeLa
RT   cells and high confident phosphopeptide identification by cross-
RT   validation of MS/MS and MS/MS/MS spectra.";
RL   J. Proteome Res. 6:4150-4162(2007).
RN   [10]
RP   PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-287, AND MASS
RP   SPECTROMETRY.
RC   TISSUE=Cervix carcinoma;
RX   PubMed=18669648; DOI=10.1073/pnas.0805139105;
RA   Dephoure N., Zhou C., Villen J., Beausoleil S.A., Bakalarski C.E.,
RA   Elledge S.J., Gygi S.P.;
RT   "A quantitative atlas of mitotic phosphorylation.";
RL   Proc. Natl. Acad. Sci. U.S.A. 105:10762-10767(2008).
RN   [11]
RP   ASSOCIATION OF VARIANT ALA-300 WITH SUSCEPTIBILITY TO IBD10.
RX   PubMed=17435756; DOI=10.1038/ng2032;
RA   Rioux J.D., Xavier R.J., Taylor K.D., Silverberg M.S., Goyette P.,
RA   Huett A., Green T., Kuballa P., Barmada M.M., Datta L.W.,
RA   Shugart Y.Y., Griffiths A.M., Targan S.R., Ippoliti A.F.,
RA   Bernard E.-J., Mei L., Nicolae D.L., Regueiro M., Schumm L.P.,
RA   Steinhart A.H., Rotter J.I., Duerr R.H., Cho J.H., Daly M.J.,
RA   Brant S.R.;
RT   "Genome-wide association study identifies new susceptibility loci for
RT   Crohn disease and implicates autophagy in disease pathogenesis.";
RL   Nat. Genet. 39:596-604(2007).
CC   -!- FUNCTION: Plays an essential role in autophagy (By similarity).
CC   -!- SUBUNIT: Homooligomer. Interacts with ATG5. Part of either the
CC       minor and major complexes respectively composed of 4 sets of
CC       ATG12-ATG5 and ATG16L1 (400 kDa) or 8 sets of ATG12-ATG5 and
CC       ATG16L1 (800 kDa) (By similarity).
CC   -!- INTERACTION:
CC       Q9GZQ8:MAP1LC3B; NbExp=2; IntAct=EBI-535909, EBI-373144;
CC       Q9BXW4:MAP1LC3C; NbExp=4; IntAct=EBI-535909, EBI-2603996;
CC   -!- SUBCELLULAR LOCATION: Cytoplasm (By similarity). Preautophagosomal
CC       structure membrane; Peripheral membrane protein (By similarity).
CC       Note=Localized to preautophagosomal structure (PAS) where it is
CC       involved in the membrane targeting of ATG5 (By similarity).
CC   -!- ALTERNATIVE PRODUCTS:
CC       Event=Alternative splicing; Named isoforms=5;
CC       Name=1; Synonyms=APG16L beta;
CC         IsoId=Q676U5-1; Sequence=Displayed;
CC       Name=2;
CC         IsoId=Q676U5-2; Sequence=VSP_013386;
CC         Note=May be produced at very low levels due to a premature stop
CC         codon in the mRNA, leading to nonsense-mediated mRNA decay;
CC       Name=3;
CC         IsoId=Q676U5-3; Sequence=VSP_013387, VSP_013388;
CC         Note=No experimental confirmation available;
CC       Name=4;
CC         IsoId=Q676U5-4; Sequence=VSP_013389, VSP_013390;
CC         Note=No experimental confirmation available;
CC       Name=5;
CC         IsoId=Q676U5-5; Sequence=VSP_013389, VSP_013386;
CC         Note=No experimental confirmation available;
CC   -!- DISEASE: Genetic variations in ATG16L1 are associated with
CC       susceptibility to inflammatory bowel disease type 10 (IBD10)
CC       [MIM:611081]. IBD is characterized by a chronic relapsing
CC       intestinal inflammation. IBD is subdivided into Crohn disease (CD)
CC       and ulcerative colitis phenotypes. IBD10 individuals show the
CC       phenotype characteristic to CD. It may involve any part of the
CC       gastrointestinal tract, but most frequently the terminal ileum and
CC       colon. CD is commonly classified as autoimmune disease.
CC   -!- SIMILARITY: Belongs to the WD repeat ATG16 family.
CC   -!- SIMILARITY: Contains 7 WD repeats.
CC   -!- SEQUENCE CAUTION:
CC       Sequence=BAB15448.1; Type=Erroneous translation; Note=Wrong choice of CDS;
CC       Sequence=BAB55412.1; Type=Erroneous initiation;
CC   -----------------------------------------------------------------------
CC   Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms
CC   Distributed under the Creative Commons Attribution-NoDerivs License
CC   -----------------------------------------------------------------------
DR   EMBL; AY398617; AAR32130.1; -; mRNA.
DR   EMBL; EF079889; ABN48554.1; -; mRNA.
DR   EMBL; EF079890; ABN48555.1; -; mRNA.
DR   EMBL; AY358182; AAQ88549.1; -; mRNA.
DR   EMBL; AK026330; BAB15448.1; ALT_SEQ; mRNA.
DR   EMBL; AK027854; BAB55412.1; ALT_INIT; mRNA.
DR   EMBL; AK123876; BAC85713.1; -; mRNA.
DR   EMBL; AC013726; -; NOT_ANNOTATED_CDS; Genomic_DNA.
DR   EMBL; CH471063; EAW71034.1; -; Genomic_DNA.
DR   EMBL; BC071846; AAH71846.1; -; mRNA.
DR   EMBL; AL834526; CAD39182.1; -; mRNA.
DR   IPI; IPI00432751; -.
DR   IPI; IPI00446614; -.
DR   IPI; IPI00470446; -.
DR   IPI; IPI00555905; -.
DR   IPI; IPI00797150; -.
DR   RefSeq; NP_001177195.1; NM_001190266.1.
DR   RefSeq; NP_001177196.1; NM_001190267.1.
DR   RefSeq; NP_060444.3; NM_017974.3.
DR   RefSeq; NP_110430.5; NM_030803.6.
DR   RefSeq; NP_942593.2; NM_198890.2.
DR   UniGene; Hs.529322; -.
DR   ProteinModelPortal; Q676U5; -.
DR   SMR; Q676U5; 310-606.
DR   DIP; DIP-27552N; -.
DR   IntAct; Q676U5; 20.
DR   MINT; MINT-1141152; -.
DR   STRING; Q676U5; -.
DR   PhosphoSite; Q676U5; -.
DR   DMDM; 62510482; -.
DR   PRIDE; Q676U5; -.
DR   DNASU; 55054; -.
DR   Ensembl; ENST00000347464; ENSP00000318259; ENSG00000085978.
DR   Ensembl; ENST00000373525; ENSP00000362625; ENSG00000085978.
DR   Ensembl; ENST00000392017; ENSP00000375872; ENSG00000085978.
DR   Ensembl; ENST00000392020; ENSP00000375875; ENSG00000085978.
DR   GeneID; 55054; -.
DR   KEGG; hsa:55054; -.
DR   UCSC; uc002vty.3; human.
DR   UCSC; uc002vtz.3; human.
DR   UCSC; uc002vua.3; human.
DR   CTD; 55054; -.
DR   GeneCards; GC02P234118; -.
DR   HGNC; HGNC:21498; ATG16L1.
DR   HPA; HPA012577; -.
DR   MIM; 610767; gene.
DR   MIM; 611081; phenotype.
DR   neXtProt; NX_Q676U5; -.
DR   Orphanet; 206; Crohn disease.
DR   PharmGKB; PA134902949; -.
DR   eggNOG; COG2319; -.
DR   GeneTree; ENSGT00670000097918; -.
DR   HOGENOM; HOG000112569; -.
DR   HOVERGEN; HBG050534; -.
DR   OrthoDB; EOG4SXNC8; -.
DR   GenomeRNAi; 55054; -.
DR   NextBio; 58531; -.
DR   ArrayExpress; Q676U5; -.
DR   Bgee; Q676U5; -.
DR   CleanEx; HS_ATG16L1; -.
DR   Genevestigator; Q676U5; -.
DR   GermOnline; ENSG00000085978; Homo sapiens.
DR   GO; GO:0005776; C:autophagic vacuole; ISS:UniProtKB.
DR   GO; GO:0034045; C:pre-autophagosomal structure membrane; IEA:UniProtKB-SubCell.
DR   GO; GO:0000045; P:autophagic vacuole assembly; NAS:UniProtKB.
DR   GO; GO:0051260; P:protein homooligomerization; NAS:UniProtKB.
DR   GO; GO:0015031; P:protein transport; IEA:UniProtKB-KW.
DR   Gene3D; G3DSA:2.130.10.10; WD40/YVTN_repeat-like; 2.
DR   InterPro; IPR013923; Autophagy-rel_prot_16.
DR   InterPro; IPR020472; G-protein_beta_WD-40_rep.
DR   InterPro; IPR015943; WD40/YVTN_repeat-like_dom.
DR   InterPro; IPR001680; WD40_repeat.
DR   InterPro; IPR019775; WD40_repeat_CS.
DR   InterPro; IPR017986; WD40_repeat_dom.
DR   Pfam; PF08614; ATG16; 1.
DR   Pfam; PF00400; WD40; 5.
DR   PRINTS; PR00320; GPROTEINBRPT.
DR   SMART; SM00320; WD40; 7.
DR   SUPFAM; SSF50978; WD40_like; 1.
DR   PROSITE; PS00678; WD_REPEATS_1; 3.
DR   PROSITE; PS50082; WD_REPEATS_2; 6.
DR   PROSITE; PS50294; WD_REPEATS_REGION; 1.
PE   1: Evidence at protein level;
KW   Alternative splicing; Autophagy; Coiled coil; Complete proteome;
KW   Cytoplasm; Membrane; Phosphoprotein; Polymorphism; Protein transport;
KW   Reference proteome; Repeat; Transport; WD repeat.
FT   CHAIN         1    607       Autophagy-related protein 16-1.
FT                                /FTId=PRO_0000050848.
FT   REPEAT      320    359       WD 1.
FT   REPEAT      364    403       WD 2.
FT   REPEAT      406    445       WD 3.
FT   REPEAT      447    484       WD 4.
FT   REPEAT      486    525       WD 5.
FT   REPEAT      532    573       WD 6.
FT   REPEAT      575    607       WD 7.
FT   COILED       78    230       Potential.
FT   MOD_RES     287    287       Phosphoserine.
FT   MOD_RES     289    289       Phosphoserine (By similarity).
FT   MOD_RES     290    290       Phosphoserine.
FT   MOD_RES     304    304       Phosphoserine.
FT   VAR_SEQ      70    213       Missing (in isoform 4 and isoform 5).
FT                                /FTId=VSP_013389.
FT   VAR_SEQ     266    284       Missing (in isoform 2 and isoform 5).
FT                                /FTId=VSP_013386.
FT   VAR_SEQ     334    368       Missing (in isoform 4).
FT                                /FTId=VSP_013390.
FT   VAR_SEQ     443    470       IKTVFAGSSCNDIVCTEQCVMSGHFDKK -> EEIQSLCLC
FT                                ICLDVSVEVCVCTSEPAFM (in isoform 3).
FT                                /FTId=VSP_013387.
FT   VAR_SEQ     471    607       Missing (in isoform 3).
FT                                /FTId=VSP_013388.
FT   VARIANT     300    300       T -> A (associated with susceptibility to
FT                                IBD10; dbSNP:rs2241880).
FT                                /FTId=VAR_021834.
FT   VARIANT     307    307       E -> K (in dbSNP:rs1866878).
FT                                /FTId=VAR_053386.
FT   CONFLICT    151    151       K -> R (in Ref. 6; BAB55412).
FT   CONFLICT    328    328       V -> A (in Ref. 6; BAB55412).
FT   CONFLICT    529    529       P -> T (in Ref. 6; BAB55412).
SQ   SEQUENCE   607 AA;  68265 MW;  5A5816AE2CF03CA0 CRC64;
     MSSGLRAADF PRWKRHISEQ LRRRDRLQRQ AFEEIILQYN KLLEKSDLHS VLAQKLQAEK
     HDVPNRHEIS PGHDGTWNDN QLQEMAQLRI KHQEELTELH KKRGELAQLV IDLNNQMQRK
     DREMQMNEAK IAECLQTISD LETECLDLRT KLCDLERANQ TLKDEYDALQ ITFTALEGKL
     RKTTEENQEL VTRWMAEKAQ EANRLNAENE KDSRRRQARL QKELAEAAKE PLPVEQDDDI
     EVIVDETSDH TEETSPVRAI SRAATKRLSQ PAGGLLDSIT NIFGRRSVSS FPVPQDNVDT
     HPGSGKEVRV PATALCVFDA HDGEVNAVQF SPGSRLLATG GMDRRVKLWE VFGEKCEFKG
     SLSGSNAGIT SIEFDSAGSY LLAASNDFAS RIWTVDDYRL RHTLTGHSGK VLSAKFLLDN
     ARIVSGSHDR TLKLWDLRSK VCIKTVFAGS SCNDIVCTEQ CVMSGHFDKK IRFWDIRSES
     IVREMELLGK ITALDLNPER TELLSCSRDD LLKVIDLRTN AIKQTFSAPG FKCGSDWTRV
     VFSPDGSYVA AGSAEGSLYI WSVLTGKVEK VLSKQHSSSI NAVAWSPSGS HVVSVDKGCK
     AVLWAQY
//

The expected out put is for one record if matching entry found is: "Q7L8J4"

Code:
ID   3BP5L_HUMAN             Reviewed;         393 AA.
AC   Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3;
DT   05-FEB-2008, integrated into UniProtKB/Swiss-Prot.
DT   05-JUL-2004, sequence version 1.
DT   05-SEP-2012, entry version 71.
DE   RecName: Full=SH3 domain-binding protein 5-like;
DE            Short=SH3BP-5-like;
GN   Name=SH3BP5L; Synonyms=KIAA1720; ORFNames=UNQ2766/PRO7133;
OS   Homo sapiens (Human).
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
OC   Catarrhini; Hominidae; Homo.
OX   NCBI_TaxID=9606;
RN   [1]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RC   TISSUE=Brain;
RX   MEDLINE=21082932; PubMed=11214970; DOI=10.1093/dnares/7.6.347;
RA   Nagase T., Kikuno R., Hattori A., Kondo Y., Okumura K., Ohara O.;
RT   "Prediction of the coding sequences of unidentified human genes. XIX.
RT   The complete sequences of 100 new cDNA clones from brain which code
RT   for large proteins in vitro.";
RL   DNA Res. 7:347-355(2000).
RN   [2]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RC   TISSUE=Amygdala;
RX   MEDLINE=21154917; PubMed=11230166; DOI=10.1101/gr.GR1547R;
RA   Wiemann S., Weil B., Wellenreuther R., Gassenhuber J., Glassl S.,
RA   Ansorge W., Boecher M., Bloecker H., Bauersachs S., Blum H.,
RA   Lauber J., Duesterhoeft A., Beyer A., Koehrer K., Strack N.,
RA   Mewes H.-W., Ottenwaelder B., Obermaier B., Tampe J., Heubner D.,
RA   Wambutt R., Korn B., Klein M., Poustka A.;
RT   "Towards a catalog of human genes and proteins: sequencing and
RT   analysis of 500 novel complete protein coding human cDNAs.";
RL   Genome Res. 11:422-435(2001).
RN   [3]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RX   MEDLINE=22887296; PubMed=12975309; DOI=10.1101/gr.1293003;
RA   Clark H.F., Gurney A.L., Abaya E., Baker K., Baldwin D.T., Brush J.,
RA   Chen J., Chow B., Chui C., Crowley C., Currell B., Deuel B., Dowd P.,
RA   Eaton D., Foster J.S., Grimaldi C., Gu Q., Hass P.E., Heldens S.,
RA   Huang A., Kim H.S., Klimowski L., Jin Y., Johnson S., Lee J.,
RA   Lewis L., Liao D., Mark M.R., Robbie E., Sanchez C., Schoenfeld J.,
RA   Seshagiri S., Simmons L., Singh J., Smith V., Stinson J., Vagts A.,
RA   Vandlen R.L., Watanabe C., Wieand D., Woods K., Xie M.-H.,
RA   Yansura D.G., Yi S., Yu G., Yuan J., Zhang M., Zhang Z., Goddard A.D.,
RA   Wood W.I., Godowski P.J., Gray A.M.;
RT   "The secreted protein discovery initiative (SPDI), a large-scale
RT   effort to identify novel human secreted and transmembrane proteins: a
RT   bioinformatics assessment.";
RL   Genome Res. 13:2265-2270(2003).
RN   [4]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RX   PubMed=14702039; DOI=10.1038/ng1285;
RA   Ota T., Suzuki Y., Nishikawa T., Otsuki T., Sugiyama T., Irie R.,
RA   Wakamatsu A., Hayashi K., Sato H., Nagai K., Kimura K., Makita H.,
RA   Sekine M., Obayashi M., Nishi T., Shibahara T., Tanaka T., Ishii S.,
RA   Yamamoto J., Saito K., Kawai Y., Isono Y., Nakamura Y., Nagahari K.,
RA   Murakami K., Yasuda T., Iwayanagi T., Wagatsuma M., Shiratori A.,
RA   Sudo H., Hosoiri T., Kaku Y., Kodaira H., Kondo H., Sugawara M.,
RA   Takahashi M., Kanda K., Yokoi T., Furuya T., Kikkawa E., Omura Y.,
RA   Abe K., Kamihara K., Katsuta N., Sato K., Tanikawa M., Yamazaki M.,
RA   Ninomiya K., Ishibashi T., Yamashita H., Murakawa K., Fujimori K.,
RA   Tanai H., Kimata M., Watanabe M., Hiraoka S., Chiba Y., Ishida S.,
RA   Ono Y., Takiguchi S., Watanabe S., Yosida M., Hotuta T., Kusano J.,
RA   Kanehori K., Takahashi-Fujii A., Hara H., Tanase T.-O., Nomura Y.,
RA   Togiya S., Komai F., Hara R., Takeuchi K., Arita M., Imose N.,
RA   Musashino K., Yuuki H., Oshima A., Sasaki N., Aotsuka S.,
RA   Yoshikawa Y., Matsunawa H., Ichihara T., Shiohata N., Sano S.,
RA   Moriya S., Momiyama H., Satoh N., Takami S., Terashima Y., Suzuki O.,
RA   Nakagawa S., Senoh A., Mizoguchi H., Goto Y., Shimizu F., Wakebe H.,
RA   Hishigaki H., Watanabe T., Sugiyama A., Takemoto M., Kawakami B.,
RA   Yamazaki M., Watanabe K., Kumagai A., Itakura S., Fukuzumi Y.,
RA   Fujimori Y., Komiyama M., Tashiro H., Tanigami A., Fujiwara T.,
RA   Ono T., Yamada K., Fujii Y., Ozaki K., Hirao M., Ohmori Y.,
RA   Kawabata A., Hikiji T., Kobatake N., Inagaki H., Ikema Y., Okamoto S.,
RA   Okitani R., Kawakami T., Noguchi S., Itoh T., Shigeta K., Senba T.,
RA   Matsumura K., Nakajima Y., Mizuno T., Morinaga M., Sasaki M.,
RA   Togashi T., Oyama M., Hata H., Watanabe M., Komatsu T.,
RA   Mizushima-Sugano J., Satoh T., Shirai Y., Takahashi Y., Nakagawa K.,
RA   Okumura K., Nagase T., Nomura N., Kikuchi H., Masuho Y., Yamashita R.,
RA   Nakai K., Yada T., Nakamura Y., Ohara O., Isogai T., Sugano S.;
RT   "Complete sequencing and characterization of 21,243 full-length human
RT   cDNAs.";
RL   Nat. Genet. 36:40-45(2004).
RN   [5]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RX   PubMed=16710414; DOI=10.1038/nature04727;
RA   Gregory S.G., Barlow K.F., McLay K.E., Kaul R., Swarbreck D.,
RA   Dunham A., Scott C.E., Howe K.L., Woodfine K., Spencer C.C.A.,
RA   Jones M.C., Gillson C., Searle S., Zhou Y., Kokocinski F.,
RA   McDonald L., Evans R., Phillips K., Atkinson A., Cooper R., Jones C.,
RA   Hall R.E., Andrews T.D., Lloyd C., Ainscough R., Almeida J.P.,
RA   Ambrose K.D., Anderson F., Andrew R.W., Ashwell R.I.S., Aubin K.,
RA   Babbage A.K., Bagguley C.L., Bailey J., Beasley H., Bethel G.,
RA   Bird C.P., Bray-Allen S., Brown J.Y., Brown A.J., Buckley D.,
RA   Burton J., Bye J., Carder C., Chapman J.C., Clark S.Y., Clarke G.,
RA   Clee C., Cobley V., Collier R.E., Corby N., Coville G.J., Davies J.,
RA   Deadman R., Dunn M., Earthrowl M., Ellington A.G., Errington H.,
RA   Frankish A., Frankland J., French L., Garner P., Garnett J., Gay L.,
RA   Ghori M.R.J., Gibson R., Gilby L.M., Gillett W., Glithero R.J.,
RA   Grafham D.V., Griffiths C., Griffiths-Jones S., Grocock R.,
RA   Hammond S., Harrison E.S.I., Hart E., Haugen E., Heath P.D.,
RA   Holmes S., Holt K., Howden P.J., Hunt A.R., Hunt S.E., Hunter G.,
RA   Isherwood J., James R., Johnson C., Johnson D., Joy A., Kay M.,
RA   Kershaw J.K., Kibukawa M., Kimberley A.M., King A., Knights A.J.,
RA   Lad H., Laird G., Lawlor S., Leongamornlert D.A., Lloyd D.M.,
RA   Loveland J., Lovell J., Lush M.J., Lyne R., Martin S.,
RA   Mashreghi-Mohammadi M., Matthews L., Matthews N.S.W., McLaren S.,
RA   Milne S., Mistry S., Moore M.J.F., Nickerson T., O'Dell C.N.,
RA   Oliver K., Palmeiri A., Palmer S.A., Parker A., Patel D., Pearce A.V.,
RA   Peck A.I., Pelan S., Phelps K., Phillimore B.J., Plumb R., Rajan J.,
RA   Raymond C., Rouse G., Saenphimmachak C., Sehra H.K., Sheridan E.,
RA   Shownkeen R., Sims S., Skuce C.D., Smith M., Steward C.,
RA   Subramanian S., Sycamore N., Tracey A., Tromans A., Van Helmond Z.,
RA   Wall M., Wallis J.M., White S., Whitehead S.L., Wilkinson J.E.,
RA   Willey D.L., Williams H., Wilming L., Wray P.W., Wu Z., Coulson A.,
RA   Vaudin M., Sulston J.E., Durbin R.M., Hubbard T., Wooster R.,
RA   Dunham I., Carter N.P., McVean G., Ross M.T., Harrow J., Olson M.V.,
RA   Beck S., Rogers J., Bentley D.R.;
RT   "The DNA sequence and biological annotation of human chromosome 1.";
RL   Nature 441:315-321(2006).
RN   [6]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RA   Mural R.J., Istrail S., Sutton G.G., Florea L., Halpern A.L.,
RA   Mobarry C.M., Lippert R., Walenz B., Shatkay H., Dew I., Miller J.R.,
RA   Flanigan M.J., Edwards N.J., Bolanos R., Fasulo D., Halldorsson B.V.,
RA   Hannenhalli S., Turner R., Yooseph S., Lu F., Nusskern D.R.,
RA   Shue B.C., Zheng X.H., Zhong F., Delcher A.L., Huson D.H.,
RA   Kravitz S.A., Mouchard L., Reinert K., Remington K.A., Clark A.G.,
RA   Waterman M.S., Eichler E.E., Adams M.D., Hunkapiller M.W., Myers E.W.,
RA   Venter J.C.;
RL   Submitted (JUL-2005) to the EMBL/GenBank/DDBJ databases.
RN   [7]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RC   TISSUE=Colon, and Lung;
RX   PubMed=15489334; DOI=10.1101/gr.2596504;
RG   The MGC Project Team;
RT   "The status, quality, and expansion of the NIH full-length cDNA
RT   project: the Mammalian Gene Collection (MGC).";
RL   Genome Res. 14:2121-2127(2004).
RN   [8]
RP   PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-343; SER-350 AND
RP   SER-362, AND MASS SPECTROMETRY.
RC   TISSUE=Cervix carcinoma;
RX   PubMed=18669648; DOI=10.1073/pnas.0805139105;
RA   Dephoure N., Zhou C., Villen J., Beausoleil S.A., Bakalarski C.E.,
RA   Elledge S.J., Gygi S.P.;
RT   "A quantitative atlas of mitotic phosphorylation.";
RL   Proc. Natl. Acad. Sci. U.S.A. 105:10762-10767(2008).
RN   [9]
RP   PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-362, AND MASS
RP   SPECTROMETRY.
RC   TISSUE=Leukemic T-cell;
RX   PubMed=19690332; DOI=10.1126/scisignal.2000007;
RA   Mayya V., Lundgren D.H., Hwang S.-I., Rezaul K., Wu L., Eng J.K.,
RA   Rodionov V., Han D.K.;
RT   "Quantitative phosphoproteomic analysis of T cell receptor signaling
RT   reveals system-wide modulation of protein-protein interactions.";
RL   Sci. Signal. 2:RA46-RA46(2009).
CC   -!- SIMILARITY: Belongs to the SH3BP5 family.
CC   -!- SEQUENCE CAUTION:
CC       Sequence=BAB21811.1; Type=Erroneous initiation;
CC   -----------------------------------------------------------------------
CC   Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms
CC   Distributed under the Creative Commons Attribution-NoDerivs License
CC   -----------------------------------------------------------------------
DR   EMBL; AB051507; BAB21811.1; ALT_INIT; mRNA.
DR   EMBL; AL136569; CAB66504.1; -; mRNA.
DR   EMBL; AY358453; AAQ88818.1; -; mRNA.
DR   EMBL; AK056382; BAB71171.1; -; mRNA.
DR   EMBL; AL732583; CAI18798.1; -; Genomic_DNA.
DR   EMBL; CH471257; EAW57534.1; -; Genomic_DNA.
DR   EMBL; BC010871; AAH10871.1; -; mRNA.
DR   EMBL; BC017254; AAH17254.1; -; mRNA.
DR   IPI; IPI00028359; -.
DR   RefSeq; NP_085148.1; NM_030645.1.
DR   UniGene; Hs.298573; -.
DR   ProteinModelPortal; Q7L8J4; -.
DR   IntAct; Q7L8J4; 2.
DR   MINT; MINT-1688351; -.
DR   PhosphoSite; Q7L8J4; -.
DR   DMDM; 74749902; -.
DR   PRIDE; Q7L8J4; -.
DR   Ensembl; ENST00000366472; ENSP00000355428; ENSG00000175137.
DR   GeneID; 80851; -.
DR   KEGG; hsa:80851; -.
DR   UCSC; uc001iev.1; human.
DR   CTD; 80851; -.
DR   GeneCards; GC01M249104; -.
DR   H-InvDB; HIX0160026; -.
DR   HGNC; HGNC:29360; SH3BP5L.
DR   HPA; HPA038068; -.
DR   neXtProt; NX_Q7L8J4; -.
DR   PharmGKB; PA142670923; -.
DR   eggNOG; NOG263345; -.
DR   GeneTree; ENSGT00390000018500; -.
DR   HOGENOM; HOG000190360; -.
DR   HOVERGEN; HBG057307; -.
DR   InParanoid; Q7L8J4; -.
DR   OMA; GVRGGRH; -.
DR   OrthoDB; EOG4PZJ78; -.
DR   GenomeRNAi; 80851; -.
DR   NextBio; 71284; -.
DR   ArrayExpress; Q7L8J4; -.
DR   Bgee; Q7L8J4; -.
DR   CleanEx; HS_SH3BP5L; -.
DR   Genevestigator; Q7L8J4; -.
DR   InterPro; IPR007940; SH3-bd_5.
DR   PANTHER; PTHR19423; SH3_bd_5; 1.
DR   Pfam; PF05276; SH3BP5; 1.
PE   1: Evidence at protein level;
KW   Coiled coil; Complete proteome; Phosphoprotein; Reference proteome.
FT   CHAIN         1    393       SH3 domain-binding protein 5-like.
FT                                /FTId=PRO_0000317508.
FT   COILED       59    140       Potential.
FT   COILED      169    272       Potential.
FT   COMPBIAS     37     40       Poly-Gly.
FT   COMPBIAS     41     44       Poly-Ser.
FT   COMPBIAS     52     55       Poly-Glu.
FT   MOD_RES     343    343       Phosphoserine.
FT   MOD_RES     350    350       Phosphoserine.
FT   MOD_RES     362    362       Phosphoserine.
FT   MOD_RES     378    378       Phosphoserine (By similarity).
SQ   SEQUENCE   393 AA;  43499 MW;  3693431765F90FDC CRC64;
     MAELRQVPGG RETPQGELRP EVVEDEVPRS PVAEEPGGGG SSSSEAKLSP REEEELDPRI
     QEELEHLNQA SEEINQVELQ LDEARTTYRR ILQESARKLN TQGSHLGSCI EKARPYYEAR
     RLAKEAQQET QKAALRYERA VSMHNAAREM VFVAEQGVMA DKNRLDPTWQ EMLNHATCKV
     NEAEEERLRG EREHQRVTRL CQQAEARVQA LQKTLRRAIG KSRPYFELKA QFSQILEEHK
     AKVTELEQQV AQAKTRYSVA LRNLEQISEQ IHARRRGGLP PHPLGPRRSS PVGAEAGPED
     MEDGDSGIEG AEGAGLEEGS SLGPGPAPDT DTLSLLSLRT VASDLQKCDS VEHLRGLSDH
     VSLDGQELGT RSGGRRGSDG GARGGRHQRS VSL
//

Right now I m receiving following error: here kaavya.pl contian following program
;

Code:
#!/usr/bin/perl

use strict;
use warnings;

open(my $id_file, "<", "id_file"); # list of ids
my $in_record=0;
my @ids=<$id_file>;
close $id_file;
chomp(@ids);
my %id_check;
map {$_++} @id_check{@ids};
open(my $records, "<", "tmp.dat"); # records of the form above
my $head;
while(<$records>){
    $head=$_ if (/^ID/);
    if (/^AC/){
        $in_record=0;
        my @entries=$_=~/\s+([^;]+);/g;
        for my$id(@entries){
            $in_record=1 if ($id_check{$id});
        }
    print $head if $in_record;
    }
print if $in_record;
}

Code:
bash-3.2$ perl kaavya.pl
readline() on closed filehandle $records at kaavya.pl line 15.
bash-3.2$

# 6  
Old 09-18-2012
Hi again Manigrover,

Have you copied the records to tmp.dat? (or changed the names used in the open statements within the script?

You can also modify the script to report failure to open the files as follows
Code:
#!/usr/bin/perl

use strict;
use warnings;

open(my $id_file, '<', 'id_file')|| die "Could not open id_file\n\t$!";;
my $in_record=0;
my @ids=<$id_file>;
close $id_file;
chomp(@ids);
my %id_check;
map {$_++} @id_check{@ids};
open(my $records, '<', 'tmp.dat')|| die "Could not open tmp.dat\n\t$!";
my $head;
while(<$records>){
    $head=$_ if (/^ID/);
    if (/^AC/){
        $in_record=0;
        my @entries=$_=~/\s+([^;]+);/g;
        for my
$id(@entries){
            $in_record=1 if ($id_check{$id});
        }
    print $head if $in_record;
    }
print if $in_record;
}

# 7  
Old 09-18-2012
Thankyou

---------- Post updated at 04:21 AM ---------- Previous update was at 04:07 AM ----------

Hi Skrynesaver,

I am having a problem with the output. For every alternate record, the information is mising. An example of my output is given below. I do not have information for AC Q8IZP0. but it goes to the next record. This happens every alternate record.

Code:
ID   ABI1_HUMAN              Reviewed;         508 AA.
ID   ABI1_HUMAN              Reviewed;         508 AA.
AC   Q8IZP0; A9Z1Y6; B4DQ58; O15147; O76049; O95060; Q5T2R3; Q5T2R4;
ID   ABI3_HUMAN              Reviewed;         366 AA.
AC   Q9P2A4; C9IZN8; Q9H0P6;
DT   19-JUL-2004, integrated into UniProtKB/Swiss-Prot.
DT   18-MAY-2010, sequence version 2.
DT   05-SEP-2012, entry version 93.
DE   RecName: Full=ABI gene family member 3;
DE   AltName: Full=New molecule including SH3;
DE            Short=Nesh;
GN   Name=ABI3; Synonyms=NESH;

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Grep: Retrieve two strings from one file to find them anyone on line in another file

I am having trouble matching *two* strings from one file anywhere in a line of a second file, and could use some help getting this figured out. My preference would be to use grep for this because I would like to take advantage of its -A option. The latter is due to the fact that I would like both... (2 Replies)
Discussion started by: jvoot
2 Replies

2. Shell Programming and Scripting

How can I retrieve the matching records from data file mentioned?

XYZNA0000778800Z 16123000012300321000000008000000000000000 16124000012300322000000007000000000000000 17234000012300323000000005000000000000000 17345000012300324000000004000000000000000 17456000012300325000000003000000000000000 9 XYZNA0000778900Z 16123000012300321000000008000000000000000... (8 Replies)
Discussion started by: later_troy
8 Replies

3. Shell Programming and Scripting

Comparing Data file with Crtl file

Hi, I need to compare a file with its contents matching to that of another file(filename , received date and record count). Lets say has File A original data Ex - 1,abc,1234 2,bcd,4567 3,cde,8901 and File B has details of File A Ex- FILEA.TXT|06/17|2010|3 (filename)|(received... (18 Replies)
Discussion started by: Prashanth B
18 Replies

4. UNIX for Dummies Questions & Answers

Mapping a data in a file and delete line in source file if data does not exist.

Hi Guys, Please help me with my problem here: I have a source file: 1212 23232 343434 ASAS1 4 3212 23232 343434 ASAS2 4 3234 23232 343434 QWQW1 4 1134 23232 343434 QWQW2 4 3212 23232 343434 QWQW3 4 and a mapping... (4 Replies)
Discussion started by: kokoro
4 Replies

5. UNIX for Dummies Questions & Answers

Hot to retrieve *.sql file names which we refer in .sh file.

Hi Guys, How to retrieve/get *.sql file names which we refer in all *.sh files. Can any one help me on this. Thanks, Kolipaka (3 Replies)
Discussion started by: lakshmanrk811
3 Replies

6. UNIX for Advanced & Expert Users

Retrieve data and redirect to a file

How to write a shell script to retrieve datas from database after that this database are redirect to a excell sheet and then i got a mail that gives details about the database with the column name and data.. I m using oracle 9i... Thanks, Anup Das (2 Replies)
Discussion started by: anupdas
2 Replies

7. Programming

to find header in Mp3 file and retrieve data

hi all, In an mp3 file , data is arranged in sequence of header and data ,how to retrieve data between two headers. Is the data between two headers fixed? because as per theory it says 1152 samples will be there , but dont knw how many bits one sample correspond to? it would help if any c... (2 Replies)
Discussion started by: shashi
2 Replies

8. Shell Programming and Scripting

Comparing data inside file

Hi Everyone, I will try to explain my question please forgive my english here. I am looking for shell script or command that can compare data in the files. I have 50 files in one directory test1 test2 test3 ....so on. I want to compare data in each files with each other and output each... (4 Replies)
Discussion started by: email-lalit
4 Replies

9. Shell Programming and Scripting

Retrieve data from a file

Hello guys I want to retrieve two data from a file, like this: bash-2.03$ cat numtest 123456 123457 bash-2.03$ more ./test_num #!/bin/bash num1= num2= cnt=1 while read x do num${cnt}=$x cnt=$(($cnt+1)) done <$1 echo $num1 "\n" $num2 But when i executed this script, error... (2 Replies)
Discussion started by: tpltp
2 Replies

10. Shell Programming and Scripting

Comparing data in file with values in table

Hi, I want to calculate the number of pipe delimiters in a file for all lines seperately. For eg:i have a file Project.txt Mohit|chawla|123|678 File1|File2|345|767|678 And my file contains many lines like this it shd give me the output as 4 5 or give me the output for all the... (0 Replies)
Discussion started by: Mohit623
0 Replies
Login or Register to Ask a Question