Extract particular lines from a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract particular lines from a file
# 1  
Old 09-19-2012
Extract particular lines from a file

Hi all,

I have a file with many records with information as given below
Code:
ID   A16L2_HUMAN             Reviewed;         619 AA.
AC   Q8NAA4; A5PL30; B2RPK5; Q658V4; Q6PID3; Q8NBG0;
DT   20-MAY-2008, integrated into UniProtKB/Swiss-Prot.
DT   20-MAY-2008, sequence version 2.
DT   05-SEP-2012, entry version 77.
DE   RecName: Full=Autophagy-related protein 16-2;
DE   AltName: Full=APG16-like 2;
DE   AltName: Full=WD repeat-containing protein 80;
GN   Name=ATG16L2; Synonyms=WDR80;
OS   Homo sapiens (Human).
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
OC   Catarrhini; Hominidae; Homo.
OX   NCBI_TaxID=9606;
RN   [1]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORMS 1 AND 2).
RC   TISSUE=Astrocyte, and Spleen;
RX   PubMed=14702039; DOI=10.1038/ng1285;
RA   Ota T., Suzuki Y., Nishikawa T., Otsuki T., Sugiyama T., Irie R.,
RA   Wakamatsu A., Hayashi K., Sato H., Nagai K., Kimura K., Makita H.,
RA   Sekine M., Obayashi M., Nishi T., Shibahara T., Tanaka T., Ishii S.,
RA   Yamamoto J., Saito K., Kawai Y., Isono Y., Nakamura Y., Nagahari K.,
RA   Murakami K., Yasuda T., Iwayanagi T., Wagatsuma M., Shiratori A.,
RA   Sudo H., Hosoiri T., Kaku Y., Kodaira H., Kondo H., Sugawara M.,
RA   Takahashi M., Kanda K., Yokoi T., Furuya T., Kikkawa E., Omura Y.,
RA   Abe K., Kamihara K., Katsuta N., Sato K., Tanikawa M., Yamazaki M.,
RA   Ninomiya K., Ishibashi T., Yamashita H., Murakawa K., Fujimori K.,
RA   Tanai H., Kimata M., Watanabe M., Hiraoka S., Chiba Y., Ishida S.,
RA   Ono Y., Takiguchi S., Watanabe S., Yosida M., Hotuta T., Kusano J.,
RA   Kanehori K., Takahashi-Fujii A., Hara H., Tanase T.-O., Nomura Y.,
RA   Togiya S., Komai F., Hara R., Takeuchi K., Arita M., Imose N.,
RA   Musashino K., Yuuki H., Oshima A., Sasaki N., Aotsuka S.,
RA   Yoshikawa Y., Matsunawa H., Ichihara T., Shiohata N., Sano S.,
RA   Moriya S., Momiyama H., Satoh N., Takami S., Terashima Y., Suzuki O.,
RA   Nakagawa S., Senoh A., Mizoguchi H., Goto Y., Shimizu F., Wakebe H.,
RA   Hishigaki H., Watanabe T., Sugiyama A., Takemoto M., Kawakami B.,
RA   Yamazaki M., Watanabe K., Kumagai A., Itakura S., Fukuzumi Y.,
RA   Fujimori Y., Komiyama M., Tashiro H., Tanigami A., Fujiwara T.,
RA   Ono T., Yamada K., Fujii Y., Ozaki K., Hirao M., Ohmori Y.,
RA   Kawabata A., Hikiji T., Kobatake N., Inagaki H., Ikema Y., Okamoto S.,
RA   Okitani R., Kawakami T., Noguchi S., Itoh T., Shigeta K., Senba T.,
RA   Matsumura K., Nakajima Y., Mizuno T., Morinaga M., Sasaki M.,
RA   Togashi T., Oyama M., Hata H., Watanabe M., Komatsu T.,
RA   Mizushima-Sugano J., Satoh T., Shirai Y., Takahashi Y., Nakagawa K.,
RA   Okumura K., Nagase T., Nomura N., Kikuchi H., Masuho Y., Yamashita R.,
RA   Nakai K., Yada T., Nakamura Y., Ohara O., Isogai T., Sugano S.;
RT   "Complete sequencing and characterization of 21,243 full-length human
RT   cDNAs.";
RL   Nat. Genet. 36:40-45(2004).
RN   [2]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RA   Mural R.J., Istrail S., Sutton G.G., Florea L., Halpern A.L.,
RA   Mobarry C.M., Lippert R., Walenz B., Shatkay H., Dew I., Miller J.R.,
RA   Flanigan M.J., Edwards N.J., Bolanos R., Fasulo D., Halldorsson B.V.,
RA   Hannenhalli S., Turner R., Yooseph S., Lu F., Nusskern D.R.,
RA   Shue B.C., Zheng X.H., Zhong F., Delcher A.L., Huson D.H.,
RA   Kravitz S.A., Mouchard L., Reinert K., Remington K.A., Clark A.G.,
RA   Waterman M.S., Eichler E.E., Adams M.D., Hunkapiller M.W., Myers E.W.,
RA   Venter J.C.;
RL   Submitted (JUL-2005) to the EMBL/GenBank/DDBJ databases.
RN   [3]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORMS 1 AND 3).
RC   TISSUE=Eye;
RX   PubMed=15489334; DOI=10.1101/gr.2596504;
RG   The MGC Project Team;
RT   "The status, quality, and expansion of the NIH full-length cDNA
RT   project: the Mammalian Gene Collection (MGC).";
RL   Genome Res. 14:2121-2127(2004).
RN   [4]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] OF 297-619.
RC   TISSUE=Stomach;
RX   PubMed=17974005; DOI=10.1186/1471-2164-8-399;
RA   Bechtel S., Rosenfelder H., Duda A., Schmidt C.P., Ernst U.,
RA   Wellenreuther R., Mehrle A., Schuster C., Bahr A., Bloecker H.,
RA   Heubner D., Hoerlein A., Michel G., Wedler H., Koehrer K.,
RA   Ottenwaelder B., Poustka A., Wiemann S., Schupp I.;
RT   "The full-ORF clone resource of the German cDNA consortium.";
RL   BMC Genomics 8:399-399(2007).
CC   -!- FUNCTION: May play a role in autophagy (By similarity).
CC   -!- SUBCELLULAR LOCATION: Cytoplasm (By similarity).
CC   -!- ALTERNATIVE PRODUCTS:
CC       Event=Alternative splicing; Named isoforms=3;
CC       Name=1;
CC         IsoId=Q8NAA4-1; Sequence=Displayed;
CC       Name=2;
CC         IsoId=Q8NAA4-2; Sequence=VSP_033904;
CC         Note=No experimental confirmation available;
CC       Name=3;
CC         IsoId=Q8NAA4-3; Sequence=VSP_033905;
CC         Note=No experimental confirmation available;
CC   -!- SIMILARITY: Belongs to the WD repeat ATG16 family.
CC   -!- SIMILARITY: Contains 7 WD repeats.
CC   -----------------------------------------------------------------------
CC   Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms
CC   Distributed under the Creative Commons Attribution-NoDerivs License
CC   -----------------------------------------------------------------------
DR   EMBL; AK093017; BAC04021.1; -; mRNA.
DR   EMBL; AK090597; BAC03485.1; -; mRNA.
DR   EMBL; CH471076; EAW74876.1; -; Genomic_DNA.
DR   EMBL; CH471076; EAW74874.1; -; Genomic_DNA.
DR   EMBL; BC036713; AAH36713.1; -; mRNA.
DR   EMBL; BC137489; AAI37490.1; -; mRNA.
DR   EMBL; BC137490; AAI37491.1; -; mRNA.
DR   EMBL; BC142718; AAI42719.1; -; mRNA.
DR   EMBL; BC146660; AAI46661.1; -; mRNA.
DR   EMBL; AL832974; CAH56355.1; -; mRNA.
DR   IPI; IPI00300536; -.
DR   IPI; IPI00892538; -.
DR   IPI; IPI00892678; -.
DR   RefSeq; NP_203746.1; NM_033388.1.
DR   UniGene; Hs.653186; -.
DR   ProteinModelPortal; Q8NAA4; -.
DR   SMR; Q8NAA4; 327-619.
DR   IntAct; Q8NAA4; 3.
DR   PhosphoSite; Q8NAA4; -.
DR   DMDM; 189027648; -.
DR   PRIDE; Q8NAA4; -.
DR   Ensembl; ENST00000321297; ENSP00000326340; ENSG00000168010.
DR   GeneID; 89849; -.
DR   KEGG; hsa:89849; -.
DR   UCSC; uc001otd.3; human.
DR   CTD; 89849; -.
DR   GeneCards; GC11P072525; -.
DR   H-InvDB; HIX0009913; -.
DR   HGNC; HGNC:25464; ATG16L2.
DR   neXtProt; NX_Q8NAA4; -.
DR   PharmGKB; PA142672575; -.
DR   eggNOG; COG2319; -.
DR   GeneTree; ENSGT00670000097918; -.
DR   HOGENOM; HOG000112569; -.
DR   HOVERGEN; HBG050534; -.
DR   InParanoid; Q8NAA4; -.
DR   OrthoDB; EOG44F68R; -.
DR   PhylomeDB; Q8NAA4; -.
DR   GenomeRNAi; 89849; -.
DR   NextBio; 76342; -.
DR   ArrayExpress; Q8NAA4; -.
DR   Bgee; Q8NAA4; -.
DR   CleanEx; HS_ATG16L2; -.
DR   Genevestigator; Q8NAA4; -.
DR   GO; GO:0005737; C:cytoplasm; IEA:UniProtKB-SubCell.
DR   GO; GO:0006914; P:autophagy; IEA:UniProtKB-KW.
DR   GO; GO:0015031; P:protein transport; IEA:UniProtKB-KW.
DR   Gene3D; G3DSA:2.130.10.10; WD40/YVTN_repeat-like; 1.
DR   InterPro; IPR013923; Autophagy-rel_prot_16.
DR   InterPro; IPR020472; G-protein_beta_WD-40_rep.
DR   InterPro; IPR015943; WD40/YVTN_repeat-like_dom.
DR   InterPro; IPR001680; WD40_repeat.
DR   InterPro; IPR019775; WD40_repeat_CS.
DR   InterPro; IPR017986; WD40_repeat_dom.
DR   Pfam; PF08614; ATG16; 1.
DR   Pfam; PF00400; WD40; 4.
DR   PRINTS; PR00320; GPROTEINBRPT.
DR   SMART; SM00320; WD40; 7.
DR   SUPFAM; SSF50978; WD40_like; 1.
DR   PROSITE; PS00678; WD_REPEATS_1; 3.
DR   PROSITE; PS50082; WD_REPEATS_2; 4.
DR   PROSITE; PS50294; WD_REPEATS_REGION; 1.
PE   2: Evidence at transcript level;
KW   Alternative splicing; Autophagy; Coiled coil; Complete proteome;
KW   Cytoplasm; Polymorphism; Protein transport; Reference proteome;
KW   Repeat; Transport; WD repeat.
FT   CHAIN         1    619       Autophagy-related protein 16-2.
FT                                /FTId=PRO_0000337110.
FT   REPEAT      334    373       WD 1.
FT   REPEAT      378    417       WD 2.
FT   REPEAT      420    454       WD 3.
FT   REPEAT      455    498       WD 4.
FT   REPEAT      500    539       WD 5.
FT   REPEAT      546    585       WD 6.
FT   REPEAT      589    619       WD 7.
FT   COILED      116    227       Potential.
FT   VAR_SEQ       1    106       Missing (in isoform 2).
FT                                /FTId=VSP_033904.
FT   VAR_SEQ     276    619       Missing (in isoform 3).
FT                                /FTId=VSP_033905.
FT   VARIANT     220    220       R -> W (in dbSNP:rs11235604).
FT                                /FTId=VAR_043605.
FT   CONFLICT    124    124       E -> G (in Ref. 1; BAC03485).
FT   CONFLICT    256    256       A -> V (in Ref. 3; AAH36713).
FT   CONFLICT    343    343       R -> G (in Ref. 1; BAC03485).
FT   CONFLICT    492    492       G -> E (in Ref. 1; BAC04021).
FT   CONFLICT    508    508       L -> R (in Ref. 4; CAH56355).
SQ   SEQUENCE   619 AA;  68998 MW;  7F2962AA3C6BB850 CRC64;
     MAGPGVPGAP AARWKRHIVR QLRLRDRTQK ALFLELVPAY NHLLEKAELL DKFSKKLQPE
     PNSVTPTTHQ GPWEESELDS DQVPSLVALR VKWQEEEEGL RLVCGEMAYQ VVEKGAALGT
     LESELQQRQS RLAALEARVA QLREARAQQA QQVEEWRAQN AVQRAAYEAL RAHVGLREAA
     LRRLQEEARD LLERLVQRKA RAAAERNLRN ERRERAKQAR VSQELKKAAK RTVSISEGPD
     TLGDGMRERR ETLALAPEPE PLEKEACEKW KRPFRSASAT SLTLSHCVDV VKGLLDFKKR
     RGHSIGGAPE QRYQIIPVCV AARLPTRAQD VLDAHLSEVN AVRFGPNSSL LATGGADRLI
     HLWNVVGSRL EANQTLEGAG GSITSVDFDP SGYQVLAATY NQAAQLWKVG EAQSKETLSG
     HKDKVTAAKF KLTRHQAVTG SRDRTVKEWD LGRAYCSRTI NVLSYCNDVV CGDHIIISGH
     NDQKIRFWDS RGPHCTQVIP VQGRVTSLSL SHDQLHLLSC SRDNTLKVID LRVSNIRQVF
     RADGFKCGSD WTKAVFSPDR SYALAGSCDG ALYIWDVDTG KLESRLQGPH CAAVNAVAWC
     YSGSHMVSVD QGRKVVLWQ
//

For every record in the file, I want lines
Code:
AC   Q8NAA4; A5PL30; B2RPK5; Q658V4; Q6PID3; Q8NBG0;
FT   COILED      116    227       Potential.
FT   VARIANT     220    220       R -> W (in dbSNP:rs11235604)

there may be one or more lines of FT COILED and FT VARIANT for a record. And every record ends with "//".

Thanks in advance
# 2  
Old 09-19-2012
Code:
skrynesaver@busbox ~/$ egrep '^(AC|FT *(COILED|VARIANT))' tmp.dat
AC   Q8NAA4; A5PL30; B2RPK5; Q658V4; Q6PID3; Q8NBG0;
FT   COILED      116    227       Potential.
FT   VARIANT     220    220       R -> W (in dbSNP:rs11235604).

You should really be supplying some indication that you are trying to resolve these issues yourself.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Extract lines that have entries in VI file

Dears experts i have UNIX file that contain 4 million lines , i need to extract all lines that have entiries saved in VI file , i have below comand but it takes tooooo long time : for i in `cat file1.csv`; do cat dump | grep -i $i >> file2.csv; done where : file1.csv = VI file that... (12 Replies)
Discussion started by: is2_egypt
12 Replies

2. Shell Programming and Scripting

Extract lines from a file

Hi all; Here is my file which contains a list of files (recent versions of files are in red). This file is dynamic, files versions can change at any time (versions can increment) filename ------------------------------------------------------- ... (8 Replies)
Discussion started by: chercheur111
8 Replies

3. Shell Programming and Scripting

Want to extract certain lines from big file

Hi All, I am trying to get some lines from a file i did it with while-do-loop. since the files are huge it is taking much time. now i want to make it faster. The requirement is the file will be having 1 million lines. The format is like below. ##transaction, , , ,blah, blah... (38 Replies)
Discussion started by: mad man
38 Replies

4. Shell Programming and Scripting

How to extract certain lines from a file?

Hi guys I have a several thousands line file in the following format: n817 -------------------------------------------------- n842 -------------------------------------------------- n877 -------------------------------------------------- n513 /bb/data/rmt2db.lrl:JBSKDB 31915 75... (4 Replies)
Discussion started by: aoussenko
4 Replies

5. Shell Programming and Scripting

Extract some lines from one file and add those lines to current file

hi, i have two files. file1.sh echo "unix" echo "linux" file2.sh echo "unix linux forums" now the output i need is $./file2.sh unix linux forums (3 Replies)
Discussion started by: snreddy_gopu
3 Replies

6. Shell Programming and Scripting

Extract the lines from input file

This is the sample input file b 05/Jul/2010:07:00:10 a 05/Jul/2010:06:00:10 b 05/Jul/2010:07:00:10 c 05/Jul/2010:07:10:10 d 05/Jul/2010:08:00:10 e 05/Jul/2010:09:00:10 f 05/Jul/2010:10:00:10 h 05/Jul/2010:11:00:10 i 05/Jul/2010:12:00:10 j ... (9 Replies)
Discussion started by: sandy1028
9 Replies

7. Shell Programming and Scripting

extract particular lines from text file

I have two files file A which have a number in every row and file B which contains few hundred thousand rows with about 300 characters in each row (csv) What I need is to extract whole rows from B file (only these which numbers are indicated in A file) I also need to use cygwin. Any... (7 Replies)
Discussion started by: gunio
7 Replies

8. Shell Programming and Scripting

Extract lines from a file automatically. Please a Help

hello, hope you can help me: ive got a file called archivos The content or structure of this file is ./chu0/filechu ./chu1/filechu I extract each line from this file manually and redirect to a file, and it Works fine, so the command line is: awk ‘/chu0/ {print $0}' < archivos >... (8 Replies)
Discussion started by: alexcol
8 Replies

9. Shell Programming and Scripting

How to extract a sequence of n lines from a file

Hi I want to be able to extract a sequence of n lines from a file. ideas, commands and suggestions would be highly appreciated. Thanks (4 Replies)
Discussion started by: 0ktalmagik
4 Replies

10. Shell Programming and Scripting

how to extract a range of lines from a file

I am reading a file that contains over 5000 lines and I want to assign it to a shell variable array (which has a restriction of 1024 rows). I had an idea that if I could grab 1000 record hunks of the file, and pipe the records out, that I could perform a loop until I got to the end and process 1000... (5 Replies)
Discussion started by: beilstwh
5 Replies
Login or Register to Ask a Question