Sponsored Content
Top Forums UNIX for Dummies Questions & Answers finding keywords in many files using grep Post 302607279 by raghulrajan on Wednesday 14th of March 2012 04:36:28 AM
Old 03-14-2012
Hi
I never knew I could work like with an OS. I have recently migrated to linux.
I think they should start teaching Linux/unix seriously from middle school level.
Thanks for the reply.
I did not get the desired result
I will give you what I did below, I am not sure whether i did properly.
When i execute the command it shows >, I dont know the meaning of this

Code:
raghul@raghul-Studio-1749:~/TOOLS/hhblits/db/blits1$ head -2 *.hhr 
==> CONTIG07340_NA______CONTIG22498_1_1_53_645_196_.hhr <== 
Query         CONTIG07340|NA______CONTIG22498_1_1_53_645_196_ 
Match_columns 152 
 
==> ISOTIG00171_NA______CONTIG00242_1_4_182_1892_502_PLUS.hhr <== 
Query         ISOTIG00171|NA______CONTIG00242_1_4_182_1892_502_PLUS 
Match_columns 210 
 
==> ISOTIG00273_NA______CONTIG60455_1_1_66_494_142_.hhr <== 
Query         ISOTIG00273|NA______CONTIG60455_1_1_66_494_142_ 
Match_columns 84 
 
raghul@raghul-Studio-1749:~/TOOLS/hhblits/db/blits1$ ls -1d *\.hhr 2>/dev/null | while read *.hhr do head -2 "${*.hhr}" done >output.txt 
>

---------- Post updated at 03:36 AM ---------- Previous update was at 03:27 AM ----------

Hi @dagio
Thanks for the reply. But the output files are empty.
I have posted a message, I think that explains the nature of the task
raghul

Last edited by Corona688; 03-14-2012 at 12:10 PM.. Reason: Code tags for code, please.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Finding the executable files of a directory using Grep

Hi guys, Can you please help me print all the executable files of a directory(in this case /home) using grep? All i know is that this command should do it but it doesnt... ls -l ~ | grep -..x it shows me the following mesage grep: invalid option -- '.' Χρήση: grep ... ΥΠΟΔΕΙΓΜΑ ... (3 Replies)
Discussion started by: jimas13
3 Replies

2. Shell Programming and Scripting

Finding 50k Keywords in 3k files

Hi, I have a file with about 50k keywords. I have a requirement to scan about 3k files to identify which filename has which keyword i.e. an output like following: File1,Keyword1 File1,Keyword2 File3,Keyword1 ..... I have written a shell script which takes each of the 3k files, searches... (4 Replies)
Discussion started by: rjains
4 Replies

3. Shell Programming and Scripting

Finding/Grep on files with date and hour in the file name

Hi, I have a folder structure as follows, DATA -> 2012-01-01 -> 00 -> ABC_2012-01-03_00.txt -> 01 -> ABC_2012-01-03_01.txt -> 02 -> ABC_2012-01-03_02.txt ... -> 23 -> ABC_2012-01-03_02.txt -> 2012-01-02 -> 2012-01-03 So the dir DATA contains the above hierarchy, User input Start and... (6 Replies)
Discussion started by: mihirvora16
6 Replies

4. UNIX for Advanced & Expert Users

Finding/Grep on files with date and hour in the file name

Hi, I have a folder structure as follows, DATA -> 2012-01-01 -> 00 -> ABC_2012-01-03_00.txt -> 01 -> ABC_2012-01-03_01.txt -> 02 -> ABC_2012-01-03_02.txt ... -> 23 -> ABC_2012-01-03_02.txt -> 2012-01-02 ... (1 Reply)
Discussion started by: mihirvora16
1 Replies

5. UNIX for Advanced & Expert Users

Need to search for keywords within files modified at a certain time

I have a huge list of files in an Unix directory (around 10000 files). I need to be able to search for a certain keyword only within files that are modified between certain date and time, say for e.g 2012-08-20 12:30 to 2012-08-20 12:40 Can someone let me know what would be the fastest way... (10 Replies)
Discussion started by: virtual123
10 Replies

6. Shell Programming and Scripting

Grep Keywords one by one

Hi I am trying to determine number of lines having a specific keyword. So for that I am using below query: grep -i 'keyword1' filename|wc -l This give me number of lines. Perfect for me. However now the requirement is I have multiple keywords together... and I have to find number of... (3 Replies)
Discussion started by: dashing201
3 Replies

7. Shell Programming and Scripting

How to grep keywords?

I have below text file only with one line: vi test.txt This is the first test from a1.loa1 a1v1, b2.lob2, "c3.loc3" c3b1, loc4 but not from mot3 and second test from a5.loa5 Below should be the output that i want: a1.loa1 b2.lob2 c3.loc3 loc4 a5.loa5 alv1 and c3b1 should be... (3 Replies)
Discussion started by: khchong
3 Replies

8. Shell Programming and Scripting

Search files in directory for keywords using bash

I have ~100 text files in a directory that I am trying to parse and output to a new file. I am looking for the words chr,start,stop,ref,alt in each of the files. Those fields should appear somewhere in those files. The first two fields of each new set of rows is also printed. Since this is on a... (7 Replies)
Discussion started by: cmccabe
7 Replies

9. UNIX for Dummies Questions & Answers

Find keywords in multiple log files

The Problem that I am having is when the code ran and populated the progflag.csv file, columns MEMSIZE, SECOND and SASEXE were blank. The next problems are the IF else statement isn't working and the email function isn't sending the progflag.csv attachment. a. What I want the program to do is to... (2 Replies)
Discussion started by: dellanicholson
2 Replies

10. Shell Programming and Scripting

Grep multiple keywords from a file

I have a script that will search for a keyword in all the log files. It work just fine. LOG_FILES={ "/Sandbox/logs/*" } for file in ${LOG_FILES}; do grep $1 $file done This only works for 1 keyword. What if I want to search for more then 1 keywords, say 4 or maybe even... (10 Replies)
Discussion started by: Loc
10 Replies
HHBLITS(1)							   User Commands							HHBLITS(1)

NAME
hhblits - fast homology detection method to iteratively search a HMM database SYNOPSIS
hhblits -i query [options] DESCRIPTION
HHblits version 2.0.15 (June 2012): HMM-HMM-based lightning-fast iterative sequence search HHblits is a sensitive, general-purpose, itera- tive sequence search tool that represents both query and database sequences by HMMs. You can search HHblits databases starting with a sin- gle query sequence, a multiple sequence alignment (MSA), or an HMM. HHblits prints out a ranked list of database HMMs/MSAs and can also generate an MSA by merging the significant database HMMs/MSAs onto the query MSA. Remmert M., Biegert A., Hauser A., and Soding J. HHblits: Lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods 9:173-175 (2011) (C) Johannes Soeding, Michael Remmert, Andreas Biegert, Andreas Hauser -i <file> input/query: single sequence or multiple sequence alignment (MSA) in a3m, a2m, or FASTA format, or HMM in hhm format <file> may be 'stdin' or 'stdout' throughout. OPTIONS
-d <name> database name (e.g. uniprot20_29Feb2012) (default=) -n [1,8] number of iterations (default=2) -e [0,1] E-value cutoff for inclusion in result alignment (def=0.001) Input alignment format: -M a2m use A2M/A3M (default): upper case = Match; lower case = Insert; ' -' = Delete; '.' = gaps aligned to inserts (may be omitted) -M first use FASTA: columns with residue in 1st sequence are match states -M [0,100] use FASTA: columns with fewer than X% gaps are match states Output options: -o <file> write results in standard format to file (default=<infile.hhr>) -oa3m <file> write result MSA with significant matches in a3m format -opsi <file> write result MSA of significant matches in PSI-BLAST format -oa2m <file> write result MSA of significant matches in a2m format -ohhm <file> write HHM file for result MSA of significant matches -oalis <name> write MSAs in A3M format after each iteration -Ofas <file> write pairwise alignments of significant matches in FASTA format Analogous for output in a3m, a2m, and psi format (e.g. -Oa3m) -qhhm <file> write query input HHM file of last iteration (default=off) -seq <int> max. number of query/template sequences displayed (default=1) -aliw <int> number of columns per line in alignment list (default=80) -p [0,100] minimum probability in summary and alignment list (default=20) -E [0,inf[ maximum E-value in summary and alignment list (default=1E+06) -Z <int> maximum number of lines in summary hit list (default=500) -z <int> minimum number of lines in summary hit list (default=10) -B <int> maximum number of alignments in alignment list (default=500) -b <int> minimum number of alignments in alignment list (default=10) Prefilter options -noprefilt disable all filter steps -noaddfilter disable all filter steps (except for fast prefiltering) -nodbfilter disable additional filtering of prefiltered HMMs -noblockfilter search complete matrix in Viterbi -maxfilt max number of hits allowed to pass 2nd prefilter (default=20000) Filter options applied to query MSA, database MSAs, and result MSA -all show all sequences in result MSA; do not filter result MSA -id [0,100] maximum pairwise sequence identity (def=90) -diff [0,inf[ filter MSAs by selecting most diverse set of sequences, keeping at least this many seqs in each MSA block of length 50 (def=1000) -cov [0,100] minimum coverage with master sequence (%) (def=0) -qid [0,100] minimum sequence identity with master sequence (%) (def=0) -qsc [0,100] minimum score per column with master sequence (default=-20.0) -neff [1,inf] target diversity of multiple sequence alignment (default=off) HMM-HMM alignment options: -norealign do NOT realign displayed hits with MAC algorithm (def=realign) -mact [0,1[ posterior probability threshold for MAC re-alignment (def=0.350) Parameter controls alignment greediness: 0:global >0.1:local -glob/-loc use global/local alignment mode for searching/ranking (def=local) -realign_max <int> realign max. <int> hits (default=1000) -alt <int> show up to this many significant alternative alignments(def=2) -premerge <int> merge <int> hits to query MSA before aligning remaining hits (def=3) -shift [-1,1] profile-profile score offset (def=-0.03) -ssm {0,..,4} 0: no ss scoring 1,2: ss scoring after or during alignment [default=2] 3,4: ss scoring after or during alignment, predicted vs. predicted -ssw [0,1] weight of ss score (def=0.11) Gap cost options: -gapb [0,inf[ Transition pseudocount admixture (def=1.00) -gapd [0,inf[ Transition pseudocount admixture for open gap (default=0.15) -gape [0,1.5] Transition pseudocount admixture for extend gap (def=1.00) -gapf ]0,inf] factor to increase/reduce gap open penalty for deletes (def=0.60) -gapg ]0,inf] factor to increase/reduce gap open penalty for inserts (def=0.60) -gaph ]0,inf] factor to increase/reduce gap extend penalty for deletes(def=0.60) -gapi ]0,inf] factor to increase/reduce gap extend penalty for inserts(def=0.60) -egq [0,inf[ penalty (bits) for end gaps aligned to query residues (def=0.00) -egt [0,inf[ penalty (bits) for end gaps aligned to template residues (def=0.00) Pseudocount (pc) options: -pcm {0,..,2} position dependence of pc admixture 'tau' (pc mode, default=2) 0: no pseudo counts: tau = 0 1: constant tau = a 2: diversity-dependent: tau = a/(1 + ((Neff[i]-1)/b)^c) (Neff[i]: number of effective seqs in local MSA around column i) -pca [0,1] overall pseudocount admixture (def=1.0) -pcb [1,inf[ Neff threshold value for -pcm 2 (def=1.5) -pcc [0,3] extinction exponent c for -pcm 2 (def=1.0) -pre_pca [0,1] PREFILTER pseudocount admixture (def=0.8) -pre_pcb [1,inf[ PREFILTER threshold for Neff (def=1.8) Context-specific pseudo-counts: -nocontxt use substitution-matrix instead of context-specific pseudocounts -contxt <file> context file for computing context-specific pseudocounts (default=/usr/lib/hhsuite/data/context_data.lib) -cslib <file> column state file for fast database prefiltering (default=/usr/lib/hhsuite/data/cs219.lib) Predict secondary structure -addss add 2ndary structure predicted with PSIPRED to result MSA -psipred <dir> directory with PSIPRED executables (default=) -psipred_data <dir> directory with PSIPRED data (default=) Other options: -v <int> verbose mode: 0:no screen output 1:only warings 2: verbose (def=2) -neffmax ]1,20] skip further search iterations when diversity Neff of query MSA becomes larger than neffmax (default=10.0) -cpu <int> number of CPUs to use (for shared memory SMPs) (default=2) -scores <file> write scores for all pairwise comparisions to file -atab <file> write all alignments in tabular layout to file -maxres <int> max number of HMM columns (def=15002) -maxmem [1,inf[ max available memory in GB (def=3.0) EXAMPLES
hhblits -i query.fas -o query.hhr -d <database-basepath> hhblits -i query.fas -o query.hhr -oa3m query.a3m -n 1 -d <database-basepath> Download databases from ftp://toolkit.genzentrum.lmu.de/HH-suite/databases/ . hhblits 2.0.15 June 2012 HHBLITS(1)
All times are GMT -4. The time now is 07:36 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy