Sponsored Content
Top Forums Programming Run sed and awk in multiple files in adirectory Post 303018347 by MadeInGermany on Monday 4th of June 2018 11:15:31 AM
Old 06-04-2018
Proposal: a bash script with a for loop
Code:
#!/bin/bash
fmask="ERR598955_orfm.fna"
for f in $fmask.*
do
  [ -f "$f" ] || continue # ensure this is an existing file
  ext=${f#$fmask} # strip off the leading fmask
  pattern=$(awk '{print $1; exit}' "$f") # pattern is the 1st word in the 1st line
  sed -n '1,/^'"${pattern}"'$/ d ; /^>/,$ p' "$f" > "first_out.$ext"
done

You might need to work on it...
This User Gave Thanks to MadeInGermany For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

when I try to run rm on multiple files I have problem to delete files with space

Hello when I try to run rm on multiple files I have problem to delete files with space. I have this command : find . -name "*.cmd" | xargs \rm -f it doing the work fine but when it comes across files with spaces like : "my foo file.cmd" it refuse to delete it why? (1 Reply)
Discussion started by: umen
1 Replies

2. Shell Programming and Scripting

How to run multiple awk files

I'm trying some thing like this. But not working It worked for bash files Now I want some thing like that along with multiple input files by redirecting their outputs as inputs of next command like below Could you guyz p0lz help me on this #!/usr/bin/awk -f BEGIN { } script1a.awk... (2 Replies)
Discussion started by: repinementer
2 Replies

3. Shell Programming and Scripting

Split line to multiple files Awk/Sed/Shell Script help

Hi, I need help to split lines from a file into multiple files. my input look like this: 13 23 45 45 6 7 33 44 55 66 7 13 34 5 6 7 87 45 7 8 8 9 13 44 55 66 77 8 44 66 88 99 6 I want to split every 3 lines from this file to be written to individual files. (3 Replies)
Discussion started by: saint2006
3 Replies

4. UNIX for Dummies Questions & Answers

best method of replacing multiple strings in multiple files - sed or awk? most simple preferred :)

Hi guys, say I have a few files in a directory (58 text files or somthing) each one contains mulitple strings that I wish to replace with other strings so in these 58 files I'm looking for say the following strings: JAM (replace with BUTTER) BREAD (replace with CRACKER) SCOOP (replace... (19 Replies)
Discussion started by: rich@ardz
19 Replies

5. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is... (3 Replies)
Discussion started by: Liverpaul09
3 Replies

6. UNIX for Dummies Questions & Answers

renaming multiple files using sed or awk one liner

hi, I have a directory "test" under which there are 3 files a.txt,b.txt and c.txt. I need to rename those files to a.pl,b.pl and c.pl respectively. is it possible to achieve this in a sed or awk one liner? i have searched but many of them are scripts. I need to do this in a one liner. I... (2 Replies)
Discussion started by: pandeesh
2 Replies

7. Shell Programming and Scripting

Bash Scipting (New); Run multiple greps > multiple files

Hi everyone, I'm new to the forums, as you can probably tell... I'm also pretty new to scripting and writing any type of code. I needed to know exactly how I can grep for multiple strings, in files located in one directory, but I need each string to output to a separate file. So I'd... (19 Replies)
Discussion started by: LDHB2012
19 Replies

8. UNIX for Dummies Questions & Answers

Run one script on multiple files and print out multiple files.

How can I run the following command on multiple files and print out the corresponding multiple files. perl script.pl genome.gff 1.txt > 1.gff However, there are multiples files of 1.txt, from 1----100.txt Thank you so much. No duplicate posting! Continue here. (0 Replies)
Discussion started by: grace_shen
0 Replies

9. Shell Programming and Scripting

Run one script on multiple files and print out multiple files.

How can I Run one script on multiple files and print out multiple files. FOR EXAMPLE i want to run script.pl on 100 files named 1.txt ....100.txt under same directory and print out corresponding file 1.gff ....100.gff.THANKS (4 Replies)
Discussion started by: grace_shen
4 Replies

10. Shell Programming and Scripting

How to use a loop for multiple files in a folder to run awk command?

Dear folks I have two data set which there names are "final.map" and "1.geno" and look like this structures: final.map: gi|358485511|ref|NC_006088.3| 2044 gi|358485511|ref|NC_006088.3| 2048 gi|358485511|ref|NC_006088.3| 2187 gi|358485511|ref|NC_006088.3| 17654 ... (2 Replies)
Discussion started by: sajmar
2 Replies
PSI-CD-HIT-2D.PL(1)						   User Commands					       PSI-CD-HIT-2D.PL(1)

NAME
psi-cd-hit-2d.pl - runs similar algorithm like CD-HIT but using BLAST to calculate similarities in db1 or db2 format DESCRIPTION
Usage psi-cd-hit-2d [Options] Options -i in_dbname, required -o out_dbname, required -c clustering threshold (sequence identity), default 0.3 -ce clustering threshold (blast expect), default -1, it means by default it doesn't use expect threshold, but with positive value, the program cluster seqs if similarities meet either identity threshold or expect threshold -L coverage of shorter sequence ( aligned / full), default 0.0 -M coverage of longer sequence ( aligned / full), default 0.0 -R (1/0) use psi-blast profile? default 0 perform psi-blast / pdb-blast type search -G (1/0) use global identity? default 1 sequence identity calculated as total identical residues of local alignments / length of shorter seq if you prefer to use -G 0, it is suggested that you also use -L, such as -L 0.8, to prevent very short matches. -d length of description line in the .clstr file, default 30 if set to 0, it takes the fasta defline and stops at first space -l length_of_throw_away_sequences, default 10 -p profile search para, default "-a 2 -d nr80 -j 3 -F F -e 0.001 -b 500 -v 500" -bfdb profile database, default nr80 -s blast search para, default "-F F -e 0.000001 -b 100000 -v 100000" -be blast expect cutoff, default 0.000001 -b filename of list of hosts to run this program in parallel with ssh calls, you need provide a list of hosts -pbs No of jobs to send each time by PBS querying system you can not use both ssh and pbs at same time -k (1/0) keep blast raw output file, default 1 -rs steps of save restart file and clustering output, default 5000 everytime after process 5000 sequences, program write a restart file and current clustering information -restart restart file, readin a restart file if program crash, stoped, termitated, you can restart it by add a option "-restart sth.restart" -rf steps of re format blast database, default 200,000 if program clustered 200,000 seqs, it remove them from seq pool, and re format blast db to save time -local dir of local blast db, when run in parallel with ssh (not pbs), I can copy blast dbs to local drives on each node to save blast db reading time BUT, IT MAY NOT FASTER -J job, job_file, exe specific jobs like parse blast outonly DON'T use it, it is only used by this program itself -single files of ids those you known that they are singletons so I won't run them as queries -i2 second input database -blastn run blastn, default 0 -lo how long can seq in db2 > db1 in a cluster, default 0 means, that seq in db2 should <= seqs in db1 in a cluster ============================== by Weizhong Li, liwz@sdsc.edu ============================== If you find cd-hit useful, please kindly cite: "Clustering of highly homologous sequences to reduce thesize of large protein database", Weizhong Li, Lukasz Jaroszewski & Adam GodzikBioinformatics, (2001) 17:282-283 "Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences", Weizhong Li & Adam Godzik Bioinformatics, (2006) 22:1658-1659 psi-cd-hit-2d.pl 4.6-2012-04-25 April 2012 PSI-CD-HIT-2D.PL(1)
All times are GMT -4. The time now is 07:45 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy