Sponsored Content
Top Forums Shell Programming and Scripting Speeding up search and replace in a for loop Post 302657687 by pbluescript on Monday 18th of June 2012 08:52:16 AM
Old 06-18-2012
Quote:
Originally Posted by hergp
Wow, I did not expect ex to be so much more efficient than sed.

Do you have the total run-time in seconds for the three approaches too, pbluescript?
Sure. These were all submitted to an LSF queue, and each node has a minimum of 8 cores with 2.8Ghz+ Intel Xeon CPUs and 16GB RAM running RHEL 5.3. Here is some extra info about each job, with the actual run time listed:

My method: 195,389 seconds
Max Memory : 5 MB
Max Swap : 266 MB
Max Processes : 5
Max Threads : 6

hergp's method: 209,240 seconds
Max Memory : 2676 MB
Max Swap : 2870 MB
Max Processes : 4
Max Threads : 5

alister's method: 42,573 seconds
Max Memory : 121 MB
Max Swap : 392 MB
Max Processes : 5
Max Threads : 6

When actual run time is used, the awk, ex method looks even better.

---------- Post updated at 08:52 AM ---------- Previous update was at 08:44 AM ----------

Quote:
Originally Posted by Scrutinizer
@pbluescript, is file.txt free format? Could you post a sample?
Sure. The actual commands I ran were slightly different than what I posted as there are two places per line that could be changed, but I only wanted one of them to change.

sed in a for loop version:
Code:
sed -i "s#gene_id \"$OLD\"#gene_id \"$NEW\"#g" file.txt

alister's version:

Code:
awk -F, '{print "%s#gene_id \""$1"\"#gene_id \""$2"\"#g"} END {print "x"}' conversion.csv | ex -s file.txt

Here is a sample of what I started with:

Code:
chr1    mm9_knownGene   exon    3195985 3197398 0.000000        -       .       gene_id "uc007aet.1"; transcript_id "uc007aet.1";
chr1    mm9_knownGene   exon    3203520 3205713 0.000000        -       .       gene_id "uc007aet.1"; transcript_id "uc007aet.1";
chr1    mm9_knownGene   stop_codon      3206103 3206105 0.000000        -       .       gene_id "uc007aeu.1"; transcript_id "uc007aeu.1";
chr1    mm9_knownGene   CDS     3206106 3207049 0.000000        -       2       gene_id "uc007aeu.1"; transcript_id "uc007aeu.1";
chr1    mm9_knownGene   exon    3204563 3207049 0.000000        -       .       gene_id "uc007aeu.1"; transcript_id "uc007aeu.1";
chr1    mm9_knownGene   CDS     3411783 3411982 0.000000        -       1       gene_id "uc007aeu.1"; transcript_id "uc007aeu.1";
chr1    mm9_knownGene   exon    3411783 3411982 0.000000        -       .       gene_id "uc007aeu.1"; transcript_id "uc007aeu.1";
chr1    mm9_knownGene   CDS     3660633 3661429 0.000000        -       0       gene_id "uc007aeu.1"; transcript_id "uc007aeu.1";
chr1    mm9_knownGene   start_codon     3661427 3661429 0.000000        -       .       gene_id "uc007aeu.1"; transcript_id "uc007aeu.1";
chr1    mm9_knownGene   exon    3660633 3661579 0.000000        -       .       gene_id "uc007aeu.1"; transcript_id "uc007aeu.1";

Here is a sample of conversion.csv:

Code:
uc007afh.1,Lypla1
uc007afg.1,Lypla1
uc007afi.2,Tcea1
uc011wht.1,Tcea1
uc011whu.1,Tcea1
uc007afn.1,Atp6v1h
uc007afm.1,Atp6v1h
uc007afo.1,Oprk1
uc007afp.1,Oprk1
uc007afq.1,Oprk1

Here is a sample of the final result:

Code:
chr1    mm9_knownGene   exon    3195985 3197398 0.000000        -       .       gene_id "mKIAA1889"; transcript_id "uc007aet.1";
chr1    mm9_knownGene   exon    3203520 3205713 0.000000        -       .       gene_id "mKIAA1889"; transcript_id "uc007aet.1";
chr1    mm9_knownGene   stop_codon      3206103 3206105 0.000000        -       .       gene_id "Xkr4"; transcript_id "uc007aeu.1";
chr1    mm9_knownGene   CDS     3206106 3207049 0.000000        -       2       gene_id "Xkr4"; transcript_id "uc007aeu.1";
chr1    mm9_knownGene   exon    3204563 3207049 0.000000        -       .       gene_id "Xkr4"; transcript_id "uc007aeu.1";
chr1    mm9_knownGene   CDS     3411783 3411982 0.000000        -       1       gene_id "Xkr4"; transcript_id "uc007aeu.1";
chr1    mm9_knownGene   exon    3411783 3411982 0.000000        -       .       gene_id "Xkr4"; transcript_id "uc007aeu.1";
chr1    mm9_knownGene   CDS     3660633 3661429 0.000000        -       0       gene_id "Xkr4"; transcript_id "uc007aeu.1";
chr1    mm9_knownGene   start_codon     3661427 3661429 0.000000        -       .       gene_id "Xkr4"; transcript_id "uc007aeu.1";
chr1    mm9_knownGene   exon    3660633 3661579 0.000000        -       .       gene_id "Xkr4"; transcript_id "uc007aeu.1";


Last edited by Scrutinizer; 06-18-2012 at 10:38 AM.. Reason: code tags
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl: Search for string on line then search and replace text

Hi All, I have a file that I need to be able to find a pattern match on a line, search that line for a text pattern, and replace that text. An example of 4 lines in my file is: 1. MatchText_randomNumberOfText moreData ReplaceMe moreData 2. MatchText_randomNumberOfText moreData moreData... (4 Replies)
Discussion started by: Crypto
4 Replies

2. UNIX for Dummies Questions & Answers

Speeding up a Shell Script (find, grep and a for loop)

Hi all, I'm having some trouble with a shell script that I have put together to search our web pages for links to PDFs. The first thing I did was: ls -R | grep .pdf > /tmp/dave_pdfs.outWhich generates a list of all of the PDFs on the server. For the sake of arguement, say it looks like... (8 Replies)
Discussion started by: Dave Stockdale
8 Replies

3. Shell Programming and Scripting

awk - replace number of string length from search and replace for a serialized array

Hello, I really would appreciate some help with a bash script for some string manipulation on an SQL dump: I'd like to be able to rename "sites/WHATEVER/files" to "sites/SOMETHINGELSE/files" within the sql dump. This is quite easy with sed: sed -e... (1 Reply)
Discussion started by: otrotipo
1 Replies

4. Programming

PERL, search and replace inside foreach loop

Hello All, Im a Hardware engineer, I have written this script to automate my job. I got stuck in the following location. CODE: .. .. ... foreach $key(keys %arr_hash) { my ($loc,$ind,$add) = split /,/, $arr_hash{$key}; &create_verilog($key, $loc, $ind ,$add); } sub create_verilog{... (2 Replies)
Discussion started by: riyasnr007
2 Replies

5. UNIX for Dummies Questions & Answers

Speeding/Optimizing GREP search on CSV files

Hi all, I have problem with searching hundreds of CSV files, the problem is that search is lasting too long (over 5min). Csv files are "," delimited, and have 30 fields each line, but I always grep same 4 fields - so is there a way to grep just those 4 fields to speed-up search. Example:... (11 Replies)
Discussion started by: Whit3H0rse
11 Replies

6. Shell Programming and Scripting

perl search and replace - search in first line and replance in 2nd line

Dear All, i want to search particular string and want to replance next line value. following is the test file. search string is tmp,??? ,10:1 "???" may contain any 3 character it should remain the same and next line replace with ,10:50 tmp,123 --- if match tmp,??? then... (3 Replies)
Discussion started by: arvindng
3 Replies

7. Shell Programming and Scripting

search replace with loop and variable

Hi, could anyone help me with this, tried several times but still not getting it right or having enough grounding to do it outside of javascript: Using awk or sed or bash: need to go through a text file using a for next loop, replacing substrings in the file that consist of a potentially multi... (3 Replies)
Discussion started by: wind
3 Replies

8. Shell Programming and Scripting

Nested search in a file and replace the inner search

Hi Team, I am new to unix, please help me in this. I have a file named properties. The content of the file is : ##Mobile props east.url=https://qa.east.corp.com/prop/end west.url=https://qa.west.corp.com/prop/end south.url=https://qa.south.corp.com/prop/end... (2 Replies)
Discussion started by: tolearn
2 Replies

9. Shell Programming and Scripting

Speeding up substitutions

Hi all, I have a lookup table from which I am looking up values (from col1) and replacing them by corresponding values (from col2) in another file. lookup file a,b c,d So just replace a by b, and replace c by d. mainfile a,fvvgeggsegg,dvs a,fgeggefddddddddddg... (7 Replies)
Discussion started by: senhia83
7 Replies

10. Shell Programming and Scripting

Help speeding up script

This is my first experience writing unix script. I've created the following script. It does what I want it to do, but I need it to be a lot faster. Is there any way to speed it up? cat 'Tax_Provision_Sample.dat' | sort | while read p; do fn=`echo $p|cut -d~ -f2,4,3,8,9`; echo $p >> "$fn.txt";... (20 Replies)
Discussion started by: JohnN6
20 Replies
All times are GMT -4. The time now is 09:11 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy