Help with changing header of tsv with 30 million lines | Unix Linux Forums | UNIX for Dummies Questions & Answers

  Go Back    


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Help with changing header of tsv with 30 million lines

UNIX for Dummies Questions & Answers


Tags
awk, header, replace header, sed

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 12-21-2012
plumb_r plumb_r is offline
Registered User
 
Join Date: Dec 2012
Last Activity: 4 February 2013, 3:29 AM EST
Posts: 3
Thanks: 1
Thanked 0 Times in 0 Posts
Help with changing header of tsv with 30 million lines

Hi
My 30 million line file has a header

Code:
chr    start   end strand  ref_context repeat_masked   s1_smpl_context  s1_c_count  s1_ct_count s1_non_ct_count s1_m%   s1_score    s1_snp   s1_indels   s2_smpl_context s2_c_count  s2_ct_count s2_non_ct_count  s2_m%   s2_score    s2_snp  s2_indels   s3_smpl_context s3_c_count   s3_ct_count s3_non_ct_count s3_m%   s3_score    s3_snp  s3_indels...

I want to replace all instances of s1 to s4 with L1 to L4 and then all instances of s5 to s8 with W1 to W4 in the header using a shell script so I don't have to use an editor.

I realise I can use something like
Code:
sed -Ee '1s/s([1-4])/L\1/g' -e '1s/s([5-8])/W\1/g' -e '1y/5678/1234/' -e '1q' file

but what form would a shell script need to take to do this on multiple files?
Sponsored Links
    #2  
Old 12-21-2012
Yoda's Avatar
Yoda Yoda is offline Forum Advisor  
Jedi Master
 
Join Date: Jan 2012
Last Activity: 30 July 2014, 7:43 PM EDT
Location: Galactic Empire
Posts: 3,358
Thanks: 230
Thanked 1,190 Times in 1,123 Posts
If your sed is working as expected, then how about using a for loop:-

Code:
for file in *
do
   sed -Ee '1s/s([1-4])/L\1/g' -e '1s/s([5-8])/W\1/g' -e '1y/5678/1234/' -e '1q' $file > tmp;
   mv tmp $file
done

Sponsored Links
    #3  
Old 12-21-2012
binlib binlib is offline
Registered User
 
Join Date: Aug 2009
Last Activity: 15 March 2013, 10:40 AM EDT
Location: New Jersey
Posts: 380
Thanks: 7
Thanked 90 Times in 75 Posts
I will assume you want to change the header but keep the rest of the 30 million lines. Since the new header has the same length, the following will open file for both reading and writing:

Code:
sed -Ee '1s/s([1-4])/L\1/g' -e '1s/s([5-8])/W\1/g' -e '1y/5678/1234/' -e '1q' 0<file 1<>file

This will be very fast and replace the file in place. Be careful.
The Following User Says Thank You to binlib For This Useful Post:
plumb_r (12-24-2012)
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
find numeric duplicates from 300 million lines.... pamu Shell Programming and Scripting 3 07-11-2012 08:43 AM
Changing email header information by tweaking sendmail proactiveaditya UNIX for Dummies Questions & Answers 2 06-22-2010 03:31 AM
Tail 86000 lines from 1.2 million line file? robsonde Shell Programming and Scripting 8 11-11-2009 05:48 PM
Changing a header in a shared library Linker UNIX for Advanced & Expert Users 0 04-01-2009 06:31 AM
Strip 3 header lines and 4 trailer lines ganesh123 Shell Programming and Scripting 9 03-10-2007 04:15 PM



All times are GMT -4. The time now is 09:17 PM.