Help with changing header of tsv with 30 million lines


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Help with changing header of tsv with 30 million lines
# 1  
Old 12-21-2012
Help with changing header of tsv with 30 million lines

Hi
My 30 million line file has a header
Code:
chr    start   end strand  ref_context repeat_masked   s1_smpl_context  s1_c_count  s1_ct_count s1_non_ct_count s1_m%   s1_score    s1_snp   s1_indels   s2_smpl_context s2_c_count  s2_ct_count s2_non_ct_count  s2_m%   s2_score    s2_snp  s2_indels   s3_smpl_context s3_c_count   s3_ct_count s3_non_ct_count s3_m%   s3_score    s3_snp  s3_indels...

I want to replace all instances of s1 to s4 with L1 to L4 and then all instances of s5 to s8 with W1 to W4 in the header using a shell script so I don't have to use an editor.

I realise I can use something like
Code:
sed -Ee '1s/s([1-4])/L\1/g' -e '1s/s([5-8])/W\1/g' -e '1y/5678/1234/' -e '1q' file

but what form would a shell script need to take to do this on multiple files?
# 2  
Old 12-21-2012
If your sed is working as expected, then how about using a for loop:-
Code:
for file in *
do
   sed -Ee '1s/s([1-4])/L\1/g' -e '1s/s([5-8])/W\1/g' -e '1y/5678/1234/' -e '1q' $file > tmp;
   mv tmp $file
done

# 3  
Old 12-21-2012
I will assume you want to change the header but keep the rest of the 30 million lines. Since the new header has the same length, the following will open file for both reading and writing:
Code:
sed -Ee '1s/s([1-4])/L\1/g' -e '1s/s([5-8])/W\1/g' -e '1y/5678/1234/' -e '1q' 0<file 1<>file

This will be very fast and replace the file in place. Be careful.
This User Gave Thanks to binlib For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find header in a text file and prepend it to all lines until another header is found

I've been struggling with this one for quite a while and cannot seem to find a solution for this find/replace scenario. Perhaps I'm getting rusty. I have a file that contains a number of metrics (exactly 3 fields per line) from a few appliances that are collected in parallel. To identify the... (3 Replies)
Discussion started by: verdepollo
3 Replies

2. Shell Programming and Scripting

Print header and some lines

HI , i have to print the first header of df -h (Filesystem Size Used Avail Use% Mounted on)and line which conatin size Network path only. Filesystem Size Used Avail Use% Mounted on /test/sda3 35G 1.8G 32G 6% / /test/sda10 7.8G 1.1G ... (3 Replies)
Discussion started by: netdbaind
3 Replies

3. Shell Programming and Scripting

find numeric duplicates from 300 million lines....

these are numeric ids.. 222932017099186177 222932014385467392 222932017371820032 222932017409556480 I have text file having 300 millions of line as shown above. I want to find duplicates from this file. Please suggest the quicker way.. sort | uniq -d will... (3 Replies)
Discussion started by: pamu
3 Replies

4. Shell Programming and Scripting

Matching 10 Million file records with 10 Million in other file

Dear All, I have two files both containing 10 Million records each separated by comma(csv fmt). One file is input.txt other is status.txt. Input.txt-> contains fields with one unique id field (primary key we can say) Status.txt -> contains two fields only:1. unique id and 2. status ... (8 Replies)
Discussion started by: vguleria
8 Replies

5. Shell Programming and Scripting

Adding header once every 5 lines

Hi, I need a help in creating a report file. The input file is like this 1 A 2 B 3 V 4 X 5 m 6 O 7 X 8 p 9 a 10 X There is a header which i have to print & save the result as a output file. The header has multiple lines on is like say: New New S.No Name (15 Replies)
Discussion started by: aravindan
15 Replies

6. UNIX for Dummies Questions & Answers

Changing email header information by tweaking sendmail

How can i tweak sendmail configuration files so that the "Received:" field is removed from email header information? Or else can i change Received: (from enswitch@localhost) in email header to something likeReceived: (from xyz@localhost)? ---------- Post updated at 09:57 PM ---------- Previous... (2 Replies)
Discussion started by: proactiveaditya
2 Replies

7. Shell Programming and Scripting

print lines except the header

awk -F ";" '{if($10>80 && NR>1) print $0 }' txt_file_* I am using this command to print the lines which has 10th field more then 80 and leaving the first line of the file which is the header. But this is not working , the first line is is coming as output , please correct me . thanks (2 Replies)
Discussion started by: madfox
2 Replies

8. Shell Programming and Scripting

Tail 86000 lines from 1.2 million line file?

I have a log file that is about 1.2 million lines long and about 300MB. we need a way to clean up this file and only keep the last few thousand lines. if i use tail command we run our of memory as the file is too big. I do have a key word to match on. example, we want to keep every line... (8 Replies)
Discussion started by: robsonde
8 Replies

9. UNIX for Advanced & Expert Users

Changing a header in a shared library

Hello, Does changing a header in a shared library under Solaris (say adding a new class data member) will result in not only compiling that library but all of the libraries that depend on that lib that was changed because of the change in the object's size? What about adding a virtual function?... (0 Replies)
Discussion started by: Linker
0 Replies

10. Shell Programming and Scripting

Strip 3 header lines and 4 trailer lines

Hello friends, I want to remove 3 header lines and 4 trailer lines, I am using following , is it correct ? sed '1,3d';'4,$ d' filename (9 Replies)
Discussion started by: ganesh123
9 Replies
Login or Register to Ask a Question