Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Help with changing header of tsv with 30 million lines Post 302747371 by plumb_r on Friday 21st of December 2012 06:49:31 AM
Old 12-21-2012
Help with changing header of tsv with 30 million lines

Hi
My 30 million line file has a header
Code:
chr    start   end strand  ref_context repeat_masked   s1_smpl_context  s1_c_count  s1_ct_count s1_non_ct_count s1_m%   s1_score    s1_snp   s1_indels   s2_smpl_context s2_c_count  s2_ct_count s2_non_ct_count  s2_m%   s2_score    s2_snp  s2_indels   s3_smpl_context s3_c_count   s3_ct_count s3_non_ct_count s3_m%   s3_score    s3_snp  s3_indels...

I want to replace all instances of s1 to s4 with L1 to L4 and then all instances of s5 to s8 with W1 to W4 in the header using a shell script so I don't have to use an editor.

I realise I can use something like
Code:
sed -Ee '1s/s([1-4])/L\1/g' -e '1s/s([5-8])/W\1/g' -e '1y/5678/1234/' -e '1q' file

but what form would a shell script need to take to do this on multiple files?
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Strip 3 header lines and 4 trailer lines

Hello friends, I want to remove 3 header lines and 4 trailer lines, I am using following , is it correct ? sed '1,3d';'4,$ d' filename (9 Replies)
Discussion started by: ganesh123
9 Replies

2. UNIX for Advanced & Expert Users

Changing a header in a shared library

Hello, Does changing a header in a shared library under Solaris (say adding a new class data member) will result in not only compiling that library but all of the libraries that depend on that lib that was changed because of the change in the object's size? What about adding a virtual function?... (0 Replies)
Discussion started by: Linker
0 Replies

3. Shell Programming and Scripting

Tail 86000 lines from 1.2 million line file?

I have a log file that is about 1.2 million lines long and about 300MB. we need a way to clean up this file and only keep the last few thousand lines. if i use tail command we run our of memory as the file is too big. I do have a key word to match on. example, we want to keep every line... (8 Replies)
Discussion started by: robsonde
8 Replies

4. Shell Programming and Scripting

print lines except the header

awk -F ";" '{if($10>80 && NR>1) print $0 }' txt_file_* I am using this command to print the lines which has 10th field more then 80 and leaving the first line of the file which is the header. But this is not working , the first line is is coming as output , please correct me . thanks (2 Replies)
Discussion started by: madfox
2 Replies

5. UNIX for Dummies Questions & Answers

Changing email header information by tweaking sendmail

How can i tweak sendmail configuration files so that the "Received:" field is removed from email header information? Or else can i change Received: (from enswitch@localhost) in email header to something likeReceived: (from xyz@localhost)? ---------- Post updated at 09:57 PM ---------- Previous... (2 Replies)
Discussion started by: proactiveaditya
2 Replies

6. Shell Programming and Scripting

Adding header once every 5 lines

Hi, I need a help in creating a report file. The input file is like this 1 A 2 B 3 V 4 X 5 m 6 O 7 X 8 p 9 a 10 X There is a header which i have to print & save the result as a output file. The header has multiple lines on is like say: New New S.No Name (15 Replies)
Discussion started by: aravindan
15 Replies

7. Shell Programming and Scripting

Matching 10 Million file records with 10 Million in other file

Dear All, I have two files both containing 10 Million records each separated by comma(csv fmt). One file is input.txt other is status.txt. Input.txt-> contains fields with one unique id field (primary key we can say) Status.txt -> contains two fields only:1. unique id and 2. status ... (8 Replies)
Discussion started by: vguleria
8 Replies

8. Shell Programming and Scripting

find numeric duplicates from 300 million lines....

these are numeric ids.. 222932017099186177 222932014385467392 222932017371820032 222932017409556480 I have text file having 300 millions of line as shown above. I want to find duplicates from this file. Please suggest the quicker way.. sort | uniq -d will... (3 Replies)
Discussion started by: pamu
3 Replies

9. Shell Programming and Scripting

Print header and some lines

HI , i have to print the first header of df -h (Filesystem Size Used Avail Use% Mounted on)and line which conatin size Network path only. Filesystem Size Used Avail Use% Mounted on /test/sda3 35G 1.8G 32G 6% / /test/sda10 7.8G 1.1G ... (3 Replies)
Discussion started by: netdbaind
3 Replies

10. Shell Programming and Scripting

Find header in a text file and prepend it to all lines until another header is found

I've been struggling with this one for quite a while and cannot seem to find a solution for this find/replace scenario. Perhaps I'm getting rusty. I have a file that contains a number of metrics (exactly 3 fields per line) from a few appliances that are collected in parallel. To identify the... (3 Replies)
Discussion started by: verdepollo
3 Replies
bup-margin(1)						      General Commands Manual						     bup-margin(1)

NAME
bup-margin - figure out your deduplication safety margin SYNOPSIS
bup margin [options...] DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids. For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by its first 46 bits. The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits, that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits with far fewer objects. If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if you're getting dangerously close to 160 bits. OPTIONS
--predict Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer from the guess. This is potentially useful for tuning an interpolation search algorithm. --ignore-midx don't use .midx files, use only .idx files. This is only really useful when used with --predict. EXAMPLE
$ bup margin Reading indexes: 100.00% (1612581/1612581), done. 40 40 matching prefix bits 1.94 bits per doubling 120 bits (61.86 doublings) remaining 4.19338e+18 times larger is possible Everyone on earth could have 625878182 data sets like yours, all in one repository, and we would expect 1 object collision. $ bup margin --predict PackIdxList: using 1 index. Reading indexes: 100.00% (1612581/1612581), done. 915 of 1612581 (0.057%) SEE ALSO
bup-midx(1), bup-save(1) BUP
Part of the bup(1) suite. AUTHORS
Avery Pennarun <apenwarr@gmail.com>. Bup unknown- bup-margin(1)
All times are GMT -4. The time now is 06:55 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy