Removing letters after a certain character within a range of columns


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Removing letters after a certain character within a range of columns
# 1  
Old 01-03-2017
Lightbulb Removing letters after a certain character within a range of columns

Hi there,

I am trying to remove al letters after : character on specific columns from 10th column till 827. I used sed and cut to do so but I am sure there is better one liner someone can think of from unix community members.


Huge file but it has this structure (Total number of Columns = 827, rows = 605278)
Code:
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    Control1    Control2    Case1    Case2
chr1    65872    .    T    G    2480.51    .    AC=65;AN=92;SF=0,1;VRT=1    GT:GQ:DP:AD:PL    1/1:3:243:176,66:27,3,0    0/1:21:148:135,13:21,0,21    0/1:9:250:201,49:9,0,115    .
chr1    65893    .    G    A    433.77    .    AC=7;AN=10;SF=0,1;VRT=1        .    .    0/1:173:144,29:198,0,91:91    0/1:143:100,43:180,0,233:99

I am trying to get this structure
Code:
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    Control1    Control2    Case1    Case2
chr1    65872    .    T    G    2480.51    .    AC=65;AN=92;SF=0,1;VRT=1     GT:GQ:DP:AD:PL    1/1     0/1    0/1    .
chr1    65893    .    G    A    433.77    .    AC=7;AN=10;SF=0,1;VRT=1         .    .    0/1     0/1

Thanks for advance Smilie
# 2  
Old 01-03-2017
Welcome to the forum!

Can you please show what is that you have tried so far? Thanks

---------- Post updated at 02:37 PM ---------- Previous update was at 02:32 PM ----------

Here is a rough sample

Code:
>cat inputfile
a-b-c-d
e-f-g-h

awk -F"-" '{ printf "\n"; for (i=1; i<=2; i++) { printf "-%s", $i } }' inputfile

# 3  
Old 01-03-2017
Thanks for the quick reply,

I did
Code:
 sed 's/:.*//' file.txt > file1.txt

However, this replaced only one of the columns then the rest all were empty.

Another bad idea was to cut the first 3 letter from the file.txt > file1.txt and re-join desired columns from file.txt to make file3.txt

It can be done but very lousy solution from my end. I thought of using awk find ":" replace with "" on specific columns 10th-827th but didnt know how to do so?
# 4  
Old 01-03-2017
Please also explain what you mean by "remove al letters after : character". The sample data you provided did not contain any letters in the fields you specified and instead of removing letters after a <colon> character, you removed all punctuation and decimal digits from those fields after the 1st <colon> character and also removed that <colon> character.

Your input field separator seems to be a sequence of four <space> characters. Your output field separator seems to be a sequence of four or five <space> characters (although I didn't check all of the output sequences). Is there any reason why a single <space> is insufficient as a field separator in your output?
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 01-03-2017
Assuming "letters" stands for "characters", and interpreting your desired output like "remove everything in a field after the first colon", and using <TAB> as the output field separator, would this come close to what you need:
Code:
awk 'NR > 1 {for (i=1; i<=NF; i++) sub (/:.*$/, _, $i)} 1' OFS="\t" file
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    Control1    Control2    Case1    Case2
chr1	65872	.	T	G	2480.51	.	AC=65;AN=92;SF=0,1;VRT=1	GT	1/1	0/1	0/1	.
chr1	65893	.	G	A	433.77	.	AC=7;AN=10;SF=0,1;VRT=1	.	.	0/1	0/1

EDIT: I made a typo: the loop should start at the 10. field only:
Code:
awk 'NR > 1 {for (i=10; i<=NF; i++) sub (/:.*$/, _, $i)} 1' OFS="\t" file


Last edited by RudiC; 01-03-2017 at 12:16 PM.. Reason: corrected typo...
This User Gave Thanks to RudiC For This Post:
# 6  
Old 01-03-2017
Thanks Don Cragun and RudiC,

I apologies for not clarifying properly, its exactly how RuiC described.
RudiC your code worked as expected, is there a way to ignore certain columns e.g #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT (from 1-9) thanks again for your code and explanation.

Smilie
# 7  
Old 01-03-2017
I'm glad it helped and my assumptions / interpretations were adequate. But your new request again is ambiguous: Modify the header? Remove ALL columns 1 - 9 in ALL rows? Remove a selection of columns?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Checking subset and removing extra letters

In each line of file, I wish to check if word1 is a non-connected subset of any of the other words in the line. If yes, keep only the words that ward1 is a subset of. Else, remove the whole line. Also, I want to remove the letters that word1 doesn't match with, except for "_+" Example file:... (2 Replies)
Discussion started by: Viernes
2 Replies

2. UNIX for Dummies Questions & Answers

Help with removing files with date range

Hi, I want to remove trace files in a particular directory for a specific date range. Currently i can remove based on time (e.g find /path/*.trm -mtime +1000 -exec rm {} \;). But i want to remove .trm files within a date range. E.g to remove .trm files between jan 1 2002 to April 15 2005. ... (3 Replies)
Discussion started by: dollypee
3 Replies

3. UNIX for Dummies Questions & Answers

Removing a range of files in a directory..

Hi all, Disclosure: I am very new to Unix, but eager to learn.. I've been tasked with transferring logs to a remote server. After I've verified these logs have transferred correctly I have to remove the source files. The naming scheme is: /directory/2012.05.01 /directory/2012.05.02 ..and... (1 Reply)
Discussion started by: JD3V
1 Replies

4. Shell Programming and Scripting

reducing values in columns with both numbers and letters

Hi, I columns with both number and letters however i need the number 4 trimmed off the lines that have 3 numbers in them so it just because the 2 preceding numbers only For example V25QG2-K18QG-V25CG2 L26HG-L17HA-L26CG I434QD1-L19HB2-I434CD1 I434QD1-A31QB-I434CD1 ... (7 Replies)
Discussion started by: olifu02
7 Replies

5. Shell Programming and Scripting

removing a range of characters in a filename

hi, I have quite a bunch of files with annoyingly long filenames. I wanted to cut the range of characters from 9-18 and just retain the first 8 characters and the .extension. any suggestion how to do it. thanks much. original filename: 20000105_20000105_20100503.nc.asc output filename:... (4 Replies)
Discussion started by: ida1215
4 Replies

6. UNIX for Dummies Questions & Answers

Removing columns from a text file that do not have any values in second and third columns

I have a text file that has three columns. But at the end of the text file, there are trailing lines that have missing second and third columns: 4 0.04972604 KLHL28 4 0.0497332 CSTB 4 0.04979822 AIF1 4 0.04983331 DECR2 4 0.04990344 KATNB1 4 4 4 4 How can I remove the trailing... (3 Replies)
Discussion started by: evelibertine
3 Replies

7. Shell Programming and Scripting

read into a range of character

i have this problem: i must hide a string with a character such as _ by command WORD=string; XXX=`echo $WORD | sed 's//_/g' but after, users must send in input a character and i must to replace the _ with the input character or better i can do this -$CHARS_INPUT i have think to use command... (3 Replies)
Discussion started by: tafazzi87
3 Replies

8. UNIX for Dummies Questions & Answers

Use of character range in awk

Hi all, I am having a bit of a hard time using awk. I must do something wrong, but I don't know what... Any help would be greatly appreciated! I read a file, as follows :... ATOM 21 C THR A 4 23.721 -26.194 1.909 1.00 32.07 C ATOM 22 O THR A 4 ... (2 Replies)
Discussion started by: hypsis
2 Replies

9. UNIX for Dummies Questions & Answers

Need help removing last character of every line if certain character

I need help removing the last character of every line if it is a certain character. For example I need to get rid of a % character if it is in the last position. Input: aaa% %bbb ccc d%dd% Output should be: aaa %bbb ccc d%dd I tried this but it gets rid of all of the % characters.... (5 Replies)
Discussion started by: raptor25
5 Replies

10. Shell Programming and Scripting

Removing Letters from Integer String

Hi all, I have a variable, on some machines it is '1024', which is fine, but on others it is '1024Mb' etc. I need this variable to simply be '1024', does anyone know how I could ensure this is always the case? Perhaps a command to remove any letters/characters that aren't integers if there is... (3 Replies)
Discussion started by: hodges
3 Replies
Login or Register to Ask a Question