Delete parts of a string of character in one given column of a tab delimited file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Delete parts of a string of character in one given column of a tab delimited file
# 1  
Old 03-02-2009
Delete parts of a string of character in one given column of a tab delimited file

I would like to remove characters from column 7 so that from an input file looking like this:


>HWI-EAS422_12:4:1:69:89 GGTTTAAATATTGCACAAAAGGTATAGAGCGT U0 1 0 0 ref_chr8.fa 6527777 F DD

I get something like that in an output file:

>HWI-EAS422_12:4:1:69:89 GGTTTAAATATTGCACAAAAGGTATAGAGCGT U0 1 0 0 chr8 6527777 F DD


where in column 7, "ref_chr8.fa" becomes "chr8" only.

Note: some lines of the file may present a letter instead of a number after chr, and two numbers before the dot and after chr: e.g. "ref_chrY.fa" should become "chrY", or "ref_chr10.fa" should become "chr10"

Thanks in advance for your help!!!!
# 2  
Old 03-02-2009
Code:
echo 'HWI-EAS422_12:4:1:69:89 GGTTTAAATATTGCACAAAAGGTATAGAGCGT U0 1 0 0 ref_chr8.fa 6527777 F DD' | nawk '{n=split($7, a, "[_.]"); $7=a[2]}1'

# 3  
Old 03-02-2009

Code:
awk '{ sub(/.*_/,"",$7); sub(/\..*/,"",$7); print }' FILE

# 4  
Old 03-02-2009
Code:
 echo 'HWI-EAS422_12:4:1:69:89 GGTTTAAATATTGCACAAAAGGTATAGAGCGT U0 1 0 0 ref_chr8.fa 6527777 F DD' | perl -pe 's/ref_(chr\w+)\.fa/$1/'

# 5  
Old 03-02-2009
thanks for all your suggestion,
vgersh99 and ShawnMilo, I forgot to mention that the rest of the line is different for every line in my file.


cfajohnson, your suggestion is good, but i am loosing the tab delimitations for those lines that have been modified, and i need them for the rest of my process?...
applying your script:

My file looks like this originally:
>HWI-EAS422_12:4:1:69:89 GGTTTAAATATTGCACAAAAGGTATAGAGCGT U0 1 0 0 ref_chr8.fa 6527777 F DD
>HWI-EAS422_12:4:1:1296:114 GAGATTGATCTTAAGCCTTTGGCACAGTTAAC U0 1 0 0 ref_chr12.fa 4777762 R DD
>HWI-EAS422_12:4:1:223:1514 GAATGATGTTGTTTGCTTAGACATGATTTTGT NM 0 0 0
>HWI-EAS422_12:4:1:1150:122 GAGCTTACATTGGACTATGAAAGAGGACAATT U0 1 0 0 ref_chr16.fa 30593383 F DD
>HWI-EAS422_12:4:1:190:83 GGTTTATCAAATACTCTGAAAATAAAATGGGC R0 19 2 0
>HWI-EAS422_12:4:1:151:1463 GATCTGGGACCCTTAATTTTTGGGAATCTGTT U1 0 1 0 ref_chr17.fa 52460364 R DD 16T
>HWI-EAS422_12:4:1:567:228 GATTTAACCGAAGATGATTTCGATTTTCTGAC NM 0 0 0
>HWI-EAS422_12:4:1:954:124 GATATGTATACCAGTGGAAGACAATGGAGAAT U0 1 0 0 ref_chr10.fa 57535899 F DD
>HWI-EAS422_12:4:1:193:486 GCACAGAGAGAGACAAAGGTGCCAACCTTGCT U0 1 0 0 ref_chr22.fa 32814752 R DD
>HWI-EAS422_12:4:1:621:157 GTCGAGCTTCTGGCCATCGGCATCGGCCATGA NM 0 0 0


and it becomes

>HWI-EAS422_12:4:1:69:89 GGTTTAAATATTGCACAAAAGGTATAGAGCGT U0 1 0 0 chr8 6527777 F DD
>HWI-EAS422_12:4:1:1296:114 GAGATTGATCTTAAGCCTTTGGCACAGTTAAC U0 1 0 0 chr12 4777762 R DD
>HWI-EAS422_12:4:1:223:1514 GAATGATGTTGTTTGCTTAGACATGATTTTGT NM 0 0 0
>HWI-EAS422_12:4:1:1150:122 GAGCTTACATTGGACTATGAAAGAGGACAATT U0 1 0 0 chr16 30593383 F DD
>HWI-EAS422_12:4:1:190:83 GGTTTATCAAATACTCTGAAAATAAAATGGGC R0 19 2 0
>HWI-EAS422_12:4:1:151:1463 GATCTGGGACCCTTAATTTTTGGGAATCTGTT U1 0 1 0 chr17 52460364 R DD 16T
>HWI-EAS422_12:4:1:567:228 GATTTAACCGAAGATGATTTCGATTTTCTGAC NM 0 0 0
>HWI-EAS422_12:4:1:954:124 GATATGTATACCAGTGGAAGACAATGGAGAAT U0 1 0 0 chr10 57535899 F DD
>HWI-EAS422_12:4:1:193:486 GCACAGAGAGAGACAAAGGTGCCAACCTTGCT U0 1 0 0 chr22 32814752 R DD
>HWI-EAS422_12:4:1:621:157 GTCGAGCTTCTGGCCATCGGCATCGGCCATGA NM 0 0 0

how can i resolve this issue?
# 6  
Old 03-02-2009
Quote:
Originally Posted by matlavmac
thanks for all your suggestion,
vgersh99 and ShawnMilo, I forgot to mention that the rest of the line is different for every line in my file.
The Perl one-liner I posted ignores the rest of the line, so it shouldn't make a difference. Did you try it? Maybe I'm misunderstanding your requirements.
# 7  
Old 03-02-2009
yes i would need this transformation to be applied to all the lines though, i will try it.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Replace a column in tab delimited file with column in other tab delimited file,based on match

Hello Everyone.. I want to replace the retail col from FileI with cstp1 col from FileP if the strpno matches in both files FileP.txt ... (2 Replies)
Discussion started by: YogeshG
2 Replies

2. Shell Programming and Scripting

Delete and insert columns in a tab delimited file

Hi all , I have a file having 12 columns tab delimited . I need to read this file and remove the column 3 and column 4 and insert a word in column 3 as "AVIALABLE " Is there a way to do this . I am trying like below Thanks DJ cat $FILENAME|awk -F"\t" '{ print $1 "\t... (3 Replies)
Discussion started by: Hypesslearner
3 Replies

3. Shell Programming and Scripting

Delete an entire column from a tab delimited file

Hi, Can anyone please tell me about how we can delete an entire column from a tab delimited file? Mu input_file.txt looks like this: And I want the output as: I used the below code nawk -v d="1" 'BEGIN{FS=OFS="\t"}{$d=""}{print}' input_file.txtBut in the output, the first column is... (5 Replies)
Discussion started by: sampoorna
5 Replies

4. Shell Programming and Scripting

Convert a 3 column tab delimited file to a matrix

Hi all, I have a 3 columns input file like this: CPLX9PC-4943 CPLX9PC-4943 1 CPLX9PC-4943 CpxID123 0 CPLX9PC-4943 CpxID126 0 CPLX9PC-4943 CPLX9PC-5763 0.5 CPLX9PC-4943 CpxID13 0 CPLX9PC-4943 CPLX9PC-6163 0 CPLX9PC-4943 CPLX9PC-6164 0.04... (7 Replies)
Discussion started by: AshwaniSharma09
7 Replies

5. UNIX for Dummies Questions & Answers

add (append) a column in a tab delimited file

I have a file having the following entries: test1 test2 test3 11 22 33 22 44 66 99 99 44 --- I want to add a column so that the above file becomes: test1 test2 test3 notest 11 22 33 * 22 44 66 * 99 99 44 * --- Thanks (6 Replies)
Discussion started by: mary271
6 Replies

6. Shell Programming and Scripting

Extract second column tab delimited file

I have a file which looks like this: 73450 articles and news developmental psychology 2006-03-30 16:22:40 1 http://www.usnews.com 73450 articles and news developmental psychology 2006-03-30 16:22:40 2 http://www.apa.org 73450 articles and news developmental psychology 2006-03-30... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

7. Shell Programming and Scripting

Using sed on 1st column of tab delimited file

Hi all, I'm new to Unix and work primarily in bioinformatics. I am in need of a script which will allow me to replace "1" with "chr1" in only the first column of a file which looks like such: 1 10327 rs112750067 T C . PASS ASP;RSPOS=10327;... (4 Replies)
Discussion started by: Hkins552
4 Replies

8. UNIX for Dummies Questions & Answers

Add a new column to a tab delimited text file

I want to add a new column to a tab delimited text file. It will be the first column and it will just be 1's. How do I go about doing that? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

9. Shell Programming and Scripting

Delete first column in tab-delimited text-file

I have a large text-file with tab-delimited genetic data that looks like: KSC112 KSC234 0 0 1 1 A G C T I simply wan to delete the first column, but since the file has 600 000 columns, it is not possible with awk (seems to be limited at 32k columns). Does anyone have an idea how to do this? (2 Replies)
Discussion started by: andmal
2 Replies

10. UNIX for Dummies Questions & Answers

Trim String in 3rd Column in Tab Delimited File...SED/PERL/AWK?

Hey Everybody, I am having much trouble figuring this out, as I am not really a programmer..:mad: Datafile.txt Column0 Column1 Column2 ABC DEF xxxGHI I am running using WGET on a cronjob to grab a datafile, but I need to cut the first three characters from... (6 Replies)
Discussion started by: rickdini
6 Replies
Login or Register to Ask a Question