find and replace in first column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting find and replace in first column
# 1  
Old 04-30-2010
find and replace in first column

Dear All
I need help with find and replacing a string:..

In the following example of a file named say <filename.gff> with 9 columns
Code:
1       ensembl chromosome      1       300239041       .       .       .       ID=1;Name=chromosome:AGPv1:1:1:300239041:1
1       ensembl exon    3       104     .       +       .       Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E07
1       ensembl gene    3       3807    .       +       .       ID=GRMZM2G060082;Name=GRMZM2G060082;biotype=protein_coding
1       ensembl mRNA    3       3807    .       +       .       ID=GRMZM2G060082_T01;Parent=GRMZM2G060082;Name=GRMZM2G060082_T01;biotype=protein_coding
1       ensembl intron  105     199     .       +       .       Parent=GRMZM2G060082_T01
3       ensembl exon    200     313     .       +       .       Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E06
3       ensembl CDS     230     313     .       +       .       Parent=GRMZM2G060082_T01;Name=CDS.98360
3       ensembl intron  314     421     .       +       .       Parent=GRMZM2G060082_T01
3       ensembl CDS     422     604     .       +       0       Parent=GRMZM2G060082_T01;Name=CDS.98361
3       ensembl exon    422     604     .       +       .       Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E05



I need to convert all the "1" in column one to "chr1" and all the "3" in column one to "chr3"
I used the command
Code:
sed 's/1/chr1/' <filename.gff>

to change the first occurence of 1 to chr1, this worked dandy. The same command will not work for "3" since '3' occurs in line 2 in the 4th field and some other places too. How can I find and replace 3 with chr3 only in the first field (column)?

thanks much
Siva
# 2  
Old 04-30-2010
Code:
xxx@yyyy:~/test> cat filname.gff
1       ensembl chromosome      1       300239041       .       .       .       ID=1;Name=chromosome:AGPv1:1:1:300239041:1
1       ensembl exon    3       104     .       +       .       Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E07
1       ensembl gene    3       3807    .       +       .       ID=GRMZM2G060082;Name=GRMZM2G060082;biotype=protein_coding
1       ensembl mRNA    3       3807    .       +       .       ID=GRMZM2G060082_T01;Parent=GRMZM2G060082;Name=GRMZM2G060082_T01;biotype=protein_coding
1       ensembl intron  105     199     .       +       .       Parent=GRMZM2G060082_T01
3       ensembl exon    200     313     .       +       .       Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E06
3       ensembl CDS     230     313     .       +       .       Parent=GRMZM2G060082_T01;Name=CDS.98360
3       ensembl intron  314     421     .       +       .       Parent=GRMZM2G060082_T01
3       ensembl CDS     422     604     .       +       0       Parent=GRMZM2G060082_T01;Name=CDS.98361
3       ensembl exon    422     604     .       +       .       Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E05

if you just want to replace the first column try
Code:
xxx@yyyy:~/test> sed 's/^\([13]\)/chr\1/' filname.gff
chr1       ensembl chromosome      1       300239041       .       .       .       ID=1;Name=chromosome:AGPv1:1:1:300239041:1
chr1       ensembl exon    3       104     .       +       .       Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E07
chr1       ensembl gene    3       3807    .       +       .       ID=GRMZM2G060082;Name=GRMZM2G060082;biotype=protein_coding
chr1       ensembl mRNA    3       3807    .       +       .       ID=GRMZM2G060082_T01;Parent=GRMZM2G060082;Name=GRMZM2G060082_T01;biotype=protein_coding
chr1       ensembl intron  105     199     .       +       .       Parent=GRMZM2G060082_T01
chr3       ensembl exon    200     313     .       +       .       Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E06
chr3       ensembl CDS     230     313     .       +       .       Parent=GRMZM2G060082_T01;Name=CDS.98360
chr3       ensembl intron  314     421     .       +       .       Parent=GRMZM2G060082_T01
chr3       ensembl CDS     422     604     .       +       0       Parent=GRMZM2G060082_T01;Name=CDS.98361
chr3       ensembl exon    422     604     .       +       .       Parent=GRMZM2G060082_T01;Name=GRMZM2G060082_E05


Last edited by n.wawerek; 04-30-2010 at 06:29 AM.. Reason: failed to read correct the first time
# 3  
Old 05-01-2010
Hello
Thanks, I am reading up a bit on regular expressions and patterns. Hope to get better in this.

Siva
# 4  
Old 05-01-2010
Or:
Code:
sed 's/^[13]/chr&/'

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find Word in Column with replace (2)

Dear ALL, I have sample file : 4c:66:41:6b:b5:5f, 00:00,00:00, -22:-25, robert, 10.101.3.119, host1 4c:66:41:2c:c1:5a, 00:00,00:00, -21:-25, 10.101.3.112, host1 4c:66:41:6b:b1:5a, 00:00,00:00, -21:-25, Julia, 10.101.3.113, host1 4c:66:41:2c:c1:5b, 00:00,00:00, -21:-25, 10.101.3.115, host1... (2 Replies)
Discussion started by: gnulyn
2 Replies

2. Shell Programming and Scripting

Find Word in Column with replace

Hi ALL i have file.txt with text : 4c:66:41:6b:b5:5f, 00:00,00:00, -22:-25, users1, 10.101.3.119, host1 4c:66:41:2c:c1:5a, 00:00,00:00, -21:-25, 10.101.3.112, host1 4c:66:41:6b:b1:5e, 00:00,00:00, -21:-25, users1, 10.101.3.113, host1 4c:66:41:2c:c1:5b, 00:00,00:00, -21:-25, 10.101.3.115,... (3 Replies)
Discussion started by: gnulyn
3 Replies

3. Shell Programming and Scripting

Need a Linux command for find/replace column based on specific criteria.

I'm new to shell programming, I have a huge text file in the following format, where columns are separated by single space: ACA MEX 4O_ $98.00 $127.40 $166.60 0:00 0:00 0 ; ACA YUL TS_ $300.00 $390.00 $510.00 0:00 0:00 0 ; ACA YYZ TS_ $300.00 $390.00 $510.00 0:00 0:00 0 ; ADZ YUL TS_ $300.00... (3 Replies)
Discussion started by: transat
3 Replies

4. Programming

Find gaps in time data and replace missing time value and column 2 value by interpolation in awk

Dear all, I am kindly seeking assistance on the following issue. I am working with data that is sampled every 0.05 hours (that is 3 minutes intervals) here is a sample data from the file 5.00000 15.5030 5.05000 15.6680 5.10000 16.0100 5.15000 16.3450 5.20000 16.7120 5.25000... (4 Replies)
Discussion started by: malandisa
4 Replies

5. Shell Programming and Scripting

Find number of characters in a column and replace

Hi all, I want to count total no. of characters in a column. and if no. of charaters are more than 3 then it must replace it by splitted string. ie, it must place a space after 3 characters. Ex: 21 435g asd3dd jklfjwe wer column number 3 has 4 alphanumeric character, so it must be splitted... (3 Replies)
Discussion started by: CAch
3 Replies

6. Shell Programming and Scripting

Find in first column and replace the line with Awk, and output new file

Find in first column and replace the line with Awk, and output new file File1.txt"2011-11-02","Georgia","Atlanta","x","","" "2011-11-03","California","Los Angeles","x","","" "2011-11-04","Georgia","Atlanta","x","x","x" "2011-11-05","Georgia","Atlanta","x","x","" ... (4 Replies)
Discussion started by: charles33
4 Replies

7. Shell Programming and Scripting

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2 file 1 sample SNDK 80004C101 AT XLNX 983919101 BB NETL 64118B100 BS AMD 007903107 CC KLAC 482480100 DC TER 880770102 KATS ATHR 04743P108 KATS... (7 Replies)
Discussion started by: rydz00
7 Replies

8. Shell Programming and Scripting

Find and replace a column that has '' to NULL in a comma delimited using awk or sed

Hi this is my first time posting ever. I'm relatively new in using AWK/SED, I've been trying many a solution. I'm trying to replace the 59th column in a file where if I encounter '' then I would like to replace it with the word NULL. example 0 , '' , '' , 0 , 195.538462 change it to 0... (5 Replies)
Discussion started by: gumal901
5 Replies

9. Shell Programming and Scripting

Find and replace duplicate column values in a row

I have file which as 12 columns and values like this 1,2,3,4,5 a,b,c,d,e b,c,a,e,f a,b,e,a,h if you see the first column has duplicate values, I need to identify (print it to console) the duplicate value (which is 'a') and also remove duplicate values like below. I could be in two... (5 Replies)
Discussion started by: nuthalapati
5 Replies

10. Shell Programming and Scripting

Need an awk for a global find/replace in a file, specific column

I am new to unix and awk/sed etc... using C-Shell. Basically, I have a fixed length file that has 4 different record types on it, H, D, V, W all in column 1. I need to change all the W's in column 1 to D's. in the entire file. The W's can be anywhere in the file and must remain in the same... (3 Replies)
Discussion started by: jclanc8
3 Replies
Login or Register to Ask a Question