Visit Our UNIX and Linux User Community


Remove the first character from the fourth column only if the column has four characters


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove the first character from the fourth column only if the column has four characters
# 1  
Old 08-13-2013
Remove the first character from the fourth column only if the column has four characters

I have a file as follows

Code:
ATOM   5181  N  AMET K 406      12.440   6.552  25.691  0.50  7.37           N   
ATOM   5182  CA AMET K 406      13.685   5.798  25.578  0.50  5.87           C   
ATOM   5183  C  AMET K 406      14.045   5.179  26.909  0.50  5.07           C   
ATOM   5184  O   MET K 406      14.595   4.083  27.003  0.50  7.07           O   
ATOM   5185  CB  MET K 406      14.812   6.674  25.044  0.50  6.80           C   
ATOM   5191  C  BMET K 406      14.044   5.177  26.910  0.50  5.15           C   
ATOM   5192  O  BMET K 406      14.589   4.078  27.004  0.50  7.09           O  
ATOM   5197  N   ALA K 407      13.718   5.884  27.972  1.00  5.30           N   
ATOM   5198  CA  ALA K 407      14.077   5.408  29.309  1.00  6.16           C 
ATOM   5202  N  AARG K 408      12.186   3.982  29.147  0.50  6.55           N   
ATOM   5203  CA AARG K 408      11.407   2.745  29.387  0.50  7.31           C

I would like to remove the first character from the fourth column only if the column has four characters. (in-place editing)

Desired output

Code:
ATOM   5181  N   MET K 406      12.440   6.552  25.691  0.50  7.37           N   
ATOM   5182  CA  MET K 406      13.685   5.798  25.578  0.50  5.87           C   
ATOM   5183  C   MET K 406      14.045   5.179  26.909  0.50  5.07           C   
ATOM   5184  O   MET K 406      14.595   4.083  27.003  0.50  7.07           O   
ATOM   5185  CB  MET K 406      14.812   6.674  25.044  0.50  6.80           C   
ATOM   5191  C   MET K 406      14.044   5.177  26.910  0.50  5.15           C   
ATOM   5192  O   MET K 406      14.589   4.078  27.004  0.50  7.09           O  
ATOM   5197  N   ALA K 407      13.718   5.884  27.972  1.00  5.30           N   
ATOM   5198  CA  ALA K 407      14.077   5.408  29.309  1.00  6.16           C 
ATOM   5202  N   ARG K 408      12.186   3.982  29.147  0.50  6.55           N   
ATOM   5203  CA  ARG K 408      11.407   2.745  29.387  0.50  7.31           C

# 2  
Old 08-13-2013
You could use substr in awk:

Code:
awk '{ print substr($0,1,16) " " substr($0,18) }' infile

# 3  
Old 08-13-2013
Thanks for your answer. I need in-place editing. Is it possible?
# 4  
Old 08-13-2013
Hello,

Could you please try the following and let me know if this helps you.


1st code is as follows.

Code:
sed 's/.MET/MET/g; s/.ARG/ARG/g' remove_char_4th_column

Output will be as folllows.

Code:
ATOM   5181  N  MET K 406      12.440   6.552  25.691  0.50  7.37           N
ATOM   5182  CA MET K 406      13.685   5.798  25.578  0.50  5.87           C
ATOM   5183  C  MET K 406      14.045   5.179  26.909  0.50  5.07           C
ATOM   5184  O  MET K 406      14.595   4.083  27.003  0.50  7.07           O
ATOM   5185  CB MET K 406      14.812   6.674  25.044  0.50  6.80           C
ATOM   5191  C  MET K 406      14.044   5.177  26.910  0.50  5.15           C
ATOM   5192  O  MET K 406      14.589   4.078  27.004  0.50  7.09           O
ATOM   5197  N   ALA K 407      13.718   5.884  27.972  1.00  5.30           N
ATOM   5198  CA  ALA K 407      14.077   5.408  29.309  1.00  6.16           C
ATOM   5202  N  ARG K 408      12.186   3.982  29.147  0.50  6.55           N
ATOM   5203  CA ARG K 408      11.407   2.745  29.387  0.50  7.31           C


2nd code is as follows.

Code:
sed 's/AMET/MET/g; s/BMET/MET/g; s/AARG/ARG/g' remove_char_4th_column


Output will be as follows.


Code:
ATOM   5181  N  MET K 406      12.440   6.552  25.691  0.50  7.37           N
ATOM   5182  CA MET K 406      13.685   5.798  25.578  0.50  5.87           C
ATOM   5183  C  MET K 406      14.045   5.179  26.909  0.50  5.07           C
ATOM   5184  O   MET K 406      14.595   4.083  27.003  0.50  7.07           O
ATOM   5185  CB  MET K 406      14.812   6.674  25.044  0.50  6.80           C
ATOM   5191  C  MET K 406      14.044   5.177  26.910  0.50  5.15           C
ATOM   5192  O  MET K 406      14.589   4.078  27.004  0.50  7.09           O
ATOM   5197  N   ALA K 407      13.718   5.884  27.972  1.00  5.30           N
ATOM   5198  CA  ALA K 407      14.077   5.408  29.309  1.00  6.16           C
ATOM   5202  N  ARG K 408      12.186   3.982  29.147  0.50  6.55           N
ATOM   5203  CA ARG K 408      11.407   2.745  29.387  0.50  7.31           C


Where I am having the input provided bby you in a file named remove_char_4th_column.



Thanks,
R. Singh
# 5  
Old 08-13-2013
Hi Ravinder singh,

Thank you for your answer. I need in-place editing because I have lot of files like this. In the given example, the name of strings are AMET, BMET, AARG and ALA. In other files, the name of strings are different. So I think, your code is difficult for me to use for multiple files.
# 6  
Old 08-13-2013
If the format is fixed the solution from Chubler should work.
To edit the file do
Code:
awk '{ print substr($0,1,16) " " substr($0,18) }' orgfile > newfile ; mv newfile orgfile

This will replace the file, (same as sed -i), not sure if you can do it in an other way.
# 7  
Old 08-13-2013
Inline change...
Try this... Works for the given pattern...
Code:
sed -i 's/\(.* \).*\(... [A-Z] [0-9].*\)/\1\2/g' infile

Code:
-bash-3.2$ sed 's/\(.* \).*\(... [A-Z] [0-9].*\)/\1\2/g' infile
ATOM   5181  N  MET K 406      12.440   6.552  25.691  0.50  7.37           N
ATOM   5182  CA MET K 406      13.685   5.798  25.578  0.50  5.87           C
ATOM   5183  C  MET K 406      14.045   5.179  26.909  0.50  5.07           C
ATOM   5184  O   MET K 406      14.595   4.083  27.003  0.50  7.07           O
ATOM   5185  CB  MET K 406      14.812   6.674  25.044  0.50  6.80           C
ATOM   5191  C  MET K 406      14.044   5.177  26.910  0.50  5.15           C
ATOM   5192  O  MET K 406      14.589   4.078  27.004  0.50  7.09           O
ATOM   5197  N   ALA K 407      13.718   5.884  27.972  1.00  5.30           N
ATOM   5198  CA  ALA K 407      14.077   5.408  29.309  1.00  6.16           C
ATOM   5202  N  ARG K 408      12.186   3.982  29.147  0.50  6.55           N
ATOM   5203  CA ARG K 408      11.407   2.745  29.387  0.50  7.31           C

Pattern is built assuming there will be a single alphabet (here it is K) followed by a space and a number.

--ahamed

Last edited by ahamed101; 08-13-2013 at 04:27 AM..

Previous Thread | Next Thread
Test Your Knowledge in Computers #457
Difficulty: Easy
NTP was created to synchronize all participating computers to within a few milliseconds of Coordinated Universal Time (UTC).
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Remove newline character from column spread over multiple lines in a file

Hi, I came across one issue recently where output from one of the columns of the table from where i am creating input file has newline characters hence, record in the file is spread over multiple lines. Fields in the file are separated by pipe (|) delimiter. As header will never have newline... (4 Replies)
Discussion started by: Prathmesh
4 Replies

2. Shell Programming and Scripting

Remove character from a column in each line

Hi, I am a newbie to shell scripting (.sh). Please guide me on how to do the below issue. My input file has below data. I want to remove $ sysmbol from the fourth column of each line. (ie, between 4th and 5th pipe symbol) ABC25160|51497|06/02/2010|$32,192.07|MARK|$100|A... (3 Replies)
Discussion started by: rsreejithmenon
3 Replies

3. Shell Programming and Scripting

[Solved] Extract First character in fourth column

Hi Experts, I am new to UNIX. One of my file records are like below 220 IN C/A 515013 NULL NULL 220 IN C/A 515017 NULL NULL 225 IN C/A 333701 NULL NULL 225 IN C/A 515034 NULL NULL 225 IN C/A 499201 NULL NULL 225 IN C/A 499202 NULL NULL The above mentioned records delimiter is... (4 Replies)
Discussion started by: suresh_target
4 Replies

4. Shell Programming and Scripting

Remove the values from a certain column without deleting the Column name in a .CSV file

(14 Replies)
Discussion started by: dhruuv369
14 Replies

5. Shell Programming and Scripting

replace by match on fourth column

Hi friends, My input file is this way chr1 100 200 "abc" chr1 350 400 "abc" chr2 450 600 "def" chr2 612 780 "def" How do I make this file into chr1 100 400 "abc" chr2 450 780 "def" This is basically matching on the fourth column and taking the minimum of second column and the... (4 Replies)
Discussion started by: jacobs.smith
4 Replies

6. Shell Programming and Scripting

remove special character from a specific column

Hello , i have a text file like this : A123 c12AB c32DD aaaa B123 23DS 12QW bbbb C123 2GR 3RG cccccc i want to remove the numbers from second and third column only. i tried this : perl -pe 's///g' file.txt > newfile.txt but it will remove the number from... (7 Replies)
Discussion started by: shelladdict
7 Replies

7. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

8. Shell Programming and Scripting

Use awk to have the fourth column with spaces

Hi Gurus, We have a ftpserver from which we do a dir command and output it to a local file. The content of the ftpfile is: 07-15-09 06:06AM 5466 ABC_123_ER19057320090714082723.ZIP 07-15-09 06:07AM 3801 ABC_123_ER19155920090714082842.ZIP 07-15-09 06:07AM ... (14 Replies)
Discussion started by: donisback
14 Replies

9. Shell Programming and Scripting

How to manipulate first column and reverse the line order in third and fourth column?

How to manipulate first column and reverse the line order in third and fourth column as follws? For example i have a original file like this: file1 0.00000000E+000 -1.17555359E-001 0.00000000E+000 2.00000000E-002 -1.17555359E-001 0.00000000E+000 ... (1 Reply)
Discussion started by: Max Well
1 Replies

10. Shell Programming and Scripting

remove new line characters from a partcular column data

Dear friends, I have a pipe delimited file having 5 columns. However the column no-3 is having extra new line characters as the data owing to owing , I am having issues. Ideally my file should have only newline termination at the end of each record and not within column data of any of... (1 Reply)
Discussion started by: sureshg_sampat
1 Replies

Featured Tech Videos