Search field in text file and replace value


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Search field in text file and replace value
# 1  
Old 02-13-2013
Search field in text file and replace value

Hi there,

First of all this is my first post here. Thank you in advance for your help.

What I am trying to do is the following. I have a text file where each field of each row is separated by a tabulator.

Looks like this:
Code:
ATOM      1  N   HSE A  26       3.033 -10.429  -2.262  1.00 17.07           N1+
ATOM      2  CA  HSE A  26       3.226 -11.674  -3.040  1.00 14.73           C  
ATOM      3  CB  HSE A  26       4.705 -11.978  -3.127  1.00 15.52           C  
ATOM      4  CG  HSE A  26       5.055 -13.031  -4.057  1.00 15.51           C  
ATOM      5  ND1 HSE A  26       4.959 -14.364  -3.715  1.00 15.39           N  
ATOM      6  CE1 HSE A  26       5.349 -15.091  -4.746  1.00 17.55           C  
ATOM      7  NE2 HSE A  26       5.765 -14.285  -5.726  1.00 21.97           N  
ATOM      8  CD2 HSE A  26       5.577 -12.980  -5.296  1.00 18.48           C  
ATOM      9  C   HSE A  26       2.538 -12.795  -2.235  1.00 13.15           C  
ATOM     10  O   HSE A  26       2.537 -12.755  -1.031  1.00 13.11           O  
ATOM     11  H1  HSE A  26       3.422 -10.546  -1.337  1.00 17.07           H  
ATOM     12  H2  HSE A  26       2.046 -10.227  -2.189  1.00 17.07           H  
ATOM     13  H3  HSE A  26       3.499  -9.664  -2.729  1.00 17.07           H  
ATOM     14  HA  HSE A  26       2.818 -11.585  -4.047  1.00 14.73           H  
ATOM     15  HB2 HSE A  26       5.049 -12.273  -2.136  1.00 15.52           H  
ATOM     16  HB3 HSE A  26       5.221 -11.068  -3.435  1.00 15.52           H  
ATOM     17  HD2 HSE A  26       5.808 -12.085  -5.855  1.00 18.48           H  
ATOM     18  HE2 HSE A  26       6.146 -14.573  -6.616  1.00 21.97           H  
ATOM     19  HE1 HSE A  26       5.334 -16.170  -4.790  1.00 17.55           H  
ATOM     20  N   PRO A  27       1.965 -13.801  -2.950  1.00 14.19           N  
ATOM     21  CA  PRO A  27       1.227 -14.887  -2.217  1.00 14.75           C  
ATOM     22  CB  PRO A  27       0.797 -15.859  -3.316  1.00 17.54           C  
ATOM     23  CG  PRO A  27       0.763 -15.036  -4.490  1.00 19.69           C  
ATOM     24  CD  PRO A  27       1.755 -13.904  -4.376  1.00 16.62           C  
ATOM     25  C   PRO A  27       2.086 -15.623  -1.216  1.00 13.14           C  
ATOM     26  O   PRO A  27       1.601 -16.109  -0.212  1.00 13.57           O  
ATOM     27  HA  PRO A  27       0.404 -14.463  -1.642  1.00 14.75           H  
ATOM     28  HB2 PRO A  27      -0.187 -16.278  -3.104  1.00 17.54           H  
ATOM     29  HB3 PRO A  27       1.520 -16.668  -3.427  1.00 17.54           H  
ATOM     30  HG2 PRO A  27      -0.239 -14.623  -4.609  1.00 19.69           H  
ATOM     31  HG3 PRO A  27       1.011 -15.643  -5.360  1.00 19.69           H  
ATOM     32  HD2 PRO A  27       2.684 -14.143  -4.893  1.00 16.62           H  
ATOM     33  HD3 PRO A  27       1.343 -12.979  -4.779  1.00 16.62           H

First what I do is find the last row which starts with ATOM and save the field value of the 6th column in this row:
Code:
last=$[$(grep 'ATOM' test.pdb | tail -n1 | awk '{ print $6 }')+1]

Then I want to search for a value I define in 6th column and replace this value by another value. This should be done until I reach the last row which starts with ATOM. Can you somehow use AWK or SED to search specifically for the row? I am new to shell scripting so sorry if the question is too trivial.

Thanks for help,

Max

---------- Post updated at 06:41 PM ---------- Previous update was at 06:18 PM ----------

Sorry the fields are not separated by tabs it is:

Code:
COLUMNS        DATA  TYPE    FIELD        DEFINITION
-------------------------------------------------------------------------------------
 1 -  6        Record name   "ATOM  "
 7 - 11        Integer       serial       Atom  serial number.
13 - 16        Atom          name         Atom name.
17             Character     altLoc       Alternate location indicator.
18 - 20        Residue name  resName      Residue name.
22             Character     chainID      Chain identifier.
23 - 26        Integer       resSeq       Residue sequence number.
27             AChar         iCode        Code for insertion of residues.
31 - 38        Real(8.3)     x            Orthogonal coordinates for X in Angstroms.
39 - 46        Real(8.3)     y            Orthogonal coordinates for Y in Angstroms.
47 - 54        Real(8.3)     z            Orthogonal coordinates for Z in Angstroms.
55 - 60        Real(6.2)     occupancy    Occupancy.
61 - 66        Real(6.2)     tempFactor   Temperature  factor.
77 - 78        LString(2)    element      Element symbol, right-justified.
79 - 80        LString(2)    charge       Charge  on the atom.

I am interested in the row 23 - 26 (Residue sequence number)
# 2  
Old 02-13-2013
Try:

Code:
 awk '/^ATOM/ { v=substr($0,23,4) } END{print v}' file

To replace values try:
Code:
awk -vF=27 -vT=28 '/^ATOM/&&substr($0,23,4)+0==F{$0=substr($0,1,22) sprintf("%4d",T) substr($0,27)}1' file


Last edited by Chubler_XL; 02-13-2013 at 08:40 PM..
# 3  
Old 02-13-2013
Hi,

thanks for your answer. I just figured out a way to do it:
Code:
#!/bin/bash

cat /dev/null > out.txt
cat $2 | while read line; do 
      id=`echo $line | awk '{ print $6 }'`
      atom=`echo $line | awk '{ print $1 }'`
      if ! [[ "$atom" == "ATOM" ]] ; then
      echo "$line" >> out.txt
      else
      if [ $id -gt $1 ]
      then
        rid=$[$id-2]
        echo "$line" | sed "s/$id/$rid/" >> out.txt
      else
        echo "$line" >> out.txt
      fi
      fi
  done

where $1 is the id starting from which i want to change it and $2 the input file name

---------- Post updated at 08:00 PM ---------- Previous update was at 07:51 PM ----------

the only thing what I just realized is that if
Code:
ATOM   1181  N   ASN A 100      10.938  11.671  38.632  1.00  8.17           N

is to be changed to 99 I get

Code:
ATOM   1181  N   ASN A 99      10.938  11.671  38.632  1.00  8.17           N

but i need an additional space Smilie

Code:
ATOM   1181  N   ASN A  99      10.938  11.671  38.632  1.00  8.17           N
ATOM   1181  N   ASN A 99      10.938  11.671  38.632  1.00  8.17           N

# 4  
Old 02-13-2013
Danger with your code is that if the ID appears in another field (like serial number, location or name) It will replace that instead.
# 5  
Old 02-14-2013
Quote:
Originally Posted by Chubler_XL
Danger with your code is that if the ID appears in another field (like serial number, location or name) It will replace that instead.
yeah you are right. I just work this out too

---------- Post updated 02-14-13 at 12:09 AM ---------- Previous update was 02-13-13 at 08:03 PM ----------

I am pretty sure that my code is horrible and that there are 1000x better ways to do what I did but this works now. I took into consideration that the number could appear in another spot so now it is impossible. I thought I just share maybe somebody else get inspired or wants to tell me how to solve it in a more appropriate manor.

Code:
while getopts ":f:o:c:" opt; do
  flags=1
  case $opt in
    f)
      inputfile=$OPTARG
      ;;
    o)
      outputfile=$OPTARG
      ;;
    c)
      if [[ $OPTARG == "" ]]
      then
      	consec=0
      else
      consec=$OPTARG
      fi
      ;;
    \?)
      echo "Invalid option: -$OPTARG" >&2
      exit
      ;;
  esac
done


if [[ $flags == "" ]]
then
	echo "Usage: PDBid_change -f -o [-c]"
	echo "-f input file name"
	echo "-o output file name"
	echo "-c start id from 0 or value"
	exit
fi

if [[ $inputfile == "" ]]
then
	echo "Please provide an input file name (-f filename)"
	exit
else
	if ! [[ -e "$inputfile" ]]
	then
		echo "Input file does not exits"
		exit
	fi
fi

if [[ $outputfile == "" ]]
then
	echo "Please provide an output file name (-o filename)"
	exit
fi

cat /dev/null > $outputfile


function write_pdb() {
if  [[ ${#newid} == ${#position} ]]
then
	if  [[ ${#newid} == 1 ]]
	then
		echo "$line" | sed "s/A   $id/A   $newid/" >> $1
	elif [[ ${#newid} == 2 ]]
	then
		echo "$line" | sed "s/A  $id/A  $newid/" >> $1
	elif [[ ${#newid} == 3 ]]
	then
		echo "$line" | sed "s/A $id/A $newid/" >> $1
	else
		echo "$line" | sed "s/A$id/A$newid/" >> $1
	fi
elif [[ ${#newid} == 1 && ${#position} == 2 ]]
then
	echo "$line" | sed "s/A  $id/A   $newid/" >> $1
elif [[ ${#newid} == 1 && ${#position} == 3 ]]
then
	echo "$line" | sed "s/A $id/A   $newid/" >> $1
elif [[ ${#newid} == 1 && ${#position} == 4 ]]
then
	echo "$line" | sed "s/A$id/A   $newid/" >> $1
elif [[ ${#newid} == 2 && ${#position} == 3 ]]
then
	echo "$line" | sed "s/A $id/A  $newid/" >> $1
elif [[ ${#newid} == 2 && ${#position} == 4 ]]
then
	echo "$line" | sed "s/A$id/A  $newid/" >> $1
else
	echo "$line" | sed "s/A$id/A $newid/" >> $1
fi
}


cat $inputfile | while read line; do 

	atom=`echo $line | awk '{ print $1 }'`
	if  [[ "$atom" != "ATOM" && "$atom" != "TER" ]] 
	then
		echo "$line" >> $outputfile
	else
	 	
	 	if  [[ "$position" == "" ]]
	  	then
	  	position=`echo $line | awk '{ print $6 }'`
	  		if [[ "$consec" != "" ]]
	  		then
      		previous=$consec
      		else
      		previous=`echo $line | awk '{ print $6 }'`-5
      		fi      		
     	fi
     	
     	id=`echo $line | awk '{ print $6 }'`
     	if  [[ "$position" == "$id" ]]
     	then
     		newid=$[$previous+1]
     		write_pdb $outputfile
       	elif [ $atom == "TER" ]
     	then
     		id=`echo $line | awk '{ print $5 }'`
     		write_pdb $outputfile
      	else
      		previous=$newid
      		position=$id
      		newid=$[$previous+1]
      		write_pdb $outputfile     		
      	fi
	fi
done

# 6  
Old 02-14-2013
You could try this for the write_pdb():

Code:
function write_pdb() {
    echo $line | awk -vF=$id -vT=$newid '
        substr($0,23,4)+0==F{$0=substr($0,1,22) sprintf("%4d",T) substr($0,27)}1' >> $1
}

# 7  
Old 02-14-2013
thank you very much. it works like a charm. since i am new to awk could you maybe explain your syntax? as a beginner it is hard to read the awk commands.

in addition to make it do exactly what i want i altered it to:

Code:
echo "$line" | awk -v F=$id -v T=$newid '
        substr($0,23,4)+0==F{$0=substr($0,1,22) sprintf("%4d",T) substr($0,27)}1' >> $1

with "" around $line to keep the format and replace in the right position and a space after -v to set a variable because this is what my version of awk requires.

thank you again and have a nice day
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk search/replace specific field, using variables for regexp & subsitution then overwrite file

Hello, I'm trying the solve the following problem. I have a file which I intend to use as a csv called master.csv The columns are separated by commas. I want to change the text on a specific row in either column 3,4,5 or 6 from xxx to yyy depending upon if column 1 matches a specified pattern.... (3 Replies)
Discussion started by: cyphex
3 Replies

2. Emergency UNIX and Linux Support

Search and replace in text file

Hi, I have gigabytes of text files that I need to search for "&" and replace with "&amp". Is there a way to do this efficiently (like sed command)? Hope you could help. Thanks. (17 Replies)
Discussion started by: daytripper1021
17 Replies

3. Shell Programming and Scripting

Awk Search text string in field, not all in field.

Hello, I am using awk to match text in a tab separated field and am able to do so when matching the exact word. My problem is that I would like to match any sequence of text in the tab-separated field without having to match it all. Any help will be appreciated. Please see the code below. awk... (3 Replies)
Discussion started by: rocket_dog
3 Replies

4. Shell Programming and Scripting

Search replace strings between single quotes in a text file

Hi There... I need to serach and replace a strings in a text file. My file has; books.amazon='Let me read' and the output needed is books.amazon=NONFOUND pls if anybody know this can be done in script sed or awk.. i have a list of different strings to be repced by NONFOUND.... (7 Replies)
Discussion started by: Hiano
7 Replies

5. Shell Programming and Scripting

replace 3rd field of space delimited text file

how to replace the 3rd colum? Each line begins similarly, but they all ends variously. XX YY 03 variable text here XX YY 03 more variable text here XX YY 03 even more variable text here really long setence XX YY 03 variable numbers also appear 03 11. 123 456 XX YY 03 the occasional comma,... (4 Replies)
Discussion started by: ajp7701
4 Replies

6. Shell Programming and Scripting

text file search and replace with awk

hello all greeting for the day i have a text file as the following text.xml abcd<FIELD>123.456</FIELD>efgh i need to replace the value between <FIELD> and </FIELD> by using awk command. please throw some light on this. thank you very very much Erik (5 Replies)
Discussion started by: erikshek
5 Replies

7. Shell Programming and Scripting

search and replace a text in a file

Hi all, I have a requirement where i have to search data between strings 'SELECT' and ';' and replace this text as "SELECT.....;" so that i can export this extracted string into a excel cell. Please suggest on this. (5 Replies)
Discussion started by: goutam_igate
5 Replies

8. UNIX for Dummies Questions & Answers

search and replace a specific text in text file?

I have a text file with following content (3 lines) filename : output.txt first line:12/12/2008 second line:12/12/2008 third line:Y I would like to know how we can replace 'Y' with 'N' in the 3rd line keeping 1st and 2nd lines same as what it was before. I tried using cat output.txt... (4 Replies)
Discussion started by: santosham
4 Replies

9. Shell Programming and Scripting

How to search and replace text in same file

script is as below v_process_run=5 typeset -i p_cnt=0 pdata=/home/proc_data.log while do # execute script in background dummy_test.sh "a1" "a2" & p_cnt=$p_cnt+1 echo "data : $p_cnt : Y" >> $pdata done file created with following data in... (1 Reply)
Discussion started by: Vrgurav
1 Replies

10. Shell Programming and Scripting

automating file search and replace text

Hi, I am trying something like this: Let's say I have a file called File1 with contents: x=-0.3 y=2.1 z=9.0 I have another file, File2, with contents: xx= yy= zz= (nothing after "="). What I want to do is get the value of x in File1 and set it to xx in File2, i.e., xx=-0.3. And the... (3 Replies)
Discussion started by: ommatidia
3 Replies
Login or Register to Ask a Question