Problem getting sed to work with variables


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Problem getting sed to work with variables
# 1  
Old 02-28-2018
Problem getting sed to work with variables

Hello,

I am processing text files looking for a string and replacing the first occurrence of the string with something else.

For the text,
Code:
id	Name
1	methyl-(2-methylpropoxy)-oxoammonium
2	N-amino-N-(methylamino)-2-nitrosoethanamine
3	3-methoxy-3-methyloxazolidin-3-ium
4	1,3-dihydroxypropan-2-yl-methyl-methyleneammonium
5	(1R)-1,2,3,3-tetraamino-2-propen-1-ol
6	2-(ethoxyamino)guanidine
7	O-[(2S)-2-aminoazopropyl]hydroxylamine
8	N-$l^{1}-oxidanyl-N-[(2-methylpropan-2-yl)oxy]methanamine
9	(1R)-1,2,3,3-tetraamino-2-propen-1-ol
10	1-amino-1-ethoxyguanidine

I am replacing the first instance of (1R)-1,2,3,3-tetraamino-2-propen-1-ol with 0_(1R)-1,2,3,3-tetraamino-2-propen-1-ol

If I do the following in sed,

sed '0,/(1R)-1,2,3,3-tetraamino-2-propen-1-ol/s//0_(1R)-1,2,3,3-tetraamino-2-propen-1-ol/' input > output.txt

I get the necessary results.

If I add variables to the command line,
Code:
current_name="(1R)-1,2,3,3-tetraamino-2-propen-1-ol";
new_name="0_(1R)-1,2,3,3-tetraamino-2-propen-1-ol";
sed -e "0,/$current_name/s//$new_name/" input > output.txt

I still get the necessary results. When, however, I assign current_name and new_name from a bash array and other bash variables,

current_name="${FIELD[1]}"
new_name='dup_'$name_count'_'$current_name

I do not get the modified output and the file is unchanged. Apparently sed is not able to match the pattern in the file. There are any number of non-standard characters in the data so I don't know if that is an issue or not. The difference that I can see is that when I assign new_name="0_(1R)-1,2,3,3-tetraamino-2-propen-1-ol", I am able to quote the string but when I assign current_name="${FIELD[1]}" I am not able to quote/escape special characters like ( in the string.

It seems like I just am missing some combination of single and double quotes to do the job but I haven't been able to progress past this.

Suggestions would be appreciated.

LMHmedchem

Last edited by LMHmedchem; 02-28-2018 at 01:56 AM..
# 2  
Old 02-28-2018
What's the contents of ${FIELD[1]}? How did you define it?
This User Gave Thanks to RudiC For This Post:
# 3  
Old 02-28-2018
Quote:
Originally Posted by RudiC
What's the contents of ${FIELD[1]}? How did you define it?
The script begins by reading a file and looking for duplicate values in a specific column. These are retrieved by,
Code:
# set input field separator to newline so each line is stored in an array element
IFS=$'\n'
# use sort and uniq to output duplicate lines in to array
dup_list=( $(cat "$base_file" | sort -k2 | uniq  -f1 -D) )

so the file is sorted on column 2 and uniq ignores the first column.

For the above data the output would be,
Code:
5	(1R)-1,2,3,3-tetraamino-2-propen-1-ol
9	(1R)-1,2,3,3-tetraamino-2-propen-1-ol

Then I iterate over the array to parse the lines and capture individual names,
Code:
for dup_name in "${dup_list[@]}"
do
   # parse on tab
   unset FIELD; IFS=$'\t' read -a FIELD <<< "$dup_name"
   # assign second column to current name
   current_name="${FIELD[1]}"
done

When I echo $current_name I get the correct value but it doesn't work with the sed command I posted.

LMHmedchem
# 4  
Old 02-28-2018
Your code(s) are working for me:
Code:
current_name="${FIELD[1]}"
sed -e "0,/$current_name/s//$new_name/" $base_file 
id    Name
1    methyl-(2-methylpropoxy)-oxoammonium
2    N-amino-N-(methylamino)-2-nitrosoethanamine
3    3-methoxy-3-methyloxazolidin-3-ium
4    1,3-dihydroxypropan-2-yl-methyl-methyleneammonium
5    dup_0_(1R)-1,2,3,3-tetraamino-2-propen-1-ol
6    2-(ethoxyamino)guanidine
7    O-[(2S)-2-aminoazopropyl]hydroxylamine
8    N-$l^{1}-oxidanyl-N-[(2-methylpropan-2-yl)oxy]methanamine
9    (1R)-1,2,3,3-tetraamino-2-propen-1-ol
10    1-amino-1-ethoxyguanidine

One problem might be that your input data do have DOS line terminators (<CR> = 0x0D = ^M = \r); did you try without?

BTW, your approach seems somewhat complicated. Does it do anything else or is its sole purpose to add a counter to the first instance of duplicates?

Last edited by RudiC; 02-28-2018 at 02:07 PM..
This User Gave Thanks to RudiC For This Post:
# 5  
Old 02-28-2018
Attention: sed uses RE, so any RE-special character or the / separator will cause a malfunction.
Is the goal to index all the duplicates?
Then consider this robust awk solution
Code:
awk '
  BEGIN { FS=OFS="\t" }
  NR==FNR { if (dup[$2]++==1) dup[$2]++; next }
  dup[$2]>1 { $2=("dup_" --dup[$2]-1 "_" $2) }
  { print }
' input input

With a trick the dup array discovers the duplicates AND and counts the index (backwards though).
This User Gave Thanks to MadeInGermany For This Post:
# 6  
Old 02-28-2018
Similar approach:
Code:
 awk '{LINE[NR] = $0; CNT[$2]++} END {for (i=1; i<=NR; i++) {$0 = LINE[i]; if (CNT[$2]-- > 1) $2 = "0_" $2; print}}' OFS="\t" file

This User Gave Thanks to RudiC For This Post:
# 7  
Old 02-28-2018
Quote:
Originally Posted by RudiC
One problem might be that your input data do have DOS line terminators (<CR> = 0x0D = ^M = \r); did you try without?
These are unix files, so there shouldn't be an issue with EOL. I am a bit mystified as to why it works from the command line but not from my script.

Quote:
Originally Posted by RudiC
BTW, your approach seems somewhat complicated. Does it do anything else or is its sole purpose to add a counter to the first instance of duplicates?
There are a number of things that need to be done. I need to identify and re-name duplicates in several files. Every name needs to be unique, so I am finding the dups and adding an indexed prefix to each instance. There could be more than one duplicate string.

Quote:
Originally Posted by MadeInGermany
Attention: sed uses RE, so any RE-special character or the / separator will cause a malfunction.
I suspect that something like this may be the issue but I am not sure why RudiC is able to run it.

Quote:
Originally Posted by MadeInGermany
Is the goal to index all the duplicates?
I think your code would work well if I only had one file to change. I need to change the name and then look up the name in several other files and propagate the change so that all files have the revised name.

This is the current script

Code:
#!/bin/bash

# base file to check for duplicate names
base_file=$1

# set input field separator to newline so each line is stored in an array element
IFS=$'\n'
# use sort and uniq to output duplicate lines in to array
dup_list=( $(cat "$base_file" | sort -k2 | uniq  -f1 -D) )

# count for indexed name
name_count=0
# to identify when we have a new duplicate
current_dup=''

# loop on duplicate names
for dup_name in "${dup_list[@]}"
do

   # use second field for name
   unset FIELD; IFS=$'\t' read -a FIELD <<< "$dup_name"

   # set name value
   current_name="${FIELD[1]}"

   # if no current dup has been set
   if [ "$current_dup" == "" ]; then
      # set base to check for new duplicate
      current_dup=$current_name
      #create new dup name
      # name count is already 0 so no need to increment
      new_name='dup_'$name_count'_'$current_name
   # if the current name matches the current dup, increment counter
   elif [ "$current_dup" == "$current_name" ]; then
      # increment counter
      name_count=$((name_count+1))
      # create name based on incremented counter
      new_name='dup_'$name_count'_'$current_name
   # if there is a new dup series
   elif [ "$current_dup" != "$current_name" ]; then
      # set base to new duplicate
      current_dup=$current_name
      # reset name counter
      name_count=0
      #create new dup name
      new_name='dup_'$name_count'_'$current_name
   fi

   # test print
   echo $new_name
 
   # find first instance of dup name in base file and replace
   sed "0,/$current_name/s//$new_name/" $base_file > 'revised_'$base_file

   # make changes in other files

done

When I run this on the attached file test_base.txt, I get the printed output I expect,
Code:
dup_0_(1R)-1,2,3,3-tetraamino-2-propen-1-ol
dup_1_(1R)-1,2,3,3-tetraamino-2-propen-1-ol
dup_2_(1R)-1,2,3,3-tetraamino-2-propen-1-ol
dup_0_2-[2-hydroxyethyl(methyl)amino]ethanol
dup_1_2-[2-hydroxyethyl(methyl)amino]ethanol
dup_2_2-[2-hydroxyethyl(methyl)amino]ethanol

All of the duplicates are identified and renamed with an indexed prefix. This works very fast and I have each duplicate name in scope in the do loop where I can work on other files.

At this point I am not able to make changes in other files, which is annoying. I am sure that the logic above is overly complex.

Could the problem also be how I am reading the data into the array?

LMHmedchem
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

[Bash] passing variables to executable doesn't work

Bash version 4.4.20 / Ubuntu 16.0.4 Hello, I tried to write a script that gathers some data and passes them to an executable. The executed application answers with an error. The echo output in the script returns correct values. If I copy/paste the last echo command, it get's executed... (2 Replies)
Discussion started by: sushi2k7
2 Replies

2. Shell Programming and Scripting

Problem with variables in sed

Hello! I have a problem to insert variables with sed... And I can't find the solution. :confused: I would like to display one/few line(s) between 2 values. This line works well sed -n '/Dec 12 10:42/,/Dec 12 10:47/p' Thoses lines with variables doesn't work and I don't find the... (2 Replies)
Discussion started by: Castelior
2 Replies

3. UNIX for Dummies Questions & Answers

Why does this SED example work?

$ x="/home/guru/temp/f1.txt" $ echo $x | sed 's^.*/^^' This will give the absolute path f1.txt. I don't understand WHY it works. How is it determining the last "/" character exactly? (7 Replies)
Discussion started by: glev2005
7 Replies

4. Shell Programming and Scripting

Perl variables inside Net::Telnet::Cisco Module doesn't work

I am writing perl script to configure Cisco device but Variables inside Net::Telnet::Cisco Module doesn't work and passed to device without resolving. Please advise. here is a sample of script: use Net::Telnet::Cisco; $device = "10.14.199.1"; ($o1, $o2, $o3, $o4) = split(/\./,$device);... (5 Replies)
Discussion started by: ahmed_zaher
5 Replies

5. Shell Programming and Scripting

reading external variables does not work

... declare vINIFILE vINIFILE=$1 ... echo "The name of the File is $vINIFILE" >>mail_tmp echo "" >> mail_tmp.$$ ... grep RUNJOB=0 $vINIFILE >>tmp_filter ... So the strange is in echo-statement I get the correct output for $vINIFILE wrtitten into the file mail_tmp. But the... (2 Replies)
Discussion started by: ABE2202
2 Replies

6. Shell Programming and Scripting

Sed with variables problem

I am writing a script with a sed call that needs to use a variable and still have quotations be present in the substitution. Example: sed -i "s/Replacable.\+$/Replaced="root@$VAR"/g" this outputs: where $VAR = place Replaced=root@place and i need Replaced="root@place" ... (2 Replies)
Discussion started by: mcdef
2 Replies

7. Shell Programming and Scripting

SED 4.1.4 - INI File Change Problem in Variables= in Specific [Sections] (Guru Help)

GNU sed version 4.1.4 on Windows XP SP3 from GnuWin32 I think that I've come across a seemingly simple text file change problem on a INI formatted file that I can't do with SED without side effects edge cases biting me. I've tried to think of various ways of doing this elegantly and quickly... (5 Replies)
Discussion started by: JakFrost
5 Replies

8. Shell Programming and Scripting

cd command doesn't work through variables

Hi.... cd command is not working when dual string drive/volume name is passed to cd through variables....... For Ex.... y=/Volumes/Backup\ vipin/ cd $y the above command gives error....... anyone with a genuine solution ? (16 Replies)
Discussion started by: vipinchauhan222
16 Replies

9. UNIX for Dummies Questions & Answers

sed command not work with variables?

I am trying to write a simple script which will take a variable with sed to take a line out of a text and display it #!/bin/sh exec 3<list while read list<&3 do echo $list sed -n '$list p'<list2 done this does not work, yet when I replace the $list variable from the sed command and... (1 Reply)
Discussion started by: MaestroRage
1 Replies

10. UNIX for Dummies Questions & Answers

Working with Script variables; seems like this should work...

The following seems quite basic but does not seem to work. Anybody know why? $ g=1 $ echo $g 1 $ echo abc$g abc1 $ abc$g=hello ksh: abc1=hello: not found $ echo $abc1 ksh: abc1: parameter not set It works when I specify the full variable name $ abc1=hello $ echo $abc1 hello ... (2 Replies)
Discussion started by: Chong Lee
2 Replies
Login or Register to Ask a Question