I do not get the modified output and the file is unchanged. Apparently sed is not able to match the pattern in the file. There are any number of non-standard characters in the data so I don't know if that is an issue or not. The difference that I can see is that when I assign new_name="0_(1R)-1,2,3,3-tetraamino-2-propen-1-ol", I am able to quote the string but when I assign current_name="${FIELD[1]}" I am not able to quote/escape special characters like ( in the string.
It seems like I just am missing some combination of single and double quotes to do the job but I haven't been able to progress past this.
Suggestions would be appreciated.
LMHmedchem
Last edited by LMHmedchem; 02-28-2018 at 01:56 AM..
What's the contents of ${FIELD[1]}? How did you define it?
The script begins by reading a file and looking for duplicate values in a specific column. These are retrieved by,
Code:
# set input field separator to newline so each line is stored in an array element
IFS=$'\n'
# use sort and uniq to output duplicate lines in to array
dup_list=( $(cat "$base_file" | sort -k2 | uniq -f1 -D) )
so the file is sorted on column 2 and uniq ignores the first column.
Then I iterate over the array to parse the lines and capture individual names,
Code:
for dup_name in "${dup_list[@]}"
do
# parse on tab
unset FIELD; IFS=$'\t' read -a FIELD <<< "$dup_name"
# assign second column to current name
current_name="${FIELD[1]}"
done
When I echo $current_name I get the correct value but it doesn't work with the sed command I posted.
Attention: sed uses RE, so any RE-special character or the / separator will cause a malfunction.
Is the goal to index all the duplicates?
Then consider this robust awk solution
Code:
awk '
BEGIN { FS=OFS="\t" }
NR==FNR { if (dup[$2]++==1) dup[$2]++; next }
dup[$2]>1 { $2=("dup_" --dup[$2]-1 "_" $2) }
{ print }
' input input
With a trick the dup array discovers the duplicates AND and counts the index (backwards though).
This User Gave Thanks to MadeInGermany For This Post:
One problem might be that your input data do have DOS line terminators (<CR> = 0x0D = ^M = \r); did you try without?
These are unix files, so there shouldn't be an issue with EOL. I am a bit mystified as to why it works from the command line but not from my script.
Quote:
Originally Posted by RudiC
BTW, your approach seems somewhat complicated. Does it do anything else or is its sole purpose to add a counter to the first instance of duplicates?
There are a number of things that need to be done. I need to identify and re-name duplicates in several files. Every name needs to be unique, so I am finding the dups and adding an indexed prefix to each instance. There could be more than one duplicate string.
Quote:
Originally Posted by MadeInGermany
Attention: sed uses RE, so any RE-special character or the / separator will cause a malfunction.
I suspect that something like this may be the issue but I am not sure why RudiC is able to run it.
Quote:
Originally Posted by MadeInGermany
Is the goal to index all the duplicates?
I think your code would work well if I only had one file to change. I need to change the name and then look up the name in several other files and propagate the change so that all files have the revised name.
This is the current script
Code:
#!/bin/bash# base file to check for duplicate names
base_file=$1
# set input field separator to newline so each line is stored in an array element
IFS=$'\n'
# use sort and uniq to output duplicate lines in to array
dup_list=( $(cat "$base_file" | sort -k2 | uniq -f1 -D) )
# count for indexed name
name_count=0
# to identify when we have a new duplicate
current_dup=''
# loop on duplicate names
for dup_name in "${dup_list[@]}"
do
# use second field for name
unset FIELD; IFS=$'\t' read -a FIELD <<< "$dup_name"
# set name value
current_name="${FIELD[1]}"
# if no current dup has been set
if [ "$current_dup" == "" ]; then
# set base to check for new duplicate
current_dup=$current_name
#create new dup name
# name count is already 0 so no need to increment
new_name='dup_'$name_count'_'$current_name
# if the current name matches the current dup, increment counter
elif [ "$current_dup" == "$current_name" ]; then
# increment counter
name_count=$((name_count+1))
# create name based on incremented counter
new_name='dup_'$name_count'_'$current_name
# if there is a new dup series
elif [ "$current_dup" != "$current_name" ]; then
# set base to new duplicate
current_dup=$current_name
# reset name counter
name_count=0
#create new dup name
new_name='dup_'$name_count'_'$current_name
fi
# test print
echo $new_name
# find first instance of dup name in base file and replace
sed "0,/$current_name/s//$new_name/" $base_file > 'revised_'$base_file
# make changes in other files
done
When I run this on the attached file test_base.txt, I get the printed output I expect,
All of the duplicates are identified and renamed with an indexed prefix. This works very fast and I have each duplicate name in scope in the do loop where I can work on other files.
At this point I am not able to make changes in other files, which is annoying. I am sure that the logic above is overly complex.
Could the problem also be how I am reading the data into the array?
Bash version 4.4.20 / Ubuntu 16.0.4
Hello,
I tried to write a script that gathers some data and passes them to an executable.
The executed application answers with an error. The echo output in the script returns correct values.
If I copy/paste the last echo command, it get's executed... (2 Replies)
Hello!
I have a problem to insert variables with sed... And I can't find the solution. :confused:
I would like to display one/few line(s) between 2 values.
This line works well
sed -n '/Dec 12 10:42/,/Dec 12 10:47/p'
Thoses lines with variables doesn't work and I don't find the... (2 Replies)
$ x="/home/guru/temp/f1.txt"
$ echo $x | sed 's^.*/^^'
This will give the absolute path f1.txt. I don't understand WHY it works. How is it determining the last "/" character exactly? (7 Replies)
I am writing perl script to configure Cisco device but Variables inside Net::Telnet::Cisco Module doesn't work and passed to device without resolving.
Please advise.
here is a sample of script:
use Net::Telnet::Cisco;
$device = "10.14.199.1";
($o1, $o2, $o3, $o4) = split(/\./,$device);... (5 Replies)
...
declare vINIFILE
vINIFILE=$1
...
echo "The name of the File is $vINIFILE" >>mail_tmp
echo "" >> mail_tmp.$$
...
grep RUNJOB=0 $vINIFILE >>tmp_filter
...
So the strange is in echo-statement I get the correct output for $vINIFILE wrtitten into the file mail_tmp. But the... (2 Replies)
I am writing a script with a sed call that needs to use a variable and still have quotations be present in the substitution.
Example:
sed -i "s/Replacable.\+$/Replaced="root@$VAR"/g"
this outputs:
where $VAR = place
Replaced=root@place
and i need
Replaced="root@place"
... (2 Replies)
GNU sed version 4.1.4 on Windows XP SP3 from GnuWin32
I think that I've come across a seemingly simple text file change problem on a INI formatted file that I can't do with SED without side effects edge cases biting me. I've tried to think of various ways of doing this elegantly and quickly... (5 Replies)
Hi....
cd command is not working when dual string drive/volume name is passed to cd through variables.......
For Ex....
y=/Volumes/Backup\ vipin/
cd $y
the above command gives error.......
anyone with a genuine solution ? (16 Replies)
I am trying to write a simple script which will take a variable with sed to take a line out of a text and display it
#!/bin/sh
exec 3<list
while read list<&3
do
echo $list
sed -n '$list p'<list2
done
this does not work, yet when I replace the $list variable from the sed command and... (1 Reply)
The following seems quite basic but does not seem to work. Anybody know why?
$ g=1
$ echo $g
1
$ echo abc$g
abc1
$ abc$g=hello
ksh: abc1=hello: not found
$ echo $abc1
ksh: abc1: parameter not set
It works when I specify the full variable name
$ abc1=hello
$ echo $abc1
hello
... (2 Replies)