Script to process a list of items and uncomment lines with that item in a second file

01-08-2020

Registered User

362, 16

Join Date: Mar 2010

Last Activity: 3 March 2020, 10:38 PM EST

Location: Boston

Posts: 362

Thanks Given: 193

Thanked 16 Times in 15 Posts

Script to process a list of items and uncomment lines with that item in a second file

Hello,

I have a src code file where I need to uncomment many lines.

The lines I need to uncomment look like,

Code:

C      CALL l_r(DESNAME,DESOUT, 'Gmax',     ESH(10),  NO_APP, JJ)

The comment is the "C" in the first column. This needs to be deleted so that there are 6 spaces preceding "CALL". The key on this line is 'Gmax'. This lets me know that the line needs to be uncommented.

I have a list of such keys

Code:

Gmax
Gmin
HS10
HS2
HS9
Hmax

Each key (including the single quotes) will occur only once in the src file being processed. I need to process the list file to look in the src file and uncomment the proper lines. There are about 300 keys in the list file.

This is what I tried,

Code:

#! /bin/bash

# file with list of items to find in modify_file
list_file=$1
# file for which a modified copy will be created
modify_file=$2
# final output file
new_file='new_'$modify_file

# copy the file being modified
cp -f $modify_file  work_on_file.txt

# loop through all the lines in the list_file
while IFS= read -r line
do

   # add single quotes
   look_for=\'$line\'

   echo "looking for line with " $look_for

   # look for the value in the current line
   # if the value is found on a line, print the substring of $0 skipping the first character
   # if the value is not on the line, just print the line
   awk -v find_to_modify=$look_for ' { if ($0 ~ find_to_modify)
                                          { print substr($0,2); }
                                       else
                                          { print $0; }
                                     } ' work_on_file.txt > new_file

   # rename the modified file from this loop for the next loop
   mv new_file  work_on_file.txt

done < "$list_file"

# rename final copy
mv work_on_file.txt  $new_file

This simply reads the list file and one at a time looks for the items in the file to modify. The awk code looks for the presence of the key on each line (including the single quotes) and if found prints the substring skipping the first character. When the key is not found on the line, the line is printed unmodified. After the key is processed, the akw modified file is renamed to be the file awk is working on for the next loop.

This works as far as I can tell. I am writing an entire copy of the modified file for each key in the list file, so this is not very efficient. The file renaming at the end of the loop is a bit kludgy as well. This only takes about 7 seconds to run, so maybe I am being picky and should just let it be but I thought I would ask if there were other suggestions.

LMHmedchem

LMHmedchem

View Public Profile for LMHmedchem

Find all posts by LMHmedchem

01-09-2020

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Try

Code:

awk -v"SQ='" '
FNR == NR       {PAT[NR]="^C.*" SQ $0 SQ
                 MX = NR
                 next
                }
                {for (i=1; i<=MX; i++) if ($0 ~ PAT[i]) {sub (/^./,_)
                                                         break
                                                        }
                }
1
' list_file modify_file

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

01-09-2020

Registered User

5,091, 1,931

Join Date: May 2012

Last Activity: 15 July 2020, 4:46 AM EDT

Location: Simplicity

Posts: 5,091

Thanks Given: 565

Thanked 1,931 Times in 1,668 Posts

Or do all in bash - here is a well documented bash version:

Code:

#! /bin/bash
# bash v3 or higher

# file with list of items to find in modify_file
list_file=$1
# file for which a modified copy will be created
modify_file=$2
# final output file
new_file='new_'$modify_file

echo "reading values from '$list_file'"
# clear (and implicityly declare) an array
list=()
while read line
do
  echo "adding '$line' to list"
  list+=("$line")
done < $list_file

echo "reading '$modify_file'"
# loop through all the lines in the list_file
while IFS= read -r line <&3
do
   # look for the value in the current line
   # if the value is found on a line, print the substring of $0 skipping the first character
   # otherwise just print the line

   echo "processing line $((++linecnt))"
    # preset the new line
   outline=$line
   # loop through the array values
   for i in "${list[@]}"
   do
      look_for="'$i'"
      # =~ is ERE match, == is glob match
      # $look_for in quotes --> do not interpret special characters in it
      #if [[ $line =~ "$look_for" ]]
      if [[ $line == *"$look_for"* ]]
      then
         outline=${line:1}
         # further matches will do the same, so we can break the loop here
         break
      fi
   done
   # print the new line
   printf "%s\n" "$outline" >&4
done 3< "$modify_file" 4> "$new_file"

Last edited by MadeInGermany; 01-09-2020 at 06:09 AM.. Reason: break added, cosmetic changes

This User Gave Thanks to MadeInGermany For This Post:

MadeInGermany

View Public Profile for MadeInGermany

Find all posts by MadeInGermany

01-09-2020

Moderator

3,843, 841

Join Date: Jun 2007

Last Activity: 29 June 2020, 12:30 PM EDT

Location: Lancashire, UK

Posts: 3,843

Thanks Given: 2,004

Thanked 841 Times in 727 Posts

One line sed option

How about using a single line sed like this:-

Code:

sed "s/^C\(      CALL.*'\(Gmax\|Gmin\|HS10\|HS2\|HS9\|Hmax\)'.*\)/\1/" source_file_name > target_file_name

It's a little messy to read, so:-

The s command calls substitution
There is the start of line marker with ^ and then the literal character C that we want to remove if we match the condition between the first / pair
The escaped brackets \( and \) wrap a section of the line matched so we can use it later. There is only 1 such grouping in this regular expression.
There are the six spaces and the literal word you want to be sure you are matching so we then get CALL (six leading spaces)
We then don't care much about what the next part of the line looks like, so we use a single wildcard character . and the following * repeats for zero or more, so any number of characters
We then have the literal text 'Gmax' to look for. The ' characters is a literal because the expression is wrapp with ". The alternate strings you have need to be grouped and alternated The group is wrapped (again) with and escaped bracket, so \( and \) and the strings listed inside. The alternator separator | also has to be escaped, hence you end up with this part being \ to avoid being interpreted. We want the literal characters
We then have the same .* as above to match the rest of the line and end the group with )
After the separating / that shows the end of the expression we have the start of what to substitute it with. We substitute the lines matched with the first group we matched, i.e. the bit in ( and ) above. Here we use \1 to represent the first (only) matched part, which is everything excluding the leading C as required. For completeness, you also have the following available to you:-
- \0 - the entire original record matched
- \1 - the first group matched (in this case the entire line excluding the leading C
- \2 - the second group match, in this case one of Gmax, Gmin etc. as matched, if that's useful in any way.
Unmatched strings (not a leading C or not containing Gmax or whatever) are just printed as they are.

Does this meet your need? Does the explanation make sense?

You could be brave and use the -i flag and no target file to just update the source file, but I'd recommend testing it first to make sure you are happy.

If the list of alternates is getting overly complex, you could pout them in a reference file, one line at a time and build the list for your command, something like:-

Code:

#!/bin/bash

item_list=""
while read item
do
   if [ "${item_list}" = "" ]
   then
      item_list="${item}"
   else
      item_list="${item_list}\|${item}"
   fi
done < reference_file_name

echo "${item_list}"              # Just so you can see

sed "s/^C\(      CALL.*'\(${item_list}\)'.*\)/\1/" source_file_name > target_file_name

Perhaps run this with bash -xv your_script_name to check what it's doing.

I hope that this helps,
Robin

Last edited by rbatte1; 01-09-2020 at 11:47 AM..

These 3 Users Gave Thanks to rbatte1 For This Post:

rbatte1

View Public Profile for rbatte1

Visit rbatte1's homepage!

Find all posts by rbatte1

01-09-2020

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Nice approach indeed!
Could be curtailed to

Code:

sed -r "/'($(paste -sd\| file2))'/s/^C//" file1

, including the item list file as well.

These 4 Users Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

01-09-2020

Moderator

3,843, 841

Join Date: Jun 2007

Last Activity: 29 June 2020, 12:30 PM EDT

Location: Lancashire, UK

Posts: 3,843

Thanks Given: 2,004

Thanked 841 Times in 727 Posts

I defer to your superior offering (and I will steal it if I ever have a similar need

)

Kindest regards,
Robin

rbatte1

View Public Profile for rbatte1

Visit rbatte1's homepage!

Find all posts by rbatte1

01-14-2020

Registered User

362, 16

Join Date: Mar 2010

Last Activity: 3 March 2020, 10:38 PM EST

Location: Boston

Posts: 362

Thanks Given: 193

Thanked 16 Times in 15 Posts

Quote:

Originally Posted by RudiC

Nice approach indeed!
Could be curtailed to

Code:

sed -r "/'($(paste -sd\| file2))'/s/^C//" file1

, including the item list file as well.

I went with this method inserted into a script. It worked well (and very quickly) the first time I tried it, but there was no output the second time. I will have to investigate what I did there.

I also made a second try before there were any responses here. This ended up looking more like the code posted by MadeInGermany where I read in the file to be modified and stored it in an array. I then did a double loop with the outside loop being my list file and the inside loop being the array with the file to be modified. Each item in the list was searched against the lines in the array. If a match was found, the array element was modified to remove the comment and then there was a break in the inner loop. The modified array was printed at the end. This approach means that each file is read in once and the output was written once, instead of once for each list item.

It seems to me that sed must be doing more or less the same thing under the hood. Every list item must be checked against every item in the file to be modified, at least until a match is found. I wasn't able to rationalize if it was more efficient to have one or the other file be the inner loop. The only approach I could think of that would be faster would be to identify the 'Gmax" value on each line of the file to be modified and then loop up that value in a map holding the list. That would, however, involve much more significant parsing of the lines to extract the 'Gmax' value. It is very nice to have a glob match, especially when there isn't a clear and consistent delimiter. If the list was the inner loop, you could delete each array element when a match was found and thus shorten the search as the process continues but deleting and shifting around array elements also takes resources.

Does anyone know what sed is doing to achieve the result so quickly? Is it mainly that is is using compiled code?

LMHmedchem

LMHmedchem

View Public Profile for LMHmedchem

Find all posts by LMHmedchem

Shell Programming and Scripting

Script to process a list of items and uncomment lines with that item in a second file

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Pass an array to awk to sequentially look for a list of items in a file

Discussion started by: LMHmedchem

2. Shell Programming and Scripting

Read a lis, find items in a file from the list, change each item

Discussion started by: LMHmedchem

3. Shell Programming and Scripting

Need help with a script to grep items in one file from another file

Discussion started by: Sam R

4. UNIX for Dummies Questions & Answers

Easiest way to comment/uncomment a shell script?

Discussion started by: saggiboy10

5. Shell Programming and Scripting

Removing lines from a file being used by another process using SHELL script

Discussion started by: nikhil8

6. Shell Programming and Scripting

[Perl] Split lines into array - variable line items - variable no of lines.

Discussion started by: ejdv

7. Shell Programming and Scripting

List content of item in the combobox

Discussion started by: chezy

8. Shell Programming and Scripting

Help needed regarding first 3 items in the list

Discussion started by: skpvalvekar

9. Shell Programming and Scripting

comment and uncomment a line with Shell Script

Discussion started by: anthonyraj75

10. Shell Programming and Scripting

retrieve what the currently selected item is in a dropdown select list using perl tk

Discussion started by: lassimanji