Script to process a list of items and uncomment lines with that item in a second file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Script to process a list of items and uncomment lines with that item in a second file
# 1  
Old 01-08-2020
Script to process a list of items and uncomment lines with that item in a second file

Hello,

I have a src code file where I need to uncomment many lines.

The lines I need to uncomment look like,

Code:
C      CALL l_r(DESNAME,DESOUT, 'Gmax',     ESH(10),  NO_APP, JJ)

The comment is the "C" in the first column. This needs to be deleted so that there are 6 spaces preceding "CALL". The key on this line is 'Gmax'. This lets me know that the line needs to be uncommented.

I have a list of such keys
Code:
Gmax
Gmin
HS10
HS2
HS9
Hmax

Each key (including the single quotes) will occur only once in the src file being processed. I need to process the list file to look in the src file and uncomment the proper lines. There are about 300 keys in the list file.

This is what I tried,
Code:
#! /bin/bash

# file with list of items to find in modify_file
list_file=$1
# file for which a modified copy will be created
modify_file=$2
# final output file
new_file='new_'$modify_file

# copy the file being modified
cp -f $modify_file  work_on_file.txt

# loop through all the lines in the list_file
while IFS= read -r line
do

   # add single quotes
   look_for=\'$line\'

   echo "looking for line with " $look_for

   # look for the value in the current line
   # if the value is found on a line, print the substring of $0 skipping the first character
   # if the value is not on the line, just print the line
   awk -v find_to_modify=$look_for ' { if ($0 ~ find_to_modify)
                                          { print substr($0,2); }
                                       else
                                          { print $0; }
                                     } ' work_on_file.txt > new_file

   # rename the modified file from this loop for the next loop
   mv new_file  work_on_file.txt

done < "$list_file"

# rename final copy
mv work_on_file.txt  $new_file

This simply reads the list file and one at a time looks for the items in the file to modify. The awk code looks for the presence of the key on each line (including the single quotes) and if found prints the substring skipping the first character. When the key is not found on the line, the line is printed unmodified. After the key is processed, the akw modified file is renamed to be the file awk is working on for the next loop.

This works as far as I can tell. I am writing an entire copy of the modified file for each key in the list file, so this is not very efficient. The file renaming at the end of the loop is a bit kludgy as well. This only takes about 7 seconds to run, so maybe I am being picky and should just let it be but I thought I would ask if there were other suggestions.

LMHmedchem
# 2  
Old 01-09-2020
Try


Code:
awk -v"SQ='" '
FNR == NR       {PAT[NR]="^C.*" SQ $0 SQ
                 MX = NR
                 next
                }
                {for (i=1; i<=MX; i++) if ($0 ~ PAT[i]) {sub (/^./,_)
                                                         break
                                                        }
                }
1
' list_file modify_file

This User Gave Thanks to RudiC For This Post:
# 3  
Old 01-09-2020
Or do all in bash - here is a well documented bash version:
Code:
#! /bin/bash
# bash v3 or higher

# file with list of items to find in modify_file
list_file=$1
# file for which a modified copy will be created
modify_file=$2
# final output file
new_file='new_'$modify_file

echo "reading values from '$list_file'"
# clear (and implicityly declare) an array
list=()
while read line
do
  echo "adding '$line' to list"
  list+=("$line")
done < $list_file

echo "reading '$modify_file'"
# loop through all the lines in the list_file
while IFS= read -r line <&3
do
   # look for the value in the current line
   # if the value is found on a line, print the substring of $0 skipping the first character
   # otherwise just print the line

   echo "processing line $((++linecnt))"
    # preset the new line
   outline=$line
   # loop through the array values
   for i in "${list[@]}"
   do
      look_for="'$i'"
      # =~ is ERE match, == is glob match
      # $look_for in quotes --> do not interpret special characters in it
      #if [[ $line =~ "$look_for" ]]
      if [[ $line == *"$look_for"* ]]
      then
         outline=${line:1}
         # further matches will do the same, so we can break the loop here
         break
      fi
   done
   # print the new line
   printf "%s\n" "$outline" >&4
done 3< "$modify_file" 4> "$new_file"


Last edited by MadeInGermany; 01-09-2020 at 06:09 AM.. Reason: break added, cosmetic changes
This User Gave Thanks to MadeInGermany For This Post:
# 4  
Old 01-09-2020
One line sed option

How about using a single line sed like this:-
Code:
sed "s/^C\(      CALL.*'\(Gmax\|Gmin\|HS10\|HS2\|HS9\|Hmax\)'.*\)/\1/" source_file_name > target_file_name

It's a little messy to read, so:-
  • The s command calls substitution
  • There is the start of line marker with ^ and then the literal character C that we want to remove if we match the condition between the first / pair
  • The escaped brackets \( and \) wrap a section of the line matched so we can use it later. There is only 1 such grouping in this regular expression.
  • There are the six spaces and the literal word you want to be sure you are matching so we then get CALL (six leading spaces)
  • We then don't care much about what the next part of the line looks like, so we use a single wildcard character . and the following * repeats for zero or more, so any number of characters
  • We then have the literal text 'Gmax' to look for. The ' characters is a literal because the expression is wrapp with ". The alternate strings you have need to be grouped and alternated The group is wrapped (again) with and escaped bracket, so \( and \) and the strings listed inside. The alternator separator | also has to be escaped, hence you end up with this part being \ to avoid being interpreted. We want the literal characters
  • We then have the same .* as above to match the rest of the line and end the group with )
  • After the separating / that shows the end of the expression we have the start of what to substitute it with. We substitute the lines matched with the first group we matched, i.e. the bit in ( and ) above. Here we use \1 to represent the first (only) matched part, which is everything excluding the leading C as required. For completeness, you also have the following available to you:-
    • \0 - the entire original record matched
    • \1 - the first group matched (in this case the entire line excluding the leading C
    • \2 - the second group match, in this case one of Gmax, Gmin etc. as matched, if that's useful in any way.
  • Unmatched strings (not a leading C or not containing Gmax or whatever) are just printed as they are.



Does this meet your need? Does the explanation make sense?


You could be brave and use the -i flag and no target file to just update the source file, but I'd recommend testing it first to make sure you are happy.


If the list of alternates is getting overly complex, you could pout them in a reference file, one line at a time and build the list for your command, something like:-
Code:
#!/bin/bash

item_list=""
while read item
do
   if [ "${item_list}" = "" ]
   then
      item_list="${item}"
   else
      item_list="${item_list}\|${item}"
   fi
done < reference_file_name

echo "${item_list}"              # Just so you can see

sed "s/^C\(      CALL.*'\(${item_list}\)'.*\)/\1/" source_file_name > target_file_name


Perhaps run this with bash -xv your_script_name to check what it's doing.


I hope that this helps,
Robin

Last edited by rbatte1; 01-09-2020 at 11:47 AM..
These 3 Users Gave Thanks to rbatte1 For This Post:
# 5  
Old 01-09-2020
Nice approach indeed!
Could be curtailed to
Code:
sed -r "/'($(paste -sd\| file2))'/s/^C//" file1

, including the item list file as well.
These 4 Users Gave Thanks to RudiC For This Post:
# 6  
Old 01-09-2020
I defer to your superior offering (and I will steal it if I ever have a similar need Smilie)



Kindest regards,
Robin
# 7  
Old 01-14-2020
Quote:
Originally Posted by RudiC
Nice approach indeed!
Could be curtailed to
Code:
sed -r "/'($(paste -sd\| file2))'/s/^C//" file1

, including the item list file as well.
I went with this method inserted into a script. It worked well (and very quickly) the first time I tried it, but there was no output the second time. I will have to investigate what I did there.

I also made a second try before there were any responses here. This ended up looking more like the code posted by MadeInGermany where I read in the file to be modified and stored it in an array. I then did a double loop with the outside loop being my list file and the inside loop being the array with the file to be modified. Each item in the list was searched against the lines in the array. If a match was found, the array element was modified to remove the comment and then there was a break in the inner loop. The modified array was printed at the end. This approach means that each file is read in once and the output was written once, instead of once for each list item.

It seems to me that sed must be doing more or less the same thing under the hood. Every list item must be checked against every item in the file to be modified, at least until a match is found. I wasn't able to rationalize if it was more efficient to have one or the other file be the inner loop. The only approach I could think of that would be faster would be to identify the 'Gmax" value on each line of the file to be modified and then loop up that value in a map holding the list. That would, however, involve much more significant parsing of the lines to extract the 'Gmax' value. It is very nice to have a glob match, especially when there isn't a clear and consistent delimiter. If the list was the inner loop, you could delete each array element when a match was found and thus shorten the search as the process continues but deleting and shifting around array elements also takes resources.

Does anyone know what sed is doing to achieve the result so quickly? Is it mainly that is is using compiled code?

LMHmedchem
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Pass an array to awk to sequentially look for a list of items in a file

Hello, I need to collect some statistical results from a series of files that are being generated by other software. The files are tab delimited. There are 4 different sets of statistics in each file where there is a line indicating what the statistic set is, followed by 5 lines of values. It... (8 Replies)
Discussion started by: LMHmedchem
8 Replies

2. Shell Programming and Scripting

Read a lis, find items in a file from the list, change each item

Hello, I have some tab delimited text data, file: final_temp1 aname val NAME;r'(1,) 3.28584 r'(2,)<tab> NAME;r'(3,) 6.13003 NAME;r'(4,) 4.18037 r'(5,)<tab> You can see that the data is incomplete in some cases. There is a trailing tab after the first column for each incomplete row. I... (2 Replies)
Discussion started by: LMHmedchem
2 Replies

3. Shell Programming and Scripting

Need help with a script to grep items in one file from another file

I have one master file "File1" with all such info in it. I need to grep each object under each list from another file "File2". Can anyone help me with a script for this. File 1 ------ List 1 Object 1 Object 2 List 2 Object 3 Object 1 List 3 Object 2 ... (5 Replies)
Discussion started by: Sam R
5 Replies

4. UNIX for Dummies Questions & Answers

Easiest way to comment/uncomment a shell script?

cd path line1 line2 line3 line4 line5 Lets say thats the sample script...So say if i have to comment the above script, which would be the better way so that whenever i want, i cud comment or uncomment the same. Thanks (1 Reply)
Discussion started by: saggiboy10
1 Replies

5. Shell Programming and Scripting

Removing lines from a file being used by another process using SHELL script

Hi All, Need a small help in writing a shell script which can delete a few lines from a file which is currently being used by another process. File gets appended using tee -a command due to which its size is getting increased. Contents like : 25/09/2012 05:18 Run ID:56579677-1 My... (3 Replies)
Discussion started by: nikhil8
3 Replies

6. Shell Programming and Scripting

[Perl] Split lines into array - variable line items - variable no of lines.

Hi, I have the following lines that I would like to see in an array for easy comparisons and printing: Example 1: field1,field2,field3,field4,field5 value1,value2,value3,value4,value5Example 2: field1,field3,field4,field2,field5,field6,field7... (7 Replies)
Discussion started by: ejdv
7 Replies

7. Shell Programming and Scripting

List content of item in the combobox

I have a combo.cgi here. this is linux environment What i am going to do is this combobox will list down all the flatfile name in this /u/test/cgi-bin/List directory. after that, i wanted it to open the flatfile and display the content of the flatfile into another listbox or textarea in this page... (0 Replies)
Discussion started by: chezy
0 Replies

8. Shell Programming and Scripting

Help needed regarding first 3 items in the list

Hi, I've a list in the following format: Empdept filedetails buildingNo Area AAA 444 2 juy AAA 544 2 kui AAA 567 4 poi AAA 734 5 oiu AAA 444 ... (2 Replies)
Discussion started by: skpvalvekar
2 Replies

9. Shell Programming and Scripting

comment and uncomment a line with Shell Script

Requirement is: 1. comment and uncomment the line with Shell Script: /opt/admin/fastpg/bin/fastpg.exe -c -=NET (using fastpg.exe as a search option) 2. display = "Commented" (when its commented) and display = "Uncommented" (when its uncommented) Its urgent, please let me asap!!! Thanks in... (2 Replies)
Discussion started by: anthonyraj75
2 Replies

10. Shell Programming and Scripting

retrieve what the currently selected item is in a dropdown select list using perl tk

I have a dropdown menu built in perl tk (I am using active state perl). I want to select a value from the dropdown menu and I want to be able to perform some other actions depending upon what value is selected. I have all the graphical part made but I dont know how to get the selected value. Any... (0 Replies)
Discussion started by: lassimanji
0 Replies
Login or Register to Ask a Question