Sponsored Content
Top Forums Shell Programming and Scripting Copy of array by index value fails Post 303012763 by LMHmedchem on Thursday 8th of February 2018 01:58:09 PM
Old 02-08-2018
I am using cygwin under windows but also run under opensuse 13.2.

This is the entire script and it is run with something like,

./_reformat.sh input_file output_file CompoundName Identifier InChI= 015_

It looks for certain conditions and when found, makes some modifications to the record. Most of the files I am processing contain thousands to tens of thousands of records. This is the version that writes each line to the output file as processed, the slow version.
Code:
#!/bin/sh

# file to be processed
input_file=$1
# prefix to add to firstline
output_file=$2
# sdf tag with name field
name_tag=$3
# sdf tag with substitution field
sub_tag=$4
# string to check for on line following name tag line
check_for=$5
# prefix to add to firstline
prefix=$6

# create output file
touch $output_file

# location of line to replace with modified name
replace_line=0
# value collected to build replacement name
sub_value=''
# flag to check next line
check_next=0
# flag to do replacement
replace=0
# flag to indicating saving of next line for sub name
save_next=0

# initalize line counter
i=0

# to preserve spaces
IFS=""

# read file by lines
while read line 
do

   # store line in array
   line_array[$i]="$line"
   # increment counter
   i=$((i+1))

   # if check next was set to 1 above, the next line is the one that needs to be evaluated
   if [[ $check_next == "1" ]]; then
      # reset check next, do this here so we reset even if the next line is not a match
      check_next=0
      # check for check_for as part of line
      if [[ $line =~ .*$check_for.* ]]; then
         # save line number
         replace_line=$i
         # set flag to do replacement of name
         replace=1
      fi
   fi

   # find name tag line and check if value on next line includes check_for string
   # check for name_tag as part of line
   if [[ $line =~ .*$name_tag.* ]]; then
      # set flag to check next line
      check_next=1
   fi

   # save the value in the line after sub tag has been found
   # this must come before save_next is set
   if [[ $save_next == "1" ]]; then
      # save the value from this line to use for substitute name
      sub_value=$line
      # reset flag
      save_next=0
   fi

   # look for the line with the sub tag
   if [[ $line =~ .*$sub_tag.* ]]; then
         # set flag to save next line
         save_next=1
   fi

   # when we get to the end of the record
   if [[ $line == '$$$$' ]]; then

      # if replace has been set, make replacements
      if [[ $replace== "1" ]]; then

         # create new first line value from stored substitute value
         new_firstline=$prefix'PubChem_CID_'$sub_value
         # create new name value from stored substitute value
         new_name='PubChem_CID_'$sub_value

         # decrement replace line value by one
         replace_line=$(($replace_line-1))

         # decrement line counter value by one
         i=$(($i-1))

         # loop through stored file
         for ((j=0; j <= $i ; j++)) ; do
            # for the first line, add the new firstline value
            if [[ $j == "0" ]]; then
               echo $new_firstline >> $output_file
            # when the replace line is found, use the substitute value
            elif [[ $j == "$replace_line" ]]; then
               echo $new_name >> $output_file
            # output all other lines as normal
            else
               echo ${line_array[$j]} >> $output_file
            fi
         done

      # if replace is not set, output unmodified record
      else
         for ((j=0; j < $i ; j++)) ; do
            echo ${line_array[$j]} >> $output_file
         done
      fi

      # reset for next record
      # line array
      unset line_array
      # line counter
      i=0
      # location of line to replace with modified name
      replace_line=0
      # value collected to build replacement name
      sub_value=''
      # flag to check next line
      check_next=0
      # flag to do replacement
      replace=0
      # flag to indicating saving of next line for sub name
      save_next=0

   fi

done < $input_file

This is an example of input with one record that meets the conditions to be changed,
Code:
015_InChI=1S/C16H9N3O5/c20-16-11-5-14-13(23-7-24-14)4-10(11)15-17-12-2-1-9(19(21)22)3-8(12)6-18(15)16/h1-5H,6-7H2
 OpenBabel05051721102D

 24 28  0  0  0  0  0  0  0  0999 V2000
   -1.3288    3.5365    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   -1.3006    2.0368    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.4974    1.1324    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.9702    1.4167    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.9528    0.2833    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.4626   -1.1343    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.9897   -1.4185    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.0071   -0.2852    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.5074   -0.2570    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.5170   -1.3527    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    1.9781   -1.0133    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.0026   -2.1090    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.4637   -1.7696    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.9004   -0.3346    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.3615    0.0048    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
    6.7981    1.4398    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    7.3859   -1.0909    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
    3.8759    0.7611    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.4148    0.4217    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.3904    1.5174    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0708    1.1780    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   -5.6593   -2.0386    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   -6.8892   -1.1799    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -6.4525    0.2552    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0  0  0  0
  2  3  1  0  0  0  0
  3  4  2  0  0  0  0
  4  5  1  0  0  0  0
  5  6  2  0  0  0  0
  6  7  1  0  0  0  0
  6 22  1  0  0  0  0
  7  8  2  0  0  0  0
  8  9  1  0  0  0  0
  8  3  1  0  0  0  0
  9 10  2  0  0  0  0
 10 11  1  0  0  0  0
 11 12  2  0  0  0  0
 12 13  1  0  0  0  0
 13 14  2  0  0  0  0
 14 15  1  0  0  0  0
 14 18  1  0  0  0  0
 15 16  2  0  0  0  0
 15 17  1  0  0  0  0
 18 19  2  0  0  0  0
 19 20  1  0  0  0  0
 19 11  1  0  0  0  0
 20 21  1  0  0  0  0
 21  2  1  0  0  0  0
 21  9  1  0  0  0  0
 22 23  1  0  0  0  0
 23 24  1  0  0  0  0
 24  5  1  0  0  0  0
M  CHG  2  15   1  17  -1
M  END
> <order>
281

>  <CompoundName>
InChI=1S/C16H9N3O5/c20-16-11-5-14-13(23-7-24-14)4-10(11)15-17-12-2-1-9(19(21)22)3-8(12)6-18(15)16/h1-5H,6-7H2

>  <Identifier>
101651482

>  <InChI>
InChI=1S/C16H9N3O5/c20-16-11-5-14-13(23-7-24-14)4-10(11)15-17-12-2-1-9(19(21)22)3-8(12)6-18(15)16/h1-5H,6-7H2

>  <InChIKey>
ZIOMULGFTPCQIY-UHFFFAOYSA-N

>  <MolecularFormula>
C16H9N3O5

>  <MonoisotopicMass>
323.0542

>  <SMILES>
C1C2=C(C=CC(=C2)[N+](=O)[O-])N=C3N1C(=O)C4=CC5=C(C=C43)OCO5

$$$$

I was trying to dump the lines of the file to a new array with the code I first posted, but that didn't work.

In short, when the value on the line after <CompoundName> contains InChI=, the name value is too long for some of the tools in the chain. I address this by making a new name from the value read from the line following <Identifier> and re-write the record using the substitution name in the required places. If the line following <CompoundName> does not contain InChI=, then the record is written unmodified.

This is what the properly modified version of the record would look like,
Code:
015_PubChem_CID_101651482
 OpenBabel05051721102D

 24 28  0  0  0  0  0  0  0  0999 V2000
   -1.3288    3.5365    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   -1.3006    2.0368    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.4974    1.1324    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.9702    1.4167    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.9528    0.2833    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.4626   -1.1343    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.9897   -1.4185    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.0071   -0.2852    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.5074   -0.2570    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.5170   -1.3527    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    1.9781   -1.0133    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.0026   -2.1090    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.4637   -1.7696    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.9004   -0.3346    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.3615    0.0048    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
    6.7981    1.4398    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    7.3859   -1.0909    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
    3.8759    0.7611    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.4148    0.4217    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.3904    1.5174    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0708    1.1780    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   -5.6593   -2.0386    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   -6.8892   -1.1799    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -6.4525    0.2552    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0  0  0  0
  2  3  1  0  0  0  0
  3  4  2  0  0  0  0
  4  5  1  0  0  0  0
  5  6  2  0  0  0  0
  6  7  1  0  0  0  0
  6 22  1  0  0  0  0
  7  8  2  0  0  0  0
  8  9  1  0  0  0  0
  8  3  1  0  0  0  0
  9 10  2  0  0  0  0
 10 11  1  0  0  0  0
 11 12  2  0  0  0  0
 12 13  1  0  0  0  0
 13 14  2  0  0  0  0
 14 15  1  0  0  0  0
 14 18  1  0  0  0  0
 15 16  2  0  0  0  0
 15 17  1  0  0  0  0
 18 19  2  0  0  0  0
 19 20  1  0  0  0  0
 19 11  1  0  0  0  0
 20 21  1  0  0  0  0
 21  2  1  0  0  0  0
 21  9  1  0  0  0  0
 22 23  1  0  0  0  0
 23 24  1  0  0  0  0
 24  5  1  0  0  0  0
M  CHG  2  15   1  17  -1
M  END
> <order>
281

>  <CompoundName>
PubChem_CID_101651482

>  <Identifier>
101651482

>  <InChI>
InChI=1S/C16H9N3O5/c20-16-11-5-14-13(23-7-24-14)4-10(11)15-17-12-2-1-9(19(21)22)3-8(12)6-18(15)16/h1-5H,6-7H2

>  <InChIKey>
ZIOMULGFTPCQIY-UHFFFAOYSA-N

>  <MolecularFormula>
C16H9N3O5

>  <MonoisotopicMass>
323.0542

>  <SMILES>
C1C2=C(C=CC(=C2)[N+](=O)[O-])N=C3N1C(=O)C4=CC5=C(C=C43)OCO5

$$$$

Sorry for the overly long post. I was trying to solve this myself and thought I just made some syntax error in making a copy of the array.

LMHmedchem
 

10 More Discussions You Might Find Interesting

1. Filesystems, Disks and Memory

why the inode index of file system starts from 1 unlike array index(0)

why do inode indices starts from 1 unlike array indexes which starts from 0 its a question from "the design of unix operating system" of maurice j bach id be glad if i get to know the answer quickly :) (0 Replies)
Discussion started by: sairamdevotee
0 Replies

2. UNIX for Dummies Questions & Answers

wh inode index starts from 1 unlike array index (0)

brothers why inode index starts from 1 unlike array inex which starts from 0 its a question from the design of unix operating system of maurice j.bach i need to know the answer urgently...someone help please (1 Reply)
Discussion started by: sairamdevotee
1 Replies

3. Shell Programming and Scripting

Problem when assign the array with the string index

I come across the problems when assigning the array in the script below . How to use the array with the 'string index' correctly ? When I assign a new string index , the array elements that are previously assigned are all changed .:eek::eek::eek: $ array=211 $ echo ${array} 211 $... (4 Replies)
Discussion started by: youareapkman
4 Replies

4. UNIX for Advanced & Expert Users

sql variable as array index

hi folks i am facing problom while trying to access sql variable as array index ina unix shell script....script goes as below.. #!/bin/ksh MAX=3 for elem in alpha beeta gaama do arr=$elem ((x=x+1)) Done SQL_SERVER='servername' /apps/sun5/utils/sqsh -S $SQL_SERVER -U user -P pwd -b -h... (1 Reply)
Discussion started by: sudheer157
1 Replies

5. Shell Programming and Scripting

awk array index help

$ cat file.txt A|X|20 A|Y|20 A|X|30 A|Z|20 B|X|10 A|Y|40 Summing up $NF based on first 2 fields, $ awk -F "|" 'BEGIN {OFS="|"} { sum += $NF } END { for (f in sum) print f,sum } ' file.txt o/p: A|X|50 A|Y|60 A|Z|20 (4 Replies)
Discussion started by: uwork72
4 Replies

6. Shell Programming and Scripting

dynamic index for array in while loop

Hi, I'm just trying to use a dynamic index for some array elements that I'm accessing within a loop. Specifically, I want to access an array at variable position $counter and then also at location $counter + 1 and $counter + 2 (the second and third array positions after it) but I keep getting... (0 Replies)
Discussion started by: weak_code-fu
0 Replies

7. Shell Programming and Scripting

how to search array and print index in ksh

Hi, I am using KSH shell to do some programming. I want to search array and print index value of the array. Example.. nodeval4workflow="DESCRIPTION ="" ISENABLED ="YES" ISVALID ="YES" NAME="TESTVALIDATION" set -A strwfVar $nodeval4workflow strwfVar=DESCRIPTION=""... (1 Reply)
Discussion started by: tmalik79
1 Replies

8. Shell Programming and Scripting

build array name based on loop index

Hi, I am new to perl and I have the following query please help here. I have following array variables declaration @pld1 = qw(00 01 02 03 04 05); @pld2 = qw(10 11 12 13 14 15); for(my $k=1;$k<=2;$k++) { //I want here to use @pld1 if $k is 1 // and @pld2 if $k is 2. How to do... (3 Replies)
Discussion started by: janavan
3 Replies

9. Shell Programming and Scripting

Index problem in associate array in awk

I am trying to reformat the table by filling any missing rows. The final table will have consecutive IDs in the first column. My problem is the index of the associate array in the awk script. infile: S01 36407 53706 88540 S02 69343 87098 87316 S03 50133 59721 107923... (4 Replies)
Discussion started by: yifangt
4 Replies

10. Shell Programming and Scripting

Associative array index question

I am trying to assign indexes to an associative array in a for loop but I have to use an eval command to make it work, this doesn't seem correct I don't have to do this with regular arrays For example, the following assignment fails without the eval command: #! /bin/bash read -d "\0" -a... (19 Replies)
Discussion started by: Riker1204
19 Replies
All times are GMT -4. The time now is 11:37 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy