Sponsored Content
Top Forums Shell Programming and Scripting Copy of array by index value fails Post 303012769 by LMHmedchem on Thursday 8th of February 2018 06:00:21 PM
Old 02-08-2018
The files that I am processing are called sdf and they contain information about chemical structures with each structure contained in a record. The first section of the record holds the chemical structure and the second section holds (can hold) other associated information such as the compound name, identification numbers, measured data, etc, in the form of attribute tags where the tag is on one line and the value on the next. Unfortunately, the standard is rather loose and the same information can be located in more than one place and there is no requirement that every record have the same attribute tags or have them in the same order.

Software that reads this type of file is also all over the place as far as where any individual application will be looking for specific information or what limitations there will be. Since these applications cannot be modified (by me), it is often necessary to modify the input and, as expected, I tend to come to linux when that happens.

In this case, there is an issue with the chemical name. IUPAC, which creates the nomenclature for chemical names has not yet come around to understanding that chemical names should formatted such that they can have a linear notation in standard ACSII or similar. There are many chemical names that cannot be copied and pasted into a computer file name or flat text file. When you get such a name, you need to do something else for that compound. With the data I am working with, someone substituted a different value called the InChi (International Chemical Identifier). This is a computer compatible string but is unfortunately still not compatible with some applications (it's too long).

Years of working with such data has taught me to avoid names that begin with a number, have special characters, or are longer than 300 characters but not everyone has come to those same conclusions. I am working with files that are 50+ MB and have thousands of records. There are generally between 50 and 75 records that need to be changed. That's to many to do by hand.

The specific case I am looking for is when the InChi value was used for the chemical name. This is identified by the line following > <CompoundName> containing the string InChi=. Where this is not the case, nothing needs to be done to the record. Where that is the case, I need to create a substitute name from something reasonable. I am using the Identifier value, which is to be found on the line following > <Identifier>.

In short, if the line following > <CompoundName> contains InChi=, I save the value on the line following > <Identifier> and use it to create a new name. That name is written to both the first line of the record (one place where apps look for the name) and to the line following > <CompoundName>. The version on the first line is a bit different but that isn't very important.

My script works, but can take an hour to do a long file. I thought that I could speed things up by storing the output in an array and then dumping it at the end as I think this is more or less what apps like awk do. I couldn't get that working.

The number of records that need to be modified is relatively small but the files are big enough to be difficult to manage. The solution should write records that do not comply with the criteria in an unaltered fashion. I have tried to write a version that knows exactly which records need to be modified and so does not process the rest (just writes to output) but that version isn't working yet. It won't be much of an improvement if I can't store the output and have to write it to a file line by line.

LMHmedchem
 

10 More Discussions You Might Find Interesting

1. Filesystems, Disks and Memory

why the inode index of file system starts from 1 unlike array index(0)

why do inode indices starts from 1 unlike array indexes which starts from 0 its a question from "the design of unix operating system" of maurice j bach id be glad if i get to know the answer quickly :) (0 Replies)
Discussion started by: sairamdevotee
0 Replies

2. UNIX for Dummies Questions & Answers

wh inode index starts from 1 unlike array index (0)

brothers why inode index starts from 1 unlike array inex which starts from 0 its a question from the design of unix operating system of maurice j.bach i need to know the answer urgently...someone help please (1 Reply)
Discussion started by: sairamdevotee
1 Replies

3. Shell Programming and Scripting

Problem when assign the array with the string index

I come across the problems when assigning the array in the script below . How to use the array with the 'string index' correctly ? When I assign a new string index , the array elements that are previously assigned are all changed .:eek::eek::eek: $ array=211 $ echo ${array} 211 $... (4 Replies)
Discussion started by: youareapkman
4 Replies

4. UNIX for Advanced & Expert Users

sql variable as array index

hi folks i am facing problom while trying to access sql variable as array index ina unix shell script....script goes as below.. #!/bin/ksh MAX=3 for elem in alpha beeta gaama do arr=$elem ((x=x+1)) Done SQL_SERVER='servername' /apps/sun5/utils/sqsh -S $SQL_SERVER -U user -P pwd -b -h... (1 Reply)
Discussion started by: sudheer157
1 Replies

5. Shell Programming and Scripting

awk array index help

$ cat file.txt A|X|20 A|Y|20 A|X|30 A|Z|20 B|X|10 A|Y|40 Summing up $NF based on first 2 fields, $ awk -F "|" 'BEGIN {OFS="|"} { sum += $NF } END { for (f in sum) print f,sum } ' file.txt o/p: A|X|50 A|Y|60 A|Z|20 (4 Replies)
Discussion started by: uwork72
4 Replies

6. Shell Programming and Scripting

dynamic index for array in while loop

Hi, I'm just trying to use a dynamic index for some array elements that I'm accessing within a loop. Specifically, I want to access an array at variable position $counter and then also at location $counter + 1 and $counter + 2 (the second and third array positions after it) but I keep getting... (0 Replies)
Discussion started by: weak_code-fu
0 Replies

7. Shell Programming and Scripting

how to search array and print index in ksh

Hi, I am using KSH shell to do some programming. I want to search array and print index value of the array. Example.. nodeval4workflow="DESCRIPTION ="" ISENABLED ="YES" ISVALID ="YES" NAME="TESTVALIDATION" set -A strwfVar $nodeval4workflow strwfVar=DESCRIPTION=""... (1 Reply)
Discussion started by: tmalik79
1 Replies

8. Shell Programming and Scripting

build array name based on loop index

Hi, I am new to perl and I have the following query please help here. I have following array variables declaration @pld1 = qw(00 01 02 03 04 05); @pld2 = qw(10 11 12 13 14 15); for(my $k=1;$k<=2;$k++) { //I want here to use @pld1 if $k is 1 // and @pld2 if $k is 2. How to do... (3 Replies)
Discussion started by: janavan
3 Replies

9. Shell Programming and Scripting

Index problem in associate array in awk

I am trying to reformat the table by filling any missing rows. The final table will have consecutive IDs in the first column. My problem is the index of the associate array in the awk script. infile: S01 36407 53706 88540 S02 69343 87098 87316 S03 50133 59721 107923... (4 Replies)
Discussion started by: yifangt
4 Replies

10. Shell Programming and Scripting

Associative array index question

I am trying to assign indexes to an associative array in a for loop but I have to use an eval command to make it work, this doesn't seem correct I don't have to do this with regular arrays For example, the following assignment fails without the eval command: #! /bin/bash read -d "\0" -a... (19 Replies)
Discussion started by: Riker1204
19 Replies
All times are GMT -4. The time now is 10:43 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy