If statement between different file's arrays


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers If statement between different file's arrays
# 1  
Old 09-27-2014
If statement between different file's arrays

Hi guys

Let me explain at first what I'm trying to do. My input file looks like this.

Code:
1280
Surfaces
Pt        0.00000000000000    0.00000000000000    0.00000000000000
Pt        2.81138845918055    0.00000000000000    0.00000000000000
Pt        5.62277691836110    0.00000000000000    0.00000000000000
Pt        8.43416537754165    0.00000000000000    0.00000000000000
Pt        1.40569422959028    2.43473382555675    0.00000000000000
Pt        4.21708268877083    2.43473382555675    0.00000000000000
Pt        7.02847114795138    2.43473382555675    0.00000000000000
Pt        0.00000000000000    1.62294556093176    2.30537603326937
Pt        2.81138845918055    1.62294556093176    2.30537603326937
Pt        5.62277691836110    1.62294556093176    2.30537603326937
Pt        8.43416537754165    1.62294556093176    2.30537603326937
Pt        1.40569422959028    4.05767938648850    2.30537603326937
Pt        4.21708268877083    4.05767938648850    2.30537603326937
Pt        7.02847114795138    4.05767938648850    2.30537603326937
Pt        9.83985960713193    4.05767938648850    2.30537603326937
Pt        0.00000000000000    6.49241321204525    2.30537603326937

It goes on for 1280 atoms. I'm trying to make a cluster out of it.
My algorithm for this problem goes as follows.
1. Delete the bottom two layers
2. Prompt for first pivot atom coordinates
3. For the first pivot atom, search for nearest neighbor atoms, put those in a new file.
4. Prompt for the second pivot atom
5. Search for it's nearest neighbors, look if they're already in the new file, if not copy those.
Do this for the next 2 pivot atoms too.

I'm sorry for making it lengthy but I'm having problem in more than 1 steps, that's why stated that clumsy algorithm.
1. When I try to delete the atoms of bottom layer -
Code:
doom=($(grep Pt cluster.xyz | awk '{ print $4 }' | grep '[0]\.[0-9]\{14\}')); sed 's/"${doom[*]}"//g' cluster.xyz

It's not working. And also I think I'm wrong here, I need to delete the whole line, if this line of code would work it'd delete only the z coordinates of the bottom layer.
4. In this step how do I check if those nearest neighbor coordinates are already in the new file? Suppose for the 2nd pivot atom, I put the nearest neighbor coordinates in 3 new arrays a[i], b[i], and c[i]. I need to check if they're already in the new file as 1st pivot atoms nearest neighbors. Lets's say I have the new files coordinates in d[], e[],g[]. Will it work like this-
Code:
if [ ${a[@]} -eq ${d[@]} ] ||  [ ${b[@]} -eq ${e[@]}] && [ ${c[@]} -eq ${g[@]} ] then
      "Copy the coordinates"
else
fi

I don't know how to do the "copy those coordinates" part. I could cat a[],b[] and c[] but they need to be in the same line. And how do I do if statement for coordinates from 2 different files? cause a,b,c are in old file and d,e,f are in new file. I guess I'm asking for too much, any link to any books or website will be much appreciated. There are so many reading materials out there, I'm just confused.
Thanks a lot guys! Have a nice day!
# 2  
Old 09-27-2014
Quote:
Originally Posted by saleheen
Code:
1280
Surfaces
Pt        0.00000000000000    0.00000000000000    0.00000000000000
Pt        2.81138845918055    0.00000000000000    0.00000000000000
Pt        5.62277691836110    0.00000000000000    0.00000000000000
Pt        8.43416537754165    0.00000000000000    0.00000000000000
Pt        1.40569422959028    2.43473382555675    0.00000000000000
Pt        4.21708268877083    2.43473382555675    0.00000000000000
Pt        7.02847114795138    2.43473382555675    0.00000000000000
Pt        0.00000000000000    1.62294556093176    2.30537603326937
Pt        2.81138845918055    1.62294556093176    2.30537603326937
Pt        5.62277691836110    1.62294556093176    2.30537603326937
Pt        8.43416537754165    1.62294556093176    2.30537603326937
Pt        1.40569422959028    4.05767938648850    2.30537603326937
Pt        4.21708268877083    4.05767938648850    2.30537603326937
Pt        7.02847114795138    4.05767938648850    2.30537603326937
Pt        9.83985960713193    4.05767938648850    2.30537603326937
Pt        0.00000000000000    6.49241321204525    2.30537603326937

It goes on for 1280 atoms. I'm trying to make a cluster out of it.
My algorithm for this problem goes as follows.
1. Delete the bottom two layers
2. Prompt for first pivot atom coordinates
3. For the first pivot atom, search for nearest neighbor atoms, put those in a new file.
4. Prompt for the second pivot atom
5. Search for it's nearest neighbors, look if they're already in the new file, if not copy those.
Do this for the next 2 pivot atoms too.
Please bear with me, but my Unix skills are better then my rudimentary knowledge of condensed matter physics.

1) What is - in terms of dsitinguishing text elements - a "bottom layer"? Can it be described in terms like all lines with "0.0000000" in the fourth field or something such?

2) In the same way of description: what is a "pivot atom"? I guess your table lists platinum atoms and their coordinates, but what makes an atom a "pivot atom"?

3) I guess "nearest neighbor" is probably the one where the absolute value of the sum of the 3 coordinate differences is minimal, yes? Something like

min( || a1 - a2 || + || b1 - b2 || + || c1 - c2 || )

How can there be more than one "nearest neighbors" and how should they be calculated?

Quote:
Originally Posted by saleheen
Code:
doom=($(grep Pt cluster.xyz | awk '{ print $4 }' | grep '[0]\.[0-9]\{14\}')); sed 's/"${doom[*]}"//g' cluster.xyz

I cannot even guess what you are trying to accomplish here. Let me tell you what this thing does:

You filter all the lines containing "Pt" from a file, the use awk to only output the fourth field of these lines. You end up with a stream of lines all consisting of one coordinate. This stream is piped into a another grep, which filters out all lines starting with "0.", followed by 14 other digits. You do nothing further with this result. Then you call "sed" and try to remove a double quote, followed by the lines end, followed (sic!) by the string '{doom[*]}"'. The "g" at the end has no significance because i promise there is only one line end in every line. Therefore the whole file comes out unchanged.

Finally you use "( ... )" to construct an array "doom" in which every word (separated by blanks) is one element. The first elements will be the filtered fourth columns from your first pipeline, then your input file is cut in separate words and these take the next places. Depending on the shell you use it might even run out of space, because i.e. ksh88 has a limitation of 1024 elements in an array.


Quote:
Originally Posted by saleheen
It's not working. And also I think I'm wrong here, I need to delete the whole line, if this line of code would work it'd delete only the z coordinates of the bottom layer.
No. It won't delete anything, just declare an array variable.

I am aware you had a few more questions, but please specify your requirements first. It makes sense to talk about implementation specifics only after this is covered.

I hope this helps.

bakunin
This User Gave Thanks to bakunin For This Post:
# 3  
Old 09-27-2014
Thanks a lot bakunin! I knew I should be more clear.
1. Yes, the bottom 2 layers are the atoms whose z coordinates start with 0.***** & 2.*****. There are two more layer of atoms above these.
2. I just made up the pivot atoms keyword, what it means is I have some adsorbed species on surface of the catalyst layer. Pivot atoms are those directly under the adsorbed species.
3. You're right, nearest neighbor distance is
Code:
sqrt[(a1-a2)^2 + (b1-b2)^2 +(c1-c2)^2]

For Pt it's 2.82 angstrom. My thinking if the above value is in between 2.82, then the code should copy those coordinates. About more than one nearest neighbors, think of it like this, I have 2 layers of atoms and on the top layer is one of my pivot atom. I'm looking for all the atoms that are in between 2.82 angstrom radius, that also includes the atoms of the bottom layer which are within 2.82 angstrom of the pivot atom. Now when I choose the second atom,(which is itself also a neighbor of 1st pivot atom) some of it's nearest neighbors might be already copied cause they were the nearest neighbor of the 1st pivot atom.

doom=($(grep Pt cluster.xyz | awk '{ print $4 }' | grep '[0]\.[0-9]\{14\}')); sed 's/"${doom[*]}"//g' cluster.xyzIn this line of code what I was trying to do was storing those coordinates into the "doom" array and replace them with blank spaces. But now I think of it, it's not what I wanted, I wanted to remove those lines on whole, not those coordinates only.

The thing I am trying to do is I do the aqueous phase calculations on a cluster of atoms, not on periodic systems. Every time I have to do this cluster building manually which is very time consuming. I also have to extend this line of thought for my MD simulation but that's for later. I guess I'm stuck in the very first step i.e deleting the bottom two layers of atoms. Sorry if I made my thinking more complex. Thanks a lot man! Really appreciate it!
# 4  
Old 09-28-2014
Quote:
Originally Posted by saleheen
1. Yes, the bottom 2 layers are the atoms whose z coordinates start with 0.***** & 2.*****. There are two more layer of atoms above these.
OK, step by step! Try this one (replace "<t>" by literal tab characters):
Code:
sed '/Pt[ <t>]*\([0-9]\.[0-9]\{14\}[ <t>]*\)\{2\}[0-2]/ s/$/  bottom!/' /path/to/input > /path/to/output

This should mark the bottom lines with the word "bottom!". I find it easier to verify something i see therefore i would first modify the lines, verify that there are no false positives, false negatives, etc., and only then proceed. If everything is indeed correct replace the substitution with a simple delete-command:

Code:
sed '/Pt[ <t>]*\([0-9]\.[0-9]\{14\}[ <t>]*\)\{2\}[0-2]/d' /path/to/input > /path/to/output

You surely want to know how this works: we start with the innermost part:

Code:
[0-9]\.[0-9]\{14\}[ <t>]*

This matches a digit, followed by a ".", followed by another 14 digits, followed by any number of whitespace (i do not know if your numbers are separated by blanks or tabs). This matches one cooordinate.

Code:
Pt[ <t>]*\(<coordinate>\)\{2\}[0-2]

This matches a "Pt", followed by whitespace, followed by two such coordinates, followed by a digit between 0 and 2, hence a 0,1 or 2 in the first digit of the z-coordinate. This should identify the lines of the bottom layers.

We use this regexp in a "one-address-command" of sed. The command (in our case a substution or a deletion) is executed only in a line which fits onto the regexp:

Code:
/^a/ s/x/y/g

Change all "x" to "y", but only in lines which begin with "a".

If this does what you need we proceed to the next part of your problem.

I hope this helps.

bakunin
This User Gave Thanks to bakunin For This Post:
# 5  
Old 09-28-2014
Thanks a lot bakunin. Honestly I'm not clear about the line of code you've given. Too dumb to understand these awesome little magics Smilie That's why I think I couldn't make it work.

1.
Code:
sed '/Pt[ <t>]*\([0-9]\.[0-9]\{14\}[ <t>]*\)\{2\}[0-2]/

In this line
Code:
sed '/Pt[ <t>]*

what's the last "*" for? And for the <t> after Pt, in my input file, the blank spaces after 'Pt' accounts for one <t>+ 2 blank spaces. Is that a problem? I tried with both but didn't work.

2.
Code:
\([0-9]\.[0-9]\{14\}[ <t>]*\)

In continuation of the code, what's the first delimiter "\" for? And does this part mean the x & y coordinates of the line after 'Pt'? When you said
Quote:
This matches one coordinate.
you didn't mean it matches the z coordinate right? At the end there's a tab which I think means the blank space between x & y coordinates, I have 4 blank spaces between coordinates so I tried that.(I tried the <t> too) And also my x and y coordinates can have 1 or 2 digits before decimal point i.e it can be 43.**** or 5.****. So in the code should it be
Code:
\([0-9][0-9]\.[0-9]\{14\}[ <t>]*\)

for accounting the 2 digits before the decimal point?
At the end there's a "*" and a delimiter "\". What do these do?

3.In the last part
Code:
\{2\}[0-2]/ s/$/  bottom!/

Why there's a delimiter "\" at the front? Does this delimiter was introduced to mean there's some blank spaces between y coordinate and the 1st digit of z coordinate i.e 0/2?

4.
Quote:
Pt[ <t>]*\(<coordinate>\)\{2\}[0-2]

This matches a "Pt", followed by white space, followed by two such coordinates, followed by a digit between 0 and 2, hence a 0,1 or 2 in the first digit of the z-coordinate. This should identify the lines of the bottom layers.
I understand by the <coordinate> part you meant CODE]\([0-9][0-9]\.[0-9]\{14\}[ <t>]*\)[/CODE] but how does it point to the fact that there two such coordinates (x&y) to get to the first digit of z coordinate(0/2)?

Sorry to give you a hard time man. Again thanks a lot!
# 6  
Old 09-28-2014
Time for a little introduction to regular expressions:

Regular expressions (or "Regexps") are a tool for matching patterns in texts. They consist of
  • ordinary characters
  • metacharacters

Ordinary characters just stand for themselves - "a" means "search for an a".

Metacharacters change the way, ordinary characters are interpreted. I will describe a few, but you are encouraged to research them, the web is full of articles about them.

[...]
defines a "character class". Any one character inside will be matched, but only one of them. Example:

Code:
a[bc]d

will match "abd" and "acd", but not "abcd" or "abbd", etc.. It is possible to reverse the meaning by using the caret "^" as the first character. This means every character NOT listed in the class:

Code:
a[^bc]d

This will match "axd" and "avd" - anything but "abd" and "acd". It is also possible to list consecutive characters by a "-": "[a.z]" means any (lowercase) character a-z, "[0-9]" means any digit 0-9. Notice, that i used "[ <t>]" (left bracket, space, tab, right bracket) in my expression above! That defines a character class which matches any whitespace, blanks OR tabs.


*

This acts as a multiplier to the expression before, meaning "zero or more of whatever the expression is". For example:

Code:
ab*c

means zero or more a's followed by a b. The strings "abc", "abbbbc" and even "ac" (zero or more!) are matched, but not "axc". If you combine this with the character classes you may get:

Code:
x[abc]*y

x, followed by any number of a's, b's or c's, followd by a y. It matches "xabcaababbbabby" and "xay", "xby" and "xcy" and even "xy" (again: zero or more), but not "xdy".


\( ... \)

This is a device for grouping and acts similar to brackets in math: it does nothing itself, but it groups together what is inside, for instance to make it manipulatable by the asterisk. Example:

Code:
\(abc\)*d

means any number of reduplications of the string "abc", followed by a "d". This matches "abcd" and "abcabcd", also "d", but not "abd", because before the "d" there is no "abc". Groupings can be nested.


\{n\} , \{m,n\}

This is also a multiplier for the expression before, like the asterisk, but limited by one or two numbers. The one-number variant means exactly that many reiterations, the two-number variant means between m and n reiterations. Example:

Code:
ab\{2\}c

is the same as "abbc". This one:

Code:
ab\{2,4\}c

matches "abbc", "abbbc" and "abbbbc" and nothing else. Again, this can be combined with other expressions, like classes. I used:

Code:
[0-9]\{14\}

to match exactly 14 digits. I could have written

Code:
[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]

for the same effect.


\

The backslash is part of some control constructs, as you have seen above. apart from that it is the "escape character". You have seen that several characters - metacharacters - are not meaning themselves but something different. The backslash strips this special meaning from them and makes them ordinary characters again. Like here:

Code:
ab\*c

Because the asterisk is escaped it is meant literal. This regexp matches the string "ab*c". Or this:

Code:
[0-9]\.[0-9]\{14\}

Usually the full stop is a metacharacter and means any one character. But here i want to use it literally and therefore escape it. The expression means a digit, followed by a full stop (decimal point), followed by eactly 14 other digits.

With this you should be able to decipher the sed script yourself.

I hope this helps.

bakunin

Last edited by bakunin; 09-28-2014 at 04:52 PM..
This User Gave Thanks to bakunin For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

About arrays in file 1 matching with file 2

i have two files say a and b a has these lines 1 20 30 40 2 30 40 50 3 25 35 45 5 20 50 20 and b has these lines 20 30 30 40 25 35 20 50 the script reads FILENAME ( "a" ) { rec1=$2; rec2=$2; } (4 Replies)
Discussion started by: paresh n doshi
4 Replies

2. UNIX for Dummies Questions & Answers

Using associative arrays with an if statement

I have this piece of code. The first if statement is not working, however the second if statement is working fine. I have set a value for Srcs to be file.srcs and want to print it. If no value for Rcvs is set, I get the print statement correctly hasValue="file.srcs" if ${hasValue}; then ... (0 Replies)
Discussion started by: kristinu
0 Replies

3. Programming

question about int arrays and file pointer arrays

if i declare both but don't input any variables what values will the int array and file pointer array have on default, and if i want to reset any of the elements of both arrays to default, should i just set it to 0 or NULL or what? (1 Reply)
Discussion started by: omega666
1 Replies

4. UNIX for Dummies Questions & Answers

File Field Replacement, Assigning Fields to Variables, Lists/Arrays?

Okay, I've made threads on extracting fields and comparing strings in separate files in .csv's. I've written the following code with intentions of learning more. I just want this one question answered: How can I assign fields from a file(comma separated) to variables? My goal is to check... (0 Replies)
Discussion started by: chickeneaterguy
0 Replies

5. Shell Programming and Scripting

Saving file content in arrays using AWK

Hi, im new to shell scripting. i have a query for which i have searched your forums but coulndt get what i need. i have a file that has two records of exactly the same length and format and they are comma seperated. i need to save the first and the second columns of the input file to 2 different... (11 Replies)
Discussion started by: atikan
11 Replies

6. Shell Programming and Scripting

Reading a .dat file in to 2 different arrays

hi all, i have a data file that contains 2 columns, names and numbers. i need to read names in to a an array call names and numbers in to an array call numbers. i also have # and blank lines in my dat file and i need to skip those when i read the dat file. how do i do this? btw, my column 1 and... (3 Replies)
Discussion started by: usustarr
3 Replies

7. Shell Programming and Scripting

Struggling with arrays and delimited file

Hi, I am trying to use arrays in my script but can not seem to get it to work. I have a file called sections, this contains headers from a tripwire log file, separated by "@" but could be "," if easier The headers will be used to cut sections from the log file into another to be mailed. ... (5 Replies)
Discussion started by: pobman
5 Replies

8. Shell Programming and Scripting

How to load different type of data in a file to two arrays

Hi, I have tried to find some sort of previous similar thread on this but not quite close to what I want to achieve. Basically I have two class of data in my file..e.g 1,1,1,1,1,2,yes 1,2,3,4,5,5,yes 2,3,4,5,5,5,no 1,2,3,4,4,2,no 1,1,3,4,5,2,no I wanted to read the "yes" entry to an... (5 Replies)
Discussion started by: ahjiefreak
5 Replies

9. Shell Programming and Scripting

Arrays & File Reading

Ok; here is the code INCREMENT=0 #Final Count MATCH=0 #Treated as a Boolean declare -a LINEFOUR #Declared Array for FILE in $DIR; do # DIR was declared earlier test -f $FILE && ( TEMP=(sed -n '4p' $FILE) #How do I assign the fourth line of the file to TEMP? This doesn't... (1 Reply)
Discussion started by: Asylus
1 Replies

10. Shell Programming and Scripting

Reading in data sets into arrays from an input file.

Hye all, I would like some help with reading in a file in which the data is seperated by commas. for instance: input.dat: 1,2,34,/test for the above case, the fn. will store the values into an array -> data as follows: data = 1 data = 2 data = 34 data = /test I am trying to write... (5 Replies)
Discussion started by: sidamin810
5 Replies
Login or Register to Ask a Question