Hello everybody,
I'm still slowly treading my way into bash scripting (without any prior programming experience) and hence my code is mostly what some might call "creative" if they meant well
I have created a script that serves its purpose but it does so very slowly, since it needs to work its way through ~1million lines of text file input and does so with a "while read line" loop, which slows down the process terribly.
If it's in any way possible I'd like to speed up the script and would appreciate any suggestions you may have.
It is supposed to work through a text file (.pdb - protein data bank file) called "atoms_done" in column 5, which consists of integer values that represent a molecule number. There are several molecule types (column 4) in the file (see example below).
The molecule number (column 4) increases until the value 9999 is reached, after which the number jumps back to 0 and is incremented again until 9999. I want to rewrite or edit the existing file and correct just that and have the numbering increase above 9999 until the end of the file is reached.
ATOM 11393 OW SOL 1997 10.570 25.370 66.140
ATOM 11394 HW1 SOL 1997 10.990 24.510 65.850
ATOM 11395 HW2 SOL 1997 11.260 25.970 66.540
ATOM 11396 OW SOL 1998 26.270 16.020 58.330
ATOM 11397 HW1 SOL 1998 27.210 16.140 58.670
ATOM 11398 HW2 SOL 1998 25.800 16.900 58.370
ATOM 11399 OW SOL 1999 7.760 28.120 61.090
ATOM 11400 HW1 SOL 1999 6.970 28.740 61.090
ATOM 11401 HW2 SOL 1999 8.260 28.210 61.950
ATOM 11402 OW SOL 2000 36.170 4.250 62.330
ATOM 11403 HW1 SOL 2000 35.280 3.810 62.490
ATOM 11404 HW2 SOL 2000 36.030 5.100 61.830
ATOM 11405 C1 MeO 2001 19.100 14.520 124.300
ATOM 11406 O1 MeO 2001 19.850 14.620 123.120
ATOM 11407 HO1 MeO 2001 19.520 15.360 122.630
ATOM 11408 HC1 MeO 2001 18.190 13.930 124.210
ATOM 11409 HC2 MeO 2001 19.740 14.210 125.120
ATOM 11410 HC3 MeO 2001 18.730 15.500 124.600
ATOM 11411 C1 MeO 2002 19.280 3.410 94.800
ATOM 11412 O1 MeO 2002 20.380 3.410 95.710
ATOM 11413 HO1 MeO 2002 21.020 3.970 95.290
ATOM 11414 HC1 MeO 2002 18.320 3.220 95.290
Thank you for any helpful comments and suggestions.
Hello everybody,
I'm still slowly treading my way into bash scripting (without any prior programming experience) and hence my code is mostly what some might call "creative" if they meant well
I have created a script that serves its purpose but it does so very slowly, since it needs to work its way through ~1million lines of text file input and does so with a "while read line" loop, which slows down the process terribly.
It's not the read that's slow, it's "echo stuff | awk". You do that twice, so for a file with a million lines, you're running two million separate instances of awk! awk is a full-fledged language in its own right which you're loading, running, and quitting each time you use it. It's capable of processing more than one line, which is the efficient way to use it -- instead of 99% time spent loading/quitting, most time will be spent actually processing. You might as well do it all in awk.
You also have lots of useless use of backticks. Why do var=`echo stuff` when you can just do var=stuff ?
Incidentally, the shell can split variables by itself. You could do while read V1 V2 V3 V4 V5 V6 V7 to get rid of that first awk.
You could change:
to:
to have the korn shell perform integer math inline and speed things up even more. It's actually quite surprising the difference it makes. consider this example:
Output:
While not a huge difference for this simple example, it could make an impact if multiple calculations are being done on millions of records.
It's not the read that's slow, it's "echo stuff | awk".(...) It's capable of processing more than one line, which is the efficient way to use it -- instead of 99% time spent loading/quitting, most time will be spent actually processing. (...) You also have lots of useless use of backticks. (...) Incidentally, the shell can split variables by itself. You could do while read V1 V2 V3 V4 V5 V6 V7 to get rid of that first awk.
This was extremely helpful and will certainly improve my scripting attemps in the future. Thanks a bunch.
Just for the sake of some other rookie searching on the same issue, this is the code that worked for me...
This User Gave Thanks to origamisven For This Post:
Hello.
System : opensuse leap 42.3
I have a bash script that build a text file.
I would like the last command doing :
print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt
where :
print_cmd ::= some printing... (1 Reply)
Hi everybody,
I am new at Unix/Bourne shell scripting and with my youngest experiences, I will not become very old with it :o
My code:
#!/bin/sh
set -e
set -u
export IFS=
optl="Optl"
LOCSTORCLI="/opt/lsi/storcli/storcli"
($LOCSTORCLI /c0 /vall show | grep RAID | cut -d " "... (5 Replies)
How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address
and column 3 contains “cc” e-mail address to include with same email.
Sample input file, email.txt
Below is an sample code where... (2 Replies)
Hi All,
I'm completely new to bash scripting and still learning my way through albeit vey slowly.
I need to know where to insert my server names', my ip address numbers through out the script alas to no avail.
I'm also searching on how to save .sh (bash shell) script properly.... (25 Replies)
Hi,
I have line in input file as below:
3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL
My expected output for line in the file must be :
"1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL"
Can someone... (7 Replies)
Hello!
Sorry, for my not so perfect english!
I want to stop bash shell script execution until any key is pressed.
This line in a bash shell script
read -n1 -r -p "Press any key to continue..." key
produces this error
When I run this from the command line
usera@lynx:~$ read... (4 Replies)
Hallo,
i need a Prompting read in my script:
read -p "Enter your command: " command
But i always get this Error:
-p: is not an identifier
When I run these in c-shell i get this error
/usr/bin/read: read: bad option(s)
How can I use a Prompt in the read command? (9 Replies)
Hi,
I need a command in UNIX KSH below is the description...
MAPPING DESCRIPTION ="Test Mapping for the calid inputs" ISVALID ="YES" NAME ="m_test_xml" OBJECTVERSION ="1" VERSIONNUMBER ="1"
unix ksh command to read the DESCRIPTION and write to a file
Test Mapping for the calid inputs... (3 Replies)
Hi Friends,
Can any of you explain me about the below line of code?
mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`
Im not able to understand, what exactly it is doing :confused:
Any help would be useful for me.
Lokesha (4 Replies)