Sponsored Content
Top Forums Shell Programming and Scripting How to get this script work on multiple input files Post 302448467 by Daniel8472 on Thursday 26th of August 2010 05:51:47 AM
Old 08-26-2010
Bug How to get this script work on multiple input files

Hello Gyues!

I would like to use awk to perform data extraction from several files. The data files look like this:

Code:
 DWT26R 1 PEP1 CA 1 OH2 SKIPPED: 0 STEP: 1
0.29000E+01 0.55005E-02 0.60012E-03
0.30000E+01 0.11149E+00 0.13603E-01
0.31000E+01 0.39719E+00 0.63013E-01
0.32000E+01 0.94264E+00 0.18784E+00
0.33000E+01 0.17744E+01 0.43749E+00
0.35000E+01 0.32350E+01 0.13273E+01
0.36000E+01 0.34913E+01 0.19104E+01
.
.
.

The first line is unique for each file and contains information I would like to add to the output. In fact, I need to seach for the highest value in $2 and print it together with the the first line of that file. Then the next file needs to be processed the same way.

For A single file it works fine though but how can I do this with multiple files? I think I somehow need to assigne information from the unique first line to the values of each file and store it in an array. At the end I simply need to print that array containing these information... However I really could not get it work so far...

The current code that works for a single file is:

Code:
BEGIN     {
    print "trajectory= traj molecules= mol Peptide= pep resid(CA?)= res contact= so (max)solv/sphere= n Radius(A)= r";
    print "traj", "mol", "pep", "res", "co", "n", "     r"; #just a header for the output
    }



# need to read substring in order to get exponential funktion
    {
    if (NR==1)    {
            expo=0;
            coomp=0;
            co=0;
            max=0;
            maxline=0;    
            traj=$2;
            mol=$1;
            pep=$3;
            res=$5;
            so=$6;
            } #saving file information and resetting comparison set
    else         {
            expo=10^(substr($2,9,3)); #extract exponent
            comp=(substr($2,3,5)/100000); 
            co=comp*expo;
            if (co > max) {max=co; maxline=substr($1,3,5)/100000*10^(substr($1,9,3))} # extract highest value from file
            }
    }



END     { 
    print traj, mol, pep, res, so, max, maxline; #print highest value and information from the first line
    }

Hope you gyues can help me out.

Cheers,
Daniel
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

can you redirect multiple files for input?

I have a program that is reading strings into a vector from a file. Currently I am using this command: a.out < file1 The program runs and prints the contents of the vector to the screen, like its supposed to. The problem is that it needs to read in 3 files to fill the vector. Is there anyway... (4 Replies)
Discussion started by: Matrix_Prime
4 Replies

2. Shell Programming and Scripting

Splitting input files into multiple files through AWK command

Hi, I needs to split *.txt files from single directory depends on the some mutltiple input values. i have wrote the code like below for file in *.txt do grep -i -h "value1|value2" $file > $file; done. My requirment is more input values needs to be given in grep; let us say 50... (3 Replies)
Discussion started by: arund_01
3 Replies

3. Shell Programming and Scripting

how to redirect multiple input files?

I have a program that runs like "cat f1 - f2 -", I need to write shell script to run the program whose standard input will be redirected from 2 files. I spend a whole day on it, but didn't figure out. Can someone help me out? Thanks! (8 Replies)
Discussion started by: microstarwwx
8 Replies

4. Shell Programming and Scripting

How to make an editing script work for multiple files?

Hey everybody, I have a script for making a string substitution in a file. I am trying to modify it in order to make the same modifcation to multiples files. here is what I have so far. #!/bin/csh set p1="$1" shift set p2="$1" shift foreach x ($*) if ( { grep -w -c "$p1" $x } ) then mv... (7 Replies)
Discussion started by: iwatk003
7 Replies

5. UNIX for Advanced & Expert Users

Input for multiple files.

Hi, I am trying to come up with a script, and would like the script to pick all the files place within a folder and interactive take my yes/no before processing within the command. Could you someone help me in modifying the script : #!/bin/bash # LDIF_FILES="File Name" for MY_FILE... (5 Replies)
Discussion started by: john_prince
5 Replies

6. UNIX for Dummies Questions & Answers

Writing a loop to process multiple input files by a shell script

I have multiple input files that I want to manipulate using a shell script. The files are called 250.1 through 250.1000 but I only want the script to manipulate 250.300 through 250.1000. Before I was using the following script to manipulate the text files: for i in 250.*; do || awk... (4 Replies)
Discussion started by: evelibertine
4 Replies

7. Shell Programming and Scripting

awk, multiple files input and multiple files output

Hi! I'm new in awk and I need some help. I have a folder with a lot of files and I need that awk do something in each file and print a new file with the output. The input file name should be modified when I print the outpu files. Thanks in advance for help! :-) ciao (5 Replies)
Discussion started by: gabrysfe
5 Replies

8. Shell Programming and Scripting

Script to delete files with an input for directories and an input for path/file

Hello, I'm trying to figure out how best to approach this script, and I have very little experience, so I could use all the help I can get. :wall: I regularly need to delete files from many directories. A file with the same name may exist any number of times in different subdirectories.... (3 Replies)
Discussion started by: *ShadowCat*
3 Replies

9. Shell Programming and Scripting

Script to delete files older than x days and also taking an input for multiple paths

Hi , I am a newbie!!! I want to develop a script for deleting files older than x days from multiple paths. Now I could reach upto this piece of code which deletes files older than x days from a particular path. How do I enhance it to have an input from a .txt file or a .dat file? For eg:... (12 Replies)
Discussion started by: jhilmil
12 Replies

10. Shell Programming and Scripting

[Solved] Multiple input files and output files

Hi, I have many test*.ft1 files to which I want to read as input for a script called pipe2txt.tcl and print the output in each separate file. For example, pipe2txt.tcl < test001.ft1 > test001.txt How can I read many files in this maner? thank you very much, Best, Pahuja (5 Replies)
Discussion started by: Pahuja
5 Replies
trjconv(1)					 GROMACS suite, VERSION 4.5.4-dev-20110404-bc5695c					trjconv(1)

NAME
trjconv - converts and manipulates trajectory files VERSION 4.5.4-dev-20110404-bc5695c SYNOPSIS
trjconv -f traj.xtc -o trajout.xtc -s topol.tpr -n index.ndx -fr frames.ndx -sub cluster.ndx -drop drop.xvg -[no]h -[no]version -nice int -b time -e time -tu enum -[no]w -xvg enum -skip int -dt time -[no]round -dump time -t0 time -timestep time -pbc enum -ur enum -[no]center -boxcenter enum -box vector -clustercenter vector -trans vector -shift vector -fit enum -ndec int -[no]vel -[no]force -trunc time -exec string -[no]app -split time -[no]sep -nzero int -dropunder real -dropover real -[no]conect DESCRIPTION
trjconv can convert trajectory files in many ways: 1. from one format to another 2. select a subset of atoms 3. change the periodicity representation 4. keep multimeric molecules together 5. center atoms in the box 6. fit atoms to reference structure 7. reduce the number of frames 8. change the timestamps of the frames ( -t0 and -timestep) 9. cut the trajectory in small subtrajectories according to information in an index file. This allows subsequent analysis of the subtra- jectories that could, for example, be the result of a cluster analysis. Use option -sub. This assumes that the entries in the index file are frame numbers and dumps each group in the index file to a separate trajectory file. 10. select frames within a certain range of a quantity given in an .xvg file. The program trjcat is better suited for concatenating multiple trajectory files. Currently seven formats are supported for input and output: .xtc, .trr, .trj, .gro, .g96, .pdb and .g87. The file formats are detected from the file extension. The precision of .xtc and .gro output is taken from the input file for .xtc, .gro and .pdb, and from the -ndec option for other input formats. The precision is always taken from -ndec, when this option is set. All other formats have fixed precision. .trr and .trj output can be single or double precision, depending on the precision of the trjconv binary. Note that velocities are only supported in .trr, .trj, .gro and .g96 files. Option -app can be used to append output to an existing trajectory file. No checks are performed to ensure integrity of the resulting combined trajectory file. Option -sep can be used to write every frame to a separate .gro, .g96 or .pdb file. By default, all frames all written to one file. .pdb files with all frames concatenated can be viewed with rasmol -nmrpdb. It is possible to select part of your trajectory and write it out to a new trajectory file in order to save disk space, e.g. for leaving out the water from a trajectory of a protein in water. ALWAYS put the original trajectory on tape! We recommend to use the portable .xtc format for your analysis to save disk space and to have portable files. There are two options for fitting the trajectory to a reference either for essential dynamics analysis, etc. The first option is just plain fitting to a reference structure in the structure file. The second option is a progressive fit in which the first timeframe is fitted to the reference structure in the structure file to obtain and each subsequent timeframe is fitted to the previously fitted structure. This way a continuous trajectory is generated, which might not be the case when using the regular fit method, e.g. when your protein undergoes large conformational transitions. Option -pbc sets the type of periodic boundary condition treatment: * mol puts the center of mass of molecules in the box. * res puts the center of mass of residues in the box. * atom puts all the atoms in the box. * nojump checks if atoms jump across the box and then puts them back. This has the effect that all molecules will remain whole (provided they were whole in the initial conformation). Note that this ensures a continuous trajectory but molecules may diffuse out of the box. The starting configuration for this procedure is taken from the structure file, if one is supplied, otherwise it is the first frame. * cluster clusters all the atoms in the selected index such that they are all closest to the center of mass of the cluster, which is iter- atively updated. Note that this will only give meaningful results if you in fact have a cluster. Luckily that can be checked afterwards using a trajectory viewer. Note also that if your molecules are broken this will not work either. The separate option -clustercenter can be used to specify an approximate center for the cluster. This is useful e.g. if you have two big vesicles, and you want to maintain their relative positions. * whole only makes broken molecules whole. Option -ur sets the unit cell representation for options mol, res and atom of -pbc. All three options give different results for tri- clinic boxes and identical results for rectangular boxes. rect is the ordinary brick shape. tric is the triclinic unit cell. compact puts all atoms at the closest distance from the center of the box. This can be useful for visualizing e.g. truncated octahedra. The center for options tric and compact is tric (see below), unless the option -boxcenter is set differently. Option -center centers the system in the box. The user can select the group which is used to determine the geometrical center. Option -boxcenter sets the location of the center of the box for options -pbc and -center. The center options are: tric: half of the sum of the box vectors, rect: half of the box diagonal, zero: zero. Use option -pbc mol in addition to -center when you want all molecules in the box after the centering. With -dt, it is possible to reduce the number of frames in the output. This option relies on the accuracy of the times in your input tra- jectory, so if these are inaccurate use the -timestep option to modify the time (this can be done simultaneously). For making smooth movies, the program g_filter can reduce the number of frames while using low-pass frequency filtering, this reduces aliasing of high fre- quency motions. Using -trunc trjconv can truncate .trj in place, i.e. without copying the file. This is useful when a run has crashed during disk I/O (i.e. full disk), or when two contiguous trajectories must be concatenated without having double frames. Option -dump can be used to extract a frame at or near one specific time from your trajectory. Option -drop reads an .xvg file with times and values. When options -dropunder and/or -dropover are set, frames with a value below and above the value of the respective options will not be written. FILES
-f traj.xtc Input Trajectory: xtc trr trj gro g96 pdb cpt -o trajout.xtc Output Trajectory: xtc trr trj gro g96 pdb -s topol.tpr Input, Opt. Structure+mass(db): tpr tpb tpa gro g96 pdb -n index.ndx Input, Opt. Index file -fr frames.ndx Input, Opt. Index file -sub cluster.ndx Input, Opt. Index file -drop drop.xvg Input, Opt. xvgr/xmgr file OTHER OPTIONS
-[no]hno Print help info and quit -[no]versionno Print version info and quit -nice int 19 Set the nicelevel -b time 0 First frame (ps) to read from trajectory -e time 0 Last frame (ps) to read from trajectory -tu enum ps Time unit: fs, ps, ns, us, ms or s -[no]wno View output .xvg, .xpm, .eps and .pdb files -xvg enum xmgrace xvg plot formatting: xmgrace, xmgr or none -skip int 1 Only write every nr-th frame -dt time 0 Only write frame when t MOD dt = first time (ps) -[no]roundno Round measurements to nearest picosecond -dump time -1 Dump frame nearest specified time (ps) -t0 time 0 Starting time (ps) (default: don't change) -timestep time 0 Change time step between input frames (ps) -pbc enum none PBC treatment (see help text for full description): none, mol, res, atom, nojump, cluster or whole -ur enum rect Unit-cell representation: rect, tric or compact -[no]centerno Center atoms in box -boxcenter enum tric Center for -pbc and -center: tric, rect or zero -box vector 0 0 0 Size for new cubic box (default: read from input) -clustercenter vector 0 0 0 Optional starting point for pbc cluster option -trans vector 0 0 0 All coordinates will be translated by trans. This can advantageously be combined with -pbc mol -ur compact. -shift vector 0 0 0 All coordinates will be shifted by framenr*shift -fit enum none Fit molecule to ref structure in the structure file: none, rot+trans, rotxy+transxy, translation, transxy or progressive -ndec int 3 Precision for .xtc and .gro writing in number of decimal places -[no]velyes Read and write velocities if possible -[no]forceno Read and write forces if possible -trunc time -1 Truncate input trajectory file after this time (ps) -exec string Execute command for every output frame with the frame number as argument -[no]appno Append output -split time 0 Start writing new file when t MOD split = first time (ps) -[no]sepno Write each frame to a separate .gro, .g96 or .pdb file -nzero int 0 If the -sep flag is set, use these many digits for the file numbers and prepend zeros as needed -dropunder real 0 Drop all frames below this value -dropover real 0 Drop all frames above this value -[no]conectno Add conect records when writing .pdb files. Useful for visualization of non-standard molecules, e.g. coarse grained ones SEE ALSO
gromacs(7) More information about GROMACS is available at <http://www.gromacs.org/>. Mon 4 Apr 2011 trjconv(1)
All times are GMT -4. The time now is 10:07 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy