merge two text files of different size on common index
I have two text files. text file 1:
text file 2:
I need to merge the two, keeping only the rows that appear in both files (the shorter list could be the index). The column filePath is the index, so the final file should look like.
I am guessing this could be done in awk, and certainly in perl, but I'm not sure how do to the alignment by the index.
hi,
i am facing a problem in merging two files using awk,
the problem is as stated below,
file1:
A|B|C|D|E|F|G|H|I|1
M|N|O|P|Q|R|S|T|U|2
AA|BB|CC|DD|EE|FF|GG|HH|II|1
....
....
....
file2 :
1|Mn|op|qr (2 Replies)
I'm running on freebsd -- with a default shell of csh.
I have two files named A and B. Each line of each file contains a file name. How can I write a script that removes all the file names in file B from A.
I tried to use perl to create a huge regular expression with "|" separating the file... (2 Replies)
I have 100 data files labelled 250.1.txt through 250.100.txt. The second column of the data files partially match (there is about %90 overlap). Each data file has 4 columns.
I want the merge all these text files by the matching values in the second column. In the output, the first column should... (1 Reply)
Hi,
I have two files that I would like to merge and think that there should be a solution using awk. The files look something like this:
file 1
IDX1 IDY1
IDX2 IDY2
IDX3 IDY3
file 2
IDY1 dataA data1
IDY2 dataB data2
IDY3 dataC data3
Desired output
IDX1 IDY1 dataA data1
IDX2 ... (5 Replies)
I have a need to merge two files on the value of an index column.
input file 1
id filePath MDL_NUMBER
1 MFCD00008104.mol MFCD00008104
2 MFCD00012849.mol MFCD00012849
3 MFCD00037597.mol MFCD00037597
4 MFCD00064558.mol MFCD00064558
5 MFCD00064559.mol MFCD00064559
input file 2
... (9 Replies)
Hi,
I have two files A (2190 rows) and file B (1100 rows). I want to merge the contents of two files based on common field, also I need the unmatched rows from file A
file A:
ABC
XYZ
PQR
file B:
>LMN|chr1:11000-12456:
>ABC|chr15:176578-187678:
>PQR|chr3:14567-15866:
output... (3 Replies)
Hi all,
Say i have multiple files x1 x2 x3 x4, all with common header (date, time, year, age),,
How can I merge them to one singe file "X" in shell scripting
Thanks for your suggestions. (2 Replies)
Hi, I am trying to selectively merge two files based on keys reported in the 1st column.
File1:
#file1-header1
file1-header2
111 qwe rtz uio
198 asd fgh jkl
165 yxc
789 poi uzt rew
89 lkj
File2:
#file2-header2
file2-header2
165 ghz nko2 ... (2 Replies)
Hello,
I have 40 data files where the first three columns are the same (in theory) and the 4th column is different. Here is an example of three files,
file 2: A_f0_r179_pred.txt
Id Group Name E0
1 V N(,)'1 0.2904
2 V N(,)'2 0.3180
3 V N(,)'3 0.3277
4 V N(,)'4 0.3675
5 V N(,)'5 0.3456
... (8 Replies)
Discussion started by: LMHmedchem
8 Replies
LEARN ABOUT DEBIAN
g_msd
g_msd(1) GROMACS suite, VERSION 4.5.4-dev-20110404-bc5695c g_msd(1)NAME
g_msd - calculates mean square displacements
VERSION 4.5.4-dev-20110404-bc5695c
SYNOPSIS
g_msd -f traj.xtc -s topol.tpr -n index.ndx -o msd.xvg -mol diff_mol.xvg -pdb diff_mol.pdb -[no]h -[no]version -nice int -b time -e time
-tu enum -[no]w -xvg enum -type enum -lateral enum -[no]ten -ngroup int -[no]mw -[no]rmcomm -tpdb time -trestart time -beginfit time -end-
fit time
DESCRIPTION
g_msd computes the mean square displacement (MSD) of atoms from a set of initial positions. This provides an easy way to compute the dif-
fusion constant using the Einstein relation. The time between the reference points for the MSD calculation is set with -trestart. The
diffusion constant is calculated by least squares fitting a straight line (D*t + c) through the MSD(t) from -beginfit to -endfit (note
that t is time from the reference positions, not simulation time). An error estimate given, which is the difference of the diffusion coef-
ficients obtained from fits over the two halves of the fit interval.
There are three, mutually exclusive, options to determine different types of mean square displacement: -type, -lateral and -ten. Option
-ten writes the full MSD tensor for each group, the order in the output is: trace xx yy zz yx zx zy.
If -mol is set, g_msd plots the MSD for individual molecules (including making molecules whole across periodic boundaries): for each
individual molecule a diffusion constant is computed for its center of mass. The chosen index group will be split into molecules.
The default way to calculate a MSD is by using mass-weighted averages. This can be turned off with -nomw.
With the option -rmcomm, the center of mass motion of a specific group can be removed. For trajectories produced with GROMACS this is usu-
ally not necessary, as mdrun usually already removes the center of mass motion. When you use this option be sure that the whole system is
stored in the trajectory file.
The diffusion coefficient is determined by linear regression of the MSD, where, unlike for the normal output of D, the times are weighted
according to the number of reference points, i.e. short times have a higher weight. Also when -beginfit=-1,fitting starts at 10% and when
-endfit=-1, fitting goes to 90%. Using this option one also gets an accurate error estimate based on the statistics between individual
molecules. Note that this diffusion coefficient and error estimate are only accurate when the MSD is completely linear between -beginfit
and -endfit.
Option -pdb writes a .pdb file with the coordinates of the frame at time -tpdb with in the B-factor field the square root of the diffu-
sion coefficient of the molecule. This option implies option -mol.
FILES -f traj.xtc Input
Trajectory: xtc trr trj gro g96 pdb cpt
-s topol.tpr Input
Structure+mass(db): tpr tpb tpa gro g96 pdb
-n index.ndx Input, Opt.
Index file
-o msd.xvg Output
xvgr/xmgr file
-mol diff_mol.xvg Output, Opt.
xvgr/xmgr file
-pdb diff_mol.pdb Output, Opt.
Protein data bank file
OTHER OPTIONS
-[no]hno
Print help info and quit
-[no]versionno
Print version info and quit
-nice int 19
Set the nicelevel
-b time 0
First frame (ps) to read from trajectory
-e time 0
Last frame (ps) to read from trajectory
-tu enum ps
Time unit: fs, ps, ns, us, ms or s
-[no]wno
View output .xvg, .xpm, .eps and .pdb files
-xvg enum xmgrace
xvg plot formatting: xmgrace, xmgr or none
-type enum no
Compute diffusion coefficient in one direction: no, x, y or z
-lateral enum no
Calculate the lateral diffusion in a plane perpendicular to: no, x, y or z
-[no]tenno
Calculate the full tensor
-ngroup int 1
Number of groups to calculate MSD for
-[no]mwyes
Mass weighted MSD
-[no]rmcommno
Remove center of mass motion
-tpdb time 0
The frame to use for option -pdb (ps)
-trestart time 10
Time between restarting points in trajectory (ps)
-beginfit time -1
Start time for fitting the MSD (ps), -1 is 10%
-endfit time -1
End time for fitting the MSD (ps), -1 is 90%
SEE ALSO gromacs(7)
More information about GROMACS is available at <http://www.gromacs.org/>.
Mon 4 Apr 2011 g_msd(1)