merge two text files of different size on common index Post: 302518644

Sponsored Content

Top Forums Shell Programming and Scripting merge two text files of different size on common index Post 302518644 by LMHmedchem on Saturday 30th of April 2011 10:22:03 PM

04-30-2011

Registered User

merge two text files of different size on common index

I have two text files.

text file 1:

Code:

ID  filePath       col1      col2      col3
1   10584588.mol   269.126   190.958   23.237
2   10584549.mol   281.001   200.889   27.7414
3   10584511.mol   408.824   158.316   29.8561
4   10584499.mol   245.632   153.241   25.2815
5   10584459.mol   290.476   133.699   28.631
6   10584426.mol   440.552   150.846   30.1827
7   10584298.mol   243.248   164.409   21.5715
8   10584286.mol   283.078   230.034   24.3697
9   10584278.mol   287.807   198.625   27.7414
10  10584197.mol   224.356   184.317   24.3616

text file 2:

Code:

ID   filePath       SUB_ID     ChBrg_REGID
1    10584588.mol   10584588   9070369
2    10584549.mol   10584549   9070193
3    10584499.mol   10584499   9069982
4    10584459.mol   10584459   9069773
5    10584426.mol   10584426   9069641
6    10584278.mol   10584278   9069060
7    10584197.mol   10584197   9068744

I need to merge the two, keeping only the rows that appear in both files (the shorter list could be the index). The column filePath is the index, so the final file should look like.

Code:

ID  filePath       SUB_ID     ChBrg_REGID   col1      col2      col3
1   10584588.mol   10584588   9070369       269.126   190.958   23.237
2   10584549.mol   10584549   9070193       281.001   200.889   27.7414
4   10584499.mol   10584499   9069982       245.632   153.241   25.2815
5   10584459.mol   10584459   9069773       290.476   133.699   28.631
6   10584426.mol   10584426   9069641       440.552   150.846   30.1827
9   10584278.mol   10584278   9069060       287.807   198.625   27.7414
10  10584197.mol   10584197   9068744       224.356   184.317   24.3616

I am guessing this could be done in awk, and certainly in perl, but I'm not sure how do to the alignment by the index.

LMHmedchem

LMHmedchem

View Public Profile for LMHmedchem

Find all posts by LMHmedchem

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merge files of differrent size with one field common in both files using awk

hi, i am facing a problem in merging two files using awk, the problem is as stated below, file1: A|B|C|D|E|F|G|H|I|1 M|N|O|P|Q|R|S|T|U|2 AA|BB|CC|DD|EE|FF|GG|HH|II|1 .... .... .... file2 : 1|Mn|op|qr

2. Shell Programming and Scripting

How to remove common file names from text files

I'm running on freebsd -- with a default shell of csh. I have two files named A and B. Each line of each file contains a file name. How can I write a script that removes all the file names in file B from A. I tried to use perl to create a huge regular expression with "|" separating the file...

3. UNIX for Dummies Questions & Answers

Writing a loop to merge multiple files by common column

I have 100 data files labelled 250.1.txt through 250.100.txt. The second column of the data files partially match (there is about %90 overlap). Each data file has 4 columns. I want the merge all these text files by the matching values in the second column. In the output, the first column should...

4. UNIX for Dummies Questions & Answers

Merge two files with common IDs but unequal number of rows

Hi, I have two files that I would like to merge and think that there should be a solution using awk. The files look something like this: file 1 IDX1 IDY1 IDX2 IDY2 IDX3 IDY3 file 2 IDY1 dataA data1 IDY2 dataB data2 IDY3 dataC data3 Desired output IDX1 IDY1 dataA data1 IDX2 ...

5. Shell Programming and Scripting

script to merge two files on an index

I have a need to merge two files on the value of an index column. input file 1 id filePath MDL_NUMBER 1 MFCD00008104.mol MFCD00008104 2 MFCD00012849.mol MFCD00012849 3 MFCD00037597.mol MFCD00037597 4 MFCD00064558.mol MFCD00064558 5 MFCD00064559.mol MFCD00064559 input file 2 ...

6. Shell Programming and Scripting

Merge files based on both common and uncommon rows

Hi, I have two files A (2190 rows) and file B (1100 rows). I want to merge the contents of two files based on common field, also I need the unmatched rows from file A file A: ABC XYZ PQR file B: >LMN|chr1:11000-12456: >ABC|chr15:176578-187678: >PQR|chr3:14567-15866: output...

7. Shell Programming and Scripting

Find matched patterns in a column of 2 files with different size and merge them

Hi, i have input files like below:- input1 Name Seq_ID NewID Scores MT1 A0QZX3 1.65 277.4 IVO A0QZX3 1.65 244.5 HPO A0QZX3 1.65 240.5 RgP A0Q3PP 5.32 241.0 GX1 LPSZ3S 96.1 216.9 MEL LPSS3X 4.23 204.1 LDD LPSS3X 4.23 100.2 input2 Fac AddName NewID ...

8. Shell Programming and Scripting

Merge multiple files with common header

Hi all, Say i have multiple files x1 x2 x3 x4, all with common header (date, time, year, age),, How can I merge them to one singe file "X" in shell scripting Thanks for your suggestions.

9. UNIX for Dummies Questions & Answers

Merge selective columns from files based on common key

Hi, I am trying to selectively merge two files based on keys reported in the 1st column. File1: #file1-header1 file1-header2 111 qwe rtz uio 198 asd fgh jkl 165 yxc 789 poi uzt rew 89 lkj File2: #file2-header2 file2-header2 165 ghz nko2 ...

10. Shell Programming and Scripting

Merge multiple tab delimited files with index checking

Hello, I have 40 data files where the first three columns are the same (in theory) and the 4th column is different. Here is an example of three files, file 2: A_f0_r179_pred.txt Id Group Name E0 1 V N(,)'1 0.2904 2 V N(,)'2 0.3180 3 V N(,)'3 0.3277 4 V N(,)'4 0.3675 5 V N(,)'5 0.3456 ...

LEARN ABOUT DEBIAN

g_msd

g_msd(1)					 GROMACS suite, VERSION 4.5.4-dev-20110404-bc5695c					  g_msd(1)

NAME

       g_msd - calculates mean square displacements

       VERSION 4.5.4-dev-20110404-bc5695c

SYNOPSIS

       g_msd  -f  traj.xtc  -s topol.tpr -n index.ndx -o msd.xvg -mol diff_mol.xvg -pdb diff_mol.pdb -[no]h -[no]version -nice int -b time -e time
       -tu enum -[no]w -xvg enum -type enum -lateral enum -[no]ten -ngroup int -[no]mw -[no]rmcomm -tpdb time -trestart time -beginfit time  -end-
       fit time

DESCRIPTION

	 g_msd computes the mean square displacement (MSD) of atoms from a set of initial positions. This provides an easy way to compute the dif-
       fusion constant using the Einstein relation.  The time between the reference points for the MSD calculation is set  with   -trestart.   The
       diffusion  constant  is	calculated by least squares fitting a straight line (D*t + c) through the MSD(t) from  -beginfit to  -endfit (note
       that t is time from the reference positions, not simulation time). An error estimate given, which is the difference of the diffusion  coef-
       ficients obtained from fits over the two halves of the fit interval.

       There  are three, mutually exclusive, options to determine different types of mean square displacement:	-type,	-lateral and  -ten. Option
       -ten writes the full MSD tensor for each group, the order in the output is: trace xx yy zz yx zx zy.

       If  -mol is set,  g_msd plots the MSD for individual molecules (including making molecules whole  across  periodic  boundaries):  for  each
       individual molecule a diffusion constant is computed for its center of mass. The chosen index group will be split into molecules.

       The default way to calculate a MSD is by using mass-weighted averages.  This can be turned off with  -nomw.

       With the option	-rmcomm, the center of mass motion of a specific group can be removed. For trajectories produced with GROMACS this is usu-
       ally not necessary, as  mdrun usually already removes the center of mass motion.  When you use this option be sure that the whole system is
       stored in the trajectory file.

       The  diffusion  coefficient is determined by linear regression of the MSD, where, unlike for the normal output of D, the times are weighted
       according to the number of reference points, i.e. short times have a higher weight. Also when  -beginfit=-1,fitting starts at 10% and  when
       -endfit=-1,  fitting  goes  to  90%.  Using this option one also gets an accurate error estimate based on the statistics between individual
       molecules.  Note that this diffusion coefficient and error estimate are only accurate when the MSD is completely linear between	 -beginfit
       and  -endfit.

       Option	-pdb writes a  .pdb file with the coordinates of the frame at time  -tpdb with in the B-factor field the square root of the diffu-
       sion coefficient of the molecule.  This option implies option  -mol.

FILES

       -f traj.xtc Input
	Trajectory: xtc trr trj gro g96 pdb cpt

       -s topol.tpr Input
	Structure+mass(db): tpr tpb tpa gro g96 pdb

       -n index.ndx Input, Opt.
	Index file

       -o msd.xvg Output
	xvgr/xmgr file

       -mol diff_mol.xvg Output, Opt.
	xvgr/xmgr file

       -pdb diff_mol.pdb Output, Opt.
	Protein data bank file

OTHER OPTIONS

       -[no]hno
	Print help info and quit

       -[no]versionno
	Print version info and quit

       -nice int 19
	Set the nicelevel

       -b time 0
	First frame (ps) to read from trajectory

       -e time 0
	Last frame (ps) to read from trajectory

       -tu enum ps
	Time unit:  fs,  ps,  ns,  us,	ms or  s

       -[no]wno
	View output  .xvg,  .xpm,  .eps and  .pdb files

       -xvg enum xmgrace
	xvg plot formatting:  xmgrace,	xmgr or  none

       -type enum no
	Compute diffusion coefficient in one direction:  no,  x,  y or	z

       -lateral enum no
	Calculate the lateral diffusion in a plane perpendicular to:  no,  x,  y or  z

       -[no]tenno
	Calculate the full tensor

       -ngroup int 1
	Number of groups to calculate MSD for

       -[no]mwyes
	Mass weighted MSD

       -[no]rmcommno
	Remove center of mass motion

       -tpdb time 0
	The frame to use for option  -pdb (ps)

       -trestart time 10
	Time between restarting points in trajectory (ps)

       -beginfit time -1
	Start time for fitting the MSD (ps), -1 is 10%

       -endfit time -1
	End time for fitting the MSD (ps), -1 is 90%

SEE ALSO

       gromacs(7)

       More information about GROMACS is available at <http://www.gromacs.org/>.

								  Mon 4 Apr 2011							  g_msd(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merge files of differrent size with one field common in both files using awk

Discussion started by: shashi1982

2. Shell Programming and Scripting

How to remove common file names from text files

Discussion started by: siegfried

3. UNIX for Dummies Questions & Answers

Writing a loop to merge multiple files by common column

Discussion started by: evelibertine

4. UNIX for Dummies Questions & Answers

Merge two files with common IDs but unequal number of rows

Discussion started by: katie8856

5. Shell Programming and Scripting

script to merge two files on an index

Discussion started by: LMHmedchem

6. Shell Programming and Scripting

Merge files based on both common and uncommon rows

Discussion started by: Diya123

7. Shell Programming and Scripting

Find matched patterns in a column of 2 files with different size and merge them

Discussion started by: redse171

8. Shell Programming and Scripting

Merge multiple files with common header

Discussion started by: msarguru

9. UNIX for Dummies Questions & Answers

Merge selective columns from files based on common key

Discussion started by: dovah

10. Shell Programming and Scripting

Merge multiple tab delimited files with index checking

Discussion started by: LMHmedchem

LEARN ABOUT DEBIAN

g_msd