Sponsored Content
Top Forums Programming [Solved] Removing duplicates from the file and saving as new file Post 302734633 by bala06 on Thursday 22nd of November 2012 01:04:02 PM
Old 11-22-2012
[Solved] Removing duplicates from the file and saving as new file

Dear All

I have 200 data files and each files has many duplicates.

I am looking for the automated awk script such that it checks and removes the duplicates from the each file and saving them as new files for all 200 files in the respective folder.

For example my data looks like this..


Code:
HETATM 4427 LPA    1     1       5.210   9.727   8.104  1.00  0.00      PROB
HETATM 4428 LPA    1     1       5.151   9.153   8.365  1.00  0.00      PROB
HETATM 4429 LPA    1     1       2.339   7.349   5.955  1.00  0.00      PROB
HETATM 4430 LPA    1     1       2.144   8.104   5.275  1.00  0.00      PROB
HETATM 4431 LPA    1     1       2.473   8.896   5.218  1.00  0.00      PROB
HETATM 4498 LPA    1     1       1.679   7.107   7.511  1.00  0.00      PROB
HETATM 4506 LPA    1     1       2.001   8.185   5.346  1.00  0.00      PROB
HETATM 4507 LPA    1     1       2.363   7.711   4.485  1.00  0.00      PROB
HETATM 4427 LPA    1     1       5.210   9.727   8.104  1.00  0.00      PROB


I have to remove the line where "4427" is repeated twice and save as new file.

Kindly advice.

Many Thanks
Balaji

Last edited by Corona688; 11-22-2012 at 02:39 PM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

removing duplicates from a file

i have a file with some 1000 entries it will contain entries like 1000,ram 2000,pankaj 1001,rahim 1000,ram 2532,govind 2000,pankaj 3000,venkat 2532,govind what i want is i want to extract only the distinct rows from this file so my output should contain only 1000,ram... (2 Replies)
Discussion started by: trichyselva
2 Replies

2. Shell Programming and Scripting

Removing duplicates in a sorted file by field.

I have data like this: It's sorted by the 2nd field (TID). envoy,90000000000000634600010001,04/11/2008,23:19:27,RB00266,0015,DETAIL,ERROR, envoy,90000000000000634600010001,04/12/2008,04:23:45,RB00266,0015,DETAIL,ERROR,... (1 Reply)
Discussion started by: kinksville
1 Replies

3. UNIX for Dummies Questions & Answers

removing duplicates of a pattern from a file

hey all, I need some help. I have a text file with names in it. My target is that if a particular pattern exists in that file more than once..then i want to rename all the occurences of that pattern by alternate patterns.. for e.g if i have PATTERN occuring 5 times then i want to... (3 Replies)
Discussion started by: ashisharora
3 Replies

4. Shell Programming and Scripting

Removing duplicates from log file?

I have a log file with posts looking like this: -- Messages can be delivered by different systems at different times. The id number is used to sort out duplicate messages. What I need is to strip the arrival time from each post, sort posts by id number, and reattach arrival time to respective... (2 Replies)
Discussion started by: Ilja
2 Replies

5. Shell Programming and Scripting

Removing Duplicates from file

Hi Experts, Please check the following new requirement. I got data like the following in a file. FILE_HEADER 01cbbfde7898410| 3477945| home| 1 01cbc275d2c122| 3478234| WORK| 1 01cbbe4362743da| 3496386| Rich Spare| 1 01cbc275d2c122| 3478234| WORK| 1 This is pipe separated file with... (3 Replies)
Discussion started by: tinufarid
3 Replies

6. Shell Programming and Scripting

formatting a file and removing duplicates

Hi, I have a file that I want to change the format of. It is a large file in rows but I want it to be comma separated (comma then a space). The current file looks like this: HI, Joe, Bob, Jack, Jack After I would want to remove any duplicates so it would look like this: HI, Joe,... (2 Replies)
Discussion started by: kylle345
2 Replies

7. UNIX for Dummies Questions & Answers

Removing duplicates from a file

Hi All, I am merging files coming from 2 different systems ,while doing that I am getting duplicates entries in the merged file I,01,000131,764,2,4.00 I,01,000131,765,2,4.00 I,01,000131,772,2,4.00 I,01,000131,773,2,4.00 I,01,000168,762,2,2.00 I,01,000168,763,2,2.00... (5 Replies)
Discussion started by: Sri3001
5 Replies

8. UNIX for Dummies Questions & Answers

Grep from pattern file without removing duplicates?

I have been using grep to output whole lines using a pattern file with identifiers (fileA): fig|562.2322.peg.1 fig|562.2322.peg.3 fig|562.2322.peg.3 fig|562.2322.peg.3 fig|562.2322.peg.7 From fileB with corresponding identifiers in the second column: NODE_0 fig|562.2322.peg.1 peg ... (2 Replies)
Discussion started by: Mauve
2 Replies

9. Shell Programming and Scripting

Removing duplicates from new file

i hav two files like i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3 (2 Replies)
Discussion started by: sagar_1986
2 Replies

10. Shell Programming and Scripting

Removing duplicates from new file

i hav two files like i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3.I have tried previous post also,but in that complete line must be similar.In this case i have to verify first column only regardless what is the content in succeeding columns. (3 Replies)
Discussion started by: sagar_1986
3 Replies
RASTEP(1)						      General Commands Manual							 RASTEP(1)

NAME
rastep - (Raster3D Thermal Ellipsoid Program) SYNOPSIS
rastep [-h] [-iso] [-Bcolor Bmin Bmax] [-prob Plevel] [-fancy[0-3]] [-radius R] < infile.pdb > ellipsoids.r3d rastep -tabulate [tabfile] [-by_atomtype] [-com [comtabfile]] < infile.pdb > statistics.text Rastep reads a PDB coordinate file. This file must contain ANISOU records describing atoms refined anisotropically. Rastep can either create an input file for the Raster3D render program or perform a statistical analysis of the atomic anisotropy for various classes of input atoms. By default the program creates an ellipsoid+stick scene description in which each atom is represented by an ellipsoid enclosing an isosurface of the probability density function. These are commonly known as thermal ellipsoids. The program can be run in an alternate mode, controlled by the -tabulate option, in which the primary output to stdout is a list of the Ei- genvalues of the Uij matrix, followed by the corresponding atomic anisotropy and isotropic Ueq, for each atom in the input file with both an ATOM record and a matching ANISOU record. This mode is used by the validation tools Parvati and Skittls. EXAMPLES
To describe thermal ellipsoids at the 50% probability level, with default CPK colors, and send it for immediate rendering into a PNG file rastep < infile.pdb | render -png picture.png To describe the same ellipsoids colored by Biso, omiting header records so that the resulting input file can be merged with other scene components rastep -h -Bcolor 10. 30. < infile.pdb > ellipsoids.r3d cat header.r3d ellipsoids.r3d otherstuff.r3d | render > picture.png OPTIONS
-auto Auto-selection of viewing angle, chosen to minimize the spread of the atoms along the view direction. -Bcolor Bmin Bmax Assign colors based on B values rather than mathcing ATOM records against input or default COLOUR records. Atoms with B <= Bmin will be colored dark blue; atoms with B >= Bmax will be colored light red; atoms with Bmin < B < Bmax will be assigned colors shading smoothly through the spectrum from blue to red. -fancy[0-6] The -fancy option selects increasingly complex representations of the rendered ellipsoids. -fancy0 [default] = solid surface -fancy1 = principal axes of ellipsoid, with transparent bounding surface -fancy2 = colored equatorial planes of the ellipsoid -fancy3 = colored equatorial planes with transparent bounding surface -fancy4 = transparent bounding surface containing longest principle axis -fancy5 = for ORTEP lovers, a solid ellipsoid with one octant missing -fancy6 = for ORTEP lovers who want the missing octant in a separate color -h Suppress header records in output. By default rastep will produce an output file which starts with header records containing a default set of scaling and processing options. The -h flag will suppress these header records. This option is useful for producing files which describe only part of a scene, and which are to be later combined with descriptor files produced by other programs. -iso Force isotropic probability surfaces (spheres). By default rastep will look for ANISOU records in the PDB file and use these to generate ellipsoids. If no ANISOU record is present for a given atom, the B value given in the ATOM/HETATM record will be used to generate a sphere instead. Selecting the -iso option will force the program to use the B value in the ATOM record even if an ANISOU record is also present. -mini Auto-orientation (as in -auto) and small size plot (176x208). -nohydrogens Do not plot hydrogens, even if present in PDB file. -prob Plevel By default, isosurfaces are drawn to enclose the 50% probability level in the density function described by the Uij values in the ANISOU record. The -prob option allows you to select a different probability level instead. If 0 < Plevel < 1 this value is interpreted as a fraction; if Plevel > 1 this value is interpreted as a percent. -radius R By default, rastep draws bonds with radius 0.10A between neighboring atoms using the same algorithm as rods. This option allows you to change the radius of the bonds. If the radius is set to 0 no bonds are drawn. -tabulate [tabfile] The -tabulate option requests that the program accumulate and print statistics on the distribution of anisotropy among atoms in the input file rather than producing an input file for render. The principle axes and anisotropy of each atom are written to stdout. An overall sta- tistical summary is written to tabfile if specified, otherwise to stdout. -by_atomtype The -by_atomtype option is a modifier to -tabulate. It causes a further subdivision of atoms by atom type in the preparation of statistical summaries. Atom types are taken from columns 77:78 of the PDB ATOM records. -com [comtabfile] Tabulate distribution of anisotropy in shells by distance from center-of-mass. Output to comtabfile if specified, otherwise to stdout. NOTES
There is little, if any, consistency in format among the various programs which write out anisotropic displacement parameters. This program interprets the Uij values in the order specified for ANISOU records in PDB format. That is, columns 29-70 of the PDB record are inter- preted as integers representing 10000 * Uij, in the order U11 U22 U33 U12 U13 U23. Note in particular that the order of the cross-terms is not the same as that used by ORTEP or shelx, neither of which use PDB format anyway. However, the program shelxpro will produce correctly formatted PDB records from a shelx coordinate file. SOURCE
web URL: http://www.bmsc.washington.edu/raster3d/raster3d.html contact: Ethan A Merritt University of Washington, Seattle WA 98195 merritt@u.washington.edu SEE ALSO
raster3d(l), render(l) AUTHORS
Ethan A Merritt. Raster3D 14 Dec 2010 RASTEP(1)
All times are GMT -4. The time now is 10:08 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy