Sponsored Content
Top Forums Shell Programming and Scripting How can I remove those duplicate sequence in UNIX?What command line I should type? Post 302279203 by Franklin52 on Thursday 22nd of January 2009 05:24:45 AM
Old 01-22-2009
Something like this?

Code:
awk '/>/{s=$0;next}!a[$0]++{print s;print}' file

Regards
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Remove Duplicate line

Hi, I have a scenario here where I have created a flatfile with the below mentioned information. File as you can see is dispalyed in three columns 1st column is FileNameString 2nd column is Report_Name (this has spaces) 3rd column is Flag Result file needed is, removal of duplicate... (1 Reply)
Discussion started by: Student37
1 Replies

2. UNIX for Dummies Questions & Answers

Remove duplicate entry in one line

Can anyone help me how can i print only the unique entry in a line? MI_AP MI_AP MI_CM MI_MF RC_NAP MBS_AP SF_RAN MBS_AP NT_CAR so that it will on output the one unique entry per line. MI_AP MI_CM MI_MF RC_NAP MBS_AP SF_RAN NT_CAR I can't find the same situation on the knowledge... (5 Replies)
Discussion started by: kharen11
5 Replies

3. Shell Programming and Scripting

How to remove those sequence with same amino acid?What command line I should type?

My input is listed as: giNumber RefAminoAcid VarAminoAcid 10190711 P P 10190711 D D 109255248 I A 110349771 A ... (4 Replies)
Discussion started by: patrick chia
4 Replies

4. Shell Programming and Scripting

How can I calculate the total of nucleotide in Unix?What command line I should type?

For example, if I have the file whose content are: >HWI-EAS382_30FC7AAXX:7:1:927:1368 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA >HWI-EAS382_30FC7AAXX:7:1:924:1373 ACGAACTTTAAAGCACCTCTTGGCTCGTATGCCGTC I want my output calculate the total of nucleotide. So my output should look like this:... (2 Replies)
Discussion started by: patrick chia
2 Replies

5. Shell Programming and Scripting

remove duplicate words in a line

Hi, Please help! I have a file having duplicate words in some line and I want to remove the duplicate words. The order of the words in the output file doesn't matter. INPUT_FILE pink_kite red_pen ball pink_kite ball yellow_flower white no white no cloud nine_pen pink cloud pink nine_pen... (6 Replies)
Discussion started by: sam_2921
6 Replies

6. Shell Programming and Scripting

remove of duplicate line from a file

I have a file a.txt having content like deepak ram sham deepram sita kumar I Want to delete the first line containing "deep" ... I tried using... grep -i 'deep' a.txt It gives me 2 rows...I want to delete the first one.. + need to know the command to delete the line from... (5 Replies)
Discussion started by: saluja.deepak
5 Replies

7. Shell Programming and Scripting

Remove duplicate line on condition

Hi Ive been scratching over this for some time with no solution. I have a file like this 1 bla bla 1 2 bla bla 2 4 bla bla 3 5 bla bla 1 6 bla bla 1 I want to remove consecutive occurrences of lines like bla bla 1, but the first column may be different. Any ideasss?? (23 Replies)
Discussion started by: jamie_123
23 Replies

8. UNIX for Dummies Questions & Answers

Remove Duplicate Two Line Pairs?

So I have a bunch of files that look like this >gi|33332323 MMKCRGVIMVVEKVMKRDGRIVPFDESRIRWAVQ--- >gi|45235353 MMKCR----VEKMRDVFFDESIRWAVQ They go on...sequences are much longer but all in two line (fasta) format. I want to remove duplicate pairs of ID(GI) number and sequence. I tried... (12 Replies)
Discussion started by: bakere19
12 Replies

9. Shell Programming and Scripting

Remove duplicate entries from the same line

Hello, I have a file which have several duplicate entries on the same line: File ID source 1 GM GF GM 2 GM GF GM GF GM GF GM GF GM GF 3 GM GF GM SF GM GF GM SF 4 FF FF FF FF 5 FF GM FF ... (2 Replies)
Discussion started by: nans
2 Replies

10. Shell Programming and Scripting

Remove duplicate line starting with a pattern

HI, I have the below input file /* ----------------- cmdsDlyStartFWJ -----------------*/ UNIX_JOB CMDS065J RUN ANY CMDNAME sleep 5 AGENT CMDSHP USER proddata RUN MON,TUE,WED,THU,FRI DELAYSUB 02:00 /* "Triggers daily file watcher jobs" */ ENVAR... (5 Replies)
Discussion started by: varun22486
5 Replies
ORIGINATOR(1gmt)					       Generic Mapping Tools						  ORIGINATOR(1gmt)

NAME
originator - Associate seamounts with hotspot point sources SYNOPSIS
originator [infile(s)] -Estage_file -Fhs_file [ -C ] [ -Dd_km ] [ -H[i][nrec] ] [ -L[flag] ] [ -Nupper_age ] [ -Qr/t ] [ -S[n_hs] ] [ -T ] [ -V ] -Wmaxdist ] [ -Z ] [ -:[i|o] ] [ -bi[s|S|d|D[ncol]|c[var1/...]] ] DESCRIPTION
originator reads (longitude, latitude, height, radius, crustal_age) records from infiles [or standard input] and uses the given Absolute Plate Motion (APM) stage poles and the list of hotspot locations to determine the most likely origin (hotspot) for each seamount. It does so by calculating flowlines back in time and determining the closest approach to all hotspots. The output consists of the input records with four additional fields added for each of the n_hs closest hotspots. The four fields are the hotspot id (e.g., HWI), the stage id of the flowline segment that came closest, the pseudo-age of the seamount, and the closest distance to the hotspot (in km). See option -: on how to read (latitude, longitude,height, radius, crustal_age) files. No space between the option flag and the associated arguments. Use upper case for the option flags and lower case for modifiers. infile(s) Seamount data file(s) to be analyzed. If not given, standard input is read. -E Give file with rotation parameters. This file must contain one record for each rotation; each record must be of the following for- mat: lon lat tstart [tstop] angle [ khat a b c d e f g df ] where tstart and tstop are in Myr and lon lat angle are in degrees. tstart and tstop are the ages of the old and young ends of a stage. If -C is set then a total reconstruction rotation is expected and tstop is implicitly set to 0 and should not be specified in the file. If a covariance matrix C for the rotation is available it must be specified in a format using the nine optional terms listed in brackets. Here, C = (g/khat)*[ a b d; b c e; d e f ] which shows C made up of three row vectors. If the degrees of free- dom (df) in fitting the rotation is 0 or not given it is set to 10000. Blank lines and records whose first column contains # will be ignored. -F Give file with hotspot locations. This file must contain one record for each hotspot to be considered; each record must be of the following format: lon lat hs_abbrev hs_id r t_off t_on create fit plot name E.g., for Hawaii this may look like 205 20 HWI 1 25 0 90 Y Y Y Hawaii Most applications only need the first 4 columns which thus represents the minimal hotspot information record type. The abbreviation may be maximum 3 characters long. The id must be an integer from 1-32. The positional uncertainty of the hotspot is given by r (in km). The t_off and t_on variables are used to indicate the active time-span of the hotspot. The create, fit, and plot indicators are either Y or N and are used by some programs to indicate if the hotspot is included in the ID-grids used to determine rotations, if the hotspot chain will be used to determine rotations, and if the hotspot should be included in various plots. The name is a 32-character maximum text string with the full hotspot name. Blank lines and records whose first column contains # will be ignored. OPTIONS
-C Expect Total Reconstruction Rotations rather than Forward Stage Rotations [Default]. File format is similar to the stage pole for- mat except that the tstart column is not present (assumed to be 0 Ma). -D Sets the flowline sampling interval in km. [Default is 5]. -H Input file(s) has header record(s). If used, the default number of header records is N_HEADER_RECS. Use -Hi if only input data should have header records [Default will write out header records if the input data have them]. Blank lines and lines starting with # are always skipped. -L Output closest approach for nearest hotspot only (ignores -S). Choose -Lt for (time, dist, z) [Default], -Lw for (omega, dist, z), and -Ll for (lon, lat, time, dist, z). Normally, dist is in km; use upper case modifiers TWL to get dist in spherical degrees. -N Set the maximum age to extend the oldest stage back in time [no extension]. -Q INput files only has (x,y,z); specify constant values for r,t that will be implied for each record. -S Set the number of closest hotspots to report [Default is 1]. -T Truncate seamount ages exceeding the upper age set with -N [no truncation]. -V Selects verbose mode, which will send progress reports to stderr [Default runs "silently"]. -W Only report those seamounts whose flowlines came within maxdist to any hotspot [Default reports all seamounts]. -Z Use the hotspot ID number rather than the name tag in output records. -: Toggles between (longitude,latitude) and (latitude,longitude) input and/or output. [Default is (longitude,latitude)]. Append i to select input only or o to select output only. [Default affects both]. -bi Selects binary input. Append s for single precision [Default is d (double)]. Uppercase S or D will force byte-swapping. Option- ally, append ncol, the number of columns in your binary input file if it exceeds the columns needed by the program. Or append c if the input file is netCDF. Optionally, append var1/var2/... to specify the variables to be read. [Default is 5 input columns]. EXAMPLES
To find the likely (hotspot) origins of the seamounts represented by the (x,y,z,r,tc) points in the file seamounts.d, using the DC85.d Euler poles and the pac_hs.d list of possible hotspots, and report the 2 most likely hotspot candidates for each seamount, run originator seamounts.d -S2 -EDC85.d -Fpac_hs.d > origins.d COORDINATES
Data coordinates are assumed to be geodetic and will automatically be converted to geocentric before spherical rotations are performed. We convert back to geodetic coordinates for output. Note: If your data already are geocentric, you can avoid the conversion by using --ELLIP- SOID=sphere. SEE ALSO
GMT(1), project(1), grdrotater(1), grdspotter(1), mapproject(1), backtracker(1), hotspotter(1) REFERENCES
Wessel, P., 1999, "Hotspotting" tools released, EOS Trans. AGU, 80(29), p. 319. GMT 4.5.7 15 Jul 2011 ORIGINATOR(1gmt)
All times are GMT -4. The time now is 09:37 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy