Sponsored Content
Top Forums Shell Programming and Scripting delete duplicated characters in each line Post 302382024 by ivpz on Monday 21st of December 2009 09:33:57 PM
Old 12-21-2009
delete duplicated characters in each line

I'm a biologist trying to analyse some data and I'll appreciate some help with the following problem. I have a column of characters which I'll like to delete the duplicated characters in each line and report only the unique one.No sorting should be done. E.g.

The original data:

GTG
CTC
CTC
CTC
GCGAGC
GCGAGC
GCGAGC
GCGAGC
GATGTG
GATGTG
GATGTG
GATGTG
A
A
C

And I'm hoping to get:

G/T/
C/T/
C/T/
C/T/
G/C/A/
G/C/A/
G/C/A/
G/C/A/
G/A/T/
G/A/T/
G/A/T/
G/A/T/
A/
A/
C/

I've tried using tr, awk with if conditions but getting nowhere.

Thank you.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

grep and delete 2nd duplicated of txt... -part2

Hi, I find out one problem is...the main point is we must delete 2nd duplicated of word in txt file. For example apple orange pink green orange yellow orange red output should be: apple orange pink green yellow orange (16 Replies)
Discussion started by: happyv
16 Replies

2. Shell Programming and Scripting

Delete new line characters from a file

Hi, I have a file with about 25 colums separated with '~', but few of the lines have extra tabs ('^') and new line characters ('$'). Is there a way I can delete those characters if they are anywhere before the 25th column in a line? example: CLUB000650;12345678;0087788667;NOOP MEMBER ... (4 Replies)
Discussion started by: rudoraj
4 Replies

3. Shell Programming and Scripting

Delete characters from each line

Hi, I have a file that has data in the following manner, tt_0.00001.dat 123.000 tt_0.00002.dat 124.000 tt_0.00002.dat 125.000 This is consistent for all the entries in the file. I want to delete the 'tt_' and '.dat' from each line. Could anyone please guide me how to do this using awk or... (2 Replies)
Discussion started by: lost.identity
2 Replies

4. Shell Programming and Scripting

delete first 2 characters for each line, please help

hi, ./R1_970330_210505.sard ./R1_970403_223412.sard ./R1_970626_115235.sard ./R1_970626_214344.sard ./R1_970716_234214.sard ... ... ... for these strings, i wanna remove the ./ for each line how can i do that? i know it could possibly be done by sed, but i really have not idea how... (4 Replies)
Discussion started by: sunnydanniel
4 Replies

5. Shell Programming and Scripting

Delete characters from each line until meet character ":"

Hello, I have file that looks like this : 765327564:line1 94:line2 7865:line3 ..... 765322:linen I want to cut all the digits from the beginning of each line up to ":" character and to have everything like this : line1 line2 line3 ..... linen P.S : content of line1 ...... (8 Replies)
Discussion started by: black_fender
8 Replies

6. Shell Programming and Scripting

How delete characters of specific line with sed?

Hi, I have a text file with some lines like this: /MEDIA/DISK1/23568742.MOV /MEDIA/DISK1/87456321.AVI /MEDIA/DISK2/PART1/45753131.AVI /IMPORT/44452.WAV ... I want to remove the last 12 characters in each line that it ends "AVI". Should look like this: /MEDIA/DISK1/23568742.MOV... (12 Replies)
Discussion started by: inaki
12 Replies

7. UNIX for Dummies Questions & Answers

How to delete text between two characters in line?

Hi, I have a large text file with the following format: >gi|347545744|gb|JN204951.1| Dismorphia spio voucher 5 ATCAAATTCCTTCCTCTCCTTAAA >gi|17544664774|gb|WN204922.32| Rodapara nigens gene region CCGGGCAAATTCCTTCCTCTCCTTAAA >gi|555466400|gb|SG255122.8| Bombyx mandariana genbank 3... (1 Reply)
Discussion started by: euspilapteryx
1 Replies

8. Shell Programming and Scripting

Delete duplicated fields in a line

Hi, I have files with this kind of format (separator is space): A1 B1 C1 D1 E1 F1 D1 C1 G1 H1 A2 B2 C2 D2 E2 F2 D2 C2 G2 H2 A3 B3 C3 D3 E3 F3 G3 D3 C3 H3 A4 B4 C4 D4 E4 F4 G4 D4 C4 H4 I want the output to be: A1 B1 E1 F1 G1 H1 A2 B2 E2 F2 G2 H2 A3 B3 E3 F3 G3 H3 A4 B4 E4 F4 G4... (12 Replies)
Discussion started by: Gr4wk
12 Replies

9. Shell Programming and Scripting

How to delete 'duplicated' column values and make a delimited file too?

Hi, I have the following output from an Oracle SQL statement and I want to remove duplicated column values. I know it is possible using Oracle analytical/statistical functions but unfortunately I don't know how to use any of those. So now, I've gone to PLAN B using awk/sed maybe or any... (5 Replies)
Discussion started by: newbie_01
5 Replies

10. Shell Programming and Scripting

Remove duplicated records and update last line record counts

Hi Gurus, I need to remove duplicate line in file and update TRAILER (last line) record count. the file is comma delimited, field 2 is key to identify duplicated record. I can use below command to remove duplicated. but don't know how to replace last line 2nd field to new count. awk -F","... (11 Replies)
Discussion started by: green_k
11 Replies
expand(1)							   User Commands							 expand(1)

NAME
expand, unexpand - expand TAB characters to SPACE characters, and vice versa SYNOPSIS
expand [-t tablist] [file]... expand [-tabstop] [-tab1, tab2,. . ., tabn] [file]... unexpand [-a] [-t tablist] [file]... DESCRIPTION
The expand utility copies files (or the standard input) to the standard output, with TAB characters expanded to SPACE characters. BACKSPACE characters are preserved into the output and decrement the column count for TAB calculations. expand is useful for pre-processing character files (before sorting, looking at specific columns, and so forth) that contain TAB characters. unexpand copies files (or the standard input) to the standard output, putting TAB characters back into the data. By default, only leading SPACE and TAB characters are converted to strings of tabs, but this can be overridden by the -a option (see the OPTIONS section below). OPTIONS
The following options are supported for expand: -t tablist Specifies the tab stops. The argument tablist must consist of a single positive decimal integer or multiple posi- tive decimal integers, separated by blank characters or commas, in ascending order. If a single number is given, tabs will be set tablist column positions apart instead of the default 8. If multiple numbers are given, the tabs will be set at those specific column positions. Each tab-stop position N must be an integer value greater than zero, and the list must be in strictly ascending order. This is taken to mean that, from the start of a line of output, tabbing to position N causes the next character output to be in the (N+1)th column position on that line. In the event of expand having to process a tab character at a position beyond the last of those specified in a multiple tab-stop list, the tab character is replaced by a single space character in the output. -tabstop Specifies as a single argument, sets TAB characters tabstop SPACE characters apart instead of the default 8. -tab1,tab2,...,tabn Sets TAB characters at the columns specified by -tab1,tab2,...,tabn The following options are supported for unexpand: -a Inserts TAB characters when replacing a run of two or more SPACE characters would produce a smaller output file. -t tablist Specifies the tab stops. The option-argument tablist must be a single argument consisting of a single positive decimal inte- ger or multiple positive decimal integers, separated by blank characters or commas, in ascending order. If a single number is given, tabs will be set tablist column positions apart instead of the default 8. If multiple numbers are given, the tabs will be set at those specific column positions. Each tab-stop position N must be an integer value greater than zero, and the list must be in strictly ascending order. This is taken to mean that, from the start of a line of output, tabbing to posi- tion N will cause the next character output to be in the (N+1)th column position on that line. When the -t option is not specified, the default is the equivalent of specifying -t 8 (except for the interaction with -a, described below). No space-to-tab character conversions occur for characters at positions beyond the last of those specified in a multiple tab-stop list. When -t is specified, the presence or absence of the -a option is ignored; conversion will not be limited to the processing of leading blank characters. OPERANDS
The following ooperand is supported for expand and unexpand: file The path name of a text file to be used as input. ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of expand and unexpand: LANG, LC_ALL, LC_CTYPE, LC_MESSAGES, and NLSPATH. EXIT STATUS
The following exit values are returned: 0 Successful completion >0 An error occurred. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWesu | +-----------------------------+-----------------------------+ |CSI |enabled | +-----------------------------+-----------------------------+ |Interface Stability |Standard | +-----------------------------+-----------------------------+ SEE ALSO
tabs(1), attributes(5), environ(5), standards(5) SunOS 5.11 1 Feb 1995 expand(1)
All times are GMT -4. The time now is 11:21 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy