Sponsored Content
Top Forums Shell Programming and Scripting delete duplicated characters in each line Post 302382024 by ivpz on Monday 21st of December 2009 09:33:57 PM
Old 12-21-2009
delete duplicated characters in each line

I'm a biologist trying to analyse some data and I'll appreciate some help with the following problem. I have a column of characters which I'll like to delete the duplicated characters in each line and report only the unique one.No sorting should be done. E.g.

The original data:

GTG
CTC
CTC
CTC
GCGAGC
GCGAGC
GCGAGC
GCGAGC
GATGTG
GATGTG
GATGTG
GATGTG
A
A
C

And I'm hoping to get:

G/T/
C/T/
C/T/
C/T/
G/C/A/
G/C/A/
G/C/A/
G/C/A/
G/A/T/
G/A/T/
G/A/T/
G/A/T/
A/
A/
C/

I've tried using tr, awk with if conditions but getting nowhere.

Thank you.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

grep and delete 2nd duplicated of txt... -part2

Hi, I find out one problem is...the main point is we must delete 2nd duplicated of word in txt file. For example apple orange pink green orange yellow orange red output should be: apple orange pink green yellow orange (16 Replies)
Discussion started by: happyv
16 Replies

2. Shell Programming and Scripting

Delete new line characters from a file

Hi, I have a file with about 25 colums separated with '~', but few of the lines have extra tabs ('^') and new line characters ('$'). Is there a way I can delete those characters if they are anywhere before the 25th column in a line? example: CLUB000650;12345678;0087788667;NOOP MEMBER ... (4 Replies)
Discussion started by: rudoraj
4 Replies

3. Shell Programming and Scripting

Delete characters from each line

Hi, I have a file that has data in the following manner, tt_0.00001.dat 123.000 tt_0.00002.dat 124.000 tt_0.00002.dat 125.000 This is consistent for all the entries in the file. I want to delete the 'tt_' and '.dat' from each line. Could anyone please guide me how to do this using awk or... (2 Replies)
Discussion started by: lost.identity
2 Replies

4. Shell Programming and Scripting

delete first 2 characters for each line, please help

hi, ./R1_970330_210505.sard ./R1_970403_223412.sard ./R1_970626_115235.sard ./R1_970626_214344.sard ./R1_970716_234214.sard ... ... ... for these strings, i wanna remove the ./ for each line how can i do that? i know it could possibly be done by sed, but i really have not idea how... (4 Replies)
Discussion started by: sunnydanniel
4 Replies

5. Shell Programming and Scripting

Delete characters from each line until meet character ":"

Hello, I have file that looks like this : 765327564:line1 94:line2 7865:line3 ..... 765322:linen I want to cut all the digits from the beginning of each line up to ":" character and to have everything like this : line1 line2 line3 ..... linen P.S : content of line1 ...... (8 Replies)
Discussion started by: black_fender
8 Replies

6. Shell Programming and Scripting

How delete characters of specific line with sed?

Hi, I have a text file with some lines like this: /MEDIA/DISK1/23568742.MOV /MEDIA/DISK1/87456321.AVI /MEDIA/DISK2/PART1/45753131.AVI /IMPORT/44452.WAV ... I want to remove the last 12 characters in each line that it ends "AVI". Should look like this: /MEDIA/DISK1/23568742.MOV... (12 Replies)
Discussion started by: inaki
12 Replies

7. UNIX for Dummies Questions & Answers

How to delete text between two characters in line?

Hi, I have a large text file with the following format: >gi|347545744|gb|JN204951.1| Dismorphia spio voucher 5 ATCAAATTCCTTCCTCTCCTTAAA >gi|17544664774|gb|WN204922.32| Rodapara nigens gene region CCGGGCAAATTCCTTCCTCTCCTTAAA >gi|555466400|gb|SG255122.8| Bombyx mandariana genbank 3... (1 Reply)
Discussion started by: euspilapteryx
1 Replies

8. Shell Programming and Scripting

Delete duplicated fields in a line

Hi, I have files with this kind of format (separator is space): A1 B1 C1 D1 E1 F1 D1 C1 G1 H1 A2 B2 C2 D2 E2 F2 D2 C2 G2 H2 A3 B3 C3 D3 E3 F3 G3 D3 C3 H3 A4 B4 C4 D4 E4 F4 G4 D4 C4 H4 I want the output to be: A1 B1 E1 F1 G1 H1 A2 B2 E2 F2 G2 H2 A3 B3 E3 F3 G3 H3 A4 B4 E4 F4 G4... (12 Replies)
Discussion started by: Gr4wk
12 Replies

9. Shell Programming and Scripting

How to delete 'duplicated' column values and make a delimited file too?

Hi, I have the following output from an Oracle SQL statement and I want to remove duplicated column values. I know it is possible using Oracle analytical/statistical functions but unfortunately I don't know how to use any of those. So now, I've gone to PLAN B using awk/sed maybe or any... (5 Replies)
Discussion started by: newbie_01
5 Replies

10. Shell Programming and Scripting

Remove duplicated records and update last line record counts

Hi Gurus, I need to remove duplicate line in file and update TRAILER (last line) record count. the file is comma delimited, field 2 is key to identify duplicated record. I can use below command to remove duplicated. but don't know how to replace last line 2nd field to new count. awk -F","... (11 Replies)
Discussion started by: green_k
11 Replies
UTF8_ENCODE(3)								 1							    UTF8_ENCODE(3)

utf8_encode - Encodes an ISO-8859-1 string to UTF-8

SYNOPSIS
string utf8_encode (string $data) DESCRIPTION
This function encodes the string $data to UTF-8, and returns the encoded version. UTF-8 is a standard mechanism used by Unicode for encoding wide character values into a byte stream. UTF-8 is transparent to plain ASCII characters, is self-synchronized (meaning it is possible for a program to figure out where in the bytestream characters start) and can be used with normal string comparison functions for sorting and such. PHP encodes UTF-8 characters in up to four bytes, like this: UTF-8 encoding +------+-------------------------------------+---+ |bytes | | | | | | | | | bits | | | | | | | | representation | | | | | | +------+-------------------------------------+---+ | 1 | | | | | | | | | 7 | | | | | | | | 0bbbbbbb | | | | | | | 2 | | | | | | | | | 11 | | | | | | | | 110bbbbb 10bbbbbb | | | | | | | 3 | | | | | | | | | 16 | | | | | | | | 1110bbbb 10bbbbbb 10bbbbbb | | | | | | | 4 | | | | | | | | | 21 | | | | | | | | 11110bbb 10bbbbbb 10bbbbbb 10bbbbbb | | | | | | +------+-------------------------------------+---+ Each b represents a bit that can be used to store character data. PARAMETERS
o $data - An ISO-8859-1 string. RETURN VALUES
Returns the UTF-8 translation of $data. SEE ALSO
utf8_decode(3). PHP Documentation Group UTF8_ENCODE(3)
All times are GMT -4. The time now is 12:45 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy