12-21-2009
delete duplicated characters in each line
I'm a biologist trying to analyse some data and I'll appreciate some help with the following problem. I have a column of characters which I'll like to delete the duplicated characters in each line and report only the unique one.No sorting should be done. E.g.
The original data:
GTG
CTC
CTC
CTC
GCGAGC
GCGAGC
GCGAGC
GCGAGC
GATGTG
GATGTG
GATGTG
GATGTG
A
A
C
And I'm hoping to get:
G/T/
C/T/
C/T/
C/T/
G/C/A/
G/C/A/
G/C/A/
G/C/A/
G/A/T/
G/A/T/
G/A/T/
G/A/T/
A/
A/
C/
I've tried using tr, awk with if conditions but getting nowhere.
Thank you.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi,
I find out one problem is...the main point is we must delete 2nd duplicated of word in txt file. For example
apple
orange
pink
green
orange
yellow
orange
red
output should be:
apple
orange
pink
green
yellow
orange (16 Replies)
Discussion started by: happyv
16 Replies
2. Shell Programming and Scripting
Hi,
I have a file with about 25 colums separated with '~', but few of the lines have extra tabs ('^') and new line characters ('$'). Is there a way I can delete those characters if they are anywhere before the 25th column in a line?
example:
CLUB000650;12345678;0087788667;NOOP MEMBER ... (4 Replies)
Discussion started by: rudoraj
4 Replies
3. Shell Programming and Scripting
Hi,
I have a file that has data in the following manner,
tt_0.00001.dat 123.000
tt_0.00002.dat 124.000
tt_0.00002.dat 125.000
This is consistent for all the entries in the file. I want to delete the 'tt_' and '.dat' from each line. Could anyone please guide me how to do this using awk or... (2 Replies)
Discussion started by: lost.identity
2 Replies
4. Shell Programming and Scripting
hi,
./R1_970330_210505.sard
./R1_970403_223412.sard
./R1_970626_115235.sard
./R1_970626_214344.sard
./R1_970716_234214.sard
...
...
...
for these strings, i wanna remove the ./ for each line
how can i do that?
i know it could possibly be done by sed, but i really have not idea how... (4 Replies)
Discussion started by: sunnydanniel
4 Replies
5. Shell Programming and Scripting
Hello,
I have file that looks like this :
765327564:line1
94:line2
7865:line3
.....
765322:linen
I want to cut all the digits from the beginning of each line up to ":" character and to have everything like this :
line1
line2
line3
.....
linen
P.S : content of line1 ...... (8 Replies)
Discussion started by: black_fender
8 Replies
6. Shell Programming and Scripting
Hi,
I have a text file with some lines like this:
/MEDIA/DISK1/23568742.MOV
/MEDIA/DISK1/87456321.AVI
/MEDIA/DISK2/PART1/45753131.AVI
/IMPORT/44452.WAV
...
I want to remove the last 12 characters in each line that it ends "AVI". Should look like this:
/MEDIA/DISK1/23568742.MOV... (12 Replies)
Discussion started by: inaki
12 Replies
7. UNIX for Dummies Questions & Answers
Hi,
I have a large text file with the following format:
>gi|347545744|gb|JN204951.1| Dismorphia spio voucher 5
ATCAAATTCCTTCCTCTCCTTAAA
>gi|17544664774|gb|WN204922.32| Rodapara nigens gene region
CCGGGCAAATTCCTTCCTCTCCTTAAA
>gi|555466400|gb|SG255122.8| Bombyx mandariana genbank 3... (1 Reply)
Discussion started by: euspilapteryx
1 Replies
8. Shell Programming and Scripting
Hi,
I have files with this kind of format (separator is space):
A1 B1 C1 D1 E1 F1 D1 C1 G1 H1
A2 B2 C2 D2 E2 F2 D2 C2 G2 H2
A3 B3 C3 D3 E3 F3 G3 D3 C3 H3
A4 B4 C4 D4 E4 F4 G4 D4 C4 H4
I want the output to be:
A1 B1 E1 F1 G1 H1
A2 B2 E2 F2 G2 H2
A3 B3 E3 F3 G3 H3
A4 B4 E4 F4 G4... (12 Replies)
Discussion started by: Gr4wk
12 Replies
9. Shell Programming and Scripting
Hi,
I have the following output from an Oracle SQL statement and I want to remove duplicated column values.
I know it is possible using Oracle analytical/statistical functions but unfortunately I don't know how to use any of those.
So now, I've gone to PLAN B using awk/sed maybe or any... (5 Replies)
Discussion started by: newbie_01
5 Replies
10. Shell Programming and Scripting
Hi Gurus,
I need to remove duplicate line in file and update TRAILER (last line) record count. the file is comma delimited, field 2 is key to identify duplicated record.
I can use below command to remove duplicated. but don't know how to replace last line 2nd field to new count.
awk -F","... (11 Replies)
Discussion started by: green_k
11 Replies
LEARN ABOUT PHP
utf8_encode
UTF8_ENCODE(3) 1 UTF8_ENCODE(3)
utf8_encode - Encodes an ISO-8859-1 string to UTF-8
SYNOPSIS
string utf8_encode (string $data)
DESCRIPTION
This function encodes the string $data to UTF-8, and returns the encoded version. UTF-8 is a standard mechanism used by Unicode for
encoding wide character values into a byte stream. UTF-8 is transparent to plain ASCII characters, is self-synchronized (meaning it is
possible for a program to figure out where in the bytestream characters start) and can be used with normal string comparison functions for
sorting and such. PHP encodes UTF-8 characters in up to four bytes, like this:
UTF-8 encoding
+------+-------------------------------------+---+
|bytes | | |
| | | |
| | bits | |
| | | |
| | representation | |
| | | |
+------+-------------------------------------+---+
| 1 | | |
| | | |
| | 7 | |
| | | |
| | 0bbbbbbb | |
| | | |
| 2 | | |
| | | |
| | 11 | |
| | | |
| | 110bbbbb 10bbbbbb | |
| | | |
| 3 | | |
| | | |
| | 16 | |
| | | |
| | 1110bbbb 10bbbbbb 10bbbbbb | |
| | | |
| 4 | | |
| | | |
| | 21 | |
| | | |
| | 11110bbb 10bbbbbb 10bbbbbb 10bbbbbb | |
| | | |
+------+-------------------------------------+---+
Each b represents a bit that can be used to store character data.
PARAMETERS
o $data
- An ISO-8859-1 string.
RETURN VALUES
Returns the UTF-8 translation of $data.
SEE ALSO
utf8_decode(3).
PHP Documentation Group UTF8_ENCODE(3)