05-06-2009
try
awk -F, 'arr[$1]++==0{print $0} ' file
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
I have a file:
Fred
Fred
Fred
Jim
Fred
Jim
Jim
If sort is executed on the listed file, shouldn't the output be?:
Fred
Fred
Fred
Fred
Jim
Jim
Jim (3 Replies)
Discussion started by: jimmyflip
3 Replies
2. UNIX for Dummies Questions & Answers
Using the last, uniq, sort and cut commands, determine how many times the different users have logged in.
I know how to use the last command and cut command...
i came up with last | cut -f1 -d" " | uniq
i dont know if this is right, can someone please help me... thanks (1 Reply)
Discussion started by: jay1228
1 Replies
3. Shell Programming and Scripting
Does anyone have a quick and dirty way of performing a sort and uniq in perl?
How an array with data like:
this is bkupArr BOLADVICE_VN
this is bkupArr MLT6800PROD2A
this is bkupArr MLT6800PROD2A
this is bkupArr BOLADVICE_VN_7YR
this is bkupArr MLT6800PROD2A
I want to sort it... (4 Replies)
Discussion started by: reggiej
4 Replies
4. Shell Programming and Scripting
Input File is :
-------------
25060008,0040,03,
25136437,0030,03,
25069457,0040,02,
80303438,0014,03,1st
80321837,0009,03,1st
80321977,0009,03,1st
80341345,0007,03,1st
84176527,0047,03,1st
84176527,0047,03,
20000735,0018,03,1st
25060008,0040,03,
I am using the following in the script... (5 Replies)
Discussion started by: Amruta Pitkar
5 Replies
5. Shell Programming and Scripting
Hello,
I have a large data file:
1234 8888 bbb
2745 8888 bbb
9489 8888 bbb
1234 8888 aaa
4838 8888 aaa
3977 8888 aaa
I need to remove duplicate lines (where the first column is the duplicate). I have been using:
sort file.txt | uniq -w4 > newfile.txt
However, it seems to keep the... (11 Replies)
Discussion started by: palex
11 Replies
6. Shell Programming and Scripting
Hi All,
I have a text file with the format shown below. Some of the records are duplicated with the only exception being date (Field 15). I want to compare all duplicate records using subscriber number (field 7) and keep only those records with greater date.
... (1 Reply)
Discussion started by: nua7
1 Replies
7. Shell Programming and Scripting
I have a flatfile A.txt
2012/12/04 14:06:07 |trees|Boards 2, 3|denver|mekong|mekong12
2012/12/04 17:07:22 |trees|Boards 2, 3|denver|mekong|mekong12
2012/12/04 17:13:27 |trees|Boards 2, 3|denver|mekong|mekong12
2012/12/04 14:07:39 |rain|Boards 1|tampa|merced|merced11
How do i sort and get... (3 Replies)
Discussion started by: sabercats
3 Replies
8. Shell Programming and Scripting
Hi again,
I have files with the following contents
datetime,ip1,port1,ip2,port2,number
How would I find out how many times ip1 field shows up a particular file? Then how would I find out how many time ip1 and port 2 shows up?
Please mind the file may contain 100k lines. (8 Replies)
Discussion started by: LDHB2012
8 Replies
9. UNIX for Dummies Questions & Answers
Hello all,
Need to pick your brains,
I have a 10Gb file where each row is a name, I am expecting about 50 names in total. So there are a lot of repetitions in clusters.
So I want to do a
sort -u file
Will it be considerably faster or slower to use a uniq before piping it to sort... (3 Replies)
Discussion started by: senhia83
3 Replies
10. Shell Programming and Scripting
Hi All,
Below the actual file which i like to sort and Uniq -u
/opt/oracle/work/Antony/Shell_Script> cat emp.1st
2233|a.k. shukula |g.m. |sales |12/12/52 |6000
1006|chanchal singhvi |director |sales |03/09/38 |6700... (8 Replies)
Discussion started by: Antony Ankrose
8 Replies
MMSEG(1) User Contributed Perl Documentation MMSEG(1)
NAME
mmseg - maximum matching segment Chinese text.
SYNOPSIS
mmseg -d dict_file [option]... [corpus_file]...
DESCRIPTION
mmseg is a tool for segmenting Chinese text into words using maximum matching algorithm. mmseg segments corpus_file, or standard input if
no filename is specified, and write the segmented result to standard output.
OPTIONS
-d dict_file
Use dict_file as lexicon. A default lexicon can be found at /usr/share/sunpinyin-slm/dict.utf8.
-f,--format (text|bin)
Output Format, can be 'text' or 'bin'. default 'bin'. Normally, in text mode, word text are output, while in binary mode, binary short
integer of the word-ids are written to stdout.
-s, --stok STOK_ID
Sentence token id. Default 10. It will be written to output in binary mode after every sentence.
-i, --show-id
Show Id info. Under text output format mode, attach id after known words. If under binary mode, print id(s) in text.
-a, --ambiguious-id AMBI-ID
Ambiguious means ABC => A BC or AB C. If specified (AMBI-ID != 0), The sequence ABC will not be segmented, in binary mode, the AMBI-ID
is written out; in text mode, "<ambi>ABC</ambi>" will be output. Default is 0.
NOTES
Under binary mode, consecutive id of 0 are merged into one 0. Under text mode, no space are inserted between unknown-words.
AUTHOR
Originally written by Phill.Zhang <phill.zhang@sun.com>. Currently maintained by Kov.Chai <tchaikov@gmail.com>.
SEE ALSO
slmseg(1), ids2ngram (1).
perl v5.14.2 2012-06-09 MMSEG(1)