Getting Unique values in a file Post: 302246870

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Unique values from a Terabyte File

Hi, I have been dealing with a files only a few gigs until now and was able to get out by using the sort utility. But now, I have a terabyte file which I want to filter out unique values from. I have a server having 8 processor and 16GB RAM with a 5 TB hdd. Is it worthwhile trying to use...

2. UNIX Desktop Questions & Answers

Fetching unique values from file

After giving grep -A4 "feature 1," <file name> I have extracted the following text feature 1, subfeat 2, type 1, subtype 5, dump '30352f30312f323030392031313a33303a3337'H -- "05/01/2009 11:30:37" -- -- ...

3. UNIX for Dummies Questions & Answers

Extract Unique Values from file

Hello all, I have a file with following sample data 2009-08-26 05:32:01.65 spid5 Process ID 86:214 owns resources that are blocking processes on Scheduler 0. 2009-08-26 05:32:01.65 spid5 Process ID 86:214 owns resources that are blocking processes on Scheduler 0. 2009-08-26...

4. Shell Programming and Scripting

How to count Unique Values from a file.

Hi I have the following info in a file - <Cell id="25D"/> <Cell id="26A"/> <Cell id="26B"/> <Cell id="26C"/> <Cell id="27A"/> <Cell id="27B"/> <Cell id="27C"/> <Cell id="28A"/> I would like to know how would you go about counting all...

5. Shell Programming and Scripting

List unique values and count instances in .csv file

I need to take the second column of a .csv file and count the number of instances of each unique value in that same second column. I'd like the output to be value,count sorted by most instances. Thanks for any guidance! Data example: 317476,317756,0 816063,318861,0 313123,319091,0...

6. Shell Programming and Scripting

Find and count unique date values in a file based on position

Hello, I need some sort of way to extract every date contained in a file, and count how many of those dates there are. Here are the specifics: The date format I'm looking for is mm/dd/yyyy I only need to look after line 45 in the file (that's where the data begins) The columns of...

7. Linux

To get all the columns in a CSV file based on unique values of particular column

cat sample.csv ID,Name,no 1,AAA,1 2,BBB,1 3,AAA,1 4,BBB,1 cut -d',' -f2 sample.csv | sort | uniq this gives only the 2nd column values Name AAA BBB How to I get all the columns of CSV along with this?

8. Shell Programming and Scripting

Extracting unique values of a column from a feed file

Hi Folks, I have the below feed file named abc1.txt in which you can see there is a title and below is the respective values in the rows and it is completely pipe delimited file ,. ...

9. Shell Programming and Scripting

Using grep and a parameter file to return unique values

Hello Everyone! I have updated the first post so that my intentions are easier to understand, and also attached sample files (post #18). I have over 500 text files in a directory. Over 1 GB of data. The data in those files is organised in lines: My intention is to return one line per...

10. Shell Programming and Scripting

How to identify varying unique fields values from a text file in UNIX?

Hi, I have a huge unsorted text file. We wanted to identify the unique field values in a line and consider those fields as a primary key for a table in upstream system. Basically, the process or script should fetch the values from each line that are unique compared to the rest of the lines in...

LEARN ABOUT DEBIAN

sort::fields

Fields(3pm)						User Contributed Perl Documentation					       Fields(3pm)

NAME

       Sort::Fields - Sort lines containing delimited fields

SYNOPSIS

	 use Sort::Fields;
	 @sorted = fieldsort [3, '2n'], @lines;
	 @sorted = fieldsort '+', [-1, -3, 0], @lines;

	 $sort_3_2n = make_fieldsort [3, '2n'], @lines;
	 @sorted = $sort_3_2n->(@lines);

DESCRIPTION

       Sort::Fields provides a general purpose technique for efficiently sorting lists of lines that contain data separated into fields.

       Sort::Fields automatically imports two subroutines, "fieldsort" and "make_fieldsort", and two variants, "stable_fieldsort" and "make_sta-
       ble_fieldsort".	"make_fieldsort" generates a sorting subroutine and returns a reference to it.	"fieldsort" is a wrapper for the
       "make_fieldsort" subroutine.

       The first argument to make_fieldsort is a delimiter string, which is used as a regular expression argument for a "split" operator.  The
       delimiter string is optional.  If it is not supplied, make_fieldsort splits each line using "/s+/".

       The second argument is an array reference containing one or more field specifiers.  The specifiers indicate what fields in the strings will
       be used to sort the data.  The specifier "1" indicates the first field, "2" indicates the second, and so on.  A negative specifier like
       "-2" means to sort on the second field in reverse (descending) order.  To indicate a numeric rather than alphabetic comparison, append "n"
       to the specifier.  A specifier of "0" means the entire string ("-0" means the entire string, in reverse order).

       The order in which the specifiers appear is the order in which they will be used to sort the data.  The primary key is first, the secondary
       key is second, and so on.

       "fieldsort [1, 2], @data" is roughly equivalent to "make_fieldsort([1, 2])->(@data)".  Avoid calling fieldsort repeatedly with the same
       sort specifiers.  If you need to use a particular sort more than once, it is more efficient to call "make_fieldsort" once and reuse the
       subroutine it returns.

       "stable_fieldsort" and "make_stable_fieldsort" are like their "unstable" counterparts, except that the items that compare the same are
       maintained in their original order.

EXAMPLES

       Some sample data (in array @data):

	 123   asd   1.22   asdd
	 32    ewq   2.32   asdd
	 43    rewq  2.12   ewet
	 51    erwt  34.2   ewet
	 23    erww  4.21   ewet
	 91    fdgs  3.43   ewet
	 123   refs  3.22   asdd
	 123   refs  4.32   asdd

	 # alpha sort on column 1
	 print fieldsort [1], @data;

	 123   asd   1.22   asdd
	 123   refs  3.22   asdd
	 123   refs  4.32   asdd
	 23    erww  4.21   ewet
	 32    ewq   2.32   asdd
	 43    rewq  2.12   ewet
	 51    erwt  34.2   ewet
	 91    fdgs  3.43   ewet

	 # numeric sort on column 1
	 print fieldsort ['1n'], @data;

	 23    erww  4.21   ewet
	 32    ewq   2.32   asdd
	 43    rewq  2.12   ewet
	 51    erwt  34.2   ewet
	 91    fdgs  3.43   ewet
	 123   asd   1.22   asdd
	 123   refs  3.22   asdd
	 123   refs  4.32   asdd

	 # reverse numeric sort on column 1
	 print fieldsort ['-1n'], @data;

	 123   asd   1.22   asdd
	 123   refs  3.22   asdd
	 123   refs  4.32   asdd
	 91    fdgs  3.43   ewet
	 51    erwt  34.2   ewet
	 43    rewq  2.12   ewet
	 32    ewq   2.32   asdd
	 23    erww  4.21   ewet

	 # alpha sort on column 2, then alpha on entire line
	 print fieldsort [2, 0], @data;

	 123   asd   1.22   asdd
	 51    erwt  34.2   ewet
	 23    erww  4.21   ewet
	 32    ewq   2.32   asdd
	 91    fdgs  3.43   ewet
	 123   refs  3.22   asdd
	 123   refs  4.32   asdd
	 43    rewq  2.12   ewet

	 # alpha sort on column 4, then numeric on column 1, then reverse
	 # numeric on column 3
	 print fieldsort [4, '1n', '-3n'], @data;

	 32    ewq   2.32   asdd
	 123   refs  4.32   asdd
	 123   refs  3.22   asdd
	 123   asd   1.22   asdd
	 23    erww  4.21   ewet
	 43    rewq  2.12   ewet
	 51    erwt  34.2   ewet
	 91    fdgs  3.43   ewet

	 # now, splitting on either literal period or whitespace
	 # sort numeric on column 4 (fractional part of decimals) then
	 # numeric on column 3 (whole part of decimals)
	 print fieldsort '(?:.|s+)', ['4n', '3n'], @data;

	 51    erwt  34.2   ewet
	 43    rewq  2.12   ewet
	 23    erww  4.21   ewet
	 123   asd   1.22   asdd
	 123   refs  3.22   asdd
	 32    ewq   2.32   asdd
	 123   refs  4.32   asdd
	 91    fdgs  3.43   ewet

	 # alpha sort on column 4, then numeric on the entire line
	 # NOTE: produces warnings under -w
	 print fieldsort [4, '0n'], @data;

	 32    ewq   2.32   asdd
	 123   asd   1.22   asdd
	 123   refs  3.22   asdd
	 123   refs  4.32   asdd
	 23    erww  4.21   ewet
	 43    rewq  2.12   ewet
	 51    erwt  34.2   ewet
	 91    fdgs  3.43   ewet

	 # stable alpha sort on column 4 (maintains original relative order
	 # among items that compare the same)
	 print stable_fieldsort [4], @data;

	 123   asd   1.22   asdd
	 32    ewq   2.32   asdd
	 123   refs  3.22   asdd
	 123   refs  4.32   asdd
	 43    rewq  2.12   ewet
	 51    erwt  34.2   ewet
	 23    erww  4.21   ewet
	 91    fdgs  3.43   ewet

BUGS

       Some rudimentary tests now.

       Perhaps something should be done to catch things like:

	 fieldsort '.', [1, 2], @lines;

       '.' translates to "split /./" -- probably not what you want.

       Passing blank lines and/or lines containing the wrong kind of data (alphas instead of numbers) can result in copious warning messages under
       "-w".

       If the regexp contains memory parentheses ("(...)" rather than "(?:...)"), split will function in "delimiter retention" mode, capturing the
       contents of the parentheses as well as the stuff between the delimiters.  I could imagine how this could be useful, but on the other hand I
       could also imagine how it could be confusing if encountered unexpectedly.  Caveat sortor.

       Not really a bug, but if you are planning to sort a large text file, consider using sort(1).  Unless, of course, your operating system
       doesn't have sort(1).

AUTHOR

       Joseph N. Hall, joseph@5sigma.com

SEE ALSO

       perl(1).

perl v5.8.8							    2008-03-25							       Fields(3pm)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Unique values from a Terabyte File

Discussion started by: Legend986

2. UNIX Desktop Questions & Answers

Fetching unique values from file

Discussion started by: shivi707

3. UNIX for Dummies Questions & Answers

Extract Unique Values from file

Discussion started by: simonsimon

4. Shell Programming and Scripting

How to count Unique Values from a file.

Discussion started by: Prega