Sponsored Content
Top Forums Shell Programming and Scripting return a list of unique values of a column from csv format file Post 302361527 by vidyadhar85 on Tuesday 13th of October 2009 11:46:23 AM
Old 10-13-2009
so your CSV file is sorted one??
if not uniq won't work on it.. please read the man page of uniq..
Quote:
The uniq command deletes repeated lines in a file. The uniq command reads either standard input or a file specified by the InFile parameter. The
command first compares adjacent lines and then removes the second and succeeding duplications of a line. Duplicated lines must be adjacent. (Before
issuing the uniq command, use the sort command to make all duplicate lines adjacent.
) Finally, the uniq command writes the resultant unique lines
either to standard output or to the file specified by the OutFile parameter. The InFile and OutFile parameters must specify different files.
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

print unique values of a column and sum up the corresponding values in next column

Hi All, I have a file which is having 3 columns as (string string integer) a b 1 x y 2 p k 5 y y 4 ..... ..... Question: I want get the unique value of column 2 in a sorted way(on column 2) and the sum of the 3rd column of the corresponding rows. e.g the above file should return the... (6 Replies)
Discussion started by: amigarus
6 Replies

2. Shell Programming and Scripting

AWK, Perl or Shell? Unique strings and their maximum values from 3 column data file

I have a file containing data like so: 2012-01-02 GREEN 4 2012-01-02 GREEN 6 2012-01-02 GREEN 7 2012-01-02 BLUE 4 2012-01-02 BLUE 3 2012-01-02 GREEN 4 2012-01-02 RED 4 2012-01-02 RED 8 2012-01-02 GREEN 4 2012-01-02 YELLOW 5 2012-01-02 YELLOW 2 I can't always predict what the... (4 Replies)
Discussion started by: rich@ardz
4 Replies

3. Shell Programming and Scripting

List unique values and count instances in .csv file

I need to take the second column of a .csv file and count the number of instances of each unique value in that same second column. I'd like the output to be value,count sorted by most instances. Thanks for any guidance! Data example: 317476,317756,0 816063,318861,0 313123,319091,0... (4 Replies)
Discussion started by: batcho
4 Replies

4. UNIX for Dummies Questions & Answers

Grep to find matching patern and return unique values

Request: grep to find given matching patern and return unique values, eliminate the duplicate values I have to retrieve the unique folder on the below file contents like; /app/oracle/build_lib/pkg320.0_20120927 /app/oracle/build_lib/pkg320.0_20121004_prof... (5 Replies)
Discussion started by: Siva SQL
5 Replies

5. Shell Programming and Scripting

Remove the values from a certain column without deleting the Column name in a .CSV file

(14 Replies)
Discussion started by: dhruuv369
14 Replies

6. Linux

To get all the columns in a CSV file based on unique values of particular column

cat sample.csv ID,Name,no 1,AAA,1 2,BBB,1 3,AAA,1 4,BBB,1 cut -d',' -f2 sample.csv | sort | uniq this gives only the 2nd column values Name AAA BBB How to I get all the columns of CSV along with this? (1 Reply)
Discussion started by: sanvel
1 Replies

7. Shell Programming and Scripting

Extracting unique values of a column from a feed file

Hi Folks, I have the below feed file named abc1.txt in which you can see there is a title and below is the respective values in the rows and it is completely pipe delimited file ,. ... (4 Replies)
Discussion started by: punpun66
4 Replies

8. Shell Programming and Scripting

Using grep and a parameter file to return unique values

Hello Everyone! I have updated the first post so that my intentions are easier to understand, and also attached sample files (post #18). I have over 500 text files in a directory. Over 1 GB of data. The data in those files is organised in lines: My intention is to return one line per... (23 Replies)
Discussion started by: clippertm
23 Replies

9. Shell Programming and Scripting

CSV File:Filter duplicate records from column1 & another column having unique record

Hi Experts, I have csv file with 30, 40 columns Pasting just 2 column for problem description. Need to print error if below combination is not present in file check for column-1 (DocumentNumber) and filter columns where value in DocumentNumber field is same. For all such rows, the field... (7 Replies)
Discussion started by: as7951
7 Replies
VCF-ANNOTATE(1) 						   User Commands						   VCF-ANNOTATE(1)

NAME
vcf-annotate - annotate VCF file, add filters or custom annotations SYNOPSIS
cat in.vcf | vcf-annotate [OPTIONS] > out.vcf DESCRIPTION
About: Annotates VCF file, adding filters or custom annotations. Requires tabix indexed file with annotations. Currently annotates only the INFO column, but it will be extended on demand. OPTIONS
-a, --annotations <file.gz> The tabix indexed file with the annotations: CHR FROM[ TO][ VALUE]+. -c, --columns <list> The list of columns in the annotation file, e.g. CHROM,FROM,TO,-,INFO/STR,INFO/GN. The dash in this example indicates that the third column should be ignored. If TO is not present, it is assumed that TO equals to FROM. -d, --description <file|string> Header annotation, e.g. key=INFO,ID=HM2,Number=0,Type=Flag,Description='HapMap2 membership'. The descriptions can be read from a file, one annotation per line. -f, --filter <list> Apply filters, list is in the format flt1=value/flt2/flt3=value/etc. -h, -?, --help This help message. Filters: + Apply all filters with default values (can be overridden, see the example below). -X Exclude the filter X 1, StrandBias FLOAT Min P-value for strand bias (given PV4) [0.0001] 2, BaseQualBias FLOAT Min P-value for baseQ bias [1e-100] 3, MapQualBias FLOAT Min P-value for mapQ bias [0] 4, EndDistBias FLOAT Min P-value for end distance bias [0.0001] a, MinAB INT Minimum number of alternate bases [2] c, SnpCluster INT1,INT2 Filters clusters of 'INT1' or more SNPs within a run of 'INT2' bases [] D, MaxDP INT Maximum read depth [10000000] d, MinDP INT Minimum read depth [2] q, MinMQ INT Minimum RMS mapping quality for SNPs [10] Q, Qual INT Minimum value of the QUAL field [10] r, RefN Reference base is N [] W, GapWin INT Window size for filtering adjacent gaps [10] w, SnpGap INT SNP within INT bp around a gap to be filtered [10] Example: zcat in.vcf.gz | vcf-annotate -a annotations.gz -d descriptions.txt | bgzip -c >out.vcf.gz zcat in.vcf.gz | vcf-annotate -f +/-a/c=3,10/q=3/d=5/-D -a annotations.gz -d descriptions.txt | bgzip -c >out.vcf.gz Where descriptions.txt contains: key=INFO,ID=GN,Number=1,Type=String,Description='Gene Name' key=INFO,ID=STR,Number=1,Type=Integer,Description='Strand' vcf-annotate 0.1.5 July 2011 VCF-ANNOTATE(1)
All times are GMT -4. The time now is 02:02 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy