Sponsored Content
Top Forums Shell Programming and Scripting return a list of unique values of a column from csv format file Post 302361291 by phoeberunner on Monday 12th of October 2009 11:17:43 PM
Old 10-13-2009
return a list of unique values of a column from csv format file

Hi all,

I have a huge csv file with the following format of data,

[HEADER]
Num SNPs, 549997
Total SNPs,555352
Num Samples, 157
[Data]
SNP, SampleID, Allele1, Allele2
A001,AB1,A,A
A002,AB1,A,A
A003,AB1,A,A
...
...
...


I would like to write out a list of unique SNP (column 1). Could you let me know how to do this with UNIX command? Do I need to at firstl convert csv file to text file?

Thank you for your attention!

phoebe
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

print unique values of a column and sum up the corresponding values in next column

Hi All, I have a file which is having 3 columns as (string string integer) a b 1 x y 2 p k 5 y y 4 ..... ..... Question: I want get the unique value of column 2 in a sorted way(on column 2) and the sum of the 3rd column of the corresponding rows. e.g the above file should return the... (6 Replies)
Discussion started by: amigarus
6 Replies

2. Shell Programming and Scripting

AWK, Perl or Shell? Unique strings and their maximum values from 3 column data file

I have a file containing data like so: 2012-01-02 GREEN 4 2012-01-02 GREEN 6 2012-01-02 GREEN 7 2012-01-02 BLUE 4 2012-01-02 BLUE 3 2012-01-02 GREEN 4 2012-01-02 RED 4 2012-01-02 RED 8 2012-01-02 GREEN 4 2012-01-02 YELLOW 5 2012-01-02 YELLOW 2 I can't always predict what the... (4 Replies)
Discussion started by: rich@ardz
4 Replies

3. Shell Programming and Scripting

List unique values and count instances in .csv file

I need to take the second column of a .csv file and count the number of instances of each unique value in that same second column. I'd like the output to be value,count sorted by most instances. Thanks for any guidance! Data example: 317476,317756,0 816063,318861,0 313123,319091,0... (4 Replies)
Discussion started by: batcho
4 Replies

4. UNIX for Dummies Questions & Answers

Grep to find matching patern and return unique values

Request: grep to find given matching patern and return unique values, eliminate the duplicate values I have to retrieve the unique folder on the below file contents like; /app/oracle/build_lib/pkg320.0_20120927 /app/oracle/build_lib/pkg320.0_20121004_prof... (5 Replies)
Discussion started by: Siva SQL
5 Replies

5. Shell Programming and Scripting

Remove the values from a certain column without deleting the Column name in a .CSV file

(14 Replies)
Discussion started by: dhruuv369
14 Replies

6. Linux

To get all the columns in a CSV file based on unique values of particular column

cat sample.csv ID,Name,no 1,AAA,1 2,BBB,1 3,AAA,1 4,BBB,1 cut -d',' -f2 sample.csv | sort | uniq this gives only the 2nd column values Name AAA BBB How to I get all the columns of CSV along with this? (1 Reply)
Discussion started by: sanvel
1 Replies

7. Shell Programming and Scripting

Extracting unique values of a column from a feed file

Hi Folks, I have the below feed file named abc1.txt in which you can see there is a title and below is the respective values in the rows and it is completely pipe delimited file ,. ... (4 Replies)
Discussion started by: punpun66
4 Replies

8. Shell Programming and Scripting

Using grep and a parameter file to return unique values

Hello Everyone! I have updated the first post so that my intentions are easier to understand, and also attached sample files (post #18). I have over 500 text files in a directory. Over 1 GB of data. The data in those files is organised in lines: My intention is to return one line per... (23 Replies)
Discussion started by: clippertm
23 Replies

9. Shell Programming and Scripting

CSV File:Filter duplicate records from column1 & another column having unique record

Hi Experts, I have csv file with 30, 40 columns Pasting just 2 column for problem description. Need to print error if below combination is not present in file check for column-1 (DocumentNumber) and filter columns where value in DocumentNumber field is same. For all such rows, the field... (7 Replies)
Discussion started by: as7951
7 Replies
Bio::Graphics::Glyph::allele_tower(3pm) 		User Contributed Perl Documentation		   Bio::Graphics::Glyph::allele_tower(3pm)

NAME
Bio::Graphics::Glyph::allele_tower - The "allele_tower" glyph SYNOPSIS
See <Bio::Graphics::Panel> and <Bio::Graphics::Glyph>. DESCRIPTION
This glyph draws a letter for each allele found at a SNP position, one above the other (i.e. in a column). For example: A G See also http://www.hapmap.org/cgi-perl/gbrowse/gbrowse 'genotyped SNPs' for an example. The common options are available (except height which is calculated based on the number of alleles). In addition, if you give the glyph the minor allele frequency (MAF) and indicate which is the minor allele, the glyph will display these differences. GETTING THE ALLELES To specify the alleles, create an "Alleles" attribute for the feature. There should be two such attributes. For example, for a T/G polymorphism, the GFF load file should look like: Chr3 . SNP 12345 12345 . . . SNP ABC123; Alleles T ; Alleles G Alternatively, you can pass an "alleles" callback to the appropriate section of the config file. This option should return the two alleles separated by a slash: alleles = sub { my $snp = shift; my @d = $snp->get_tag_values('AllelePair'); return join "/",@d; } OPTIONS . Glyph Colour . Different colour for alleles on the reverse strand . Print out the complement for alleles on the reverse strand . Major allele shown in bold . Horizontal histogram to show allele frequency GLYPH COLOR The glyph color can be configured to be different if the feature is on the plus or minus strand. Use fgcolor to define the glyph color for the plus strand and bgcolor for the minus strand. For example: fgcolor = blue bgcolor = red For this option to work, you must also set ref_strand to return the strand of the feature: ref_strand = sub {shift->strand} REVERSE STRAND ALLELES If the alleles on the negative strand need to be the complement of what is listed in the GFF files, (e.g. A/G becomes T/C), set the complement option to have value 1 complement = 1 For this option to work, you must also set ref_strand to return the strand of the feature: ref_strand = sub {shift->strand} MAJOR/MINOR ALLELE Use the 'minor_allele' option to return the minor allele for the SNP. If you use this option, the major allele will appear in bold type. ALLELE FREQUENCY HISTOGRAMS Use the 'maf' option to return the minor allele frequency for the SNP. If you use this option, a horizontal histogram will be drawn next to the alleles, to indicate their relative frequencies. e.g. A______ C__ Note: The 'label' option must be set to 1 (i.e. on) and the 'minor_allele' option must return a valid allele for this to work. BUGS
Please report them. SEE ALSO
Bio::Graphics::Panel, Bio::Graphics::Glyph, Bio::Graphics::Glyph::arrow, Bio::Graphics::Glyph::cds, Bio::Graphics::Glyph::crossbox, Bio::Graphics::Glyph::diamond, Bio::Graphics::Glyph::dna, Bio::Graphics::Glyph::dot, Bio::Graphics::Glyph::ellipse, Bio::Graphics::Glyph::extending_arrow, Bio::Graphics::Glyph::generic, Bio::Graphics::Glyph::graded_segments, Bio::Graphics::Glyph::heterogeneous_segments, Bio::Graphics::Glyph::line, Bio::Graphics::Glyph::pinsertion, Bio::Graphics::Glyph::primers, Bio::Graphics::Glyph::rndrect, Bio::Graphics::Glyph::segments, Bio::Graphics::Glyph::ruler_arrow, Bio::Graphics::Glyph::toomany, Bio::Graphics::Glyph::transcript, Bio::Graphics::Glyph::transcript2, Bio::Graphics::Glyph::translation, Bio::Graphics::Glyph::allele_tower, Bio::DB::GFF, Bio::SeqI, Bio::SeqFeatureI, Bio::Das, GD AUTHOR
Fiona Cunningham <cunningh@cshl.edu> in Lincoln Stein's lab <steinl@cshl.edu>. Copyright (c) 2003 Cold Spring Harbor Laboratory This package and its accompanying libraries is free software; you can redistribute it and/or modify it under the terms of the GPL (either version 1, or at your option, any later version) or the Artistic License 2.0. Refer to LICENSE for the full license text. In addition, please see DISCLAIMER.txt for disclaimers of warranty. perl v5.14.2 2012-02-20 Bio::Graphics::Glyph::allele_tower(3pm)
All times are GMT -4. The time now is 06:41 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy