Count of unique lines in field 4 Post: 302960172

Sponsored Content

Top Forums Shell Programming and Scripting Count of unique lines in field 4 Post 302960172 by cmccabe on Wednesday 11th of November 2015 05:42:20 PM

11-11-2015

Registered User

Count of unique lines in field 4

When I use the below awk to count the unique lines in $4 for the input it seems to work. The answer is 3 because $4 is only unique 3 times in all the entries. However, when I use the same on actual data I get 56,536 and I know the answer should be 56,548. My question is there a better way to count the unique lines? Thank you Smilie

.

input

Code:

chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    1    20
chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    2    20
chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    3    22
chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    4    22
chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    5    22
chr1    957571    957852    chr1:957571-957852    AGRN-7|gc=61.2    1    186
chr1    957571    957852    chr1:957571-957852    AGRN-7|gc=61.2    2    201
chr1    957571    957852    chr1:957571-957852    AGRN-7|gc=61.2    3    201
chr1    957571    957852    chr1:957571-957852    AGRN-7|gc=61.2    271    176
chr1    957571    957852    chr1:957571-957852    AGRN-7|gc=61.2    272    175
chr1    957571    957852    chr1:957571-957852    AGRN-7|gc=61.2    273    175
chr1    957571    957852    chr1:957571-957852    AGRN-7|gc=61.2    274    175
chr1    970621    970740    chr1:970621-970740    AGRN-8|gc=57.1    46    280
chr1    970621    970740    chr1:970621-970740    AGRN-8|gc=57.1    47    280

cmccabe

View Public Profile for cmccabe

Find all posts by cmccabe

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare Tab Separated Field with AWK to all and print lines of unique fields.

Hi. I have a tab separated file that has a couple nearly identical lines. When doing: sort file | uniq > file.new It passes through the nearly identical lines because, well, they still are unique. a) I want to look only at field x for uniqueness and if the content in field x is the...

2. Shell Programming and Scripting

Unique Field

I have this input file tilenet_test:clar_r5_performance:server_2:4.80762:0%:APM00083103999-009E,APM00083103999-009F tilenet_int:clar_r5_performance:server_2:4.80762:0%:APM00083103999-00C4...

3. UNIX for Dummies Questions & Answers

Print unique lines without sort or unique

I would like to print unique lines without sort or unique. Unfortunately the server I am working on does not have sort or unique. I have not been able to contact the administrator of the server to ask him to add it for several weeks.

4. Shell Programming and Scripting

awk to count using each unique value

Im looking for an awk script that will take the unique values in column 5, then print and count the unique values in column 6. CA001011500 11111 11111 -9999 201301 AAA CA001012040 11111 11111 -9999 201301 AAA CA001012573 11111 11111 -9999 201301 BBB CA001012710 11111 11111 -9999 201301...

5. Shell Programming and Scripting

Count occurrence of column one unique value having unique second column value

Hello Team, I need your help on the following: My input file a.txt is as below: 3330690|373846|108471 3330690|373846|108471 0640829|459725|100001 0640829|459725|100001 3330690|373847|108471 Here row 1 and row 2 of column 1 are identical but corresponding column 2 value are...

6. Shell Programming and Scripting

awk joining multiple lines based on field count

Hi Folks, I have a file with fields as follows which has last field in multiple lines. I would like to combine a line which has three fields with single field line for as shown in expected output. Please help. INPUT hname01 windows appnamec1eda_p1, ...

7. Shell Programming and Scripting

awk to remove lines where field count is greather than 1 in two fields

I am trying to remove all the lines and spaces where the count in $4 or $5 is greater than 1 (more than 1 letter). The file and the output are tab-delimited. Thank you :). file X 5811530 . G C NLGN4X 17 10544696 . GA G MYH3 9 96439004 . C ...

8. UNIX for Beginners Questions & Answers

Count unique column

Hello, I am trying to count unique rows in my file based on 4 columns (2-5) and to output its frequency in a sixth column. My file is tab delimited My input file looks like this: Colum1 Colum2 Colum3 Colum4 Coulmn5 1.1 100 100 a b 1.1 100 100 a c 1.2 200 205 a d 1.3 300 301 a y 1.3 300...

9. UNIX for Beginners Questions & Answers

Print lines based upon unique values in Nth field

For some reason I am having difficulty performing what should be a fairly easy task. I would like to print lines of a file that have a unique value in the first field. For example, I have a large data-set with the following excerpt: PS003,001 MZMWR/ L-DWD// * PS003,001...

10. UNIX for Beginners Questions & Answers

Awk: count unique elements in a field and sum their occurence across the entire file

Hi, Sure it's an easy one, but it drives me insane. input ("|" separated): 1|A,B,C,A 2|A,D,D 3|A,B,B I would like to count the occurence of each capital letters in $2 across the entire file, knowing that duplicates in each record count as 1. I am trying to get this output...

LEARN ABOUT REDHAT

uniq

UNIQ(1) 								FSF								   UNIQ(1)

NAME

       uniq - remove duplicate lines from a sorted file

SYNOPSIS

       uniq [OPTION]... [INPUT [OUTPUT]]

DESCRIPTION

       Discard all but one of successive identical lines from INPUT (or standard input), writing to OUTPUT (or standard output).

       Mandatory arguments to long options are mandatory for short options too.

       -c, --count
	      prefix lines by the number of occurrences

       -d, --repeated
	      only print duplicate lines

       -D, --all-repeated[=delimit-method] print all duplicate lines
	      delimit-method={none(default),prepend,separate} Delimiting is done with blank lines.

       -f, --skip-fields=N
	      avoid comparing the first N fields

       -i, --ignore-case
	      ignore differences in case when comparing

       -s, --skip-chars=N
	      avoid comparing the first N characters

       -u, --unique
	      only print unique lines

       -w, --check-chars=N
	      compare no more than N characters in lines

       --help display this help and exit

       --version
	      output version information and exit

       A field is a run of whitespace, then non-whitespace characters.	Fields are skipped before chars.

AUTHOR

       Written by Richard Stallman and David MacKenzie.

REPORTING BUGS

       Report bugs to <bug-coreutils@gnu.org>.

COPYRIGHT

       Copyright (C) 2002 Free Software Foundation, Inc.
       This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICU-
       LAR PURPOSE.

SEE ALSO

       The full documentation for uniq is maintained as a Texinfo manual.  If the info and uniq programs are properly installed at your site,  the
       command

	      info uniq

       should give you access to the complete manual.

uniq (coreutils) 4.5.3						   February 2003							   UNIQ(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare Tab Separated Field with AWK to all and print lines of unique fields.

Discussion started by: rocket_dog

2. Shell Programming and Scripting

Unique Field

Discussion started by: greycells

3. UNIX for Dummies Questions & Answers

Print unique lines without sort or unique

Discussion started by: cokedude

4. Shell Programming and Scripting

awk to count using each unique value

Discussion started by: ncwxpanther