Counts not matching in file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Counts not matching in file
# 1  
Old 11-12-2015
Counts not matching in file

I can not figure out why there are 56,548 unique entries in test.bed. However, perl and awk see only 56,543 and that # is what my analysis see's as well. What happened to the 5 missing? Thank you Smilie.

The file is attached as well.

Code:
cmccabe@DTV-A5211QLM:~/Desktop/NGS/bed/bedtools$wc -l test.bed
56548 test.bed

cmccabe@DTV-A5211QLM:~/Desktop/NGS/bed/bedtools$ perl -nae '$seen{$F[3]}++;
    END{
        print "There are ", scalar keys %seen, " unique fourth fields\n";
    }' test.bed
There are 56543 unique fourth fields

cmccabe@DTV-A5211QLM:~/Desktop/NGS/bed/bedtools$ awk '$4!=d{c++;d=$4}END{print c}' test.bed
56543

# 2  
Old 11-12-2015
1. sorted your file based on 4th column and saved it: sort -k4,4 test.bed >test.bed.sorted

2. ran my solution minus the wc -l and saved that: sort -u -k4,4 test.bed >test.bed.uniq

3. here are the diffs
Code:
4748d4747
< chr11	47270217	47270425	chr11:47270217-47270425	unknown-1062|gc=64.9
4970d4968
< chr11	5248271	5248449	chr11:5248271-5248449	HBB-283|gc=55.1
24883d24880
< chr19	13010118	13010237	chr19:13010118-13010237	SYCE2-864|gc=47.9
33027d33023
< chr22	38153605	38154160	chr22:38153605-38154160	TRIOBP-610|gc=68.6
54957d54952
< chrX	33357316	33359011	chrX:33357316-33359011	DMD-581|gc=33.7

---------- Post updated at 02:16 PM ---------- Previous update was at 02:13 PM ----------

adding just for clarity... so take 'chr11:47270217-47270425' and fgrep that string in the original test.bed file.
Code:
$ fgrep 'chr11:47270217-47270425' test.bed
chr11	47270217	47270425	chr11:47270217-47270425	ACP2-1062|gc=64.9
chr11	47270217	47270425	chr11:47270217-47270425	unknown-1062|gc=64.9

Feel free to do with the other values and you'll see that they are not unique.
This User Gave Thanks to cjcox For This Post:
# 3  
Old 11-12-2015
Thank you Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Output counts of all matching strings lessthan a number using awk

The awk below is supposed to count all the matching $5 strings and count how many $7 values is less than 20. I don't think I need the portion in bold as I do not need any decimal point or format, but can not seem to get the correct counts. Thank you :). file chr5 77316500 77316628 ... (6 Replies)
Discussion started by: cmccabe
6 Replies

2. Shell Programming and Scripting

New file should store all the 7 existing filenames and their record counts and ftp th

Hi, I need help regarding below concern. There is a script and it has 7 existing files(in a path say,. usr/appl/temp/file1.txt) and I need to create one new blank file say “file_count.txt” in the same script itself. Then the new file <file_count.txt> should store all the 7 filenames and... (1 Reply)
Discussion started by: pr293
1 Replies

3. Shell Programming and Scripting

word counts for a single line xml file

I have any XML ouput file(file name TABLE.xml), where the data is loaded in A SINGLE LINE, I need help in writting a ksh shell script which gives me the word counts of word <TABLE-ROW> This is my input file. <?xml version="1.0" encoding="UTF-8"?><!--Generated by Ascential Software... (4 Replies)
Discussion started by: pred55
4 Replies

4. UNIX for Dummies Questions & Answers

Hardcoding & Record counts in a file

HI , I am having a huge comma delimiter file, I have to append the following four lines before the starting of the file through a shell script. FILE NAME = TEST_LOAD DATETIME = CURRENT DATE TIME LOAD DATE = CURRENT DATE RECORD COUNT = TOTAL RECORDS IN FILE Source data 1,2,3,4,5,6,7... (7 Replies)
Discussion started by: shruthidwh
7 Replies

5. UNIX for Dummies Questions & Answers

how to get distinct counts in a column of a file

If i have a file sample.txt with more than 10 columns and 11th column as following data. would it be possible to get the distinct counts of values in single shot,Thank you. Y Y N N N P P o Expected Result: Value count Y 2 N 3 P 2 (2 Replies)
Discussion started by: Ariean
2 Replies

6. Shell Programming and Scripting

Counts a number of unique word contained in the file and print them in alphabetical order

What should be the Shell script that counts a number of unique word contained in a file and print them in alphabetical order line by line? (7 Replies)
Discussion started by: proactiveaditya
7 Replies

7. Shell Programming and Scripting

Perl script that counts lines of a file

I am working on this script, but hit a bump. Looking for a little help figuring out the last part: open(MY_FILE, $ARGV) or die $COUNTER = 1; $LINE = <FILE>; while ($LINE, <FILE>) { # Adds leading zeros for numbers 1 digit long if ($COUNTER<10){ print "000"; } # Adds... (2 Replies)
Discussion started by: Breakology
2 Replies

8. UNIX for Dummies Questions & Answers

counts

To start I have a table that has ticketholders. Each ticket holder has a unique number and each ticket holder is associated to a so called household number. You can have multiple guests w/i a household. I would like to create 3 flags (form a, for a household that has 1-4 gst) form b 5-8 gsts... (3 Replies)
Discussion started by: sbr262
3 Replies

9. Solaris

file size counts??

Hello experts, I do - $ ls -lhtr logs2007* Is it possible that i can get the results of- totals size in MB/KB for ALL "logs2007*" note: in the same directory I have "logs2006*" & "logs2007*" files. (4 Replies)
Discussion started by: thepurple
4 Replies

10. UNIX for Dummies Questions & Answers

counts

How can i do a simple record count in my shell script? i just want to count the number of records i receive from a specific file. (11 Replies)
Discussion started by: k@ssidy
11 Replies
Login or Register to Ask a Question