|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
Count the lines with the same values in a column and write the output to a file
Hey everyone! I have a tab delimited data set which I want to create an output contained the calculation of number of those lines with a certain value in 2nd and 3rd column. my input file is like this: Code:
ID1 1 10M AAATTTCCGG ID2 5 4M ACGT ID3 5 8M ACCTTGGA ID4 5 8M ACCTTGGA ID5 5 8M ACCTTGGA ID6 20 3M TCG ID7 20 3M TCG ID8 20 12M AACCTTGGCCTT ID9 20 12M AACCTTGGCCTT ID10 20 12M AACCTTGGCCTT I want my output to be like this: Code:
1 10M 1 AAATTTCCGG 5 4M 1 ACGT 5 8M 3 ACCTTGGA 20 3M 2 TCG 20 12M 3 AACCTTGGCCTT Thanks in advance!
|
| Sponsored Links | ||
|
|
#2
|
|||
|
|||
|
Try the below, Code:
cut -f2- FileName | sort | uniq -c The first column will give you the count of the occurrences. |
| Sponsored Links | ||
|
|
#3
|
||||
|
||||
|
Let us know if the order matters (or just add
| sort -nk1 -k2 after the script). Code:
awk 'END {
for (k in c)
print k, c[k], d[k]
}
{
k = $2 OFS $3
c[k]++; d[k] = $NF
}' infile |
| The Following User Says Thank You to radoulov For This Useful Post: | ||
@man (07-03-2012) | ||
|
#4
|
|||
|
|||
|
Thanks Athix. I tried this code. But it didn't work. It just gives value 1 for all lines in the first column which is not true. And it keeps repeating the lines with the same values in 2nd and 3rd and 4th columns which I don't want. ---------- Post updated at 03:49 PM ---------- Previous update was at 03:46 PM ---------- Thnaks radoulov. I'm really newbie! I need more explanation. I tried to copy and paste what you said into terminal but I couldn't manage to give the path for the input file. Let me why although I know it is stupid! BTW, I already sorted my file using this script: Code:
sort -n -k1 -k2 <filename> |
| Sponsored Links | |
|
|
#5
|
||||
|
||||
|
Code:
awk 'END {
for (k in c)
print k, c[k], d[k]
}
{
k = $2 OFS $3
c[k]++; d[k] = $NF
}' <filename> |
| The Following User Says Thank You to radoulov For This Useful Post: | ||
@man (07-03-2012) | ||
| Sponsored Links | |
|
|
#6
|
||||
|
||||
|
Code:
$ awk '{print $3,$2,$4}' input.txt | sort | uniq -c | awk '{print $3,$2,$1,$4}' | sort -n
1 10M 1 AAATTTCCGG
5 4M 1 ACGT
5 8M 3 ACCTTGGA
20 12M 3 AACCTTGGCCTT
20 3M 2 TCG |
| The Following User Says Thank You to itkamaraj For This Useful Post: | ||
@man (07-03-2012) | ||
| Sponsored Links | |
|
|
#7
|
|||
|
|||
|
Thanks radoulov! It works perfectly!
It just prints the last column of my dataset as the last column of output. In my real data I want the 10th column for the last column in the output! Thanks alot! ![]() |
| Sponsored Links | ||
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Word count of values in a column | jacobs.smith | Shell Programming and Scripting | 5 | 06-12-2012 03:21 PM |
| How to compare the values of a column in awk in a same file and consecutive lines.. | manuswami | Shell Programming and Scripting | 4 | 04-04-2012 07:23 AM |
| Count Number Of lines in text files and append values to beginning of file | motoxeryz125 | UNIX for Dummies Questions & Answers | 7 | 04-28-2011 02:36 AM |
| Help with script to read lines from file and count values | gman2010 | Shell Programming and Scripting | 2 | 04-27-2011 08:37 PM |
| count number of rows based on other column values | itsme999 | UNIX for Dummies Questions & Answers | 3 | 08-29-2010 05:11 PM |
|
|