How to count specific columns and merge with unique ones?


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers How to count specific columns and merge with unique ones?
# 1  
Old 08-07-2012
How to count specific columns and merge with unique ones?

Hi. I am not sure the title gives an optimal description of what I want to do.

I have several text files that contain data in many columns. All the files are organized the same way, but the data in the columns might differ. I want to count the number of times data occur in specific columns, sort the output and make a new file. However, I want check several files for the occurrence of the same data.

Code:
File 1:
xx xx xx aab rrt xx
xx xx xx ccd bbt xx
xx xx xx ggt iir xx
File 2:
xx xx xx ggt iir xx
File 3:
xx xx xx aab rrt xx
xx xx xx ggt iir xx

First I made a modification to the files, individually (any better way?) to make the file name occur in the first column:
Code:
sed 's/^/File1\t/' file1.temp > 1.txt

Then I extracted the columns of interest and sorted them and made a new file:

Code:
awk '{print $1,$5,$6}' *.txt |sort -k2 > output.txt

The output.txt file could look like this:

Code:
File1 aab rrt
File3 aab rrt
File1 ccd bbt
File2 ggt iir
File3 ggt iir
File1 ggt iir

Now, I want to count the number of times column 2 and column 3 are identical for every line and keep the first column information in the output file, separated by comma or similar. I want to result to be like this:

Code:
1 ccd bbt File1
2 aab rrt File1,File3
3 ggt iir File1, File2, File3

It would be good (but not a requirement) to have the last column in the final file to be sorted, lane1, lane2, lane3 etc. The lane* can also be separated by columns if that is easier.

So far I have tried to use:

Code:
awk '{print $1,$5,$6}' *.txt |sort -k2|uniq -f1 -c|sort -g > final_output.txt

However, I am not able to get the column data merged in the final output file. How should I go about to do that?

-James

Last edited by JamesT; 08-07-2012 at 08:52 AM.. Reason: Made a mistake in the first code
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Merge 4 bim files by keeping only the overlapping variants (unique rs values )

Dear community, I am facing a problem and I kindly ask your help: I have 4 different data sets consisted from 3 different types of array. On each file, column 1 is chromosome position, column 2 is SNP id etc... Lets say I have the following (bim) datasets: x2014: 1 rs3094315... (4 Replies)
Discussion started by: fondan
4 Replies

2. Shell Programming and Scripting

How to merge two files with unique values matching.?

I have one script as below: #!/bin/ksh Outputfile1="/home/OutputFile1.xls" Outputfile2="/home/OutputFile2.xls" InputFile1="/home/InputFile1.sql" InputFile2="/home/InputFile2.sql" echo "Select hobby, class, subject, sports, rollNumber from Student_Table" >> InputFile1 echo "Select rollNumber... (3 Replies)
Discussion started by: Sharma331
3 Replies

3. Shell Programming and Scripting

Merge specific columns of two files

Hello, I have two tab delimited text files. Both files have the same number of rows but not necessarily the same number of columns. The column headers look like, File 1: f0order CVorder Name f0 RI_9 E99 E199 E299 E399 E499 E599 E699 E799 E899 E999 File 2:... (9 Replies)
Discussion started by: LMHmedchem
9 Replies

4. Shell Programming and Scripting

Count frequency of unique values in specific column

Hi, I have tab-deliminated data similar to the following: dot is-big 2 dot is-round 3 dot is-gray 4 cat is-big 3 hot in-summer 5 I want to count the frequency of each individual "unique" value in the 1st column. Thus, the desired output would be as follows: dot 3 cat 1 hot 1 is... (5 Replies)
Discussion started by: owwow14
5 Replies

5. Shell Programming and Scripting

count the unique records based on certain columns

Hi everyone, I have a file result.txt with records as following and another file mirna.txt with a list of miRNAs e.g. miR22, miR123, miR13 etc. Gene Transcript miRNA Gar Nm_111233 miR22 Gar Nm_123440 miR22 Gar Nm_129939 miR22 Hel Nm_233900 miR13 Hel ... (6 Replies)
Discussion started by: miclow
6 Replies

6. Shell Programming and Scripting

How to merge columns into lines, using unique keys?

I would really appreciate a sulution for this : invoice# client# 5929 231 4358 231 2185 231 6234 231 1166 464 1264 464 3432 464 1720 464 9747 464 1133 791 4930 791 5496 791 6291 791 8681 989 3023 989 (2 Replies)
Discussion started by: hemo21
2 Replies

7. Shell Programming and Scripting

sort split merge -u unique

Hi, this is about sorting a very large file (like 10 gb) to keep lines with unique entries across SOME of the columns. The line originally looked like this: sort -u -k2,2 -k3,3n -k4,4n -k5,5n -k6,6n file_unsorted > file_sorted please note the -u flag. The problem is that this single... (4 Replies)
Discussion started by: jbr950
4 Replies

8. Shell Programming and Scripting

Merge 2 columns/remove specific spaces

Hi, I have a requirement to remove certain spaces from a table of information, but I'm unsure where to start. A typical table will be like this: ABCDE 1 Elton John 25 12 15 9 3 ABCDE 2 Oasis 29 13 4 6 9 ABCDE 3 The Rolling Stones 55 19 3 8 6The goal is to remove only the spaces between... (11 Replies)
Discussion started by: danhodges99
11 Replies

9. Shell Programming and Scripting

merge columns into one line after a specific pattern

Hi all, im a linux newbie, plz help! I have a file - box -------- Fox-2 -------- UF29 zip42 -------- zf-CW SNF2_N Heli_Z -------- Fox -------- Kel_1 box (3 Replies)
Discussion started by: sam_2921
3 Replies
Login or Register to Ask a Question