How to count specific columns and merge with unique ones?
Hi. I am not sure the title gives an optimal description of what I want to do.
I have several text files that contain data in many columns. All the files are organized the same way, but the data in the columns might differ. I want to count the number of times data occur in specific columns, sort the output and make a new file. However, I want check several files for the occurrence of the same data.
First I made a modification to the files, individually (any better way?) to make the file name occur in the first column:
Then I extracted the columns of interest and sorted them and made a new file:
The output.txt file could look like this:
Now, I want to count the number of times column 2 and column 3 are identical for every line and keep the first column information in the output file, separated by comma or similar. I want to result to be like this:
It would be good (but not a requirement) to have the last column in the final file to be sorted, lane1, lane2, lane3 etc. The lane* can also be separated by columns if that is easier.
So far I have tried to use:
However, I am not able to get the column data merged in the final output file. How should I go about to do that?
-James
Last edited by JamesT; 08-07-2012 at 08:52 AM..
Reason: Made a mistake in the first code
Dear community, I am facing a problem and I kindly ask your help:
I have 4 different data sets consisted from 3 different types of array.
On each file, column 1 is chromosome position, column 2 is SNP id etc... Lets say I have the following (bim) datasets:
x2014:
1 rs3094315... (4 Replies)
Hello,
I have two tab delimited text files. Both files have the same number of rows but not necessarily the same number of columns. The column headers look like,
File 1:
f0order CVorder Name f0 RI_9 E99 E199 E299 E399 E499 E599 E699 E799 E899 E999
File 2:... (9 Replies)
Hi, I have tab-deliminated data similar to the following:
dot is-big 2
dot is-round 3
dot is-gray 4
cat is-big 3
hot in-summer 5
I want to count the frequency of each individual "unique" value in the 1st column. Thus, the desired output would be as follows:
dot 3
cat 1
hot 1
is... (5 Replies)
Hi everyone,
I have a file result.txt with records as following and another file mirna.txt with a list of miRNAs e.g. miR22, miR123, miR13 etc.
Gene Transcript miRNA
Gar Nm_111233 miR22
Gar Nm_123440 miR22
Gar Nm_129939 miR22
Hel Nm_233900 miR13
Hel ... (6 Replies)
Hi, this is about sorting a very large file (like 10 gb) to keep lines with unique entries across SOME of the columns.
The line originally looked like this:
sort -u -k2,2 -k3,3n -k4,4n -k5,5n -k6,6n file_unsorted > file_sorted
please note the -u flag.
The problem is that this single... (4 Replies)
Hi,
I have a requirement to remove certain spaces from a table of information, but I'm unsure where to start.
A typical table will be like this:
ABCDE 1 Elton John 25 12 15 9 3
ABCDE 2 Oasis 29 13 4 6 9
ABCDE 3 The Rolling Stones 55 19 3 8 6The goal is to remove only the spaces between... (11 Replies)
Hi all,
im a linux newbie, plz help!
I have a file -
box
--------
Fox-2
--------
UF29
zip42
--------
zf-CW
SNF2_N
Heli_Z
--------
Fox
--------
Kel_1
box (3 Replies)