Sponsored Content
Top Forums UNIX for Dummies Questions & Answers How to count specific columns and merge with unique ones? Post 302682903 by JamesT on Tuesday 7th of August 2012 04:00:00 AM
Old 08-07-2012
How to count specific columns and merge with unique ones?

Hi. I am not sure the title gives an optimal description of what I want to do.

I have several text files that contain data in many columns. All the files are organized the same way, but the data in the columns might differ. I want to count the number of times data occur in specific columns, sort the output and make a new file. However, I want check several files for the occurrence of the same data.

Code:
File 1:
xx xx xx aab rrt xx
xx xx xx ccd bbt xx
xx xx xx ggt iir xx
File 2:
xx xx xx ggt iir xx
File 3:
xx xx xx aab rrt xx
xx xx xx ggt iir xx

First I made a modification to the files, individually (any better way?) to make the file name occur in the first column:
Code:
sed 's/^/File1\t/' file1.temp > 1.txt

Then I extracted the columns of interest and sorted them and made a new file:

Code:
awk '{print $1,$5,$6}' *.txt |sort -k2 > output.txt

The output.txt file could look like this:

Code:
File1 aab rrt
File3 aab rrt
File1 ccd bbt
File2 ggt iir
File3 ggt iir
File1 ggt iir

Now, I want to count the number of times column 2 and column 3 are identical for every line and keep the first column information in the output file, separated by comma or similar. I want to result to be like this:

Code:
1 ccd bbt File1
2 aab rrt File1,File3
3 ggt iir File1, File2, File3

It would be good (but not a requirement) to have the last column in the final file to be sorted, lane1, lane2, lane3 etc. The lane* can also be separated by columns if that is easier.

So far I have tried to use:

Code:
awk '{print $1,$5,$6}' *.txt |sort -k2|uniq -f1 -c|sort -g > final_output.txt

However, I am not able to get the column data merged in the final output file. How should I go about to do that?

-James

Last edited by JamesT; 08-07-2012 at 08:52 AM.. Reason: Made a mistake in the first code
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

merge columns into one line after a specific pattern

Hi all, im a linux newbie, plz help! I have a file - box -------- Fox-2 -------- UF29 zip42 -------- zf-CW SNF2_N Heli_Z -------- Fox -------- Kel_1 box (3 Replies)
Discussion started by: sam_2921
3 Replies

2. Shell Programming and Scripting

Merge 2 columns/remove specific spaces

Hi, I have a requirement to remove certain spaces from a table of information, but I'm unsure where to start. A typical table will be like this: ABCDE 1 Elton John 25 12 15 9 3 ABCDE 2 Oasis 29 13 4 6 9 ABCDE 3 The Rolling Stones 55 19 3 8 6The goal is to remove only the spaces between... (11 Replies)
Discussion started by: danhodges99
11 Replies

3. Shell Programming and Scripting

sort split merge -u unique

Hi, this is about sorting a very large file (like 10 gb) to keep lines with unique entries across SOME of the columns. The line originally looked like this: sort -u -k2,2 -k3,3n -k4,4n -k5,5n -k6,6n file_unsorted > file_sorted please note the -u flag. The problem is that this single... (4 Replies)
Discussion started by: jbr950
4 Replies

4. Shell Programming and Scripting

How to merge columns into lines, using unique keys?

I would really appreciate a sulution for this : invoice# client# 5929 231 4358 231 2185 231 6234 231 1166 464 1264 464 3432 464 1720 464 9747 464 1133 791 4930 791 5496 791 6291 791 8681 989 3023 989 (2 Replies)
Discussion started by: hemo21
2 Replies

5. Shell Programming and Scripting

count the unique records based on certain columns

Hi everyone, I have a file result.txt with records as following and another file mirna.txt with a list of miRNAs e.g. miR22, miR123, miR13 etc. Gene Transcript miRNA Gar Nm_111233 miR22 Gar Nm_123440 miR22 Gar Nm_129939 miR22 Hel Nm_233900 miR13 Hel ... (6 Replies)
Discussion started by: miclow
6 Replies

6. Shell Programming and Scripting

Count frequency of unique values in specific column

Hi, I have tab-deliminated data similar to the following: dot is-big 2 dot is-round 3 dot is-gray 4 cat is-big 3 hot in-summer 5 I want to count the frequency of each individual "unique" value in the 1st column. Thus, the desired output would be as follows: dot 3 cat 1 hot 1 is... (5 Replies)
Discussion started by: owwow14
5 Replies

7. Shell Programming and Scripting

Merge specific columns of two files

Hello, I have two tab delimited text files. Both files have the same number of rows but not necessarily the same number of columns. The column headers look like, File 1: f0order CVorder Name f0 RI_9 E99 E199 E299 E399 E499 E599 E699 E799 E899 E999 File 2:... (9 Replies)
Discussion started by: LMHmedchem
9 Replies

8. Shell Programming and Scripting

How to merge two files with unique values matching.?

I have one script as below: #!/bin/ksh Outputfile1="/home/OutputFile1.xls" Outputfile2="/home/OutputFile2.xls" InputFile1="/home/InputFile1.sql" InputFile2="/home/InputFile2.sql" echo "Select hobby, class, subject, sports, rollNumber from Student_Table" >> InputFile1 echo "Select rollNumber... (3 Replies)
Discussion started by: Sharma331
3 Replies

9. UNIX for Beginners Questions & Answers

Merge 4 bim files by keeping only the overlapping variants (unique rs values )

Dear community, I am facing a problem and I kindly ask your help: I have 4 different data sets consisted from 3 different types of array. On each file, column 1 is chromosome position, column 2 is SNP id etc... Lets say I have the following (bim) datasets: x2014: 1 rs3094315... (4 Replies)
Discussion started by: fondan
4 Replies
MPSMatrixDescriptor(3)					 MetalPerformanceShaders.framework				    MPSMatrixDescriptor(3)

NAME
MPSMatrixDescriptor SYNOPSIS
#import <MPSMatrixTypes.h> Inherits NSObject. Class Methods (__nonnull instancetype) + matrixDescriptorWithDimensions:columns:rowBytes:dataType: (__nonnull instancetype) + matrixDescriptorWithRows:columns:rowBytes:dataType: (__nonnull instancetype) + matrixDescriptorWithRows:columns:matrices:rowBytes:matrixBytes:dataType: (size_t) + rowBytesFromColumns:dataType: (size_t) + rowBytesForColumns:dataType: Properties NSUInteger rows NSUInteger columns NSUInteger matrices MPSDataType dataType NSUInteger rowBytes NSUInteger matrixBytes Detailed Description This depends on Metal.framework A MPSMatrixDescriptor describes the sizes, strides, and data type of a an array of 2-dimensional matrices. All storage is assumed to be in 'matrix-major'. See the description for MPSMatrix for further details. Method Documentation + (__nonnull instancetype) matrixDescriptorWithDimensions: (NSUInteger) rows(NSUInteger) columns(NSUInteger) rowBytes(MPSDataType) dataType Create a MPSMatrixDescriptor with the specified dimensions and data type. Parameters: rows The number of rows of the matrix. columns The number of columns of the matrix. rowBytes The number of bytes between starting elements of consecutive rows. Must be a multiple of the element size. dataType The type of the data to be stored in the matrix. For performance considerations the optimal row stride may not necessarily be equal to the number of columns in the matrix. The MPSMatrix class provides a method which may be used to determine this value, see the rowBytesForColumns API in the MPSMatrix class. The number of matrices described is initialized to 1. + (__nonnull instancetype) matrixDescriptorWithRows: (NSUInteger) rows(NSUInteger) columns(NSUInteger) matrices(NSUInteger) rowBytes(NSUInteger) matrixBytes(MPSDataType) dataType Create a MPSMatrixDescriptor with the specified dimensions and data type. Parameters: rows The number of rows of a single matrix. columns The number of columns of a single matrix. matrices The number of matrices in the MPSMatrix object. rowBytes The number of bytes between starting elements of consecutive rows. Must be a multiple of the element size. matrixBytes The number of bytes between starting elements of consecutive matrices. Must be a multiple of rowBytes. dataType The type of the data to be stored in the matrix. For performance considerations the optimal row stride may not necessarily be equal to the number of columns in the matrix. The MPSMatrix class provides a method which may be used to determine this value, see the rowBytesForColumns API in the MPSMatrix class. + (__nonnull instancetype) matrixDescriptorWithRows: (NSUInteger) rows(NSUInteger) columns(NSUInteger) rowBytes(MPSDataType) dataType + (size_t) rowBytesForColumns: (NSUInteger) columns(MPSDataType) dataType + (size_t) rowBytesFromColumns: (NSUInteger) columns(MPSDataType) dataType Return the recommended row stride, in bytes, for a given number of columns. Parameters: columns The number of columns in the matrix for which the recommended row stride, in bytes, is to be determined. dataType The type of matrix data values. To achieve best performance the optimal stride between rows of a matrix is not necessarily equivalent to the number of columns. This method returns the row stride, in bytes, which gives best performance for a given number of columns. Using this row stride to construct your array is recommended, but not required (provided that the stride used is still large enough to allocate a full row of data). Property Documentation - columns [read], [write], [nonatomic], [assign] The number of columns in a matrix. - dataType [read], [write], [nonatomic], [assign] The type of the data which makes up the values of the matrix. - matrices [read], [nonatomic], [assign] The number of matrices. - matrixBytes [read], [nonatomic], [assign] The stride, in bytes, between corresponding elements of consecutive matrices. Must be a multiple of rowBytes. - rowBytes [read], [write], [nonatomic], [assign] The stride, in bytes, between corresponding elements of consecutive rows. Must be a multiple of the element size. - rows [read], [write], [nonatomic], [assign] The number of rows in a matrix. Author Generated automatically by Doxygen for MetalPerformanceShaders.framework from the source code. Version MetalPerformanceShaders-100 Thu Feb 8 2018 MPSMatrixDescriptor(3)
All times are GMT -4. The time now is 10:59 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy