How to count specific columns and merge with unique ones? Post: 302682903

Sponsored Content

Top Forums UNIX for Dummies Questions & Answers How to count specific columns and merge with unique ones? Post 302682903 by JamesT on Tuesday 7th of August 2012 04:00:00 AM

08-07-2012

Registered User

How to count specific columns and merge with unique ones?

Hi. I am not sure the title gives an optimal description of what I want to do.

I have several text files that contain data in many columns. All the files are organized the same way, but the data in the columns might differ. I want to count the number of times data occur in specific columns, sort the output and make a new file. However, I want check several files for the occurrence of the same data.

Code:

File 1:
xx xx xx aab rrt xx
xx xx xx ccd bbt xx
xx xx xx ggt iir xx
File 2:
xx xx xx ggt iir xx
File 3:
xx xx xx aab rrt xx
xx xx xx ggt iir xx

First I made a modification to the files, individually (any better way?) to make the file name occur in the first column:

Code:

sed 's/^/File1\t/' file1.temp > 1.txt

Then I extracted the columns of interest and sorted them and made a new file:

Code:

awk '{print $1,$5,$6}' *.txt |sort -k2 > output.txt

The output.txt file could look like this:

Code:

File1 aab rrt
File3 aab rrt
File1 ccd bbt
File2 ggt iir
File3 ggt iir
File1 ggt iir

Now, I want to count the number of times column 2 and column 3 are identical for every line and keep the first column information in the output file, separated by comma or similar. I want to result to be like this:

Code:

1 ccd bbt File1
2 aab rrt File1,File3
3 ggt iir File1, File2, File3

It would be good (but not a requirement) to have the last column in the final file to be sorted, lane1, lane2, lane3 etc. The lane* can also be separated by columns if that is easier.

So far I have tried to use:

Code:

awk '{print $1,$5,$6}' *.txt |sort -k2|uniq -f1 -c|sort -g > final_output.txt

However, I am not able to get the column data merged in the final output file. How should I go about to do that?

-James

Last edited by JamesT; 08-07-2012 at 08:52 AM.. Reason: Made a mistake in the first code

JamesT

View Public Profile for JamesT

Find all posts by JamesT

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

merge columns into one line after a specific pattern

Hi all, im a linux newbie, plz help! I have a file - box -------- Fox-2 -------- UF29 zip42 -------- zf-CW SNF2_N Heli_Z -------- Fox -------- Kel_1 box

2. Shell Programming and Scripting

Merge 2 columns/remove specific spaces

Hi, I have a requirement to remove certain spaces from a table of information, but I'm unsure where to start. A typical table will be like this: ABCDE 1 Elton John 25 12 15 9 3 ABCDE 2 Oasis 29 13 4 6 9 ABCDE 3 The Rolling Stones 55 19 3 8 6The goal is to remove only the spaces between...

3. Shell Programming and Scripting

sort split merge -u unique

Hi, this is about sorting a very large file (like 10 gb) to keep lines with unique entries across SOME of the columns. The line originally looked like this: sort -u -k2,2 -k3,3n -k4,4n -k5,5n -k6,6n file_unsorted > file_sorted please note the -u flag. The problem is that this single...

4. Shell Programming and Scripting

How to merge columns into lines, using unique keys?

I would really appreciate a sulution for this : invoice# client# 5929 231 4358 231 2185 231 6234 231 1166 464 1264 464 3432 464 1720 464 9747 464 1133 791 4930 791 5496 791 6291 791 8681 989 3023 989

5. Shell Programming and Scripting

count the unique records based on certain columns

Hi everyone, I have a file result.txt with records as following and another file mirna.txt with a list of miRNAs e.g. miR22, miR123, miR13 etc. Gene Transcript miRNA Gar Nm_111233 miR22 Gar Nm_123440 miR22 Gar Nm_129939 miR22 Hel Nm_233900 miR13 Hel ...

6. Shell Programming and Scripting

Count frequency of unique values in specific column

Hi, I have tab-deliminated data similar to the following: dot is-big 2 dot is-round 3 dot is-gray 4 cat is-big 3 hot in-summer 5 I want to count the frequency of each individual "unique" value in the 1st column. Thus, the desired output would be as follows: dot 3 cat 1 hot 1 is...

7. Shell Programming and Scripting

Merge specific columns of two files

Hello, I have two tab delimited text files. Both files have the same number of rows but not necessarily the same number of columns. The column headers look like, File 1: f0order CVorder Name f0 RI_9 E99 E199 E299 E399 E499 E599 E699 E799 E899 E999 File 2:...

8. Shell Programming and Scripting

How to merge two files with unique values matching.?

I have one script as below: #!/bin/ksh Outputfile1="/home/OutputFile1.xls" Outputfile2="/home/OutputFile2.xls" InputFile1="/home/InputFile1.sql" InputFile2="/home/InputFile2.sql" echo "Select hobby, class, subject, sports, rollNumber from Student_Table" >> InputFile1 echo "Select rollNumber...

9. UNIX for Beginners Questions & Answers

Merge 4 bim files by keeping only the overlapping variants (unique rs values )

Dear community, I am facing a problem and I kindly ask your help: I have 4 different data sets consisted from 3 different types of array. On each file, column 1 is chromosome position, column 2 is SNP id etc... Lets say I have the following (bim) datasets: x2014: 1 rs3094315...

LEARN ABOUT PLAN9

join

JOIN(1) 						      General Commands Manual							   JOIN(1)

NAME

       join - relational database operator

SYNOPSIS

       join [ options ] file1 file2

DESCRIPTION

       Join forms, on the standard output, a join of the two relations specified by the lines of file1 and file2.  If one of the file names is the
       standard input is used.

       File1 and file2 must be sorted in increasing ASCII collating sequence on the fields on which they are to be joined, normally the  first	in
       each line.

       There  is  one line in the output for each pair of lines in file1 and file2 that have identical join fields.  The output line normally con-
       sists of the common field, then the rest of the line from file1, then the rest of the line from file2.

       Input fields are normally separated spaces or tabs; output fields by space.  In this case, multiple separators count as	one,  and  leading
       separators are discarded.

       The following options are recognized, with POSIX syntax.

       -a n   In addition to the normal output, produce a line for each unpairable line in file n, where n is 1 or 2.

       -v n   Like -a, omitting output for paired lines.

       -e s   Replace empty output fields by string s.

       -1 m
       -2 m   Join on the mth field of file1 or file2.

       -jn m  Archaic equivalent for -n m.

       -ofields
	      Each  output  line  comprises the designated fields.  The comma-separated field designators are either 0, meaning the join field, or
	      have the form n.m, where n is a file number and m is a field number.  Archaic usage allows separate arguments for field designators.

       -tc    Use character c as the only separator (tab character) on input and output.  Every appearance of c in a line is significant.

EXAMPLES

       sort /adm/users | join -t: -a 1 -e "" - bdays
	      Add birthdays to password information, leaving unknown birthdays empty.  The layout of is given in users(6); bdays  contains  sorted
	      lines like

       tr : ' ' </adm/users | sort -k 3 3 >temp
       join -1 3 -2 3 -o 1.1,2.1 temp temp | awk '$1 < $2'
	      Print all pairs of users with identical userids.

SOURCE

       /sys/src/cmd/join.c

SEE ALSO

       sort(1), comm(1), awk(1)

BUGS

       With default field separation, the collating sequence is that of sort -b -ky,y; with -t, the sequence is that of sort -tx -ky,y.
       One of the files must be randomly accessible.

																	   JOIN(1)

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

merge columns into one line after a specific pattern

Discussion started by: sam_2921

2. Shell Programming and Scripting

Merge 2 columns/remove specific spaces

Discussion started by: danhodges99

3. Shell Programming and Scripting

sort split merge -u unique

Discussion started by: jbr950

4. Shell Programming and Scripting

How to merge columns into lines, using unique keys?

Discussion started by: hemo21

5. Shell Programming and Scripting

count the unique records based on certain columns

Discussion started by: miclow

6. Shell Programming and Scripting

Count frequency of unique values in specific column

Discussion started by: owwow14

7. Shell Programming and Scripting

Merge specific columns of two files

Discussion started by: LMHmedchem

8. Shell Programming and Scripting

How to merge two files with unique values matching.?

Discussion started by: Sharma331

9. UNIX for Beginners Questions & Answers

Merge 4 bim files by keeping only the overlapping variants (unique rs values )

Discussion started by: fondan

LEARN ABOUT PLAN9

join