Sponsored Content
Top Forums UNIX for Dummies Questions & Answers How to count specific columns and merge with unique ones? Post 302682903 by JamesT on Tuesday 7th of August 2012 04:00:00 AM
Old 08-07-2012
How to count specific columns and merge with unique ones?

Hi. I am not sure the title gives an optimal description of what I want to do.

I have several text files that contain data in many columns. All the files are organized the same way, but the data in the columns might differ. I want to count the number of times data occur in specific columns, sort the output and make a new file. However, I want check several files for the occurrence of the same data.

Code:
File 1:
xx xx xx aab rrt xx
xx xx xx ccd bbt xx
xx xx xx ggt iir xx
File 2:
xx xx xx ggt iir xx
File 3:
xx xx xx aab rrt xx
xx xx xx ggt iir xx

First I made a modification to the files, individually (any better way?) to make the file name occur in the first column:
Code:
sed 's/^/File1\t/' file1.temp > 1.txt

Then I extracted the columns of interest and sorted them and made a new file:

Code:
awk '{print $1,$5,$6}' *.txt |sort -k2 > output.txt

The output.txt file could look like this:

Code:
File1 aab rrt
File3 aab rrt
File1 ccd bbt
File2 ggt iir
File3 ggt iir
File1 ggt iir

Now, I want to count the number of times column 2 and column 3 are identical for every line and keep the first column information in the output file, separated by comma or similar. I want to result to be like this:

Code:
1 ccd bbt File1
2 aab rrt File1,File3
3 ggt iir File1, File2, File3

It would be good (but not a requirement) to have the last column in the final file to be sorted, lane1, lane2, lane3 etc. The lane* can also be separated by columns if that is easier.

So far I have tried to use:

Code:
awk '{print $1,$5,$6}' *.txt |sort -k2|uniq -f1 -c|sort -g > final_output.txt

However, I am not able to get the column data merged in the final output file. How should I go about to do that?

-James

Last edited by JamesT; 08-07-2012 at 08:52 AM.. Reason: Made a mistake in the first code
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

merge columns into one line after a specific pattern

Hi all, im a linux newbie, plz help! I have a file - box -------- Fox-2 -------- UF29 zip42 -------- zf-CW SNF2_N Heli_Z -------- Fox -------- Kel_1 box (3 Replies)
Discussion started by: sam_2921
3 Replies

2. Shell Programming and Scripting

Merge 2 columns/remove specific spaces

Hi, I have a requirement to remove certain spaces from a table of information, but I'm unsure where to start. A typical table will be like this: ABCDE 1 Elton John 25 12 15 9 3 ABCDE 2 Oasis 29 13 4 6 9 ABCDE 3 The Rolling Stones 55 19 3 8 6The goal is to remove only the spaces between... (11 Replies)
Discussion started by: danhodges99
11 Replies

3. Shell Programming and Scripting

sort split merge -u unique

Hi, this is about sorting a very large file (like 10 gb) to keep lines with unique entries across SOME of the columns. The line originally looked like this: sort -u -k2,2 -k3,3n -k4,4n -k5,5n -k6,6n file_unsorted > file_sorted please note the -u flag. The problem is that this single... (4 Replies)
Discussion started by: jbr950
4 Replies

4. Shell Programming and Scripting

How to merge columns into lines, using unique keys?

I would really appreciate a sulution for this : invoice# client# 5929 231 4358 231 2185 231 6234 231 1166 464 1264 464 3432 464 1720 464 9747 464 1133 791 4930 791 5496 791 6291 791 8681 989 3023 989 (2 Replies)
Discussion started by: hemo21
2 Replies

5. Shell Programming and Scripting

count the unique records based on certain columns

Hi everyone, I have a file result.txt with records as following and another file mirna.txt with a list of miRNAs e.g. miR22, miR123, miR13 etc. Gene Transcript miRNA Gar Nm_111233 miR22 Gar Nm_123440 miR22 Gar Nm_129939 miR22 Hel Nm_233900 miR13 Hel ... (6 Replies)
Discussion started by: miclow
6 Replies

6. Shell Programming and Scripting

Count frequency of unique values in specific column

Hi, I have tab-deliminated data similar to the following: dot is-big 2 dot is-round 3 dot is-gray 4 cat is-big 3 hot in-summer 5 I want to count the frequency of each individual "unique" value in the 1st column. Thus, the desired output would be as follows: dot 3 cat 1 hot 1 is... (5 Replies)
Discussion started by: owwow14
5 Replies

7. Shell Programming and Scripting

Merge specific columns of two files

Hello, I have two tab delimited text files. Both files have the same number of rows but not necessarily the same number of columns. The column headers look like, File 1: f0order CVorder Name f0 RI_9 E99 E199 E299 E399 E499 E599 E699 E799 E899 E999 File 2:... (9 Replies)
Discussion started by: LMHmedchem
9 Replies

8. Shell Programming and Scripting

How to merge two files with unique values matching.?

I have one script as below: #!/bin/ksh Outputfile1="/home/OutputFile1.xls" Outputfile2="/home/OutputFile2.xls" InputFile1="/home/InputFile1.sql" InputFile2="/home/InputFile2.sql" echo "Select hobby, class, subject, sports, rollNumber from Student_Table" >> InputFile1 echo "Select rollNumber... (3 Replies)
Discussion started by: Sharma331
3 Replies

9. UNIX for Beginners Questions & Answers

Merge 4 bim files by keeping only the overlapping variants (unique rs values )

Dear community, I am facing a problem and I kindly ask your help: I have 4 different data sets consisted from 3 different types of array. On each file, column 1 is chromosome position, column 2 is SNP id etc... Lets say I have the following (bim) datasets: x2014: 1 rs3094315... (4 Replies)
Discussion started by: fondan
4 Replies
JOIN(1) 						      General Commands Manual							   JOIN(1)

NAME
join - relational database operator SYNOPSIS
join [ options ] file1 file2 DESCRIPTION
Join forms, on the standard output, a join of the two relations specified by the lines of file1 and file2. If one of the file names is the standard input is used. File1 and file2 must be sorted in increasing ASCII collating sequence on the fields on which they are to be joined, normally the first in each line. There is one line in the output for each pair of lines in file1 and file2 that have identical join fields. The output line normally con- sists of the common field, then the rest of the line from file1, then the rest of the line from file2. Input fields are normally separated spaces or tabs; output fields by space. In this case, multiple separators count as one, and leading separators are discarded. The following options are recognized, with POSIX syntax. -a n In addition to the normal output, produce a line for each unpairable line in file n, where n is 1 or 2. -v n Like -a, omitting output for paired lines. -e s Replace empty output fields by string s. -1 m -2 m Join on the mth field of file1 or file2. -jn m Archaic equivalent for -n m. -ofields Each output line comprises the designated fields. The comma-separated field designators are either 0, meaning the join field, or have the form n.m, where n is a file number and m is a field number. Archaic usage allows separate arguments for field designators. -tc Use character c as the only separator (tab character) on input and output. Every appearance of c in a line is significant. EXAMPLES
sort /adm/users | join -t: -a 1 -e "" - bdays Add birthdays to password information, leaving unknown birthdays empty. The layout of is given in users(6); bdays contains sorted lines like tr : ' ' </adm/users | sort -k 3 3 >temp join -1 3 -2 3 -o 1.1,2.1 temp temp | awk '$1 < $2' Print all pairs of users with identical userids. SOURCE
/sys/src/cmd/join.c SEE ALSO
sort(1), comm(1), awk(1) BUGS
With default field separation, the collating sequence is that of sort -b -ky,y; with -t, the sequence is that of sort -tx -ky,y. One of the files must be randomly accessible. JOIN(1)
All times are GMT -4. The time now is 12:45 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy