Advanced: Sort, count data in column, append file name
Hi. I am not sure the title gives an optimal description of what I want to do. Also, I tried to post this in the "UNIX for Dummies Questions & Answers", but it seems no-one was able to help out.
I have several text files that contain data in many columns. All the files are organized the same way, but the data in the columns might differ. I want to count the number of times data occur in specific columns, sort the output and make a new file. However, I want check several files for the occurrence of the same data, count the number of times it occurs, append the file name to each one and make a new file sorted by the number of occurrences.
File 1:
Code:
xx xx xx aab rrt xx
xx xx xx ccd bbt xx
xx xx xx ggt iir xx
File 2:
Code:
xx xx xx ggt iir xx
xx xx xx ccd bbt xx
File 3:
Code:
xx xx xx aab rrt xx
xx xx xx ggt iir xx
First I made a modification to the files, individually (any better way?) to make the file name occur in the first column:
Code:
sed 's/^/File1\t/' file1.temp > 1.txt
This gives files with:
File1:
Code:
File1 xx xx xx aab rrt xx
File1 xx xx xx ccd bbt xx
File1 xx xx xx ggt iir xx
File2:
Code:
File2 xx xx xx ggt iir xx
File2 xx xx xx ccd bbt xx
File3:
Code:
File3 xx xx xx aab rrt xx
File3 xx xx xx ggt iir xx
Then I extracted the columns of interest and sorted them and made a new file:
Now, I want to count the number of times column 2 and column 3 are identical for every line and keep the first column information in the output file, separated by comma or similar. I want to result to be like this:
It would be good (but not a requirement) to have the last column in the final file to be sorted, lane1, lane2, lane3 etc. The lane* can also be separated by columns if that is easier.
$ cat file[123]
File1 xx xx xx aab rrt xx
File1 xx xx xx ccd bbt xx
File1 xx xx xx ggt iir xx
File2 xx xx xx ggt iir xx
File2 xx xx xx ccd bbt xx
File3 xx xx xx aab rrt xx
File3 xx xx xx ggt iir xx
$
$ perl -lane '$x{"$F[4] $F[5]"} .= "$F[0],"; END{for(keys %x){$x{$_}=~s/,$//;print "$_ $x{$_}"}}' file1 file2 file3
ggt iir File1,File2,File3
ccd bbt File1,File2
aab rrt File1,File3
$
This User Gave Thanks to balajesuri For This Post:
But, the perl code is more impressive. From this expercise, being a biologist trying to do some simple bioinformatics, I really want to learn more Unix/script/shell programming. Wow, so powerful.
Hi guys,
I have problem to append new data at the end of each line of the files where it takes whole value of the nth column. My expected result i just want to take a specific value only. This new data is based on substring of 11th, 12th 13th column that has comma seperated value.
My code:
awk... (4 Replies)
Hi guys,
I need to append new data at the end of each line of the files. This new data is based on substring (3rd fields) of last column.
Input file xxx.csv:
U1234|1-5X|orange|1-5X|Act|1-5X|0.1 /sac/orange 12345 0
U5678|1-7X|grape|1-7X|Act|1-7X|0.1 /sac/grape 5678 0... (5 Replies)
Hi,
I have a data like
Input:
12||34|56|78
Output:
XYZ|12||34|56|78
I tried like this , but it puts it on another line
awk -F "|" ' BEGIN {"XYZ"} {print $0} 'file
Any quick suggessitons in sed/awk ? am using HP-UX (3 Replies)
hI
I have file A
NSU30504 5 6 G 6
NSU3050B T 7 9 J
NSU30506 T I 8 9
NSU3050C H J K L Output:
NSU3050B T 7 9 J
NSU3050C H J K L
NSU30504 5 6 G 6
NSU30506 T I 8 9Video tutorial on how to use code tags in The UNIX and Linux Forums. (13 Replies)
cat file1.txt
field1 "user1":
field2:"data-cde"
field3:"data-pqr"
field4:"data-mno"
field1 "user1":
field2:"data-dcb"
field3:"data-mxz"
field4:"data-zul"
field1 "user2":
field2:"data-cqz"
field3:"data-xoq"
field4:"data-pos"
Now i need to have the date like below.
i have just... (7 Replies)
Hello all -
I am to this forum and fairly new in learning unix and finding some difficulty in preparing a small shell script. I am trying to make script to sort all the files given by user as input (either the exact full name of the file or say the files matching the criteria like all files... (3 Replies)
Hi,
The below is the content of the file.
008.03.50.21|ID4|0015a3f01cf3
008.04.20.16|ID3|0015a3f02337
008.04.20.17|ID4_1xVoice|00131180d80e
008.04.20.03|ID3_1xVoice|0015a3694125
008.04.30.05|ID3_1xVoice|0015a3f038af
008.06.30.17|ID3_1xVoice|00159660d454... (2 Replies)