The first thing that occurs to me is that the input is a columns-should-be-rows flavor, so turn it into proper tuples by making a row/line/tuple for each related. Then you can deal with it like a SQL RDBMS table. You can then cut, sort and uniq -c or while read the various columns to get ranks and statistics. For instance, assuming there are always 2 related and they are peers, here is a partial solution (forgot VALUE, discarded TOTFrequencyrelations f2 -- see if you can fix it):
This cuts out fields for related1 and then for related2 into a common stream and feeds it to a sort, creating a sorted tuple file on the stream. The while read loop counts the user and user+related counts for sort ordering, and the +1 p, -1 n and 0 z, spitting out counts when the key changes. A dummy input trailer makes the loop spit out the last set of values. Once sorted numerically, the counts for sorting are discarded.
Last edited by DGPickett; 06-19-2013 at 05:23 PM..
I have been trying to find a good solution for this seemingly simple task for 2 days, and I'm giving up and posting a thread. I hope someone can help me out!
I'm on HPUX, using sqlplus, mailx, awk, have some other tools available, but can't install stuff that isn't already in place (without a... (6 Replies)
Hi all,
Is there a way to convert full data matrix to linearised left data matrix?
e.g full data matrix
Bh1 Bh2 Bh3 Bh4 Bh5 Bh6 Bh7
Bh1 0 0.241058 0.236129 0.244397 0.237479 0.240767 0.245245
Bh2 0.241058 0 0.240594 0.241931 0.241975 ... (8 Replies)
Hi everyone
I am very new at awk but think that that might be the best strategy for this. I have a matrix very similar to a correlation matrix and in practical terms I need to convert it into a list containing the values from the matrix (one value per line) with the first field of the line (row... (5 Replies)
Hi Fellows,
I have been struggling to fix an issue in csv records to compose sql statements and have been really losing sleep over it. Here is the problem:
I have csv files in the following pipe-delimited format:
Column1|Column2|Column3|Column4|NEWLINE
Address Type|some descriptive... (4 Replies)
Hi
I have two csv files, with the following formats:
FileA.log:
Application, This occured blah
Application, That occured blah
Application, Also this
AnotherLog, Bob did this
AnotherLog, Dave did that
FileB.log:
Uk, London, Application, datetime, LaterDateTime, Today it had'nt... (8 Replies)
Greetings, salutations.
I have a 3 column csv file with ~13 million rows and I would like to generate a correlation matrix. Interestingly, you all previously provided a solution to the inverse of this problem. Thread title: "awk? adjacency matrix to adjacency list / correlation matrix to list"... (6 Replies)
Hi,
I have a file of csv data, which looks like this:
file1:
1AA,LGV_PONCEY_LES_ATHEE,1,\N,1,00020460E1,0,\N,\N,\N,\N,2,00.22335321,0.00466628
2BB,LES_POUGES_ASF,\N,200,200,00006298G1,0,\N,\N,\N,\N,1,00.30887539,0.00050312... (10 Replies)
Discussion started by: djoseph
10 Replies
LEARN ABOUT MINIX
sort
SORT(1) General Commands Manual SORT(1)NAME
sort - sort a file of ASCII lines
SYNOPSIS
sort [-bcdfimnru] [-tc] [-o name] [+pos1] [-pos2] file ...
OPTIONS -b Skip leading blanks when making comparisons
-c Check to see if a file is sorted
-d Dictionary order: ignore punctuation
-f Fold upper case onto lower case
-i Ignore nonASCII characters
-m Merge presorted files
-n Numeric sort order
-o Next argument is output file
-r Reverse the sort order
-t Following character is field separator
-u Unique mode (delete duplicate lines)
EXAMPLES
sort -nr file # Sort keys numerically, reversed
sort +2 -4 file # Sort using fields 2 and 3 as key
sort +2 -t: -o out # Field separator is :
sort +.3 -.6 # Characters 3 through 5 form the key
DESCRIPTION
Sort sorts one or more files. If no files are specified, stdin is sorted. Output is written on standard output, unless -o is specified.
The options +pos1 -pos2 use only fields pos1 up to but not including pos2 as the sort key, where a field is a string of characters delim-
ited by spaces and tabs, unless a different field delimiter is specified with -t. Both pos1 and pos2 have the form m.n where m tells the
number of fields and n tells the number of characters. Either m or n may be omitted.
SEE ALSO comm(1), grep(1), uniq(1).
SORT(1)