Dear All,
I have the following file tabulated:
What I would like to do is the following, create 100 new files based on content of the second column,
The first file should contain all the lines with a distance between 0 and 1000,
the second between 1000 and 2000, and so on untile 99000 and 10000,
Finally for each new file I would like to calculate the median of the third column(score)
Is there a rapid way to do so? I tried witha perl script but it seems really slow,
thanks for your help,
Paolo
Last edited by paolo.kunder; 11-19-2013 at 09:03 AM..
Hello, I am using awk to split a file into multiple files using command:
nawk '{
if ( $1 == "<process" )
{
n=split($2, arr, "\"");
file=arr
}
print > file }' processes.xml
<process name="Process1.process">
... (3 Replies)
Arun kumar something somehting Enterting in to the line
.
.
.
.
Some text text Finshing the sentence
Some other text
.
.
.
.
Again something somehting Enterting in to the line
.
.
.
.
.
.
Again text text Finshing the sentence (6 Replies)
Hi,
I've one requirement. I have to split one comma delimited file into multiple files based on one of the column values.
How can I achieve this Unix
Here is the sample data. In this case I have split the files based on date column(c4)
Input file
c1,c2,c3,c4,c5... (1 Reply)
Good day all
I need some helps,
say that I have data like below, each field separated by a tab
DATE NAME ADDRESS
15/7/2012 LX a.b.c
15/7/2012 LX1 a.b.c
16/7/2012 AB a.b.c
16/7/2012 AB2 a.b.c
15/7/2012 LX2 a.b.c... (2 Replies)
Hi,
I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each.
Please help me as Split command cannot work here as it might miss tags..
Format of the file is as below
<!--###### ###### START-->... (6 Replies)
Hi All,
I have the sales_data.csv file in the directory as below.
SDDCCR; SOM ; MD6546474777 ;05-JAN-16
ABC ; KIRAN ; CB789 ;04-JAN-16
ABC ; RAMANA; KS566767477747 ;06-JAN-16
ABC ; KAMESH; A33535335 ;04-JAN-16
SDDCCR; DINESH; GD6674474747 ;08-JAN-16... (4 Replies)
Hi,
I have two pipe separated files as below:
head -3 file1.txt
"HD"|"Nov 11 2016 4:08AM"|"0000000018"
"DT"|"240350264"|"56432"
"DT"|"240350264"|"56432"
head -3 file2.txt
"HD"|"Nov 15 2016 2:18AM"|"0000000019"
"DT"|"240350264"|"56432"
"DT"|"240350264"|"56432"
I want to list the... (6 Replies)
I need to split the file contents with multiple rows based on patterns
Sample:
Input:
ABC101testXYZ102UKMNO1092testing
ABC999testKMNValid
Output:
ABC101test
XYZ102U
KMN1092testing
ABC999test
KMNValid
In this ABC , XYZ and KMN are patterns
Continue here./mod]
Please read forum... (1 Reply)
I need to split the file contents with multiple rows based on patterns
Sample:
Input:
ABC101testXYZ102UKMNO1092testing
ABC999testKMNValid
Output:
ABC101test
XYZ102U
KMN1092testing
ABC999test
KMNValid
In this ABC , XYZ and KMN are patterns (6 Replies)
Discussion started by: Jairaj
6 Replies
LEARN ABOUT DEBIAN
dotur
Dotur(1) General Commands Manual Dotur(1)NAME
dotur - A program for calculating descriptive statistics for sequence libraries
SYNOPSIS
dotur [-i Iterations (<1000>)] [-c ClusterMethod (<f>, n, a)] [-p Precision (10, <100>, 1000, 10000)] [-l] [-j] <file>
OPTIONS
-i: Number of iterations (default = 1000)
-c: Clustering method - (f) furthest neighbor, (n) nearest neighbor, (a) average neighbor (default = f)
-p: Precision of distances for output, increasing can dramatically lengthen execution times - 10, 100, 1000, 10000 (default = 100)
-l: Input file is lower triangular (default = square matrix)
-r: Calculates rarefaction curves for each parameter, can dramatically lengthen execution times. Simple rarefaction curve always calcu-
lated.
-stop: Stops clustering when cutoff has been reached.
-wrep: Samples with replacement.
-jumble:
Jumble the order of the distance matrix.
-sim: Converts similarity score to distance (D=1-S).
DESCRIPTION
This is the Debian GNU/Linux version of dotur.
DOTUR is a computer program that takes a distance matrix describing the genetic distance between DNA sequence data and assigns sequences to
operational taxonomic units (OTUs) using either the furthest, average, or nearest neighbor algorithms for all possible distances that can
be described using the distance matrix. Using the OTU composition data, DOTUR constructs collector's and rarefaction curves for sampling
intensity, richness estimators, and diversity indices.
AUTHOR
This manual page was compiled from the package description and the output of help2man by Thorsten Alteholz <debian@alteholz.de>, for the
Debian GNU/Linux system (but may be used by others).
DOTUR June 03, 2011 Dotur(1)