Sponsored Content
Top Forums Shell Programming and Scripting Split a file in more files based on score content Post 302875317 by paolo.kunder on Tuesday 19th of November 2013 07:58:10 AM
Old 11-19-2013
Split a file in more files based on score content

Dear All,
I have the following file tabulated:
Code:
ID	distanceTSS	score
8434	571269	10
10122	393912	9
7652	6	10
4863	1451	9
8419	39	2
9363	564	21
9333	7714	22
9638	8334	9
1638	1231	11
10701	918	1000
6587	32056	111

What I would like to do is the following, create 100 new files based on content of the second column,
The first file should contain all the lines with a distance between 0 and 1000,
the second between 1000 and 2000, and so on untile 99000 and 10000,

Finally for each new file I would like to calculate the median of the third column(score)

Is there a rapid way to do so? I tried witha perl script but it seems really slow,

thanks for your help,
Paolo

Last edited by paolo.kunder; 11-19-2013 at 09:03 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split a file into multiple files based on the input pattern

I have a file with lines something like. ...... 123_start ...... ....... 123_end .... ..... 456_start ...... ..... 456_end .... ..... 789_start .... .... 789_end (6 Replies)
Discussion started by: abinash
6 Replies

2. Shell Programming and Scripting

split XML file into multiple files based on pattern

Hello, I am using awk to split a file into multiple files using command: nawk '{ if ( $1 == "<process" ) { n=split($2, arr, "\""); file=arr } print > file }' processes.xml <process name="Process1.process"> ... (3 Replies)
Discussion started by: chiru_h
3 Replies

3. Shell Programming and Scripting

Split the file based on the content

Arun kumar something somehting Enterting in to the line . . . . Some text text Finshing the sentence Some other text . . . . Again something somehting Enterting in to the line . . . . . . Again text text Finshing the sentence (6 Replies)
Discussion started by: arukuku
6 Replies

4. Shell Programming and Scripting

Split a file into multiple files based on field value

Hi, I've one requirement. I have to split one comma delimited file into multiple files based on one of the column values. How can I achieve this Unix Here is the sample data. In this case I have split the files based on date column(c4) Input file c1,c2,c3,c4,c5... (1 Reply)
Discussion started by: manasvi24
1 Replies

5. Shell Programming and Scripting

How to split file into multiple files using awk based on 1 field in the file?

Good day all I need some helps, say that I have data like below, each field separated by a tab DATE NAME ADDRESS 15/7/2012 LX a.b.c 15/7/2012 LX1 a.b.c 16/7/2012 AB a.b.c 16/7/2012 AB2 a.b.c 15/7/2012 LX2 a.b.c... (2 Replies)
Discussion started by: alexyyw
2 Replies

6. UNIX for Dummies Questions & Answers

Split a huge 7 GB File Based on Pattern into 4 files

Hi, I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each. Please help me as Split command cannot work here as it might miss tags.. Format of the file is as below <!--###### ###### START-->... (6 Replies)
Discussion started by: KishM
6 Replies

7. Shell Programming and Scripting

Split a single file into multiple files based on a value.

Hi All, I have the sales_data.csv file in the directory as below. SDDCCR; SOM ; MD6546474777 ;05-JAN-16 ABC ; KIRAN ; CB789 ;04-JAN-16 ABC ; RAMANA; KS566767477747 ;06-JAN-16 ABC ; KAMESH; A33535335 ;04-JAN-16 SDDCCR; DINESH; GD6674474747 ;08-JAN-16... (4 Replies)
Discussion started by: ROCK_PLSQL
4 Replies

8. Shell Programming and Scripting

List the files after sorting based on file content

Hi, I have two pipe separated files as below: head -3 file1.txt "HD"|"Nov 11 2016 4:08AM"|"0000000018" "DT"|"240350264"|"56432" "DT"|"240350264"|"56432" head -3 file2.txt "HD"|"Nov 15 2016 2:18AM"|"0000000019" "DT"|"240350264"|"56432" "DT"|"240350264"|"56432" I want to list the... (6 Replies)
Discussion started by: Prasannag87
6 Replies

9. Shell Programming and Scripting

Split content based on keywords

I need to split the file contents with multiple rows based on patterns Sample: Input: ABC101testXYZ102UKMNO1092testing ABC999testKMNValid Output: ABC101test XYZ102U KMN1092testing ABC999test KMNValid In this ABC , XYZ and KMN are patterns Continue here./mod] Please read forum... (1 Reply)
Discussion started by: Jairaj
1 Replies

10. UNIX for Beginners Questions & Answers

Split content based on keywords

I need to split the file contents with multiple rows based on patterns Sample: Input: ABC101testXYZ102UKMNO1092testing ABC999testKMNValid Output: ABC101test XYZ102U KMN1092testing ABC999test KMNValid In this ABC , XYZ and KMN are patterns (6 Replies)
Discussion started by: Jairaj
6 Replies
Dotur(1)						      General Commands Manual							  Dotur(1)

NAME
dotur - A program for calculating descriptive statistics for sequence libraries SYNOPSIS
dotur [-i Iterations (<1000>)] [-c ClusterMethod (<f>, n, a)] [-p Precision (10, <100>, 1000, 10000)] [-l] [-j] <file> OPTIONS
-i: Number of iterations (default = 1000) -c: Clustering method - (f) furthest neighbor, (n) nearest neighbor, (a) average neighbor (default = f) -p: Precision of distances for output, increasing can dramatically lengthen execution times - 10, 100, 1000, 10000 (default = 100) -l: Input file is lower triangular (default = square matrix) -r: Calculates rarefaction curves for each parameter, can dramatically lengthen execution times. Simple rarefaction curve always calcu- lated. -stop: Stops clustering when cutoff has been reached. -wrep: Samples with replacement. -jumble: Jumble the order of the distance matrix. -sim: Converts similarity score to distance (D=1-S). DESCRIPTION
This is the Debian GNU/Linux version of dotur. DOTUR is a computer program that takes a distance matrix describing the genetic distance between DNA sequence data and assigns sequences to operational taxonomic units (OTUs) using either the furthest, average, or nearest neighbor algorithms for all possible distances that can be described using the distance matrix. Using the OTU composition data, DOTUR constructs collector's and rarefaction curves for sampling intensity, richness estimators, and diversity indices. AUTHOR
This manual page was compiled from the package description and the output of help2man by Thorsten Alteholz <debian@alteholz.de>, for the Debian GNU/Linux system (but may be used by others). DOTUR
June 03, 2011 Dotur(1)
All times are GMT -4. The time now is 05:54 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy