Sponsored Content
Top Forums Shell Programming and Scripting Split a file in more files based on score content Post 302875317 by paolo.kunder on Tuesday 19th of November 2013 07:58:10 AM
Old 11-19-2013
Split a file in more files based on score content

Dear All,
I have the following file tabulated:
Code:
ID	distanceTSS	score
8434	571269	10
10122	393912	9
7652	6	10
4863	1451	9
8419	39	2
9363	564	21
9333	7714	22
9638	8334	9
1638	1231	11
10701	918	1000
6587	32056	111

What I would like to do is the following, create 100 new files based on content of the second column,
The first file should contain all the lines with a distance between 0 and 1000,
the second between 1000 and 2000, and so on untile 99000 and 10000,

Finally for each new file I would like to calculate the median of the third column(score)

Is there a rapid way to do so? I tried witha perl script but it seems really slow,

thanks for your help,
Paolo

Last edited by paolo.kunder; 11-19-2013 at 09:03 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split a file into multiple files based on the input pattern

I have a file with lines something like. ...... 123_start ...... ....... 123_end .... ..... 456_start ...... ..... 456_end .... ..... 789_start .... .... 789_end (6 Replies)
Discussion started by: abinash
6 Replies

2. Shell Programming and Scripting

split XML file into multiple files based on pattern

Hello, I am using awk to split a file into multiple files using command: nawk '{ if ( $1 == "<process" ) { n=split($2, arr, "\""); file=arr } print > file }' processes.xml <process name="Process1.process"> ... (3 Replies)
Discussion started by: chiru_h
3 Replies

3. Shell Programming and Scripting

Split the file based on the content

Arun kumar something somehting Enterting in to the line . . . . Some text text Finshing the sentence Some other text . . . . Again something somehting Enterting in to the line . . . . . . Again text text Finshing the sentence (6 Replies)
Discussion started by: arukuku
6 Replies

4. Shell Programming and Scripting

Split a file into multiple files based on field value

Hi, I've one requirement. I have to split one comma delimited file into multiple files based on one of the column values. How can I achieve this Unix Here is the sample data. In this case I have split the files based on date column(c4) Input file c1,c2,c3,c4,c5... (1 Reply)
Discussion started by: manasvi24
1 Replies

5. Shell Programming and Scripting

How to split file into multiple files using awk based on 1 field in the file?

Good day all I need some helps, say that I have data like below, each field separated by a tab DATE NAME ADDRESS 15/7/2012 LX a.b.c 15/7/2012 LX1 a.b.c 16/7/2012 AB a.b.c 16/7/2012 AB2 a.b.c 15/7/2012 LX2 a.b.c... (2 Replies)
Discussion started by: alexyyw
2 Replies

6. UNIX for Dummies Questions & Answers

Split a huge 7 GB File Based on Pattern into 4 files

Hi, I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each. Please help me as Split command cannot work here as it might miss tags.. Format of the file is as below <!--###### ###### START-->... (6 Replies)
Discussion started by: KishM
6 Replies

7. Shell Programming and Scripting

Split a single file into multiple files based on a value.

Hi All, I have the sales_data.csv file in the directory as below. SDDCCR; SOM ; MD6546474777 ;05-JAN-16 ABC ; KIRAN ; CB789 ;04-JAN-16 ABC ; RAMANA; KS566767477747 ;06-JAN-16 ABC ; KAMESH; A33535335 ;04-JAN-16 SDDCCR; DINESH; GD6674474747 ;08-JAN-16... (4 Replies)
Discussion started by: ROCK_PLSQL
4 Replies

8. Shell Programming and Scripting

List the files after sorting based on file content

Hi, I have two pipe separated files as below: head -3 file1.txt "HD"|"Nov 11 2016 4:08AM"|"0000000018" "DT"|"240350264"|"56432" "DT"|"240350264"|"56432" head -3 file2.txt "HD"|"Nov 15 2016 2:18AM"|"0000000019" "DT"|"240350264"|"56432" "DT"|"240350264"|"56432" I want to list the... (6 Replies)
Discussion started by: Prasannag87
6 Replies

9. Shell Programming and Scripting

Split content based on keywords

I need to split the file contents with multiple rows based on patterns Sample: Input: ABC101testXYZ102UKMNO1092testing ABC999testKMNValid Output: ABC101test XYZ102U KMN1092testing ABC999test KMNValid In this ABC , XYZ and KMN are patterns Continue here./mod] Please read forum... (1 Reply)
Discussion started by: Jairaj
1 Replies

10. UNIX for Beginners Questions & Answers

Split content based on keywords

I need to split the file contents with multiple rows based on patterns Sample: Input: ABC101testXYZ102UKMNO1092testing ABC999testKMNValid Output: ABC101test XYZ102U KMN1092testing ABC999test KMNValid In this ABC , XYZ and KMN are patterns (6 Replies)
Discussion started by: Jairaj
6 Replies
LASTAL(1)							   User Commands							 LASTAL(1)

NAME
lastal - genome-scale comparison of biological sequences SYNOPSIS
lastal [options] lastdb-name fasta-sequence-file(s) DESCRIPTION
Find local sequence alignments. Score options (default settings): -r: match score (DNA: 1, protein: blosum62, 0<Q<5: 6) -q: mismatch cost (DNA: 1, protein: blosum62, 0<Q<5: 18) -p: file for residue pair scores -a: gap existence cost (DNA: 7, protein: 11, 0<Q<5: 21) -b: gap extension cost (DNA: 1, pro- tein: 2, 0<Q<5: 9) -c: unaligned residue pair cost (100000) -F: frameshift cost (off) -x: maximum score drop for gapped alignments (max[y, a+b*20]) -y: maximum score drop for gapless alignments (t*10) -z: maximum score drop for final gapped alignments (x) -d: minimum score for gapless alignments (e*3/5) -e: minimum score for gapped alignments (DNA: 40, protein: 100, 0<Q<5: 180) Cosmetic options (default settings): -h: show all options and their default settings -v: be verbose: write messages about what lastal is doing -o: output file -f: output format: 0=tabular, 1=maf (1) Miscellaneous options (default settings): -s: strand: 0=reverse, 1=forward, 2=both (2 for DNA, 1 for protein) -m: maximum multiplicity for initial matches (10) -l: minimum length for initial matches (1) -n: maximum number of gapless alignments per query position (infinity) -k: step-size along the query sequence (1) -i: query batch size (1 MiB if Q>0, else 16 MiB if j=0, else 128 MiB) -u: mask lowercase during extensions: 0=never, 1=gapless, 2=gapless+gapped but not final, 3=always (2 if lastdb -c and Q<5, else 0) -w: supress repeats inside exact matches, offset by this distance or less (1000) -G: genetic code file -t: 'temperature' for calculating probabilities (1/lambda) -g: 'gamma' parameter for gamma-centroid and LAMA (1) -j: output type: 0=match counts, 1=gapless, 2=redundant gapped, 3=gapped, 4=column ambiguity estimates, 5=gamma-centroid, 6=LAMA (3) -Q: input format: 0=fasta, 1=fastq-sanger, 2=fastq-solexa, 3=fastq-illumina, 4=prb, 5=PSSM (0) REPORTING BUGS
Report bugs to: last (ATmark) cbrc (dot) jp LAST home page: http://last.cbrc.jp/ lastal 199 May 2012 LASTAL(1)
All times are GMT -4. The time now is 07:52 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy