What I would like to do is the following, create 100 new files based on content of the second column,
The first file should contain all the lines with a distance between 0 and 1000,
the second between 1000 and 2000, and so on untile 99000 and 10000,
Finally for each new file I would like to calculate the median of the third column(score)
Is there a rapid way to do so? I tried witha perl script but it seems really slow,
thanks for your help,
Paolo
Last edited by paolo.kunder; 11-19-2013 at 09:03 AM..
Hello, I am using awk to split a file into multiple files using command:
nawk '{
if ( $1 == "<process" )
{
n=split($2, arr, "\"");
file=arr
}
print > file }' processes.xml
<process name="Process1.process">
... (3 Replies)
Arun kumar something somehting Enterting in to the line
.
.
.
.
Some text text Finshing the sentence
Some other text
.
.
.
.
Again something somehting Enterting in to the line
.
.
.
.
.
.
Again text text Finshing the sentence (6 Replies)
Hi,
I've one requirement. I have to split one comma delimited file into multiple files based on one of the column values.
How can I achieve this Unix
Here is the sample data. In this case I have split the files based on date column(c4)
Input file
c1,c2,c3,c4,c5... (1 Reply)
Good day all
I need some helps,
say that I have data like below, each field separated by a tab
DATE NAME ADDRESS
15/7/2012 LX a.b.c
15/7/2012 LX1 a.b.c
16/7/2012 AB a.b.c
16/7/2012 AB2 a.b.c
15/7/2012 LX2 a.b.c... (2 Replies)
Hi,
I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each.
Please help me as Split command cannot work here as it might miss tags..
Format of the file is as below
<!--###### ###### START-->... (6 Replies)
Hi All,
I have the sales_data.csv file in the directory as below.
SDDCCR; SOM ; MD6546474777 ;05-JAN-16
ABC ; KIRAN ; CB789 ;04-JAN-16
ABC ; RAMANA; KS566767477747 ;06-JAN-16
ABC ; KAMESH; A33535335 ;04-JAN-16
SDDCCR; DINESH; GD6674474747 ;08-JAN-16... (4 Replies)
Hi,
I have two pipe separated files as below:
head -3 file1.txt
"HD"|"Nov 11 2016 4:08AM"|"0000000018"
"DT"|"240350264"|"56432"
"DT"|"240350264"|"56432"
head -3 file2.txt
"HD"|"Nov 15 2016 2:18AM"|"0000000019"
"DT"|"240350264"|"56432"
"DT"|"240350264"|"56432"
I want to list the... (6 Replies)
I need to split the file contents with multiple rows based on patterns
Sample:
Input:
ABC101testXYZ102UKMNO1092testing
ABC999testKMNValid
Output:
ABC101test
XYZ102U
KMN1092testing
ABC999test
KMNValid
In this ABC , XYZ and KMN are patterns
Continue here./mod]
Please read forum... (1 Reply)
I need to split the file contents with multiple rows based on patterns
Sample:
Input:
ABC101testXYZ102UKMNO1092testing
ABC999testKMNValid
Output:
ABC101test
XYZ102U
KMN1092testing
ABC999test
KMNValid
In this ABC , XYZ and KMN are patterns (6 Replies)
Discussion started by: Jairaj
6 Replies
LEARN ABOUT DEBIAN
lastal
LASTAL(1) User Commands LASTAL(1)NAME
lastal - genome-scale comparison of biological sequences
SYNOPSIS
lastal [options] lastdb-name fasta-sequence-file(s)
DESCRIPTION
Find local sequence alignments.
Score options (default settings): -r: match score (DNA: 1, protein: blosum62, 0<Q<5: 6) -q: mismatch cost (DNA: 1, protein: blosum62,
0<Q<5: 18) -p: file for residue pair scores -a: gap existence cost (DNA: 7, protein: 11, 0<Q<5: 21) -b: gap extension cost (DNA: 1, pro-
tein: 2, 0<Q<5: 9) -c: unaligned residue pair cost (100000) -F: frameshift cost (off) -x: maximum score drop for gapped alignments
(max[y, a+b*20]) -y: maximum score drop for gapless alignments (t*10) -z: maximum score drop for final gapped alignments (x) -d: minimum
score for gapless alignments (e*3/5) -e: minimum score for gapped alignments (DNA: 40, protein: 100, 0<Q<5: 180)
Cosmetic options (default settings): -h: show all options and their default settings -v: be verbose: write messages about what lastal is
doing -o: output file -f: output format: 0=tabular, 1=maf (1)
Miscellaneous options (default settings): -s: strand: 0=reverse, 1=forward, 2=both (2 for DNA, 1 for protein) -m: maximum multiplicity for
initial matches (10) -l: minimum length for initial matches (1) -n: maximum number of gapless alignments per query position (infinity) -k:
step-size along the query sequence (1) -i: query batch size (1 MiB if Q>0, else 16 MiB if j=0, else 128 MiB) -u: mask lowercase during
extensions: 0=never, 1=gapless,
2=gapless+gapped but not final, 3=always (2 if lastdb -c and Q<5, else 0)
-w: supress repeats inside exact matches, offset by this distance or less (1000) -G: genetic code file -t: 'temperature' for calculating
probabilities (1/lambda) -g: 'gamma' parameter for gamma-centroid and LAMA (1) -j: output type: 0=match counts, 1=gapless, 2=redundant
gapped, 3=gapped,
4=column ambiguity estimates, 5=gamma-centroid, 6=LAMA (3)
-Q: input format: 0=fasta, 1=fastq-sanger, 2=fastq-solexa, 3=fastq-illumina,
4=prb, 5=PSSM (0)
REPORTING BUGS
Report bugs to: last (ATmark) cbrc (dot) jp
LAST home page: http://last.cbrc.jp/
lastal 199 May 2012 LASTAL(1)