I have a huge file which has 450G. Its tab-delimited format is as below
Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is from 600000 to 30000000. I wrote the following perl script but it doesn't work:
I guess the input file and output file are both too big that my script can't handle it.
Anyone knows if there is any good way to do it? Perl or Shell scripts are preferred..
All your help will be appreciated!
Last edited by Franklin52; 03-13-2010 at 01:47 PM..
Reason: Please indent your code and use code tags!!
Hi
May I know is there a way to read/copy a mainframe (IBM OS/390) dataset (sequential file) into a UNIX directory?
Thank you for your time.
IcyGuava (4 Replies)
Hi,
I have a huge file of bibliographic records in some standard format.I need a script to do some repeatable task as follows:
1. Needs to create folders as the strings starts with "item_*" from the input file
2. Create a file "contents" in each folders having "license.txt(tab... (5 Replies)
Hello All,
I need some assistance to extract a piece of information from a huge file.
The file is like this one :
database information
ccccccccccccccccc
ccccccccccccccccc
ccccccccccccccccc
ccccccccccccccccc
os information
cccccccccccccccccc
cccccccccccccccccc... (2 Replies)
Hello everyone,
i have to normalize this dataset (with 20.000 rows):
2,4,4,3,2,7,8,2,9,11,7,7,1,8,5,6
4,7,5,5,5,5,9,6,4,8,7,9,2,9,7,10
7,10,8,7,4,8,8,5,10,11,2,8,2,5,5,10
4,9,5,7,4,7,7,13,1,7,6,8,3,8,0,8,8
6,7,8,5,4,7,6,3,7,10,7,9,3,8,3,7,8
in this form:... (1 Reply)
I am looking for an opensource dataset library for C. Something equivalent to ADO.Net.
Specifically, I am looking for the following features:
1. Create a Dataset from a file (XML or CSV).
2. Create a Dataset from a select query using an ODBC connection.
3. Load a created Dataset into a... (1 Reply)
Hi All,
I want to write a script to create flar images on multiple servers. In non zfs filesystem I am using -X option to refer a file to exclude mounts on different servers.
but on ZFS -X option is not working. I want multiple mounts to be ignore on ZFS base system during flarecreate.
I... (0 Replies)
Hello. I was wondering if anyone could help. I have a file containing a large table in the format:
marker1 marker2 marker3 marker4
position1 position2 position3 position4
genotype1 genotype2 genotype3 genotype4
with marker being a name, position a numeric... (2 Replies)
I have a huge list of files (about 300,000) which have a pattern like this.
.I 1
.U
87049087
.S
Am J Emerg
.M
Allied Health Personnel/*; Electric Countershock/*;
.T
Refibrillation managed by EMT-Ds:
.P
ARTICLE.
.W
Some patients converted from ventricular fibrillation to organized... (1 Reply)
Hi Guys,
Is there a way to export a sas file i.e .sas7bdat file to .csv file with header and data using unix. I dont want to use SAS program instead using unix tool or unix scripting is it possible ? (25 Replies)
Discussion started by: Master_Mind
25 Replies
LEARN ABOUT DEBIAN
svm-subset
svm-subset(1) User Manuals svm-subset(1)NAME
svm-subset - a subset selection tool for LIBSVM
SYNOPSIS
svm-subset [ -s method ] dataset number [ output1 ] [ output2 ]
DESCRIPTION
Training large data is time consuming. Sometimes one should work on a smaller subset first. The python script subset.py randomly selects a
specified number of samples. For classification data, we provide a stratified selection to ensure the same class distribution in the sub-
set.
OPTIONS -s method
0 -- stratified selection (classification only) (default)
1 -- random selection
output1
The subset. If output1 is omitted, the subset will be printed on the screen.
output2
The rest of data.
FILES
See svm-train(1) for the format of dataset
EXAMPLES
svm-subset heart_scale 100 file1 file2
From heart_scale 100 samples are randomly selected and stored in file1. All remaining instances are stored in file2.
BUGS
Please report bugs to the Debian BTS.
AUTHOR
Chih-Chung Chang, Chih-Jen Lin <cjlin@csie.ntu.edu.tw>, Chen-Tse Tsai <ctse.tsai@gmail.com> (packaging)
SEE ALSO svm-train(1), svm-predict(1)Linux DEC 2009 svm-subset(1)