10-22-2008
The OP wants all the 1's in a single file, 2's in a single file possibly with all 3's in the same file as well.
The problem is you have to know the split count as well as the complete key list and count of unique keys and how to group them before you attempt a split. I would create a list of unique key fields, divide the count by 3 and let any extras fall into the last split.
The problem with this is that you can get splits of enormously different sizes depending on how skewed the distribution of keys is in the data file. It defeats splitting altogether - IMO. And what happens when you ask for more splits than there are keys?
The only thing that that makes sense to me is a one-to-one split - one distinct key per file or leave everything in one big file.
9 More Discussions You Might Find Interesting
1. UNIX for Advanced & Expert Users
Hi all,
Pls. let me know whether there is any concept called "FILE SETS" in unix?
Because, I am using ETL tool DataStage which creates FILE SETS.
While I am able to view the data of such a file set in the tool, the "cat" command on this FILESET lists only the Metadata and not the data content... (2 Replies)
Discussion started by: Aparna_A
2 Replies
2. AIX
hello, we are implementing ip security on several of our aix 5.2-09 boxes and i am unable to locate the prerequisite file sets. does anyone know where i can find these? i have the original 5.2 cd's but these file sets are not on any of the cd's. Any thoughts or suggestions? (3 Replies)
Discussion started by: zuessh
3 Replies
3. Virtualization and Cloud Computing
timbass
Sat, 28 Jul 2007 10:07:53 +0000
Originally posted in Yahoo! CEP-Interest
Here is my follow-up note on posets (partially ordered sets) and tosets (totally or linearly ordered sets) as background set theory for event processing, and in particular CEP and ESP.
In my last note, we... (0 Replies)
Discussion started by: Linux Bot
0 Replies
4. Shell Programming and Scripting
I have an archive file that holds a batch of statements. I would like to be able to extract a certain statement based on the unique customer # (ie. 123456). The end for each statement is noted by "ENDSTM".
I can find the line number for the beginning of the statement section with sed.
... (5 Replies)
Discussion started by: andrewsc
5 Replies
5. Shell Programming and Scripting
Hi, this is about sorting a very large file (like 10 gb) to keep lines with unique entries across SOME of the columns.
The line originally looked like this:
sort -u -k2,2 -k3,3n -k4,4n -k5,5n -k6,6n file_unsorted > file_sorted
please note the -u flag.
The problem is that this single... (4 Replies)
Discussion started by: jbr950
4 Replies
6. Shell Programming and Scripting
I have 84 files with the following names splitseqs.1, spliseqs.2 etc.
and I want to change the .number to a unique filename.
E.g.
change splitseqs.1 into splitseqs.7114_1#24
and
change spliseqs.2 into splitseqs.7067_2#4
So all the current file names are unique, so are the new file names.... (1 Reply)
Discussion started by: avonm
1 Replies
7. Shell Programming and Scripting
Hello,
I have a database of name variants with the following structure:
variant=variant=variant
The number of variants can be as many as thirty to forty.
Since the database is quite large (at present around 60,000 lines) duplicate sets of variants creep in. Thus
John=Johann=Jon
and... (2 Replies)
Discussion started by: gimley
2 Replies
8. UNIX for Beginners Questions & Answers
Dear Users,
Appreciate your help if you could help me with splitting a large file > 1 million lines with sed or awk. below is the text in the file
input file.txt
scaffold1 928 929 C/T +
scaffold1 942 943 G/C +
scaffold1 959 960 C/T +... (6 Replies)
Discussion started by: kapr0001
6 Replies
9. UNIX for Beginners Questions & Answers
I have requirement to split below file (sample.csv) into multiple files by using the unique columns (first 3 are unique columns)
sample.csv
123|22|56789|ABCDEF|12AB34|2019-07-10|2019-07-10|443.3400|1|1
123|12|5679|BCDEFG|34CD56|2019-07-10|2019-07-10|896.7200|1|2... (3 Replies)
Discussion started by: RVSP
3 Replies
LEARN ABOUT DEBIAN
mpb-split
MPB(1) MIT Photonic-Bands Package MPB(1)
NAME
mpb-split - compute eigenmodes with MPB using multiple processes
SYNOPSIS
mpb-split NUM-SPLIT [DEFINITION]... [CTLFILE]...
DESCRIPTION
mpb-split is a parallelizing front-end to MIT Photonic Bands (MPB). For a computation with several k points, it splits the list of k
points over multiple processes. Of course, this will only benefit you on a system where different processes will run on different proces-
sors, such as an SMP or a cluster with automatic process migration (e.g. MOSIX). mpb-split is actually a trivial shell script, though, so
you can easily modify it if you need to use a special command to launch processes on other processors/machines.
MIT Photonic Bands (MPB) is a free program to compute the band structures (dispersion relations) and electromagnetic modes of periodic
dielectric structures, and is applicable both to photonic crystals (photonic band-gap materials) and a wide range of other optical prob-
lems.
More information on MPB, including a detailed manual, can be found online at the MPB home page: http://ab-initio.mit.edu/mpb/
A typical invocation of mpb-split looks like:
mpb-split num-split foo.ctl >& foo.out
This causes mpb-split to process the control file foo.ctl, divide the k points into num-split equal chunks, run each list in a separate
process with MPB, and redirect the output (in order) to foo.out. (One typically redirects output to a file, as the output is verbose and
contains a number of comma-delimited datasets that one can extract by grepping.)
Overall, the behavior and arguments are the same as for mpb except that the first argument must be the integer num-split.
What mpb-split technically does is to set the MPB variable k-split-num to num-split and k-split-index to the index (starting with 0) of the
chunk for each process. If you want, you can use these variables to divide the problem in some other way and then reset them to 1 and 0,
respectively.
BUGS
Send bug reports to S. G. Johnson, stevenj@alum.mit.edu.
AUTHORS
Written by Steven G. Johnson. Copyright (c) 1999, 2000, 2001, 2002 by the Massachusetts Institute of Technology.
SEE ALSO
mpb(1), mpb-data(1)
MPB
March 13, 2002 MPB(1)