Sponsored Content
Top Forums UNIX for Dummies Questions & Answers split a file with unique sets Post 302250095 by jim mcnamara on Wednesday 22nd of October 2008 05:23:38 PM
Old 10-22-2008
The OP wants all the 1's in a single file, 2's in a single file possibly with all 3's in the same file as well.

The problem is you have to know the split count as well as the complete key list and count of unique keys and how to group them before you attempt a split. I would create a list of unique key fields, divide the count by 3 and let any extras fall into the last split.

The problem with this is that you can get splits of enormously different sizes depending on how skewed the distribution of keys is in the data file. It defeats splitting altogether - IMO. And what happens when you ask for more splits than there are keys?

The only thing that that makes sense to me is a one-to-one split - one distinct key per file or leave everything in one big file.
 

9 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

FILE SETS in unix

Hi all, Pls. let me know whether there is any concept called "FILE SETS" in unix? Because, I am using ETL tool DataStage which creates FILE SETS. While I am able to view the data of such a file set in the tool, the "cat" command on this FILESET lists only the Metadata and not the data content... (2 Replies)
Discussion started by: Aparna_A
2 Replies

2. AIX

IP Security file sets

hello, we are implementing ip security on several of our aix 5.2-09 boxes and i am unable to locate the prerequisite file sets. does anyone know where i can find these? i have the original 5.2 cd's but these file sets are not on any of the cd's. Any thoughts or suggestions? (3 Replies)
Discussion started by: zuessh
3 Replies

3. Virtualization and Cloud Computing

Clouds (Partially Order Sets) - Streams (Linearly Ordered Sets) - Part 2

timbass Sat, 28 Jul 2007 10:07:53 +0000 Originally posted in Yahoo! CEP-Interest Here is my follow-up note on posets (partially ordered sets) and tosets (totally or linearly ordered sets) as background set theory for event processing, and in particular CEP and ESP. In my last note, we... (0 Replies)
Discussion started by: Linux Bot
0 Replies

4. Shell Programming and Scripting

get part of file with unique & non-unique string

I have an archive file that holds a batch of statements. I would like to be able to extract a certain statement based on the unique customer # (ie. 123456). The end for each statement is noted by "ENDSTM". I can find the line number for the beginning of the statement section with sed. ... (5 Replies)
Discussion started by: andrewsc
5 Replies

5. Shell Programming and Scripting

sort split merge -u unique

Hi, this is about sorting a very large file (like 10 gb) to keep lines with unique entries across SOME of the columns. The line originally looked like this: sort -u -k2,2 -k3,3n -k4,4n -k5,5n -k6,6n file_unsorted > file_sorted please note the -u flag. The problem is that this single... (4 Replies)
Discussion started by: jbr950
4 Replies

6. Shell Programming and Scripting

Change unique file names into new unique filenames

I have 84 files with the following names splitseqs.1, spliseqs.2 etc. and I want to change the .number to a unique filename. E.g. change splitseqs.1 into splitseqs.7114_1#24 and change spliseqs.2 into splitseqs.7067_2#4 So all the current file names are unique, so are the new file names.... (1 Reply)
Discussion started by: avonm
1 Replies

7. Shell Programming and Scripting

Identifying dupes within a database and creating unique sub-sets

Hello, I have a database of name variants with the following structure: variant=variant=variant The number of variants can be as many as thirty to forty. Since the database is quite large (at present around 60,000 lines) duplicate sets of variants creep in. Thus John=Johann=Jon and... (2 Replies)
Discussion started by: gimley
2 Replies

8. UNIX for Beginners Questions & Answers

sed awk: split a large file to unique file names

Dear Users, Appreciate your help if you could help me with splitting a large file > 1 million lines with sed or awk. below is the text in the file input file.txt scaffold1 928 929 C/T + scaffold1 942 943 G/C + scaffold1 959 960 C/T +... (6 Replies)
Discussion started by: kapr0001
6 Replies

9. UNIX for Beginners Questions & Answers

Split into multiple files by using Unique columns in a UNIX file

I have requirement to split below file (sample.csv) into multiple files by using the unique columns (first 3 are unique columns) sample.csv 123|22|56789|ABCDEF|12AB34|2019-07-10|2019-07-10|443.3400|1|1 123|12|5679|BCDEFG|34CD56|2019-07-10|2019-07-10|896.7200|1|2... (3 Replies)
Discussion started by: RVSP
3 Replies
MPB(1)							    MIT Photonic-Bands Package							    MPB(1)

NAME
mpb-split - compute eigenmodes with MPB using multiple processes SYNOPSIS
mpb-split NUM-SPLIT [DEFINITION]... [CTLFILE]... DESCRIPTION
mpb-split is a parallelizing front-end to MIT Photonic Bands (MPB). For a computation with several k points, it splits the list of k points over multiple processes. Of course, this will only benefit you on a system where different processes will run on different proces- sors, such as an SMP or a cluster with automatic process migration (e.g. MOSIX). mpb-split is actually a trivial shell script, though, so you can easily modify it if you need to use a special command to launch processes on other processors/machines. MIT Photonic Bands (MPB) is a free program to compute the band structures (dispersion relations) and electromagnetic modes of periodic dielectric structures, and is applicable both to photonic crystals (photonic band-gap materials) and a wide range of other optical prob- lems. More information on MPB, including a detailed manual, can be found online at the MPB home page: http://ab-initio.mit.edu/mpb/ A typical invocation of mpb-split looks like: mpb-split num-split foo.ctl >& foo.out This causes mpb-split to process the control file foo.ctl, divide the k points into num-split equal chunks, run each list in a separate process with MPB, and redirect the output (in order) to foo.out. (One typically redirects output to a file, as the output is verbose and contains a number of comma-delimited datasets that one can extract by grepping.) Overall, the behavior and arguments are the same as for mpb except that the first argument must be the integer num-split. What mpb-split technically does is to set the MPB variable k-split-num to num-split and k-split-index to the index (starting with 0) of the chunk for each process. If you want, you can use these variables to divide the problem in some other way and then reset them to 1 and 0, respectively. BUGS
Send bug reports to S. G. Johnson, stevenj@alum.mit.edu. AUTHORS
Written by Steven G. Johnson. Copyright (c) 1999, 2000, 2001, 2002 by the Massachusetts Institute of Technology. SEE ALSO
mpb(1), mpb-data(1) MPB
March 13, 2002 MPB(1)
All times are GMT -4. The time now is 07:16 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy