10-22-2008
split a file with unique sets
This may sound like a trivial problem, but I still need some help:
I have a file with ids and I want to split it 'n' ways (could be any number) into files:
1
1
1
2
2
3
3
4
5
5
Let's assume 'n' is 3, and we cannot have the same id in two different partitions. So the partitions may look like (1,1,1,), (2,2,3,3),(4,5,5).
Thanks guys,
- CB
9 More Discussions You Might Find Interesting
1. UNIX for Advanced & Expert Users
Hi all,
Pls. let me know whether there is any concept called "FILE SETS" in unix?
Because, I am using ETL tool DataStage which creates FILE SETS.
While I am able to view the data of such a file set in the tool, the "cat" command on this FILESET lists only the Metadata and not the data content... (2 Replies)
Discussion started by: Aparna_A
2 Replies
2. AIX
hello, we are implementing ip security on several of our aix 5.2-09 boxes and i am unable to locate the prerequisite file sets. does anyone know where i can find these? i have the original 5.2 cd's but these file sets are not on any of the cd's. Any thoughts or suggestions? (3 Replies)
Discussion started by: zuessh
3 Replies
3. Virtualization and Cloud Computing
timbass
Sat, 28 Jul 2007 10:07:53 +0000
Originally posted in Yahoo! CEP-Interest
Here is my follow-up note on posets (partially ordered sets) and tosets (totally or linearly ordered sets) as background set theory for event processing, and in particular CEP and ESP.
In my last note, we... (0 Replies)
Discussion started by: Linux Bot
0 Replies
4. Shell Programming and Scripting
I have an archive file that holds a batch of statements. I would like to be able to extract a certain statement based on the unique customer # (ie. 123456). The end for each statement is noted by "ENDSTM".
I can find the line number for the beginning of the statement section with sed.
... (5 Replies)
Discussion started by: andrewsc
5 Replies
5. Shell Programming and Scripting
Hi, this is about sorting a very large file (like 10 gb) to keep lines with unique entries across SOME of the columns.
The line originally looked like this:
sort -u -k2,2 -k3,3n -k4,4n -k5,5n -k6,6n file_unsorted > file_sorted
please note the -u flag.
The problem is that this single... (4 Replies)
Discussion started by: jbr950
4 Replies
6. Shell Programming and Scripting
I have 84 files with the following names splitseqs.1, spliseqs.2 etc.
and I want to change the .number to a unique filename.
E.g.
change splitseqs.1 into splitseqs.7114_1#24
and
change spliseqs.2 into splitseqs.7067_2#4
So all the current file names are unique, so are the new file names.... (1 Reply)
Discussion started by: avonm
1 Replies
7. Shell Programming and Scripting
Hello,
I have a database of name variants with the following structure:
variant=variant=variant
The number of variants can be as many as thirty to forty.
Since the database is quite large (at present around 60,000 lines) duplicate sets of variants creep in. Thus
John=Johann=Jon
and... (2 Replies)
Discussion started by: gimley
2 Replies
8. UNIX for Beginners Questions & Answers
Dear Users,
Appreciate your help if you could help me with splitting a large file > 1 million lines with sed or awk. below is the text in the file
input file.txt
scaffold1 928 929 C/T +
scaffold1 942 943 G/C +
scaffold1 959 960 C/T +... (6 Replies)
Discussion started by: kapr0001
6 Replies
9. UNIX for Beginners Questions & Answers
I have requirement to split below file (sample.csv) into multiple files by using the unique columns (first 3 are unique columns)
sample.csv
123|22|56789|ABCDEF|12AB34|2019-07-10|2019-07-10|443.3400|1|1
123|12|5679|BCDEFG|34CD56|2019-07-10|2019-07-10|896.7200|1|2... (3 Replies)
Discussion started by: RVSP
3 Replies
split(1) User Commands split(1)
NAME
split - split a file into pieces
SYNOPSIS
split [-linecount | -l linecount] [-a suffixlength] [ file [name]]
split [ -b n | nk | nm] [-a suffixlength] [ file [name]]
DESCRIPTION
The split utility reads file and writes it in linecount-line pieces into a set of output-files. The name of the first output-file is name
with aa appended, and so on lexicographically, up to zz (a maximum of 676 files). The maximum length of name is 2 characters less than the
maximum filename length allowed by the filesystem. See statvfs(2). If no output name is given, x is used as the default (output-files will
be called xaa, xab, and so forth).
OPTIONS
The following options are supported:
-linecount | -l linecounNumber of lines in each piece. Defaults to 1000 lines.
-a suffixlength Uses suffixlength letters to form the suffix portion of the filenames of the split file. If -a is not specified,
the default suffix length is 2. If the sum of the name operand and the suffixlength option-argument would create a
filename exceeding NAME_MAX bytes, an error will result; split will exit with a diagnostic message and no files
will be created.
-b n Splits a file into pieces n bytes in size.
-b nk Splits a file into pieces n*1024 bytes in size.
-b nm Splits a file into pieces n*1048576 bytes in size.
OPERANDS
The following operands are supported:
file The path name of the ordinary file to be split. If no input file is given or file is -, the standard input will be used.
name The prefix to be used for each of the files resulting from the split operation. If no name argument is given, x will be used as
the prefix of the output files. The combined length of the basename of prefix and suffixlength cannot exceed NAME_MAX bytes. See
OPTIONS.
USAGE
See largefile(5) for the description of the behavior of split when encountering files greater than or equal to 2 Gbyte ( 2**31 bytes).
ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of split: LANG, LC_ALL, LC_CTYPE, LC_MES-
SAGES, and NLSPATH.
EXIT STATUS
The following exit values are returned:
0 Successful completion.
>0 An error occurred.
ATTRIBUTES
See attributes(5) for descriptions of the following attributes:
+-----------------------------+-----------------------------+
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
+-----------------------------+-----------------------------+
|Availability |SUNWesu |
+-----------------------------+-----------------------------+
|CSI |enabled |
+-----------------------------+-----------------------------+
|Interface Stability |Standard |
+-----------------------------+-----------------------------+
SEE ALSO
csplit(1), statvfs(2), attributes(5), environ(5), largefile(5), standards(5)
SunOS 5.10 16 Apr 1999 split(1)