![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Split a file with no pattern -- Split, Csplit, Awk | madhunk | UNIX for Dummies Questions & Answers | 10 | 12-17-2007 12:57 PM |
| Clouds (Partially Order Sets) - Streams (Linearly Ordered Sets) - Part 2 | iBot | Virtualization and Cloud Computing | 0 | 07-28-2007 07:40 AM |
| Clouds (Partially Order Sets) – Streams (Linearly Ordered Sets) - Part 1 | iBot | Complex Event Processing RSS News | 0 | 07-28-2007 07:40 AM |
| IP Security file sets | zuessh | AIX | 3 | 04-26-2007 04:52 AM |
| FILE SETS in unix | Aparna_A | UNIX for Advanced & Expert Users | 2 | 08-14-2006 04:12 PM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
split a file with unique sets
This may sound like a trivial problem, but I still need some help:
I have a file with ids and I want to split it 'n' ways (could be any number) into files: 1 1 1 2 2 3 3 4 5 5 Let's assume 'n' is 3, and we cannot have the same id in two different partitions. So the partitions may look like (1,1,1,), (2,2,3,3),(4,5,5). Thanks guys, - CB |
|
||||
|
You're aware there's a limited number of combinations for a given fixed number of ids, right?
So, if you have 1 1 1 2 2 3 3 4 5 5 in the example you gave, and this happens more than once, the first partition could be (1,1,1),(2,2,3,3),(4,5,5) and a second occurrence of those ids could generate a partition like (1,1), (1, 2,2,3,3),(4,5,5), right? |
|
||||
|
I was hoping for a better solution, but here is a crude way that i thought of:
1. split the file 'n' ways (n=3 for this example): part 1 part 2 part 3 1 2 3 1 2 4 1 3 5 2. if n%(size of orig file) = 3%10 > 0 then append remaining id to the last partition part 3 3 4 5 5 3. Compare part 1 with part 2 and see if ids are matched. If found, then move row from part 2 to part 1. Move to the next part and do the same. part 1 1 1 1 part 2 2 2 3 3 part 3 3 4 5 5 Hopefully, someone will present a sleeker solution with some syntax. Thanks, - CB |
|
||||
|
The OP wants all the 1's in a single file, 2's in a single file possibly with all 3's in the same file as well.
The problem is you have to know the split count as well as the complete key list and count of unique keys and how to group them before you attempt a split. I would create a list of unique key fields, divide the count by 3 and let any extras fall into the last split. The problem with this is that you can get splits of enormously different sizes depending on how skewed the distribution of keys is in the data file. It defeats splitting altogether - IMO. And what happens when you ask for more splits than there are keys? The only thing that that makes sense to me is a one-to-one split - one distinct key per file or leave everything in one big file. |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|