Split a file into parts only if the first field is different


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Split a file into parts only if the first field is different
# 8  
Old 08-14-2014
Is the volume a problem, or could use this logic:-
  • Get all the unique values of the first column - one IO
  • Get counts of each unique value - one IO per unique value
  • Group the unique values somehow to spread the volume between ten output files
  • Extract each group of labels to the output files - ten IOs
I would be concerned about multiple IO passes, but it might help clarity in the processing.

How would you propose to assign the various unique values to each output file?
  • By sequence (i.e. first 10% of unique values)
  • By record counts (might be very tricky)
  • By order first found in the original file



Sorry it's just more questions,

Robin
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Split the all files in to 8 parts in a folder

Hi, I have different files and i need to split the files in that folder split in to 8 parts with equal number of lines....! any fastest way of doing this in awk. for an example i have a file called "BillingDetails_BaseFile.csv" with total line count 65536 and i need to split in to 8 parts... (1 Reply)
Discussion started by: Raghuram717
1 Replies

2. Shell Programming and Scripting

Split line in 4 parts

Hi Guys, I have file A.txt 1 2 3 4 5 6 7 8 9 10 11 Want Output :- 1 2 3 (3 Replies)
Discussion started by: pareshkp
3 Replies

3. Shell Programming and Scripting

Split file based on a column/field value

Hi All, I have a requirement to split file into 2 sets of file. Below is a sample data of the file AU;PTN;24EX;25-AUG-14;AU;123;SE;123;Test NN;;;;ASD; AU;PTN;24EX;25-AUG-14;AU;456;SE;456;Test NN;;;;ASD; AU;PTN;24EX;25-AUG-14;AU;147;SE;147;Test NN;;;;ASD;... (6 Replies)
Discussion started by: galaxy_rocky
6 Replies

4. UNIX for Dummies Questions & Answers

How To Split A File In Two Rar Parts?

I Am connected to Whatbox.ca Seed Box Via SSH!! i have a file named avicii.mp3. I Want to split it into two rar parts as Apart1.rar and Apart2.rar So That When i Download Both the parts to My PC And Extract Them They Come out As Whole Avicii.mp3. There is also one more problem!! When I Rar A... (18 Replies)
Discussion started by: anime12345
18 Replies

5. Shell Programming and Scripting

Split file into n parts.

Hi all: I have a 5-column tab-separated file. The only thing that I want to do with it is to split it. However, I want to split it with a 80/20 proportion -- randomized, if possible. I know that something like : awk '{print $0 ""> "file" NR}' RS='' input-file will work, but it only... (6 Replies)
Discussion started by: owwow14
6 Replies

6. Shell Programming and Scripting

How to split file into multiple files using awk based on 1 field in the file?

Good day all I need some helps, say that I have data like below, each field separated by a tab DATE NAME ADDRESS 15/7/2012 LX a.b.c 15/7/2012 LX1 a.b.c 16/7/2012 AB a.b.c 16/7/2012 AB2 a.b.c 15/7/2012 LX2 a.b.c... (2 Replies)
Discussion started by: alexyyw
2 Replies

7. Shell Programming and Scripting

awk to split one field and print the last two fields within the split part.

Hello; I have a file consists of 4 columns separated by tab. The problem is the third fields. Some of the them are very long but can be split by the vertical bar "|". Also some of them do not contain the string "UniProt", but I could ignore it at this moment, and sort the file afterwards. Here is... (5 Replies)
Discussion started by: yifangt
5 Replies

8. Shell Programming and Scripting

Split file when the key field change !

Hello, I have the following example data file: Rv.Global_Sk,1077.160523,D,16/09/2011 Rv.Global_Sk,1077.08098,D,17/09/2011 Rv.Global_Sk,1077.001445,D,18/09/2011 Rv.Global_Sk,1072.660733,D,19/09/2011 Rv.Global_Sk,1070.381557,D,20/09/2011 Rv.Global_Sk,1071.971747,D,21/09/2011... (4 Replies)
Discussion started by: csierra
4 Replies

9. Shell Programming and Scripting

Split file based on field

Hi I have a large file 2.6 million records and I am trying to split the file based on last column. I am doing awk -F"|" '{ print > $NF }' filename1 After around 1000 splits it gives me a error awk: can't open file 3332332423 input record number 1068, file filename1 source... (6 Replies)
Discussion started by: s_adu
6 Replies

10. Shell Programming and Scripting

Removing parts of a specific field

All, I have a field in a comma seperated file with hundreds of lines and about 20 columns and I wish to remove all numbers after the decimal point in field 4 on each line and output the rest to another file or write it back to itself. File is like this 20070126, 123.0, GBP, 1234.5678,... (9 Replies)
Discussion started by: kieranh
9 Replies
Login or Register to Ask a Question