How to split a large file with the first 100 lines of each condition?


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users How to split a large file with the first 100 lines of each condition?
# 1  
Old 02-23-2016
How to split a large file with the first 100 lines of each condition?

I have a huge file with the following input:

Code:
Case1	Specific_Info	Specific_Info
Case1	Specific_Info	Specific_Info
Case3	Specific_Info	Specific_Info
Case4	Specific_Info	Specific_Info
Case1	Specific_Info	Specific_Info
Case2	Specific_Info	Specific_Info
Case2	Specific_Info	Specific_Info
Case1	Specific_Info	Specific_Info
Case3	Specific_Info	Specific_Info
…	…	…
Casen	Specific_Info	Specific_Info

I need to split this file into several files where each final file has 1000 lines per "Casen". I have been using the separate procedures to do this first splitting the the files per Cases, then those files per 1000 lines and then adding the files back into 1, but this process is too long.

Last edited by Scrutinizer; 02-23-2016 at 02:49 PM.. Reason: code tags
# 2  
Old 02-23-2016
Using bash cd to the directory with that file:

Step 1.

Code:
nn=$(sort -u infile | wc -l)
echo $nn

If the file open limit (shown by ulimit -n) is less than nn minus 3 i.e., n-3:
Code:
awk '{print $0 > $1}' infile

otherwise nn is too big use:
Code:
while read rec
do
   f=${rec## *}
   echo "$rec" >> $f
done < infile

Now you have a bunch of files with many lines that all start with the same string.

Step 2.
Use the split -l (ell) command to make smaller files with a limit of 1000 lines per file. Note last file may have less than 1000

Code:
ls case* >tmpfile
while read f
do
  split -l 1000 $f $f
  rm $f # remove litter
done
rm tmpfile

You are going to have loads of small files all looking like this Case343AAB the AAB is the unique file name suffix added by split

Last edited by jim mcnamara; 02-23-2016 at 05:30 PM..
# 3  
Old 02-24-2016
How about
Code:
sort file | awk 'C[$1]++ < 1000'

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Trying To Split a Large File

Trying to split a 35gb file into 1000mb parts. My research shows I should you this. split -b 1000m file.txt and my return is "split: cannot open 'crunch1.txt' for reading: No such file or directory" so I tried split -b 1000m Documents/Wordlists/file.txt and I get nothing other than the curser just... (3 Replies)
Discussion started by: sub terra
3 Replies

2. UNIX for Advanced & Expert Users

How to split large file with different record delimiter?

Hi, I have received a file which is 20 GB. We would like to split the file into 4 equal parts and process it to avoid memory issues. If the record delimiter is unix new line, I could use split command either with option l or b. The problem is that the line terminator is |##| How to use... (5 Replies)
Discussion started by: Ravi.K
5 Replies

3. UNIX for Dummies Questions & Answers

Split large file to smaller fastly

hi , I have a requirement input file: 1 1111111111111 108 1 1111111111111 109 1 1111111111111 109 1 1111111111111 110 1 1111111111111 111 1 1111111111111 111 1 1111111111111 111 1 1111111111111 112 1 1111111111111 112 1 1111111111111 112 The output should be, (19 Replies)
Discussion started by: mechvijays
19 Replies

4. Shell Programming and Scripting

how to split a huge file by every 100 lines

into small files. i need to add a head.txt and tail.txt into small files at the begin and end, and give a name as q1.xml q2.xml q3.xml .... thank you very much. (2 Replies)
Discussion started by: dtdt
2 Replies

5. Shell Programming and Scripting

Split a large file

I have a 3 GB text file that I would like to split. How can I do this? It's a giant comma-separated list of numbers. I would like to make it into about 20 files of ~100 MB each, with a custom header and footer. The file can only be split on commas, but they're plentiful. Something like... (3 Replies)
Discussion started by: CRGreathouse
3 Replies

6. Shell Programming and Scripting

Splitting a large file, split command will not do.

Hello Everyone, I have a large file that needs to be split into many seperate files, however the text in between the blank lines need to be intact. The file looks like SomeText SomeText SomeText SomeOtherText SomeOtherText .... Since the number of lines of text are different for... (3 Replies)
Discussion started by: jwillis0720
3 Replies

7. Shell Programming and Scripting

split file with condition

$ cat file H1:12:90 k:12:b n:22:i k:54:b k:42:b s:48:s a:41:b t:18:n c:77:a I am trying to split above file based on $2 such that if $2 is rounded to nearest 10's multiple (e.g. 10,20,30 etc), each sub file should contain 3 multiples and so on (also I want to keep header i.e. NR==1, in... (6 Replies)
Discussion started by: uwork72
6 Replies

8. Shell Programming and Scripting

Split Large File

HI, i've to split a large file which inputs seems like : Input file name_file.txt 00001|AAAA|MAIL|DATEOFBIRTHT|....... 00001|AAAA|MAIL|DATEOFBIRTHT|....... 00002|BBBB|MAIL|DATEOFBIRTHT|....... 00002|BBBB|MAIL|DATEOFBIRTHT|....... 00003|CCCC|MAIL|DATEOFBIRTHT|.......... (1 Reply)
Discussion started by: AMARA
1 Replies

9. Shell Programming and Scripting

Split a large file with patterns and size

Hi, I have a large file with a repeating pattern in it. Now i want the file split into the block of patterns with a specified no. of lines in each file. i.e. The file is like 1... 2... 2... 3... 1... 2... 3... 1... 2... 2... 2... 2... 2... 3... where 1 is the start of the block... (5 Replies)
Discussion started by: sudhamacs
5 Replies

10. Shell Programming and Scripting

Split A Large File

Hi, I have a large file(csv format) that I need to split into 2 files. The file looks something like Original_file.txt first name, family name, address a, b, c, d, e, f, and so on for over 100,00 lines I need to create two files from this one file. The condition is i need to ensure... (4 Replies)
Discussion started by: nbvcxzdz
4 Replies
Login or Register to Ask a Question