Splitting the file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Splitting the file
# 1  
Old 11-18-2013
Splitting the file

I have a file with around 10 million records.

Please find the sample data below

Code:
123456|ASDF|WORD|MIND|456890|40050|RTS
123456|9UIL|WORD|BLINK|15G26|43215|GTS
123456|9UIL|WORD|BLINK|15G26|43215|BTS
125828|9UIH|WIRD|BLANK|15G26|45215|NTS
125828|9UIH|WIRD|BLANK|15G26|47215|PTS
145679|8UIH|BIRD|BLINK|15T26|90807|ZTS

My requirement is I want to split the file based on the first column.
For the first column which is having the same set of values will go to one file like that.
So in the above data
First three records will go to file 1

Code:
123456|ASDF|WORD|MIND|456890|40050|RTS
123456|9UIL|WORD|BLINK|15G26|43215|GTS
123456|9UIL|WORD|BLINK|15G26|43215|BTS

this will go to file2

Code:
125828|9UIH|WIRD|BLANK|15G26|45215|NTS
125828|9UIH|WIRD|BLANK|15G26|47215|PTS

this will go to file3

Code:
145679|8UIH|BIRD|BLINK|15T26|90807|ZTS

But the problem here is the number of record with the same value for the first column can vary.
FOr example in the above sample data I show three records with same value.
It can be either 3 or 4 or 100 or any number.Same for the other set of records also
# 2  
Old 11-18-2013
Simple approach:
Code:
awk -F\| '{print $0 > $1".txt"}' file

Does the number in column #1 always comes in groups, or will you find eks 123456 further down in the file after other data?
If there are many records, files should be closed.

EDIT:
This should close the file while field #1 changes
Code:
awk -F\| 'f!=$1 {close (f".txt")} {print $0 > $1".txt";f=$1} END {close (f".txt")}' file


Last edited by Jotne; 11-18-2013 at 03:42 AM..
# 3  
Old 11-18-2013
thanks its working fine.The records will always come in group only.

But there is another issue.if we have around 77k same set of records , it will create 77k files.Actually I don't want to create that much files.I can combine the files and want to make it three or four max.But the same set of records shouldn't get split in two files.
# 4  
Old 11-18-2013
We can use only part of the first filed to create larger groups. So if you show an example of group, we can show you how it can be done. Exs 2 first digit.
Here is en example on 2 first digit:
Code:
awk -F\| 'f!=substr($1,1,2) {close (f".txt")} {print $0 > substr($1,1,2)".txt";f=substr($1,1,2)} END {close (f".txt")}' file


Last edited by Jotne; 11-18-2013 at 03:54 AM..
# 5  
Old 11-18-2013
For Ex.

Code:
123456|ASDF|WORD|MIND|456890|40050|RTS
123456|9UIL|WORD|BLINK|15G26|43215|GTS
123456|9UIL|WORD|BLINK|15G26|43215|BTS
125828|9UIH|WIRD|BLANK|15G26|45215|NTS
125828|9UIH|WIRD|BLANK|15G26|47215|PTS
145679|8UIH|BIRD|BLINK|15T26|90807|ZTS
123456|ASDF|WORD|MIND|456890|40050|RTS
123456|9UIL|WORD|BLINK|15G26|43215|GTS
123456|9UIL|WORD|BLINK|15G26|43215|BTS

I can combine all the first set and second set into one file.And we can combine as many records into one file still the file size become 500000 records.

But we should take care one thing that, the same set of records shouldn't get split into two files.

For Ex.

123456|ASDF|WORD|MIND|456890|40050|RTS
123456|9UIL|WORD|BLINK|15G26|43215|GTS
123456|9UIL|WORD|BLINK|15G26|43215|BTS

In the above sample data , the first two records into one file and other into a different file.That shouldn't be happen anywhere.
# 6  
Old 11-18-2013
The below code will first check the unique patterns in the first column and saves it to a file. it then checks for the unique pattern in the input file and stores all records matching pattern in a file named with the pattern

Code:
cut -d'|' -f1 input_file | uniq > final
 
while read line
do
grep "$line" input_file >> "$line".txt
done < final

once the above code is executed it results in 3 files( for the example in ques)

123456.txt
125828.txt
145679.txt

result in the file as below.

Code:
more 123456.txt
123456|ASDF|WORD|MIND|456890|40050|RTS
123456|9UIL|WORD|BLINK|15G26|43215|GTS
123456|9UIL|WORD|BLINK|15G26|43215|BTS
 
more 125828.txt
125828|9UIH|WIRD|BLANK|15G26|45215|NTS
125828|9UIH|WIRD|BLANK|15G26|47215|PTS
 
more 145679.txt
145679|8UIH|BIRD|BLINK|15T26|90807|ZTS

# 7  
Old 11-18-2013
Not sure what you like, but to split the file into files with 500000 records in each file.
Code:
awk 'NR%500000==1 {close (++a".txt")} {print $0 > sprintf("%06d",a)".txt"}' file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Splitting the file based on two fields - Fixed length file

Hi , I am having a scenario where I need to split the file based on two field values. The file is a fixed length file. ex: AA0998703000000000000190510095350019500010005101980301 K 0998703000000000000190510095351019500020005101480 ... (4 Replies)
Discussion started by: saj
4 Replies

2. Shell Programming and Scripting

Splitting a text file into smaller files with awk, how to create a different name for each new file

Hello, I have some large text files that look like, putrescine Mrv1583 01041713302D 6 5 0 0 0 0 999 V2000 2.0928 -0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 5.6650 0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3.5217 ... (3 Replies)
Discussion started by: LMHmedchem
3 Replies

3. Shell Programming and Scripting

Execution of loop :Splitting a single file into multiple .dat file

hdr=$(cut -c1 $path$file|head -1)#extract header”H” trl=$(cut -c|path$file|tail -1)#extract trailer “T” SplitFile=$(cut -c 50-250 $path 1$newfile |sed'$/ *$//' head -1')# to trim white space and extract table name If; then # start loop if it is a header While read I #read file Do... (4 Replies)
Discussion started by: SwagatikaP1
4 Replies

4. Shell Programming and Scripting

Splitting XML file on basis of line number into multiple file

Hi All, I have more than half million lines of XML file , wanted to split in four files in a such a way that top 7 lines should be present in each file on top and bottom line of should be present in each file at bottom. from the 8th line actual record starts and each record contains 15 lines... (14 Replies)
Discussion started by: ajju
14 Replies

5. UNIX for Dummies Questions & Answers

Extracting data from one file, based on another file (splitting)

Dear All, I have two files but want to extract data from one based on another... can you please help me file 1 David Tom Ellen and file 2 David|0010|testnamez|resultsz David|0004|testnamex|resultsx Tom|0010|testnamez|resultsz Tom|0004|testnamex|resultsx Ellen|0010|testnamez|resultsz... (12 Replies)
Discussion started by: A-V
12 Replies

6. Shell Programming and Scripting

Splitting a file in to multiple files and passing each individual file to a command

I have an input file with contents like: MainFile.dat: 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 ... (4 Replies)
Discussion started by: rkrish
4 Replies

7. Shell Programming and Scripting

File splitting, naming file according to internal field

Hi All, I have a rather stange set of requirements that I'm hoping someone here could help me with. We receive a file that is actually a concatenation of 4 files (don't believe this would change, but ideally the solution would handle n files). The super-file looks like:... (7 Replies)
Discussion started by: Leedor
7 Replies

8. Shell Programming and Scripting

splitting the file

Hi , I have one file which has many headers. Say suppose HEDAER ..data DATA DATA ..data ..data HEADER ..data ..data DATA .data HEADER. ..data ..data If there are 3 HEADERS in source file then I need to split the source file into 3 separate file.... (2 Replies)
Discussion started by: tanyaheerani
2 Replies

9. UNIX for Dummies Questions & Answers

Splitting a file based on record sin another file

All, We receive a file with a large no of records (records can vary) and we have to split it into two files based on another file. e.g. File1: UHDR 2008112 "25187","00000022","00",21-APR-1991,"" ,"D",-000000519,+0000000000,"C", ,+000000000,+000000000,000000000,"2","" ,21-APR-1991... (7 Replies)
Discussion started by: er_ashu
7 Replies

10. Shell Programming and Scripting

[Splitting file] Extracting group of segments from one file to others

Hi there, I need to split one huge file into separate files if the condition is fulfilled according to that the position between 97 and 98 matches with “IT” at the segment MAS. There is no delimiter file is fix-width with varous line length. Could you please help me how I do split the file... (1 Reply)
Discussion started by: ozgurgul
1 Replies
Login or Register to Ask a Question