Split a file into parts only if the first field is different


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Split a file into parts only if the first field is different
# 1  
Old 08-12-2014
Split a file into parts only if the first field is different

Hi, I have a file like this:

Code:
aaa 123
aaa 223
aaa 225
bbb 332
bbb 423
bbb 6755
bbb 324
ccc 112
ccc 234
ccc 897

Which I need to split into several files, something like
split -l 3
but the way that the lines with the same names would only go into one file:

Code:
File1

aaa 123
aaa 223
aaa 225

File2

bbb 332
bbb 423
bbb 6755
bbb 324

File3

ccc 112
ccc 234
ccc 897

Could you please help me with that?
# 2  
Old 08-12-2014
How many different files are we talking about? This may work, but if there's more than 20 outputs or so, awk may choke from opening too many files:

Code:
awk '!($1 in A) { A[$1]="File" ++X } ; { print > A[$1] }' inputfile

You can get around that with close()ing everything, but that's not terribly efficient, and not all awk has close (use nawk on Solaris).

Code:
awk '!($1 in A) { A[$1]="File" ++X } ; { print >> A[$1] ; close(A[$1]) }' inputfile

This User Gave Thanks to Corona688 For This Post:
# 3  
Old 08-13-2014
I might have worded it wrong, sorry for the confusion. I don't want every line with the same first column in a separate file. I need to split file in, let's say 10 files, but check it for not putting the lines with the same column names in different files. Basically, in my example the output can be this:

Code:
File1
aaa 123
aaa 223
aaa 225
bbb 332
bbb 423
bbb 6755
bbb 324

File2
ccc 112
ccc 234
ccc 897

# 4  
Old 08-13-2014
Let's say, you want to split a file into 10 parts, and all the lines have same first field...
What is the expected output in that case
# 5  
Old 08-13-2014
This is not the case but then it will all go into the first (and only) file.
# 6  
Old 08-13-2014
Do 'extra' rows have to be in the first file, or can they be in the last one instead?

i.e.
Code:
File1
aaa 123
aaa 223
aaa 225

File2
bbb 332
bbb 423
bbb 6755
bbb 324
ccc 112
ccc 234
ccc 897

# 7  
Old 08-14-2014
This one adds extra lines until column#1 differs (i.e. to the first file).
Code:
awk 'BEGIN {out="x"0} (n++>=L && $1!=p1) {n=0; out="x"++x} {print > out; p1=$1}' L=3 file

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Split the all files in to 8 parts in a folder

Hi, I have different files and i need to split the files in that folder split in to 8 parts with equal number of lines....! any fastest way of doing this in awk. for an example i have a file called "BillingDetails_BaseFile.csv" with total line count 65536 and i need to split in to 8 parts... (1 Reply)
Discussion started by: Raghuram717
1 Replies

2. Shell Programming and Scripting

Split line in 4 parts

Hi Guys, I have file A.txt 1 2 3 4 5 6 7 8 9 10 11 Want Output :- 1 2 3 (3 Replies)
Discussion started by: pareshkp
3 Replies

3. Shell Programming and Scripting

Split file based on a column/field value

Hi All, I have a requirement to split file into 2 sets of file. Below is a sample data of the file AU;PTN;24EX;25-AUG-14;AU;123;SE;123;Test NN;;;;ASD; AU;PTN;24EX;25-AUG-14;AU;456;SE;456;Test NN;;;;ASD; AU;PTN;24EX;25-AUG-14;AU;147;SE;147;Test NN;;;;ASD;... (6 Replies)
Discussion started by: galaxy_rocky
6 Replies

4. UNIX for Dummies Questions & Answers

How To Split A File In Two Rar Parts?

I Am connected to Whatbox.ca Seed Box Via SSH!! i have a file named avicii.mp3. I Want to split it into two rar parts as Apart1.rar and Apart2.rar So That When i Download Both the parts to My PC And Extract Them They Come out As Whole Avicii.mp3. There is also one more problem!! When I Rar A... (18 Replies)
Discussion started by: anime12345
18 Replies

5. Shell Programming and Scripting

Split file into n parts.

Hi all: I have a 5-column tab-separated file. The only thing that I want to do with it is to split it. However, I want to split it with a 80/20 proportion -- randomized, if possible. I know that something like : awk '{print $0 ""> "file" NR}' RS='' input-file will work, but it only... (6 Replies)
Discussion started by: owwow14
6 Replies

6. Shell Programming and Scripting

How to split file into multiple files using awk based on 1 field in the file?

Good day all I need some helps, say that I have data like below, each field separated by a tab DATE NAME ADDRESS 15/7/2012 LX a.b.c 15/7/2012 LX1 a.b.c 16/7/2012 AB a.b.c 16/7/2012 AB2 a.b.c 15/7/2012 LX2 a.b.c... (2 Replies)
Discussion started by: alexyyw
2 Replies

7. Shell Programming and Scripting

awk to split one field and print the last two fields within the split part.

Hello; I have a file consists of 4 columns separated by tab. The problem is the third fields. Some of the them are very long but can be split by the vertical bar "|". Also some of them do not contain the string "UniProt", but I could ignore it at this moment, and sort the file afterwards. Here is... (5 Replies)
Discussion started by: yifangt
5 Replies

8. Shell Programming and Scripting

Split file when the key field change !

Hello, I have the following example data file: Rv.Global_Sk,1077.160523,D,16/09/2011 Rv.Global_Sk,1077.08098,D,17/09/2011 Rv.Global_Sk,1077.001445,D,18/09/2011 Rv.Global_Sk,1072.660733,D,19/09/2011 Rv.Global_Sk,1070.381557,D,20/09/2011 Rv.Global_Sk,1071.971747,D,21/09/2011... (4 Replies)
Discussion started by: csierra
4 Replies

9. Shell Programming and Scripting

Split file based on field

Hi I have a large file 2.6 million records and I am trying to split the file based on last column. I am doing awk -F"|" '{ print > $NF }' filename1 After around 1000 splits it gives me a error awk: can't open file 3332332423 input record number 1068, file filename1 source... (6 Replies)
Discussion started by: s_adu
6 Replies

10. Shell Programming and Scripting

Removing parts of a specific field

All, I have a field in a comma seperated file with hundreds of lines and about 20 columns and I wish to remove all numbers after the decimal point in field 4 on each line and output the rest to another file or write it back to itself. File is like this 20070126, 123.0, GBP, 1234.5678,... (9 Replies)
Discussion started by: kieranh
9 Replies
Login or Register to Ask a Question