[Solved] intelligent splitting?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting [Solved] intelligent splitting?
# 1  
Old 09-10-2012
[Solved] intelligent splitting?

Hi,

I want to split a file into multiple ones, with a new file for every line in the old file. Typically it is in this format

Code:
0.25 20 35.7143
0.5 31 55.3571
0.85 3 5.35714
1.3 2 3.57143

I can make new files by using split or other simple awk commands. But sometimes, the file is like this
Code:
0.25 20 35.7143
0.5 31 55.3571
1.3 2 3.57143

or
Code:
0.25 20 35.7143

Even in these cases I want place holder files to be created. So even for just one row, I want four files, where three will be empty.

How do I do this? Thanks a lot.
# 2  
Old 09-10-2012
How many lines can you have?

and what about a file with 2 lines then ( 2 empty be created also?)?
# 3  
Old 09-10-2012
The script needs to know, how many empty files it should create. That can be my some kind of maximum value or by empty lines in the original file (if there are any).

Here an example with manually giving 5 as maximum number of files:
Code:
$ cat infile
0.25 20 35.7143
0.5 31 55.3571
1.3 2 3.57143
$ awk '{print $0 > "file_"NR} END{c=NR; while(c <= max){print "" > "file_"c; c++}}' max=5 infile
$ ls -la
total 36
drwxr-xrwx  3 root root 4096 10. Sep 10:41 .
drwxr-xr-x 11 root root 4096 30. Aug 10:39 ..
-rw-r--r--  1 root root   16 10. Sep 10:41 file_1
-rw-r--r--  1 root root   15 10. Sep 10:41 file_2
-rw-r--r--  1 root root   15 10. Sep 10:41 file_3
-rw-r--r--  1 root root    1 10. Sep 10:41 file_4
-rw-r--r--  1 root root    1 10. Sep 10:41 file_5
-rw-r--r--  1 root root   45 10. Sep 10:30 infile
-rwx------  1 root root    0  6. Sep 15:19 mach.ksh
drwxr-xr-x  2 root root 4096  6. Aug 09:21 tmp

# 4  
Old 09-10-2012
Hi,
Thanks so much for the response, to make things simpler, my base file will have one of these 4 values in the first column- .25 .5 .85 and 1.3. So if any of these exist in the file, then write just that line to a new file. whenever a column entry does not exist, then just write the value (one of those 4) and leave the 2nd column blank.

---------- Post updated at 11:16 AM ---------- Previous update was at 11:15 AM ----------

@vbe I can have a maximum of 4 lines, if there are only 2, even that case i want to create 4 files with place holders in 2.
# 5  
Old 09-10-2012
Like this?
Code:
awk 'BEGIN{written["0.25"]=written["0.5"]=written["0.85"]=written["1.3"]="N"}
{print >> "file"$1;close("file"$1);if(written[$1]=="N") written[$1]="Y"}
END{for(i in written) if(written[i]=="N"){print i > "file"i;close("file"i)}}' file

And yes, I am doing the open-close juggling so that only 1 file remains open at a time.

Last edited by elixir_sinari; 09-10-2012 at 07:02 AM..
This User Gave Thanks to elixir_sinari For This Post:
# 6  
Old 09-10-2012
Wow, THanks so much. It works as I want. A few questions on this-
why do you specifically close the file every time? I have never done so with awk.

Also, I am using this as a part of a bigger script, the segment will end up somewhere like this-

Code:
for k in 1 2 3 4 5 6 7 8 9 10
cat time$k.txt | awk 'BEGIN{written["0.25"]=written["0.5"]=written["0.85"]=written["1.3"]="N"}
{print >> "file"$k$1;close("file"$k$1);if(written[$1]=="N") written[$1]="Y"}
END{for(i in written) if(written[i]=="N"){print i > "file"$k$1;close("file"$k$1)}}'
done

The file number is all over the place. Could you help me a bit with this.

Thanks a lot! uve been great!
# 7  
Old 09-10-2012
That is because awk has some limitations w.r.t. to the number of files open at a particular time. But, that is implementation (and sometimes, system) specific.

In ksh93, I could write that code snippet as:
Code:
awk 'BEGIN{written["0.25"]=written["0.5"]=written["0.85"]=written["1.3"]="N"}
FNR==1{k=FILENAME;gsub(/^.{4}|\..*/,"",k)}
{print >> "file" k $1;close("file" k $1);if(written[$1]=="N") written[$1]="Y"}
END{for(i in written) if(written[i]=="N"){print i > "file" k $1;close("file" k $1)}}' time[1-9]?(0).txt

If you are using bash, you could turn extglob on (shopt -s extglob) before running this.
This User Gave Thanks to elixir_sinari For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

6 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

[Solved] File Splitting And Renaming Problem

OK So I Recently Bought A whatbox Seed-box Act!!:cool: I am connected to whatbox via SSH!!! Now i have downloaded a movie and renamed it to 2yify.mp4 (800MB):o When I TYPE the command to split it which is:) split -b 400m 2yify.mp4 It gets renamed into two parts with different names... (4 Replies)
Discussion started by: anime12345
4 Replies

2. UNIX for Dummies Questions & Answers

[SOLVED] splitting a single column(with spaces) into multiple rows

Hi All, My requisite is to split a single column of phonemes seperated by spaces into multiple rows. my input file is: a dh u th a qn ch A v U r k my o/p should be like: adhu a dh u (3 Replies)
Discussion started by: girlofgenuine
3 Replies

3. Shell Programming and Scripting

Intelligent scaning of log files

Dear experts, I have a problem and I am not clear on how to attack this. Let me define the problem as simply as possible. 1)There are several log files in a directory 2) Script should open each log file and scan for errors (grep for certain strings - say - error1, error2, error3) 3) If any... (1 Reply)
Discussion started by: newscripter
1 Replies

4. Shell Programming and Scripting

Need a script for intelligent diff

Hi, I have 2 files which represent data in a Sybase table and I need to run a diff on them, and based on the first column (which is the primary key) in each file, create 3 files, one for inserts, one for deletes and one for updates Example: old.txt contains server1,a,b,c server2,d,e,f... (4 Replies)
Discussion started by: MARKPARE
4 Replies

5. UNIX for Dummies Questions & Answers

A more intelligent SDIFF

Hi all I have two files which are essentially the same. However the way an exponent is written is different (i.e. in 1 file, a particular number might be written as 1.43230000E+02 whereas in another it might be 1.4323E2). If I use SDIFF then the program will merely check the ASCII characters... (1 Reply)
Discussion started by: robbiegregg
1 Replies

6. Shell Programming and Scripting

more intelligent way of uninstalling a RPM

Hi all, I'm writing an uninstaller for a bespoke piece of software that we deploy to our Linux terminals. One of the packages we install is the JDK (Java Development Kit). Now over the years we have quite a number of different versions installed with different package names. In my uninstaller... (0 Replies)
Discussion started by: _Spare_Ribs_
0 Replies
Login or Register to Ask a Question