Split file by data group


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split file by data group
# 1  
Old 11-17-2009
Split file by data group

Hi all,

I'm having a little trouble solving a file split I need to get done.

I have the following data:

Code:
1. Light
1A. Light Soft
texture: it's soft
color: the color value is that of something light
vital statistics: srm: 23 og: 1.035 sp: 1.065
comment: this is nice if you like light colored soft things

1B. Medium
texture: it's soft-ish
color: the color is similar to light soft but because of the difference in texture it appears slightly darker to the naked eye
vital statistics: srm: 30 og: 1.020 sp: 1.070
comment: this is nice if you like light colored soft-ish things

2. German Stlye
2A. Blue Jeans
texture: jeany
color: blue stoopid
vital statistics: srm: 22 og: 1.045 sp: 1.211
comment: german blue jeans are really different that other blue jeans.

3. Last Example Category
3A. This is stoopid
texture: huh?
color: umm... white and black?
vital statustics: srm: 11 og: 1.222 sp: 1.222
comment: ugg

Desired Output

Code:
Light.dat

category|style|texture|color|srm|og|sp|comment
Light|Light Soft|it's soft|the color value is that of something light|23|1.035|1.065|this is nice if you like light colored soft things
Light|Medium|it's soft-ishthe color is similar to light soft but because of the difference in texture it appears slightly darker to the naked eye|30|1.020|1.070|this is nice if you like light colored soft-ish things

German Style.dat

category|style|texture|color|srm|og|sp|comment
German Style|Blue Jeans|jeany|blue stoopid|22|1.045|1.211|german blue jeans are really different that other blue jeans.

Last Example Category.dat

category|style|texture|color|srm|og|sp|comment
Last Example Category|This is stoopid|huh?|umm... white and black?|11|1.222|1.222|ugg

I realize that this is kind of a tall order and I really appreciate any help you can offer, maybe it's just because it's late at night, but I've just got nothing on this one.

I'd prefer a script in bash or awk.


THANKS!
# 2  
Old 11-17-2009
Not finished, but this maybe give you some helps.

Code:
$ sed 's/^[0-9][A-Z]\. \|texture: \|^color: \|^vital statistics: srm: \| og: \| sp: \|^comment: /|/g' urfile
1. Light
|Light Soft
|it's soft
|the color value is that of something light
|23|1.035|1.065
|this is nice if you like light colored soft things

|Medium
|it's soft-ish
|the color is similar to light soft but because of the difference in texture it appears slightly darker to the naked eye
|30|1.020|1.070
|this is nice if you like light colored soft-ish things

2. German Stlye
|Blue Jeans
|jeany
|blue stoopid
|22|1.045|1.211
|german blue jeans are really different that other blue jeans.

3. Last Example Category
|This is stoopid
|huh?
|umm... white and black?
|11|1.222|1.222
|ugg

# 3  
Old 11-17-2009
Wrench

OP, base on your data sample and required output:
Code:
# cat awk.script
BEGIN{
        x="|"
        y=".dat"
        z="category|style|texture|color|srm|og|sp|comment"
     }
END{print r > (n y)}
        int($1)"."==$1{
                        if(r){
                                print r > (n y)
                                r=""
                             }
                        sub($1FS,"")
                        n=$0
                        }
        $1 ~ /[A-Z]\./{
                        sub($1FS,"")
                        r  = r ? r ORS n x $0: z ORS n x $0
                        }
        /vital/{
                        gsub(/[a-z]| /,"")
                        gsub("::","")
                        gsub(":","|")
                        r = r x $0
                }
        /^[a-z]/{
                        r = r x substr($0,(match($0,":")+2))
                }
# awk -f awk.script sample_file
# cat Light.dat
category|style|texture|color|srm|og|sp|comment
Light|Light Soft|it's soft|the color value is that of something light|23|1.035|1.065|this is nice if you like light colored soft things
Light|Medium|it's soft-ish|the color is similar to light soft but because of the difference in texture it appears slightly darker to the naked eye|30|1.020|1.070|this is nice if you like light colored soft-ish things
# cat German\ Stlye.dat
category|style|texture|color|srm|og|sp|comment
German Stlye|Blue Jeans|jeany|blue stoopid|22|1.045|1.211|german blue jeans are really different that other blue jeans.
# cat Last\ Example\ Category.dat
category|style|texture|color|srm|og|sp|comment
Last Example Category|This is stoopid|huh?|umm... white and black?|11|1.222|1.222|ugg

That was too much to fit in one line Smilie

PS. Use gawk, nawk or /usr/xpg4/bin/awk on Solaris.
# 4  
Old 11-17-2009
Quote:
Originally Posted by danmero
That was too much to fit in one line Smilie
Definitely, thank you very much for your help with this, much appreciated!

Thanks!
# 5  
Old 11-17-2009
Quote:
Originally Posted by scottn
I lived to see danmero use whitespace -- I'm gonna have a T-shirt made!
I always like to answer with oneliner's but for this problem the onliner became too long for my screen Smilie and I take the advantage to add the most wanted whitespace Smilie

Quote:
Originally Posted by mkastin
Incredible help with a complex problem, not just some little one-liner.
Can you confirm please that is working as expected on real data file not just on sample data Smilie


PS. Thank you both for credits and appreciation Smilie
# 6  
Old 11-17-2009
Quote:
Originally Posted by danmero
PS. Thank you both for credits and appreciation Smilie
I will be able to this evening, I will have to make a few modifications to the script and won't have access to my servers till later today.
# 7  
Old 11-17-2009
Pfeww, that was fun Smilie. Not as nice as a single script, but I liked trying another approach.
Code:
sed -r     '/vital/{s/\w+ \w+://;s/(\w+: \w+)/\n\1/g};' infile                  |
awk -F': ' 'function prl() { if(s!=""){print s>f".dat"}; sub(/[^ ]* /,"")       }
            BEGIN          { h="category|style|texture|color|srm|og|sp|comment" }
            /^[0-9]+\./    { prl(); f=$1; print h>f".dat"; s=""                 }
            /^[0-9]+[A-Z]/ { prl(); s=f"|"$1                                    }
            /:/            { s=s"|"$2                                           }
            END            { prl()                                              }'

-or-
Code:
sed        '/vital/{s/\w\+ \w\+://;s/\(\w\+: \w\+\)/\n\1/g};' infile            |

if your sed does not support -r

---------- Post updated at 23:21 ---------- Previous update was at 22:34 ----------

They will ask me where I got all those bits and who my accomplice is, you know Smilie

Last edited by Scrutinizer; 11-17-2009 at 05:40 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Split binary file every occurrence of a group of characters

Hello I am new to scripts, codes, bash, terminal, etc. I apologize this my be very scattered because I frankly don't have any idea where to begin and I have had trouble sleeping lately. I have several 2GB files I wish to split. This Code 00 00 01 BA ** ** ** ** ** ** ** ** C3 F8 00 00 01 BB 00... (17 Replies)
Discussion started by: PatrickE
17 Replies

2. Solaris

Create file for group of data:

Hi folks, I have the following data.Any help is greatly appreciated. order File_name 7222245 7222245.pdf 7222245 7222245a.pdf 7222245 7222245b.pdf 7222245 7222245c.pdf 7222245 7222245d.pdf 7222250 ... (1 Reply)
Discussion started by: kumar444
1 Replies

3. Shell Programming and Scripting

Split a file into several files using a data

Hi All, I have file(File1) with data like below: 102100|LName|Gender|Company|Branch|Bday|Salary|Age 102100|bbbb|male|cccc|dddd|19900814|15000|20| 102101|asdg|male|gggg|ksgu|19911216||| 102102|bdbm|male|kkkk|acke|19931018||23| 102102|kfjg|male|kkkc|gkgg|19921213|14000|24|... (2 Replies)
Discussion started by: sarav.shan
2 Replies

4. Shell Programming and Scripting

Split File data using awk

HI Guys, I need to split the file in to number of files . file contains FILEHEADER and EOF . I have to split n number of times . I have to form the file with each splitted message between FILEHEADER and EOF using awk beign and end . how to implement please suggest. (2 Replies)
Discussion started by: manish8484
2 Replies

5. Shell Programming and Scripting

How to split a data file into separate files with the file names depending upon a column's value?

Hi, I have a data file xyz.dat similar to the one given below, 2345|98|809||x|969|0 2345|98|809||y|0|537 2345|97|809||x|544|0 2345|97|809||y|0|651 9685|98|809||x|321|0 9685|98|809||y|0|357 9685|98|709||x|687|0 9685|98|709||y|0|234 2315|98|809||x|564|0 2315|98|809||y|0|537... (2 Replies)
Discussion started by: nithins007
2 Replies

6. Shell Programming and Scripting

split input data file and put into same output file

Hi All, I have two input file and need to generate a CSV file. The existing report just "GREP" the records with the Header and Tailer records with the count of records. Now i need to split the data into 25 records each in the same CSV file. id_file (Input file ) 227050994 232510151... (4 Replies)
Discussion started by: rasmith
4 Replies

7. Shell Programming and Scripting

Split, Search and Reformat by Data Group

Hi, I am writing just to share my appreciation for help I have received from this site in the past. In a previous post Split File by Data Group I received a lot of help with a troublesome awk script to reformat some complicated data blocks. What I learned really came in hand recently when I... (1 Reply)
Discussion started by: mkastin
1 Replies

8. Shell Programming and Scripting

split file based on group count

Hi, can some one please help me to split the file based on groups. like in the below scenario x indicates the begining of the group and the file should be split each with 2 groups below there are 10 groups it should create 5 files. could you please help? (4 Replies)
Discussion started by: hitmansilentass
4 Replies

9. Shell Programming and Scripting

Can I split a 10GB file into 1 GB sizes using my repeating data pattern

I'm not a unix guy so excuses my ignorance... I'm the database ETL guy. I'm trying to be proactive and devise a plan B for a ETL process where I expect a file 10X larger than what I process daily for a recast job. The ETL may handle it but I just don't know. This file may need to be split... (3 Replies)
Discussion started by: john091
3 Replies

10. HP-UX

Need to split a large data file using a Unix script

Greetings all: I am still new to Unix environment and I need help with the following requirement. I have a large sequential file sorted on a field (say store#) that is being split into several smaller files, one for each store. That means if there are 500 stores, there will be 500 files. This... (1 Reply)
Discussion started by: SAIK
1 Replies
Login or Register to Ask a Question