Shrinking a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Shrinking a file
# 1  
Old 11-07-2003
Shrinking a file

Hi All

I have a somewhat complex requirement.
I have a file containing about 1 million records
The records in the file are of fixed length
Every record begins with 03,04 ,05 or 06

03 record is parent record.
04 05 and 06 are child records
Every 03 record can have
zero ,1 or more than 1 child record.

The position 12:16 of 03 record indicates the TYPE .Every record has a TYPE There are about
10 unique file TYPEs in the whole file


Now my requirement is that
IF number of 03 records for a particular TYPE is less than 100 ,
then ALL the 03 records for that TYPE along with their CHILD records should be written to a new file.

IF number of 03 records for a particular TYPE is more than 100 ,
then ONLY 2% of the 03 records(selected randomly) for that TYPE along with their corresponding CHILD records should be written to a new file.


Please note that all the records need to go to a single file.

Is it somehow possible to achieve this using shell scripts.

Thanks in advance
Ashwin N.



Thanks
Ashwin N.
# 2  
Old 11-07-2003
If I had to do this in a shell script I would use the awk command.
# 3  
Old 11-10-2003
Re: Shrinking a file

Yes. But how do I get the lines in the file
randomly.
I have worked on few simple shell scripts
but never really used sed and awk

Thanks
Ashwin N.
# 4  
Old 11-10-2003
The HP-UX 11.00 awk documents random functions that can be used within an awk script.

Run the command 'man awk' to get a description of what awk can do.
# 5  
Old 11-10-2003
I think that I would do two passes of the data file. First to find all types that occur more that 100 times...
Code:
awk '
  /^03/ { a[substr($0,12,5)]++ }
  END { for(i in a) if(a[i]>100) print i }
  ' data_file > tmp_file

And then a second pass to filter out data records...
Code:
awk '
  BEGIN { srand; while(getline < "tmp_file") a[i++]=$1 }
  /^03/ { f=0; for(i in a) if(substr($0,12,5)==a[i]) f=1 }
  /^03/ || f==0 || f==1 && rand*100<2
  ' data_file > new_data_file

Hope that this will get you programming in awk.
# 6  
Old 11-10-2003
This can be done in most languages. I would go with c mostly based on file size. If not c, then I would use ksh. The choice of language really is just a personal choice.

This task is tough enough that it will require a real programmer. You say that you have worked on a few simple shell scripts. Unless you have several years of programming experience in some procedural language it's unlikely that you will succeed.

To answer your question, you need to make a pass though the file creating lists of each of the types of 03 records. Then you need to examine the lists. If a list has less then 100 items, all items on the list will be marked ok. If the list has over 100 items, only some will be marked ok.

Since you counted the elements, you can compute 2% of that number. Now you know how many to mark ok.

To mark one ok, generate a random integer between 1 and n where n in the number of non-ok elements in the list. Now scan the list from element 1 to the end and find the selected element. Mark it ok. Subtract one from your number of non-ok elements on the list. Generate a new random integer between 1 and current number of non-ok elements. Find that one. And so on until you have marked the required number ok.

After you have done this with each list, go back to main file. For each 03 record, see it it's marked ok on the lists. If so write it and the child records to the final output file.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Shell script (sh file) logic to compare contents of one file with another file and output to file

Shell script logic Hi I have 2 input files like with file 1 content as (file1) "BRGTEST-242" a.txt "BRGTEST-240" a.txt "BRGTEST-219" e.txt File 2 contents as fle(2) "BRGTEST-244" a.txt "BRGTEST-244" b.txt "BRGTEST-231" c.txt "BRGTEST-231" d.txt "BRGTEST-221" e.txt I want to get... (22 Replies)
Discussion started by: pottic
22 Replies

2. UNIX for Dummies Questions & Answers

Shrinking partitions in enterprise

Hello , to invent some software we need to massive shrink system volumes in our company (800 pc windows xp) . For that purpose we are planing to run remotely some linux distro with fdisk script. Some one have similar experience? ? Any information would be helpfull . THanks.;) (3 Replies)
Discussion started by: Y2J
3 Replies

3. Shell Programming and Scripting

Compare 2 text file with 1 column in each file and write mismatch data to 3rd file

Hi, I need to compare 2 text files with around 60000 rows and 1 column. I need to compare these and write the mismatch data to 3rd file. File1 - file2 = file3 wc -l file1.txt 58112 wc -l file2.txt 55260 head -5 file1.txt 101214200123 101214700300 101250030067 101214100500... (10 Replies)
Discussion started by: Divya Nochiyil
10 Replies

4. Solaris

Shrinking zpool

Hello experts, I have a solaris 10 (SunOS 5.10 Generic_148888-05 sun4u sparc SUNW,SPARC-Enterprise) that by mistake I added a second san space of 700g to the pool. the whole pool is now 1.2T and, I need to take the space away from the pool and, make the pool 700g total. this is live oracle... (7 Replies)
Discussion started by: afadaghi
7 Replies

5. Shell Programming and Scripting

Match list of strings in File A and compare with File B, C and write to a output file in CSV format

Hi Friends, I'm a great fan of this forum... it has helped me tone my skills in shell scripting. I have a challenge here, which I'm sure you guys would help me in achieving... File A has a list of job ids and I need to compare this with the File B (*.log) and File C (extend *.log) and copy... (6 Replies)
Discussion started by: asnandhakumar
6 Replies

6. Linux

About shrinking LVM and then adding the freed space to another OS on dualboot system

Hi all, Fedora 17, 64bit Ubuntu 12.04 desktop 64bit HD 160G I installed Ubuntu 12.04 on the HD first taking up the whole disc. Later I added/installed Fedora 17 selecting the "Shrink" option and save the bootloader on /dev/sda1 to make them dualboot. Installation is successful with... (0 Replies)
Discussion started by: satimis
0 Replies

7. Windows & DOS: Issues & Discussions

Consolidating Freespace to allow shrinking partition?

i have an "old" laptop with 84gb used space, 203gb free, running 32bit Windows Vista. i've tried all defragmenting programs i could find and though some offer Free Space Defrag, they don't seem to take into account where on the disk to consolidates the space to. what i am trying to achieve is... (4 Replies)
Discussion started by: Sterist
4 Replies

8. AIX

shrinking filesystem error

Hi Guys, probably a standard issue, so what do I miss here? Error message: > chfs -a size=-128M /export/nim/aix/5300-10 chfs: 0506-964 There is not enough free space to shrink the file system. df shows 0.75GB free > df -g . Filesystem GB blocks Free %Used Iused %Iused... (7 Replies)
Discussion started by: raba
7 Replies

9. Linux

shrinking root partition and using free space to create a block device

We are intending to protect a set of user specified files using LVM mirroring where the protected space on which the user files are stored is mirrored on an LV on a different disk. Our problem is that for a user with a custom layout has installed linux with 2 partitons for swap and / and there is... (0 Replies)
Discussion started by: kickdgrass
0 Replies
Login or Register to Ask a Question