Sponsored Content
Top Forums Shell Programming and Scripting Splitting a text file into smaller files with awk, how to create a different name for each new file Post 303027062 by LMHmedchem on Friday 7th of December 2018 11:03:58 PM
Old 12-08-2018
Splitting a text file into smaller files with awk, how to create a different name for each new file

Hello,

I have some large text files that look like,
Code:
putrescine
  Mrv1583 01041713302D          

  6  5  0  0  0  0            999 V2000
    2.0928   -0.2063    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    5.6650    0.2063    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    3.5217   -0.2063    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.2361    0.2063    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8072    0.2063    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.9504   -0.2063    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  5  1  0  0  0  0
  2  6  1  0  0  0  0
  3  4  1  0  0  0  0
  3  5  1  0  0  0  0
  4  6  1  0  0  0  0
M  END
> <num>
1

> <name>
putrescine

$$$$
bis(hexamethylene)triamine.mol
  Mrv1583 01041713302D          

 15 14  0  0  0  0            999 V2000
    6.4898    1.0450    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    7.2042    1.4575    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    7.9187    1.0450    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    8.6332    1.4575    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    9.3477    1.0450    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   10.0621    1.4575    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   10.7766    1.0450    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   11.4911    1.4575    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   12.2055    1.0450    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   12.9200    1.4575    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   13.6345    1.0450    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   14.3490    1.4575    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   15.0634    1.0450    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   15.7779    1.4575    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   16.4924    1.0450    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
  3  4  1  0  0  0  0
  4  5  1  0  0  0  0
  5  6  1  0  0  0  0
  6  7  1  0  0  0  0
  7  8  1  0  0  0  0
  8  9  1  0  0  0  0
  9 10  1  0  0  0  0
 10 11  1  0  0  0  0
 11 12  1  0  0  0  0
 12 13  1  0  0  0  0
 13 14  1  0  0  0  0
 14 15  1  0  0  0  0
M  END
> <num>
2

> <name>
bis(hexamethylene)triamine

$$$$

There can be thousands of records and there is no specific length for each record as far as the number of lines or tag fields between MEND and $$$$. Each record ends with the $$$$ terminator. I am trying to divide large files into a number of smaller files, each with the same number of records.

This code attempts to do this,
Code:
#! /bin/sh

# input file name
input_file=${1:-input.txt}
# output file name
output_file=${2:-output.txt}
# number of compounds per sdf file
split_number=${3:-6}

cat $input_file | \
awk -v split=$split_number ' { OUT[++CNT] = $0;  }
                $0 == "$$$$" { ++MOLS }
             $MOLS == $split { for(i in OUT) print OUT[i]; delete OUT; MOLS = 0 }
                         END { for(i in OUT) print OUT[i] }
                           ' > $output_file

by storing rows in OUT[] until a counter is reached (the desired number of records in each subfile) and then printing the rows, clearing the array, and resetting the counter. This also attempts to trap if EOF is reached before the counter reaches the set number.

The obvious problem is that there is no way to change the output file name for each subsequent write, so I will only end up with the last file. I think I can change the value of $output_file with the awk code but I think the awk here runs in a different subshell than bash, so I don't think that will work.

If I could run the awk only on specific lines of the file, I think I could call awk from a bash loop and make that work but I am guessing there is an easier way. I am running this in 32-bit cygwin so have everything available from that kit.

Suggestions would be appreciated.

LMHmedchem
This User Gave Thanks to LMHmedchem For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

splitting files based on text in the file

I need to split a file based on certain context inside the file. Is there a unix command that can do this? I have looked into split and csplit but it does not seem like those would work because I need to split this file based on certain text. The file has multiple records and I need to split this... (1 Reply)
Discussion started by: matrix1067
1 Replies

2. Shell Programming and Scripting

Splitting text file to several other files using sed.

I'm trying to figure out how to do this efficiently with as little execution time as possible and I'm pretty sure using sed is the best way. However I'm new to sed and all the reading and examples I've found don't seem to show a similar exercise: I have a long text file (i'll call it... (3 Replies)
Discussion started by: JeffV
3 Replies

3. Shell Programming and Scripting

Splitting a Larger File Into Mutiple Smaller ones.

Hello.. Iam in need to urgent help with the below. Have data-file with 40,567 and need to split them into multiple files with smaller line-count. Iam aware of "split" command with -l option which allows you to specify the no of lines in smaller files ,with the target file-name pattern... (1 Reply)
Discussion started by: madhubt_1982
1 Replies

4. UNIX for Dummies Questions & Answers

splitting the large file into smaller files

hi all im new to this forum..excuse me if anythng wrong. I have a file containing 600 MB data in that. when i do parse the data in perl program im getting out of memory error. so iam planning to split the file into smaller files and process one by one. can any one tell me what is the code... (1 Reply)
Discussion started by: vsnreddy
1 Replies

5. Shell Programming and Scripting

splitting text file into smaller ones

Hello We have a text file with 400,000 lines and need to split into multiple files each with 5000 lines ( will result in 80 files) Got an idea of using head and tail commands to do that with a loop but looked not efficient. Please advise the simple and yet effective way to do it. TIA... (3 Replies)
Discussion started by: prvnrk
3 Replies

6. Shell Programming and Scripting

Help with splitting a large text file into smaller ones

Hi Everyone, I am using a centos 5.2 server as an sflow log collector on my network. Currently I am using inmons free sflowtool to collect the packets sent by my switches. I have a bash script running on an infinate loop to stop and start the log collection at set intervals - currently one... (2 Replies)
Discussion started by: lord_butler
2 Replies

7. Shell Programming and Scripting

Splitting text file into 2 separate files ??

Hi All, I am new to this forumn as well to the UNIX, I have basic knowledge of UNIX which I studied some years ago, now I have to do some shell scripting to load data into Oracle database using sqlldr utility, whcih I am able to do. I have a requirement where I need to do following operation. I... (10 Replies)
Discussion started by: shekharjchandra
10 Replies

8. Shell Programming and Scripting

Splitting a file into several smaller files using perl

Hi, I'm trying to split a large file into several smaller files the script will have two input arguments argument1=filename and argument2=no of files to be split. In my large input file I have a header followed by 100009 records The first line is a header; I want this header in all my... (9 Replies)
Discussion started by: ramky79
9 Replies

9. UNIX for Dummies Questions & Answers

Splitting up a text file into multiple files by columns

Hi, I have a space delimited text file with multiple columns 102 columns. I want to break it up into 100 files labelled 1.txt through 100.txt (n.txt). Each text file will contain the first two columns and in addition the nth column (that corresponds to n.txt). The third file will contain the... (1 Reply)
Discussion started by: evelibertine
1 Replies

10. Shell Programming and Scripting

Sed: Splitting A large File into smaller files based on recursive Regular Expression match

I will simplify the explaination a bit, I need to parse through a 87m file - I have a single text file in the form of : <NAME>house........ SOMETEXT SOMETEXT SOMETEXT . . . . </script> MORETEXT MORETEXT . . . (6 Replies)
Discussion started by: sumguy
6 Replies
All times are GMT -4. The time now is 09:49 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy