Splitting text file to several other files using sed.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Splitting text file to several other files using sed.
# 1  
Old 03-13-2008
Splitting text file to several other files using sed.

I'm trying to figure out how to do this efficiently with as little execution time as possible and I'm pretty sure using sed is the best way. However I'm new to sed and all the reading and examples I've found don't seem to show a similar exercise:

I have a long text file (i'll call it all_files.txt) listing all the files on the system, each line showing the checksum, permissions, date, and file name with path. For example:

683706D9 104775 Sep 27 12:00:04 1999 /bin/Audio
4C799E06 100775 Nov 14 17:33:11 1997 /bin/Blkfsys
C851669A 104775 Oct 04 14:08:38 1996 /bin/Dev16
CA4B42E7 100775 Nov 21 11:58:06 1996 /bin/Dev16.ansi
FF4396D0 100775 Oct 04 14:06:03 1996 /bin/Dev16.par

Some of these files are categorized according to some other text files listing the files belonging to that category. For example, the file catA.dat may be the following:

/bin/Dev16.par
/bin/some_other_file
/home/stuff/another_file

Similar lists would exist for catB.dat and catC.dat.

What should happen is all the lines in the original file which belong to a certain category will be deleted from the original file and copied to a new file, say catA_list and catB_list, etc. So in the end only the files not assigned to any category are left in all_files.txt.

Is there an easy way to do this? I've figured out how to use sed to delete lines, but to output them to 3 different files based on matches from reference text files is confusing me. Any ideas would be greatly appreciated!!
# 2  
Old 03-13-2008
The following example demonstrates how to write results
out to 3 different files

Code:
#!/usr/bin/ksh

tmp=file.$$

cat <<EOT >$tmp
683706D9 104775 Sep 27 12:00:04 1999 /bin/Audio
4C799E06 100775 Nov 14 17:33:11 1997 /bin/Blkfsys
C851669A 104775 Oct 04 14:08:38 1996 /bin/Dev16
CA4B42E7 100775 Nov 21 11:58:06 1996 /bin/Dev16.ansi
FF4396D0 100775 Oct 04 14:06:03 1996 /bin/Dev16.par
EOT

sed -n -e '/^68/w ./out1' -e '/^C8/w ./out2' -e '/^CA/w ./out3' $tmp

rm $tmp
exit 0

# 3  
Old 03-13-2008
Thanks! That helps, although I'm seeing other complications here. For one, I won't know what I'm searching for since this will come as input in from other files (catA_list, catB_list, catC_list). I was initially thinking I could use a loop to read each file from the category lists, working on each category at a time:

while read AFILE
do
sed -n -e '\|"$AFILE"$| {
w /catA_list
d
}' <all_files.txt>tmpfile
done<catA.dat


However 2 more loops would have to be used for Category B and Category C files. Essentially this would be looping through the file many times. I'm not even sure this would work correctly. I'm thinking there has to be a more efficient way to do this.
# 4  
Old 03-14-2008
Ok, I've figured out a solution that mostly works, however I can't get it to work when passing an argument into the sed regular expression. I basically have this for extracting any line for a file in category A:

while read AFILE
do
sed -e '\|'"$AFILE"'$|{
w /tmp/catA_list
d
}' </tmp/all_files>/tmp/non_AFiles

done<catA.dat


The regular expression works fine if I substitute a specific file name. But passing the argument this way it will only find one of the files in the catA list and not the others. Is there a better way to pass the argument here?

Also the file name includes the full path, hence why '/' is not being used as a delimiter for the expression.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting a text file into smaller files with awk, how to create a different name for each new file

Hello, I have some large text files that look like, putrescine Mrv1583 01041713302D 6 5 0 0 0 0 999 V2000 2.0928 -0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 5.6650 0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3.5217 ... (3 Replies)
Discussion started by: LMHmedchem
3 Replies

2. Shell Programming and Scripting

Frustrating in splitting text files

Duplicate threads merged Dear all, I have been working with a very large text file manually. I'm ordering how to do this with a script. The gamma should be straightforward: I just want split the text into multiple files. The file name should be "CP1", "TS1 for the second step", "PR1 for... (3 Replies)
Discussion started by: liuzhencc
3 Replies

3. Shell Programming and Scripting

Splitting a delimited text file

Howdy folks, I've got a very large plain text file that I need to split into many smaller files. My script-fu is not powerful enough for this, so any assistance is much appreciated. The file is a database dump from Cyrus IMAP server. It's basically a bunch of emails (thousands) all... (13 Replies)
Discussion started by: lupin..the..3rd
13 Replies

4. Shell Programming and Scripting

Sed: Splitting A large File into smaller files based on recursive Regular Expression match

I will simplify the explaination a bit, I need to parse through a 87m file - I have a single text file in the form of : <NAME>house........ SOMETEXT SOMETEXT SOMETEXT . . . . </script> MORETEXT MORETEXT . . . (6 Replies)
Discussion started by: sumguy
6 Replies

5. UNIX for Dummies Questions & Answers

Splitting up a text file into multiple files by columns

Hi, I have a space delimited text file with multiple columns 102 columns. I want to break it up into 100 files labelled 1.txt through 100.txt (n.txt). Each text file will contain the first two columns and in addition the nth column (that corresponds to n.txt). The third file will contain the... (1 Reply)
Discussion started by: evelibertine
1 Replies

6. Shell Programming and Scripting

splitting a large text file into paragraphs

Hello all, newbie here. I've searched the forum and found many "how to split a text file" topics but none that are what I'm looking for. I have a large text file (~15 MB) in size. It contains a variable number of "paragraphs" (for lack of a better word) that are each of variable length. A... (3 Replies)
Discussion started by: lupin..the..3rd
3 Replies

7. Linux

Splitting a Text File by Rows

Hello, Please help me. I have hundreds of text files composed of several rows of information and I need to separate each row into a new text file. I was trying to figure out how to split the text file into different text files, based on each row of text in the original text file. Here is an... (2 Replies)
Discussion started by: dvdrevilla
2 Replies

8. Shell Programming and Scripting

Splitting text file into 2 separate files ??

Hi All, I am new to this forumn as well to the UNIX, I have basic knowledge of UNIX which I studied some years ago, now I have to do some shell scripting to load data into Oracle database using sqlldr utility, whcih I am able to do. I have a requirement where I need to do following operation. I... (10 Replies)
Discussion started by: shekharjchandra
10 Replies

9. Shell Programming and Scripting

splitting text file into smaller ones

Hello We have a text file with 400,000 lines and need to split into multiple files each with 5000 lines ( will result in 80 files) Got an idea of using head and tail commands to do that with a loop but looked not efficient. Please advise the simple and yet effective way to do it. TIA... (3 Replies)
Discussion started by: prvnrk
3 Replies

10. Shell Programming and Scripting

splitting files based on text in the file

I need to split a file based on certain context inside the file. Is there a unix command that can do this? I have looked into split and csplit but it does not seem like those would work because I need to split this file based on certain text. The file has multiple records and I need to split this... (1 Reply)
Discussion started by: matrix1067
1 Replies
Login or Register to Ask a Question