Extract text and store in separate files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract text and store in separate files
# 1  
Old 09-25-2011
Extract text and store in separate files

Hi,

I have a file which looks like this:

Code:
.I 1
some text
.A
this is the first line
.I 2
some text again
.B 
this is the second line
.I 3
again some text
.C 
this is the third line

I want to have my output like this in separate files:

1.txt
Code:
.I 1
some text
.A
this is the first line

2.txt
Code:
.I 2
some text again
.B 
this is the second line

3.txt
Code:
.I 3
again some text
.C 
this is the third line

I have tried this but it does not work perfectly, though in some cases it works very well but I see in some files the output is not as expected. What I am doing in this code is that I am starting from the first .I and parsing the file until the next .I appears. Please note the my file does not look like the one shown above but the pattern begins with .I 1, .I 2 until the last .I N

Code:
awk '/.I/{c++}{print > c".txt"}' test_file.tst

I am using this in Linux

---------- Post updated at 05:17 PM ---------- Previous update was at 04:39 PM ----------

great got it myselfSmilie

the right way of doing this is

Code:
awk '/.I /{c++}{print > c".txt"}' test_file.tst

I missed one space after .I

---------- Post updated at 07:58 PM ---------- Previous update was at 05:17 PM ----------

This is odd. I tried on a small set of files it worked. But when I run this on the main file it doesn't. It does not create desirable results after .I 149 and I manually checked the main file at .I 149 and .I 150 there's nothing erroneous there.

Can anybody please help me? What happens is that 149 is split into two from the middle.
# 2  
Old 09-25-2011
There may be too many open files. Try;
Code:
awk '/.I /{close(c++".txt")}{print > c".txt"}' test_file.tst

This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 09-25-2011
Sorry to say it did not work same thing appears with this too. I am not able to make out as to what the problem is. Parsing breaks at 148.txt and when I checked 147.txt the format is the same as that with 148.txt only the text contents are different. But 147.txt is completely intact with full contents. But in 148.txt, I have some contents in 148.txt and rest in 149.txt

---------- Post updated at 10:35 PM ---------- Previous update was at 08:34 PM ----------

ok. I get what that problem is:

Let me paste the section of the file where the problem actually occurred:
Code:
.I 148
.U
870
.S
Br 
.M
abc
.T
tiger.
.P
JOURNAL ARTICLE.
.W
We succeeded in r.
.A
GoY.

The problem mainly occurs when I have the same character as . (character). In the example above after .W the sentence begins with letter W [We succeeded in r.]. hence, this is here my code breaks. This is happening consitently everywhere with any line which begins with the same character as .(charater)
For example:
The same thing could happen if after .M, line begins with M character. and if after .S, line begins with S character.
# 4  
Old 09-25-2011
Ow I see, it should be:
Code:
awk '/\.I /.....

or better yet:
Code:
awk '/^\.I /.....

If .I is always at the start of the line

Last edited by Scrutinizer; 09-25-2011 at 12:28 PM..
This User Gave Thanks to Scrutinizer For This Post:
# 5  
Old 09-25-2011
I am not able to replicate it. I tried with a small set with whatever you have observed. And I don't think whatever you said can cause the issue. It should be the no of files opened.

Let me try with a bigger set. Can you upload the file? (it will take some time to create, you see)

--ahamed
This User Gave Thanks to ahamed101 For This Post:
# 6  
Old 09-25-2011
Hi.

A problem category addressed by standard utility csplit:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate splitting based on context, csplit.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C csplit

FILE=${1-data1}

# Remove debris from previous runs.
rm -f xx*

pl " Input data file $FILE:"
cat $FILE

pl " Results:"
csplit -k -z -q $FILE '/^\.I/' '{*}'
ls -lgG xx*

SAMPLE=xx02
pl " Sample of output file $SAMPLE:"
cat $SAMPLE

exit 0

producing:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
GNU bash 3.2.39
csplit (GNU coreutils) 6.10

-----
 Input data file data1:
.I 1
some text
.A
this is the first line
.I 2
some text again
.B 
this is the second line
.I 3
again some text
.C 
this is the third line
.I 148
.U
870
.S
Br 
.M
abc
.T
tiger.
.P
JOURNAL ARTICLE.
.W
We succeeded in r.
.A
GoY.

-----
 Results:
-rw-r--r-- 1 41 Sep 25 17:45 xx00
-rw-r--r-- 1 49 Sep 25 17:45 xx01
-rw-r--r-- 1 48 Sep 25 17:45 xx02
-rw-r--r-- 1 88 Sep 25 17:45 xx03

-----
 Sample of output file xx02:
.I 3
again some text
.C 
this is the third line

Your version of csplit may differ slightly, see man csplit.

Best wishes ... cheers, drl
This User Gave Thanks to drl For This Post:
# 7  
Old 09-25-2011
Hi All,

Thanks for your help. I tried with this:

Code:
awk '/^\.I /.....

and it worked.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Separate columns into different text files

Hi I have large text file consisting of five columns. Sample of the file is give below: ed 2-4 12.0 commons that they depended on. मानवों नष्ट किया जिन पर वो आधारित थे। ed 3-1 12.0 Almost E, but would be over. रचना करीब करीब ई तक जाती है, मगर तब तो नाटक ख़त्म हो... (2 Replies)
Discussion started by: my_Perl
2 Replies

2. Shell Programming and Scripting

Most efficient method to extract values from text files

I have a list of files defined in a single file , one on each line.(No.of files may wary each time) eg. content of ETL_LOOKUP.dat /data/project/randomname /data/project/ramname /data/project/raname /data/project/radomname /data/project/raame /data/project/andomname size of these... (5 Replies)
Discussion started by: h0x0r21
5 Replies

3. Shell Programming and Scripting

How to separate a statement based on some delimiter and store each field in a variable?

Hi, Variable1 = MKT1,MKT2,MKT3,MKT4 Now i want to store each of these value seperated by comma to a array and access each of the values. Also find out number of such values seperated by comma. Variable1 can have any number of values seperated by comma. Thanks :) (3 Replies)
Discussion started by: arghadeep adity
3 Replies

4. Shell Programming and Scripting

Extract lines from text files

I have some files containing the following data # RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N TCO KAPPA ALPHA PHI PSI X-CA Y-CA Z-CA 1 196 A M 0 0 230 0, 0.0 2,-0.2 0, 0.0 0, 0.0 0.000 360.0 360.0 360.0 76.4 21.7 -6.8 11.3 2 197 A D + 0 0 175 1,-0.1 2,-0.1 0, 0.0 0, 0.0... (10 Replies)
Discussion started by: edweena
10 Replies

5. UNIX for Dummies Questions & Answers

Extract unique combination of rows from text files

Hi Gurus, I have 100 tab-delimited text files each with 21 columns. I want to extract only 2nd and 5th column from each text file. However, the values in both 2bd and 5th column contain duplicate values but the combination of these values in a row are not duplicate. I want to extract only those... (3 Replies)
Discussion started by: Unilearn
3 Replies

6. Programming

How to extract data from CVS log files and store it in database ?

Am currently working on CVS projects .. I have generated the cvs log file which is in the RCS file format . .I want to extract file path ,total revision ,date ,author and message from that file . .I want a program in java which would extract the data from cvs log file. .Pls help me out.. My... (0 Replies)
Discussion started by: EVERSOFT
0 Replies

7. Shell Programming and Scripting

Combine the lines from separate text files

Hi All, I have three separate text files which has only one line and i want to combine these lines in one text file which will have three lines. cat file1.txt abc cat file2.txt 1265 6589 1367 cat file3.txt 0.98 0.36 0.5 So, I want to see these three lines in the... (9 Replies)
Discussion started by: senayasma
9 Replies

8. UNIX for Dummies Questions & Answers

Separate text files in a folder by word count

Hi, been searching Google for like an hour and I doubt I got the right keywords so might as well ask here. What I need: Before: Article1.txt 564 Article2.txt 799 Article3.txt 349 Article4.txt 452 * Separate files over 400 wordcount * After: Article1.txt 564... (3 Replies)
Discussion started by: Richard2000
3 Replies

9. Shell Programming and Scripting

Splitting text file into 2 separate files ??

Hi All, I am new to this forumn as well to the UNIX, I have basic knowledge of UNIX which I studied some years ago, now I have to do some shell scripting to load data into Oracle database using sqlldr utility, whcih I am able to do. I have a requirement where I need to do following operation. I... (10 Replies)
Discussion started by: shekharjchandra
10 Replies

10. Shell Programming and Scripting

extract nth line of all files and print in output file on separate lines.

Hello UNIX experts, I have 124 text files in a directory. I want to extract the 45678th line of all the files sequentialy by file names. The extracted lines should be printed in the output file on seperate lines. e.g. The input Files are one.txt, two.txt, three.txt, four.txt The cat of four... (1 Reply)
Discussion started by: yogeshkumkar
1 Replies
Login or Register to Ask a Question