Want to extract certain lines from big file

01-22-2016

Registered User

54, 1

Join Date: Nov 2015

Last Activity: 30 January 2019, 7:26 AM EST

Posts: 54

Thanks Given: 27

Thanked 1 Time in 1 Post

Want to extract certain lines from big file

Hi All,

I am trying to get some lines from a file i did it with while-do-loop. since the files are huge it is taking much time. now i want to make it faster.

The requirement is the file will be having 1 million lines.
The format is like below.

Code:

##transaction, , , ,blah, blah
%%blah~trannum~blah~blah~blah
0000content01
0001content02
.
.
0010contentnn
0000EOT
##transaction, , , ,blah, blah
%%blah~trannum~blah~blah~blah
0000content01
0001content02
.
.
0010contentnn
0000EOT
##transaction, , , ,blah, blah
%%blah~transnum~blah~blah~blah
0000content01
0001content02
.
.
0010contentnn
0000EOT

What i know from the file is transnum in a set. I want to copy the ##transaction to till the next EOT for the particular transnum.
Also my requirement is from that file i want to copy only one set because my process will know only one transnum only.
So my output file will have only 10 to 15 lines (Only 1 transaction)
So please help me thanks.

Last edited by vbe; 01-22-2016 at 11:58 AM.. Reason: code tags please!

mad man

View Public Profile for mad man

Find all posts by mad man

01-22-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Please use code tags as required by forum rules, and please show your attempts so far as well. Is above the exact file structure? No empty lines? Is "EOT" exactly this string? Or a token/control char?

Last edited by RudiC; 01-23-2016 at 06:46 AM.. Reason: typo

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

01-22-2016

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

In addition to what RudiC already said, if every transaction contains the literal line:

Code:

%%blah~transnum~blah~blah~blah

how do you know which transnum set you want? We might guess that blah isn't literal and we might guess that transnum isn't literal and that transnum is different in each set, but you haven't given us enough information to make a reasonable guess at a BRE that will match the transum set you want.

Please give us:

some more realistic sample data,
a description of any file(s) that your script is expected to read,
a description of any file(s) that your script is expected to write,
a description of any arguments you intend to pass to your script,
the operating system and shell you're using, and
the exact output you want your script to produce with the sample data provided in #1 above and sample arguments you provided in $4 above.

(And, don't forget to use CODE tags when showing us your sample input, sample output, and your attempts at writing a script to perform these tasks.)

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

01-23-2016

Registered User

54, 1

Join Date: Nov 2015

Last Activity: 30 January 2019, 7:26 AM EST

Posts: 54

Thanks Given: 27

Thanked 1 Time in 1 Post

@RudiC
1. The above sample is the exact structure of my input file these set of lines from ##transaction .. to 000EOT will be repeated. There are no empty lines in between and 0000EOT is the exact string. There are no other token/control characters in the file.
Thanks.

---------- Post updated at 06:59 PM ---------- Previous update was at 06:40 PM ----------

@Don:
The transnum will be read from other file. The file extraction part, is a part of my script. This script is a long script. After extraction from the file we will be processing the transaction. so the transaction extraction part is making the performance issue. The transnum will be in a variable. After reading the file line by line i will cut the "transnum" with tilde delimiter and then i will use if condition to check if they are matching. If they match i will copy the current line(earlier line will be copied in to a new variable) and subsequent lines until next EOT into a new file.

mad man

View Public Profile for mad man

Find all posts by mad man

01-23-2016

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Try something like:

Code:

awk '{p=p $0 RS} /EOT/{if(p~s)printf "%s",p; p=x}' s='~trannum~' file

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

01-23-2016

Registered User

54, 1

Join Date: Nov 2015

Last Activity: 30 January 2019, 7:26 AM EST

Posts: 54

Thanks Given: 27

Thanked 1 Time in 1 Post

Just thought to say this added info
I am using AIX version of unix

---------- Post updated at 09:21 PM ---------- Previous update was at 08:42 PM ----------

@Don
Please give us:
some more realistic sample data,

Code:

##PAYMNT, , , ,blah, blah
%%YEDTRN~trannum~blah~blah~blah
0000content01
0001content02
.
.
0010contentnn
0000EOT

In my above input sample, the tags "##PAYMNT", "%%YEDTRN", & "0000EOT" are the constant values, all the other values are varying with transactions.
a description of any file(s) that your script is expected to read

Code:

Input file is a transaction details file- flat file

a description of any file(s) that your script is expected to write,

Code:

Out put file should have 1 of the desired transaction record

a description of any arguments you intend to pass to your script,

Code:

The argument to this part of the script is a transaction number value which will be in script

the operating system and shell you're using

Code:

AIX 6 OS and korn shell

So kindly give me your different suggestions.
Thanks

---------- Post updated at 09:34 PM ---------- Previous update was at 09:21 PM ----------

@Scrutinizer - Thanks for your reply i request you to go thru my above explanation to Don. So kindly give my more possible commands. So i will try them when i reach office tomorrow.

Thanks

mad man

View Public Profile for mad man

Find all posts by mad man

01-23-2016

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

It looks like Scrutinzer's suggestion should work just fine as long as:

trannum does not contain any characters that are special in an ERE, and
the number of bytes in a single transaction (from ##PAYMNT through 0000EOT is not more than 2047 bytes.

So:

What is the format of trannum? Is it all alphanumeric characters? (If it isn't all alphanumeric characters, what characters can be included in a trannum?) How many characters are in a trannum? (Is it always the same number of characters or does it vary? If it varies, what are the minimum and maximum number of characters in a trannum?)
What is the maximum number of bytes (not characters; bytes) in a transaction? If that number is larger than 2047, what is the maximum number of bytes in a single line in a transaction? (As long as the number of byte in a line (including the terminating <newline> character is no larger than 2048 bytes, we can easily do that. If it is more than 2048 bytes, it takes more work to get what you want on AIX.)

I would do it slightly differently (to quit after the desired transaction is found):

Code:

awk '{p=p $0 RS} /EOT/{if(p~s){printf "%s",p;exit}else p=x}' s="~$trannum~" file

which should cut the time awk spends reading your large file about in half, on average.

But, the way to make big gains here would be to search for and extract multiple transactions in a single pass through your large file. If you could, for example, extract 10 transactions at a time, you would only have to read the large file once instead of 10 times and you would only have to invoke awk once instead of 10 times; both of which would be big wins for performance.

Note that extracting 10 transactions at a time does not mean that the extracted transactions would all be saved in a single file; each transaction could easily be extracted into a separate file. And, 10 is just an example; an awk script could easily extract thousands of transactions into separate files in a single pass through your large transaction file increasing your script's processing speed immensely if your script is being used to process thousands of transactions. Note also that this is why we want details about what you are doing instead of vague statements about a tiny piece of the script you are writing. The more we know, the better chance we have of making a suggestion that will significantly improve your script.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

Shell Programming and Scripting

Want to extract certain lines from big file

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract Big and continuous regions

Discussion started by: amrutha_sastry

2. UNIX for Beginners Questions & Answers

How to copy only some lines from very big file?

Discussion started by: phamnu

3. Shell Programming and Scripting

Extract certain columns from big data

Discussion started by: happypoker

4. Shell Programming and Scripting

Extract certain entries from big file:Request to check

Discussion started by: manigrover

5. UNIX for Advanced & Expert Users

Delete first 100 lines from a BIG File

Discussion started by: unohu

6. Shell Programming and Scripting

Extract some lines from one file and add those lines to current file

Discussion started by: snreddy_gopu

7. Shell Programming and Scripting

Re: Deleting lines from big file.

Discussion started by: dipeshvshah

8. Shell Programming and Scripting

Print #of lines after search string in a big file

Discussion started by: prash184u

9. UNIX for Dummies Questions & Answers

How big is too big a config.log file?

Discussion started by: NeedLotsofHelp

10. UNIX for Dummies Questions & Answers

How to view a big file(143M big)

Discussion started by: chenhao_no1