Want to extract certain lines from big file

01-30-2016

Registered User

54, 1

Join Date: Nov 2015

Last Activity: 30 January 2019, 7:26 AM EST

Posts: 54

Thanks Given: 27

Thanked 1 Time in 1 Post

Quote:

Originally Posted by Don Cragun

I sincerely apologize. In each case, the output file you got had a filename derived from the 2nd field (i.e., the data between the 1st and 2nd tildes which seems to be a constant for the transactions you selected to print) in a line that contained a transaction number you wanted to print, and the contents of that file was the transactions starting with the transaction after the next to the last transaction number you requested in the big input file through the last transaction number you requested from the big input file.

It comes from me not getting nearly enough sleep, you not providing sample data that matched the actual format of your data, and from me not getting nearly enough sleep. (There were three problems and I'm blaming two of them on not getting enough sleep.) Now that I have cleaned up my test data to match what I believe is your current data format, the following seems to work. Please try this replacement:

Code:

#!/bin/ksh
big_file='/tmp/remedixz.20160120_085021_41222370_1'
trannum='/tmp/transnum'

awk -F '~' '
FNR == NR {
	# Gather transaction numbers...
	t[$1]
	tc = FNR
	next
}
{	# Gather transaction lines.
	l[++lc] = $0
}
$1 == "%%YEDTRN" && $3 in t {
	# We have found a transaction number for a transaction that is to be
	# extracted.  Save the transaction number and remove this transaction
	# from the transaction list.
	delete t[transnum = $3]
	file = FILENAME "_" transnum
	tc--
}
/^0000EOT/ {
	# If we have a transaction that is to be printed, print it.
	if(transnum) {
		# Print the transaction.
		for(i = 1; i <= lc; i++)
			print l[i] > file
		close(file)
		printf("Transaction #%s extracted to file %s\n", transnum, file)
		# Did we just print the last transaction requested?
		if(!tc)	{
			# Yes.  We are done.
			exit
		}
		# No.  Clear found transaction number.
		transnum = ""
	}
	# Reset for next transaction.
	lc = 0
}' "$trannum" "$big_file"

Hopefully, this will do what you want.

As stated before, if someone wants to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

Hi Don,

Thanks this was working as expected. it written all the 3 transactions as expected to separate files. I want to change the code in such a way that i want to write all three transactions set into single file. could you please help me?

Thanks.

mad man

View Public Profile for mad man

Find all posts by mad man

01-30-2016

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

I would be happy to help you!

So, exactly what pathname should this single output file have?

What good is this file going to be given that the script that will be reading this file can only handle a single transaction?

Looking at the awk script I provided, what do you think should be changed to produce a single output file instead of one output file per transaction?

My guess would be that one line needs to be removed and one line needs to be changed. And, it might make sense (as a minor optimization) to move that changed line from its current location into a BEGIN clause or an FNR==1 clause depending on whether the desired output file pathname is a constant or is a modification of the second input file's pathname).

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

01-30-2016

Registered User

54, 1

Join Date: Nov 2015

Last Activity: 30 January 2019, 7:26 AM EST

Posts: 54

Thanks Given: 27

Thanked 1 Time in 1 Post

Quote:

Originally Posted by Don Cragun

Hi Don,

I just changed

Code:

delete t[transnum = $3] to 
delete t[transnum = 123456]

print l[i] > file
print l[i] >> file

Now it started to write all the transaction numbers into a same output file

/tmp/remedixz.20160120_085021_41222370_1_123456

mad man

View Public Profile for mad man

Find all posts by mad man

01-30-2016

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by mad man

Hi Don,

I just changed

Code:

delete t[transnum = $3] to 
delete t[transnum = 123456]

print l[i] > file
print l[i] >> file

Now it started to write all the transaction numbers into a same output file

/tmp/remedixz.20160120_085021_41222370_1_123456

I will make the (hopefully not too wild guess) from this that the name of pathname of the output file you want is the pathname of the input file with the string _123456 appended.

The variable transnum in that awk script is intended to be the transaction number of the transaction that is being copied from the input file to the output file. And, since your transaction numbers are 19 character alphanumeric strings (not six digit decimal strings), setting transnum = 123456 is NOT appropriate.

Changing the:

Code:

print l[i] > file

to:

Code:

print l[i] >> file

means that instead of creating a new output file each time you run this script, it will append all of the transactions requested on the latest run to the output produced on any earlier runs. This would not seem to be a desirable side effect.

Please undo the changes you made and make the following changes instead:
First, change the line:

Code:

	file = FILENAME "_" transnum

to:

Code:

	file = FILENAME "_123456"

and, second, delete the line:

Code:

		close(file)

With these changes, the transaction number printed when a transaction is copied to the output file will again be printed correctly and a single output file will be produced each time the script is run (and will contain only the transactions extracted on that execution of the script). Later executions of the script will replace the contents of that file (if it still exists from an earlier run) or create that file (if it had been removed).

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

Shell Programming and Scripting

Want to extract certain lines from big file

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract Big and continuous regions

Discussion started by: amrutha_sastry

2. UNIX for Beginners Questions & Answers

How to copy only some lines from very big file?

Discussion started by: phamnu

3. Shell Programming and Scripting

Extract certain columns from big data

Discussion started by: happypoker

4. Shell Programming and Scripting

Extract certain entries from big file:Request to check

Discussion started by: manigrover

5. UNIX for Advanced & Expert Users

Delete first 100 lines from a BIG File

Discussion started by: unohu

6. Shell Programming and Scripting

Extract some lines from one file and add those lines to current file

Discussion started by: snreddy_gopu

7. Shell Programming and Scripting

Re: Deleting lines from big file.

Discussion started by: dipeshvshah

8. Shell Programming and Scripting

Print #of lines after search string in a big file

Discussion started by: prash184u

9. UNIX for Dummies Questions & Answers

How big is too big a config.log file?

Discussion started by: NeedLotsofHelp

10. UNIX for Dummies Questions & Answers

How to view a big file(143M big)

Discussion started by: chenhao_no1