Sponsored Content
Top Forums Shell Programming and Scripting Want to extract certain lines from big file Post 302965156 by Don Cragun on Sunday 24th of January 2016 09:06:32 AM
Old 01-24-2016
Quote:
Originally Posted by mad man
Hi Don,

Sorry, Since i am new to use blogging websites i am afraid of giving a banks transaction input structure in a public website. I am afraid since it might end up me in trouble and also apologize for a faulty input. I am learning 1 by 1 towards perfection.

I am going to try out your new suggestions will update you in 15 mins.

Thanks.

---------- Post updated at 05:40 PM ---------- Previous update was at 05:01 PM ----------

Hi Don

I am getting the below error after doing the changes what ever you have suggested.
Code:
awk: Cannot divide by zero.

 The input line number is 32042. The file is /tmp/remedixz.20160120_085021_41222370_1.
 The source line number is 18.

The line 32042 is the EOT line of the particular transaction reference number. Please find the code below
Code:
big_file='/tmp/remedixz.20160120_085021_41222370_1'
trannum="/tmp/transnum"

/tmp> cat /tmp/transnum
ABC160120XYZ0983921 

##In the above you can see the transnum given

awk -F '~' '
    FNR == NR {
      t[$1]
      tc = FNR
      next
      } 
      {
      l[++lc] = $0
      }
    $1 == "%%YEDTRN" && $3 in t {
        remove t[transnum = $2]
        tc--
    }

    /^0000EOT/ {
        if(transnum) {
            for(i = 1; i <= lc; i++)
                print l[i] > (/tmp/remedixz.20160120_085021_41222370_1_new "_" transnum)
            close(/tmp/remedixz.20160120_085021_41222370_1_new "_" transnum)
            printf("Transaction #%s extracted to file /tmp/remedixz.20160120_085021_41222370_1_new "_" transnum:%s\n", transnum,
                transnum)
        }
        if(tc) {
            lc = 0
            transnum = ""
        } else {
            exit
        }
    }' $trannum $file

This time i just directly gave the output file name rather than a variable.
Kindly let me know where i am missing something.

Thanks.
Realize that I have been up all night trying to help you (and it is now almost 6AM where I am), so I may not be thinking clearly. But, could you please explain why you chose to change the code I suggested:
Code:
			print l[i] > (FILENAME "_" transnum)

to:
Code:
                print l[i] > (/tmp/remedixz.20160120_085021_41222370_1_new "_" transom)

FILENAME is an awk variable holding the name of the current input file. But, /tmp/remedixz.20160120_085021_41222370_1_new is an attempt to divide nothing by the contents of the variable tmp divided by contents of the variable remedixz followed by a syntax error. And since neither tmp nor remedixz have been defined in this awk script, both are treated as a division by zero.

Would you PLEASE just try the following script without changing it:
Code:
#!/bin/ksh
big_file='/tmp/remedixz.20160120_085021_41222370_1'
transnums='/tmp/transnum'

awk -F '~' '
FNR == NR {
	# Gather transaction numbers...
	t[$1]
	tc = FNR
	next
}
{	# Gather transaction lines.
	l[++lc] = $0
}
$1 == "%%YEDTRN" && $3 in t {
	# We have found a transaction number for a transaction that is to be
	# extracted.  Save the transaction number and remove this transaction
	# from the transaction list.
	delete t[transnum = $2]
	file = FILENAME "_" transnum
	tc--
}
/^0000EOT/ {
	# If we have a transaction that is to be printed, print it.
	if(transnum) {
		# Print the transaction.
		for(i = 1; i <= lc; i++)
			print l[i] > file
		close(file)
		printf("Transaction #%s extracted to file %s\n", transnum, file)
		# Was this the last remaining transaction to be extracted?
		if(tc) {# No.  Reset for next transaction.
			lc = 0
			transnum = ""
		} else {# Yes.  Exit.
			exit
		}
	}
}' "$transnums" "$big_file"

Note that this has a few changes to match your latest description of your transaction format, has a typo fixed, and has some minor performance improvements. It also now includes your filenames (which had not been provided before).

If /tmp/transnum contains the single line:
Code:
ABC160120XYZ0983921

and there is a transaction in your big transaction file with that transaction number, it should produce a file named /tmp/remedixz.20160120_085021_41222370_1_ABC160120XYZ0983921 containing that transaction. And, as stated before, if /tmp/transnum contains multiple transaction numbers on separate lines, one invocation of this script will produce an output file for each transaction given.

If this all works, you could also add an END clause to print a list of any transaction numbers that were specified in your transaction numbers file that were not found in your big transactions file.
These 3 Users Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to view a big file(143M big)

1 . Thanks everyone who read the post first. 2 . I have a log file which size is 143M , I can not use vi open it .I can not use xedit open it too. How to view it ? If I want to view 200-300 ,how can I implement it 3 . Thanks (3 Replies)
Discussion started by: chenhao_no1
3 Replies

2. UNIX for Dummies Questions & Answers

How big is too big a config.log file?

I have a 5000 line config.log file with several "maybe" errors. Any reccomendations on finding solvable problems? (2 Replies)
Discussion started by: NeedLotsofHelp
2 Replies

3. Shell Programming and Scripting

Print #of lines after search string in a big file

I have a command which prints #lines after and before the search string in the huge file nawk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r;print;c=a}b{r=$0}' b=0 a=10 s="STRING1" FILE The file is 5 gig big. It works great and prints 10 lines after the lines which contains search string in... (8 Replies)
Discussion started by: prash184u
8 Replies

4. Shell Programming and Scripting

Re: Deleting lines from big file.

Hi, I have a big (2.7 GB) text file. Each lines has '|' saperator to saperate each columns. I want to delete those lines which has text like '|0|0|0|0|0' I tried: sed '/|0|0|0|0|0/d' test.txt Unfortunately, it scans the file but does nothing. file content sample:... (4 Replies)
Discussion started by: dipeshvshah
4 Replies

5. Shell Programming and Scripting

Extract some lines from one file and add those lines to current file

hi, i have two files. file1.sh echo "unix" echo "linux" file2.sh echo "unix linux forums" now the output i need is $./file2.sh unix linux forums (3 Replies)
Discussion started by: snreddy_gopu
3 Replies

6. UNIX for Advanced & Expert Users

Delete first 100 lines from a BIG File

Hi, I need a unix command to delete first n (say 100) lines from a log file. I need to delete some lines from the file without using any temporary file. I found sed -i is an useful command for this but its not supported in my environment( AIX 6.1 ). File size is approx 100MB. Thanks in... (18 Replies)
Discussion started by: unohu
18 Replies

7. Shell Programming and Scripting

Extract certain entries from big file:Request to check

Hi all I have a big file which I have attached here. And, I have to fetch certain entries and arrange in 5 columns Name Drug DAP ID disease approved or notIn the attached file data is arranged with tab separated columns in this way: and other data is... (2 Replies)
Discussion started by: manigrover
2 Replies

8. Shell Programming and Scripting

Extract certain columns from big data

The dataset I'm working on is about 450G, with about 7000 colums and 30,000,000 rows. I want to extract about 2000 columns from the original file to form a new file. I have the list of number of the columns I need, but don't know how to extract them. Thanks! (14 Replies)
Discussion started by: happypoker
14 Replies

9. UNIX for Beginners Questions & Answers

How to copy only some lines from very big file?

Dear all, I have stuck with this problem for some days. I have a very big file, this file can not open by vi command. There are 200 loops in this file, in each loop will have one line like this: GWA quasiparticle energy with Z factor (eV) And I need 98 lines next after this line. Is... (6 Replies)
Discussion started by: phamnu
6 Replies

10. Shell Programming and Scripting

Extract Big and continuous regions

Hi all, I have a file like this I want to extract only those regions which are big and continous chr1 3280000 3440000 chr1 3440000 3920000 chr1 3600000 3920000 # region coming within the 3440000 3920000. so i don't want it to be printed in output chr1 3920000 4800000 chr1 ... (2 Replies)
Discussion started by: amrutha_sastry
2 Replies
All times are GMT -4. The time now is 07:35 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy