Want to extract certain lines from big file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Want to extract certain lines from big file
# 22  
Old 01-24-2016
Hi RudiC

Let me try this sed in your last post.

Thanks.
# 23  
Old 01-24-2016
Quote:
Originally Posted by mad man
Hi Don,

Sorry for the inconvenience.

The code you have posted last is not working for me please find the way how i used it.

Code:
 
big_file='/tmp/remedixz.20160120_085021_41222370_1'
trannum="/tmp/transnum"
file_new="${big_file}_23962395676"
awk -F '~' '
FNR == NR {
	t[$1]
	tc = FNR
	next
}
{
	l[++lc] = $0
}
$1 == "%%YEDTRN" && $2 in t {
	remove t[transnum = $2]
	tc--
}
$1 == "0000EOT" {

	if(transnum) {
		for(i = 1; i <= lc; i++)
			print l[i] > ("$file_new:" transnum)
		close("$file_new:" transnum)
		printf("Transaction #%s extracted to file $file_new:%s\n", transnum,
		    transnum)
	}
	if(tc) {
		lc = 0
		transnum = ""
	} else {
		exit
	}
}' $trannum $big_file

---------- Post updated at 03:25 PM ---------- Previous update was at 03:13 PM ----------

Hi Don,

The SED you have modified and posted did not thrown any error like last time but the output file is empty.

Thanks
You have already been told that shell variables are not expanded inside single quotes! This is true in any shell script. It doesn't matter whether the single quoted string is a sed script inside a shell script or an awk script inside a shell script.

What is in the file named /tmp/transnum? As stated in my post describing this script, that file must contain a list of one or more transaction numbers to be extracted, with one transaction number per line. IF YOU DO NOT PUT THE TRANSACTION NUMBERS YOU WANT TO EXTRACT IN THAT FILE, MY SCRIPT CANNOT WORK! There is nothing shown in your script that puts any data in /tmp/transnum.

What am I missing? Why is 23962395676 important as the last part of your output filename before the transaction number. (We know this is not a transaction number because you have told us that transaction numbers are 19 characters (not 12). And, the code I provided already included the transaction number as the last part of the output file's pathname. If what you want is the 2nd input file's pathname followed by an underscore followed by the transaction number; just change every occurrence of:
Code:
"TX:" transnum

in the script I posted in post #14 in this thread to:
Code:
FILENAME "_" transnum

And since the end of a transaction is NOT 0000EOT as you repeatedly told us, it is no wonder that the scripts that have been provided to you do not work. Since the end of a transaction is a line like:
Code:
0000EOT<><>000000000000019000000000000000000000000000019<><><>

instead of the exact line:
Code:
0000EOT

that you described before, you also need to change the line in my script:
Code:
$1 == "0000EOT" {

to:
Code:
/^0000EOT/ {

And, if the transaction number you're trying to extract is ABC160120XYZ0983921 (and, since this transaction number does not appear in your latest sample input, there would be no output), the transaction number has also changed positions from where you said it was (from following the 1st tilde to following the 2nd tilde), then you also need to change the line in my script:
Code:
$1 == "%%YEDTRN" && $2 in t {

to:
Code:
$1 == "%%YEDTRN" && $3 in t {

This is a classic case of what computer scientists refer to as GIGO (Garbage In, Garbage Out). If the specification of the input data does not match the input data provided for processing, please don't blame the scripts that we suggested! You HAVE to give us representative samples of the data you are processing if you need our help in writing your code!

If you don't fully understand how awk is processing this script, you might also want to keep the comments I provided instead of throwing them away. Smilie
# 24  
Old 01-24-2016
@RudiC,

Thanks for your efforts trying to help me. But my version of unix(AIX) is not working for this SED command you have provided.
Thanks.
# 25  
Old 01-24-2016
I'm sorry to hear that. Did you try any of the suggestions given on an (artificially) simplified data sample?
# 26  
Old 01-24-2016
Hi Don,

Sorry, Since i am new to use blogging websites i am afraid of giving a banks transaction input structure in a public website. I am afraid since it might end up me in trouble and also apologize for a faulty input. I am learning 1 by 1 towards perfection.

I am going to try out your new suggestions will update you in 15 mins.

Thanks.

---------- Post updated at 05:40 PM ---------- Previous update was at 05:01 PM ----------

Hi Don

I am getting the below error after doing the changes what ever you have suggested.
Code:
awk: Cannot divide by zero.

 The input line number is 32042. The file is /tmp/remedixz.20160120_085021_41222370_1.
 The source line number is 18.

The line 32042 is the EOT line of the particular transaction reference number. Please find the code below
Code:
big_file='/tmp/remedixz.20160120_085021_41222370_1'
trannum="/tmp/transnum"

/tmp> cat /tmp/transnum
ABC160120XYZ0983921 

##In the above you can see the transnum given

awk -F '~' '
    FNR == NR {
      t[$1]
      tc = FNR
      next
      } 
      {
      l[++lc] = $0
      }
    $1 == "%%YEDTRN" && $3 in t {
        remove t[transnum = $2]
        tc--
    }

    /^0000EOT/ {
        if(transnum) {
            for(i = 1; i <= lc; i++)
                print l[i] > (/tmp/remedixz.20160120_085021_41222370_1_new "_" transnum)
            close(/tmp/remedixz.20160120_085021_41222370_1_new "_" transnum)
            printf("Transaction #%s extracted to file /tmp/remedixz.20160120_085021_41222370_1_new "_" transnum:%s\n", transnum,
                transnum)
        }
        if(tc) {
            lc = 0
            transnum = ""
        } else {
            exit
        }
    }' $trannum $file

This time i just directly gave the output file name rather than a variable.
Kindly let me know where i am missing something.

Thanks.
# 27  
Old 01-24-2016
Quote:
Originally Posted by mad man
@RudiC,

Thanks for your efforts trying to help me. But my version of unix(AIX) is not working for this SED command you have provided.
Thanks.
Try this adaptation of RudiC's suggestion and Don's adaption for proper shell quoting on AIX:
Code:
sed -n "
/~$transnum~/ {
H
g
}
/~$transnum~/,/EOT/p
h
" file

---
Quote:
Originally Posted by Don Cragun
It looks like Scrutinzer's suggestion should work just fine as long as:
  1. [..]
  2. the number of bytes in a single transaction (from ##PAYMNT through 0000EOT is not more than 2047 bytes.
Not so much 2047 bytes, in most implementations much higher or unlimited, and for some there is a much lower limit but unrelated to LINE_MAX, as I think we worked out before here: Sequence extraction

Last edited by Scrutinizer; 01-24-2016 at 08:28 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 28  
Old 01-24-2016
hi Don

One more request too, the way how i want to give my output file was through a variable and not directly a file name. Please suggest for it.

Thanks.

---------- Post updated at 05:51 PM ---------- Previous update was at 05:43 PM ----------

Dear Scrutinizer,

Thanks a lot this time SED worked.

It gave me desired output .

Thanks.

---------- Post updated at 05:54 PM ---------- Previous update was at 05:51 PM ----------

Hi,

This msg is intended to all who are all replied to help me out.
Hats off for your efforts to help me. Also i request each one of you to suggest me a link of good materials as you feel it was, for me to learn the SED & AWK atleast the basics.

Thanks.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract Big and continuous regions

Hi all, I have a file like this I want to extract only those regions which are big and continous chr1 3280000 3440000 chr1 3440000 3920000 chr1 3600000 3920000 # region coming within the 3440000 3920000. so i don't want it to be printed in output chr1 3920000 4800000 chr1 ... (2 Replies)
Discussion started by: amrutha_sastry
2 Replies

2. UNIX for Beginners Questions & Answers

How to copy only some lines from very big file?

Dear all, I have stuck with this problem for some days. I have a very big file, this file can not open by vi command. There are 200 loops in this file, in each loop will have one line like this: GWA quasiparticle energy with Z factor (eV) And I need 98 lines next after this line. Is... (6 Replies)
Discussion started by: phamnu
6 Replies

3. Shell Programming and Scripting

Extract certain columns from big data

The dataset I'm working on is about 450G, with about 7000 colums and 30,000,000 rows. I want to extract about 2000 columns from the original file to form a new file. I have the list of number of the columns I need, but don't know how to extract them. Thanks! (14 Replies)
Discussion started by: happypoker
14 Replies

4. Shell Programming and Scripting

Extract certain entries from big file:Request to check

Hi all I have a big file which I have attached here. And, I have to fetch certain entries and arrange in 5 columns Name Drug DAP ID disease approved or notIn the attached file data is arranged with tab separated columns in this way: and other data is... (2 Replies)
Discussion started by: manigrover
2 Replies

5. UNIX for Advanced & Expert Users

Delete first 100 lines from a BIG File

Hi, I need a unix command to delete first n (say 100) lines from a log file. I need to delete some lines from the file without using any temporary file. I found sed -i is an useful command for this but its not supported in my environment( AIX 6.1 ). File size is approx 100MB. Thanks in... (18 Replies)
Discussion started by: unohu
18 Replies

6. Shell Programming and Scripting

Extract some lines from one file and add those lines to current file

hi, i have two files. file1.sh echo "unix" echo "linux" file2.sh echo "unix linux forums" now the output i need is $./file2.sh unix linux forums (3 Replies)
Discussion started by: snreddy_gopu
3 Replies

7. Shell Programming and Scripting

Re: Deleting lines from big file.

Hi, I have a big (2.7 GB) text file. Each lines has '|' saperator to saperate each columns. I want to delete those lines which has text like '|0|0|0|0|0' I tried: sed '/|0|0|0|0|0/d' test.txt Unfortunately, it scans the file but does nothing. file content sample:... (4 Replies)
Discussion started by: dipeshvshah
4 Replies

8. Shell Programming and Scripting

Print #of lines after search string in a big file

I have a command which prints #lines after and before the search string in the huge file nawk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r;print;c=a}b{r=$0}' b=0 a=10 s="STRING1" FILE The file is 5 gig big. It works great and prints 10 lines after the lines which contains search string in... (8 Replies)
Discussion started by: prash184u
8 Replies

9. UNIX for Dummies Questions & Answers

How big is too big a config.log file?

I have a 5000 line config.log file with several "maybe" errors. Any reccomendations on finding solvable problems? (2 Replies)
Discussion started by: NeedLotsofHelp
2 Replies

10. UNIX for Dummies Questions & Answers

How to view a big file(143M big)

1 . Thanks everyone who read the post first. 2 . I have a log file which size is 143M , I can not use vi open it .I can not use xedit open it too. How to view it ? If I want to view 200-300 ,how can I implement it 3 . Thanks (3 Replies)
Discussion started by: chenhao_no1
3 Replies
Login or Register to Ask a Question