Want to extract certain lines from big file Post: 302965121

Sponsored Content

Top Forums Shell Programming and Scripting Want to extract certain lines from big file Post 302965121 by Don Cragun on Sunday 24th of January 2016 04:08:06 AM

01-24-2016

Registered User

It sounds like I have wasted the last hour of my life trying to help you, but maybe this will help someone else. The following awk script only uses POSIX specified awk features and should work on any system (although you would need to change awk to /usr/xpg4/bin/awk or nawk if and only if you want to run this on a Solaris/SunOS system). It takes two files as inputs (which is what you said you had earlier). The first file (named trannums in this script) contains one or more lines with each line containing a transaction number to be extracted from your big file. The second file (named bigfile in this script) contains your big file containing transactions. It extracts each transaction listed in trannums into a separate output file with a name that is the string TX: followed by the transaction number:

Code:

#!/bin/ksh
awk -F '~' '
FNR == NR {
	# Gather transaction numbers...
	t[$1]
	tc = FNR
	next
}
{	# Gather transaction lines.
	l[++lc] = $0
}
$1 == "%%YEDTRN" && $2 in t {
	# We have found a transaction number for a transaction that is to be
	# extracted.  Save the transaction number and remove this transaction
	# from the remaining transaction list.
	remove t[transnum = $2]
	tc--
}
$1 == "0000EOT" {
	# If we have a transaction that is to be printed, print it.
	if(transnum) {
		# Print the transaction.
		for(i = 1; i <= lc; i++)
			print l[i] > ("TX:" transnum)
		close("TX:" transnum)
		printf("Transaction #%s extracted to file TX:%s\n", transnum,
		    transnum)
	}
	# Was this the last remaining transaction to be extracted?
	if(tc) {# No.  Reset for next transaction.
		lc = 0
		transnum = ""
	} else {# Yes.  Exit.
		exit
	}
}' trannums bigfile

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to view a big file(143M big)

1 . Thanks everyone who read the post first. 2 . I have a log file which size is 143M , I can not use vi open it .I can not use xedit open it too. How to view it ? If I want to view 200-300 ,how can I implement it 3 . Thanks

2. UNIX for Dummies Questions & Answers

How big is too big a config.log file?

I have a 5000 line config.log file with several "maybe" errors. Any reccomendations on finding solvable problems?

3. Shell Programming and Scripting

Print #of lines after search string in a big file

I have a command which prints #lines after and before the search string in the huge file nawk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r;print;c=a}b{r=$0}' b=0 a=10 s="STRING1" FILE The file is 5 gig big. It works great and prints 10 lines after the lines which contains search string in...

4. Shell Programming and Scripting

Re: Deleting lines from big file.

Hi, I have a big (2.7 GB) text file. Each lines has '|' saperator to saperate each columns. I want to delete those lines which has text like '|0|0|0|0|0' I tried: sed '/|0|0|0|0|0/d' test.txt Unfortunately, it scans the file but does nothing. file content sample:...

5. Shell Programming and Scripting

Extract some lines from one file and add those lines to current file

hi, i have two files. file1.sh echo "unix" echo "linux" file2.sh echo "unix linux forums" now the output i need is $./file2.sh unix linux forums

6. UNIX for Advanced & Expert Users

Delete first 100 lines from a BIG File

Hi, I need a unix command to delete first n (say 100) lines from a log file. I need to delete some lines from the file without using any temporary file. I found sed -i is an useful command for this but its not supported in my environment( AIX 6.1 ). File size is approx 100MB. Thanks in...

7. Shell Programming and Scripting

Extract certain entries from big file:Request to check

Hi all I have a big file which I have attached here. And, I have to fetch certain entries and arrange in 5 columns Name Drug DAP ID disease approved or notIn the attached file data is arranged with tab separated columns in this way: and other data is...

8. Shell Programming and Scripting

Extract certain columns from big data

The dataset I'm working on is about 450G, with about 7000 colums and 30,000,000 rows. I want to extract about 2000 columns from the original file to form a new file. I have the list of number of the columns I need, but don't know how to extract them. Thanks!

9. UNIX for Beginners Questions & Answers

How to copy only some lines from very big file?

Dear all, I have stuck with this problem for some days. I have a very big file, this file can not open by vi command. There are 200 loops in this file, in each loop will have one line like this: GWA quasiparticle energy with Z factor (eV) And I need 98 lines next after this line. Is...

10. Shell Programming and Scripting

Extract Big and continuous regions

Hi all, I have a file like this I want to extract only those regions which are big and continous chr1 3280000 3440000 chr1 3440000 3920000 chr1 3600000 3920000 # region coming within the 3440000 3920000. so i don't want it to be printed in output chr1 3920000 4800000 chr1 ...

LEARN ABOUT V7

diff

DIFF(1) 						      General Commands Manual							   DIFF(1)

NAME

       diff - differential file comparator

SYNOPSIS

       diff [ -efbh ] file1 file2

DESCRIPTION

       Diff  tells what lines must be changed in two files to bring them into agreement.  If file1 (file2) is `-', the standard input is used.	If
       file1 (file2) is a directory, then a file in that directory whose file-name is the same as the file-name of file2  (file1)  is  used.   The
       normal output contains lines of these forms:

	    n1 a n3,n4
	    n1,n2 d n3
	    n1,n2 c n3,n4

       These  lines resemble ed commands to convert file1 into file2.  The numbers after the letters pertain to file2.	In fact, by exchanging `a'
       for `d' and reading backward one may ascertain equally how to convert file2 into file1.	As in ed, identical pairs where n1 = n2 or n3 = n4
       are abbreviated as a single number.

       Following  each	of these lines come all the lines that are affected in the first file flagged by `<', then all the lines that are affected
       in the second file flagged by `>'.

       The -b option causes trailing blanks (spaces and tabs) to be ignored and other strings of blanks to compare equal.

       The -e option produces a script of a, c and d commands for the editor ed, which will recreate file2 from file1.	The -f option  produces  a
       similar	script,  not useful with ed, in the opposite order.  In connection with -e, the following shell program may help maintain multiple
       versions of a file.  Only an ancestral file ($1) and a chain of version-to-version ed scripts ($2,$3,...) made by diff need be on hand.	 A
       `latest version' appears on the standard output.

	    (shift; cat $*; echo '1,$p') | ed - $1

       Except in rare circumstances, diff finds a smallest sufficient set of file differences.

       Option  -h  does  a  fast,  half-hearted job.  It works only when changed stretches are short and well separated, but does work on files of
       unlimited length.  Options -e and -f are unavailable with -h.

FILES

       /tmp/d?????
       /usr/lib/diffh for -h

SEE ALSO

       cmp(1), comm(1), ed(1)

DIAGNOSTICS

       Exit status is 0 for no differences, 1 for some, 2 for trouble.

BUGS

       Editing scripts produced under the -e or -f option are naive about creating lines consisting of a single `.'.

																	   DIFF(1)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to view a big file(143M big)

Discussion started by: chenhao_no1

2. UNIX for Dummies Questions & Answers

How big is too big a config.log file?

Discussion started by: NeedLotsofHelp

3. Shell Programming and Scripting

Print #of lines after search string in a big file

Discussion started by: prash184u

4. Shell Programming and Scripting

Re: Deleting lines from big file.

Discussion started by: dipeshvshah