Want to extract certain lines from big file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Want to extract certain lines from big file
# 15  
Old 01-24-2016
Quote:
Originally Posted by mad man
Hi RudiC

I am getting the error cannot be parsed. for this sed command.
Please find below how i used.
transnum="ABC160120XYZ0983921"

Code:
sed -n '/"$transnum"/ {H;g}; /"$transnum"/,/EOT/p;h' $file > $file_new

please suggest
Shell variables are not expanded within single quotes. You didn't correctly copy the script RudiC provided. And, this code depends on a feature that is not supported by some (standards-conforming) versions of sed. (I'm not sure if it will work without the semicolon I added on AIX or not; but it won't work on OS X without the semicolon I added.) Try:
Code:
sed -n "/~$transnum~/ {H;g;}; /~$transnum~/,/EOT/p;h" "$file" > "$file_new"

# 16  
Old 01-24-2016
Hi Don,

Sorry for the inconvenience.

The code you have posted last is not working for me please find the way how i used it.

Code:
 
big_file='/tmp/remedixz.20160120_085021_41222370_1'
trannum="/tmp/transnum"
file_new="${big_file}_23962395676"
awk -F '~' '
FNR == NR {
	t[$1]
	tc = FNR
	next
}
{
	l[++lc] = $0
}
$1 == "%%YEDTRN" && $2 in t {
	remove t[transnum = $2]
	tc--
}
$1 == "0000EOT" {

	if(transnum) {
		for(i = 1; i <= lc; i++)
			print l[i] > ("$file_new:" transnum)
		close("$file_new:" transnum)
		printf("Transaction #%s extracted to file $file_new:%s\n", transnum,
		    transnum)
	}
	if(tc) {
		lc = 0
		transnum = ""
	} else {
		exit
	}
}' $trannum $big_file

---------- Post updated at 03:25 PM ---------- Previous update was at 03:13 PM ----------

Hi Don,

The SED you have modified and posted did not thrown any error like last time but the output file is empty.

Thanks
# 17  
Old 01-24-2016
The sed script as provided was tested and worked for me, as is on Linux, with Don Cragun's semicolon inserted on FreeBSD.
So, how about
- supplying meaningful samples of input data (as requested several times in this thread)?
- trying to solve the issues yourself by some playing around (modifying and testing) with the solution offered?
This User Gave Thanks to RudiC For This Post:
# 18  
Old 01-24-2016
Hi RudiC

Will post the actual input here now in 5 mins

thanks

---------- Post updated at 03:50 PM ---------- Previous update was at 03:42 PM ----------

Hi RudiC,

Please find the actual input below. The below is a single transaction this set will be repeated as many as transactions in a file for 1000 transactions this below set will be repeated 1000 times. But the ~ABC160120XYZ0983920~ is unique for each and every transactions.
tags like ##PAYMNT , %%YEDTRN & 0000EOT are constant for every transaction.
Code:
##PAYMNT,ABCDEFGH,        ,        ,TEST01                  ,0000004308,0000004216,1104      ,000000, ,00110,USD,   ,T,TESTST008                    ,2016-01-18T09:30:47                ,pain.001.001.03                    ,00000000000000001200,00000000000000018.00
%%YEDTRN~0000004646~ABC160120XYZ0983920~20160120_085131~20160120_085021~20160120_085021_41222370                              ~20160120_084728_15401168                     ~20160120_084728~0000004644~          ~TEST01                       ~pain.001.001.03                    ~U ~0.02               ~C~FWT       ~          ~SFTS                           ~99999999801                        ~WireCmpIDa                         ~021000018                          ~99999998799002                     ~20020101~Payee Name 1104                    ~TstTrceNbr1104                     ~PR ~USD~   ~   ~Y~US~PmtGrpWire0b                       ~OOXXMXM                            ~TestWirePay002                     ~Y~01~HARDCOPY  ~pain.001.001.03                    ~TESTST008                    ~2016-01-18T09:30:47                ~00000000000000000000000000000000001200~0000000000000018.00~N~                                                                                                    ~00~               ~ ~                                   ~                                   ~   ~               ~                                   ~ ~          ~                                                                                                    ~00~               ~                                                                                                    ~00~ ~               ~               ~                                   ~                                   ~   ~               ~ 
0000ISA00          00          ZZABCDETEST01   ZZABCDEFGHI      1601200849U005010000043080T 
0000GSRA   201601200849    000004216X 005010      
0000ST<>820<>1104<>PmtGrpWire0b<>
0010BPR<>U<>0000000000000000.02C<>FWT<><>01<>043000261<>DA<>99999999801<>WireCmpIDa<><>01<>021000018<>DA<>99999998799002<>20020101<><><><><>
0010TRN<>1<>TstTrceNbr1104<><>OOXXMXM<>
0010CUR<>PR<>USD<>00000000000<><><><>00000000        <>00000000        <>00000000        <>00000000        <>00000000        
0010REF<>TN<>TestPay002<><><><><><><><>
0020N1<>O2<>Test Initiating Party<><><><><>
0020N1<>O1<>Test Debtor Bank<>13<>043000261<><><>
0020N4<><><><>US<><><><>
0020N1<>PR<>Debtor Pyr Nm 0a<><><><><>
0020N3<>Payer Address 00a Line1<><>
0020N3<>Payer Address 00a Line2<><>
0020N4<>Payer City<>PA<>12345<>US<><><><>
0020N1<>BK<>Test Payee Bank 002<><><><><>
0020N1<>PE<>Payee Name 1104<><><><><>
0020N3<>Payee Address 002 Line1<><>
0020N3<>Payee Address 002 Line2<><>
0020N4<>Payee Town 002<>PA<>12345<>US<><><><>
0000SE<>00000000001104<>
0000GE<>000001000004216
0000IEA<>00001000004216
0000EOT<><>000000000000019000000000000000000000000000019<><><>

Thanks.
# 19  
Old 01-24-2016
That sed works with your sample data as specified, it extracts exactly that record from a set of different records.
The data structure is NOT what you posted earlier. E.g. the transnum is not the second ~delimited field but the third, EOT is not the last element in the final line, ... That's why other solutions offered may have failed.
This User Gave Thanks to RudiC For This Post:
# 20  
Old 01-24-2016
Hi RudiC,

can you please suggest a SED solution for the above input?
mean while i am also playing around with solutions offered.

Thanks.
# 21  
Old 01-24-2016
I don't think I can do better than what I have already offered. With an input file of three similar records with different transnums, it extracts the desired one:
Code:
transnum=ABC160120XYZ0983920
sed -n "/~${transnum}~/{H;g;}; /~${transnum}~/,/EOT/p; h" file
##PAYMNT,ABCDEFGH,        ,        ,TEST01                  ,0000004308,0000004216,1104      ,000000, ,00110,USD,   ,T,TESTST008                    ,2016-01-18T09:30:47                ,pain.001.001.03                    ,00000000000000001200,00000000000000018.00
%%YEDTRN~0000004646~ABC160120XYZ0983920~20160120_085131~20160120_085021~20160120_085021_41222370                              ~20160120_084728_15401168                     ~20160120_084728~0000004644~          ~TEST01                       ~pain.001.001.03                    ~U ~0.02               ~C~FWT       ~          ~SFTS                           ~99999999801                        ~WireCmpIDa                         ~021000018                          ~99999998799002                     ~20020101~Payee Name 1104                    ~TstTrceNbr1104                     ~PR ~USD~   ~   ~Y~US~PmtGrpWire0b                       ~OOXXMXM                            ~TestWirePay002                     ~Y~01~HARDCOPY  ~pain.001.001.03                    ~TESTST008                    ~2016-01-18T09:30:47                ~00000000000000000000000000000000001200~0000000000000018.00~N~                                                                                                    ~00~               ~ ~                                   ~                                   ~   ~               ~                                   ~ ~          ~                                                                                                    ~00~               ~                                                                                                    ~00~ ~               ~               ~                                   ~                                   ~   ~               ~ 
0000ISA00          00          ZZABCDETEST01   ZZABCDEFGHI      1601200849U005010000043080T 
0000GSRA   201601200849    000004216X 005010      
0000ST<>820<>1104<>PmtGrpWire0b<>
0010BPR<>U<>0000000000000000.02C<>FWT<><>01<>043000261<>DA<>99999999801<>WireCmpIDa<><>01<>021000018<>DA<>99999998799002<>20020101<><><><><>
0010TRN<>1<>TstTrceNbr1104<><>OOXXMXM<>
0010CUR<>PR<>USD<>00000000000<><><><>00000000        <>00000000        <>00000000        <>00000000        <>00000000        
0010REF<>TN<>TestPay002<><><><><><><><>
0020N1<>O2<>Test Initiating Party<><><><><>
0020N1<>O1<>Test Debtor Bank<>13<>043000261<><><>
0020N4<><><><>US<><><><>
0020N1<>PR<>Debtor Pyr Nm 0a<><><><><>
0020N3<>Payer Address 00a Line1<><>
0020N3<>Payer Address 00a Line2<><>
0020N4<>Payer City<>PA<>12345<>US<><><><>
0020N1<>BK<>Test Payee Bank 002<><><><><>
0020N1<>PE<>Payee Name 1104<><><><><>
0020N3<>Payee Address 002 Line1<><>
0020N3<>Payee Address 002 Line2<><>
0020N4<>Payee Town 002<>PA<>12345<>US<><><><>
0000SE<>00000000001104<>
0000GE<>000001000004216
0000IEA<>00001000004216
0000EOT<><>000000000000019000000000000000000000000000019<><><>

What else can I do?
This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract Big and continuous regions

Hi all, I have a file like this I want to extract only those regions which are big and continous chr1 3280000 3440000 chr1 3440000 3920000 chr1 3600000 3920000 # region coming within the 3440000 3920000. so i don't want it to be printed in output chr1 3920000 4800000 chr1 ... (2 Replies)
Discussion started by: amrutha_sastry
2 Replies

2. UNIX for Beginners Questions & Answers

How to copy only some lines from very big file?

Dear all, I have stuck with this problem for some days. I have a very big file, this file can not open by vi command. There are 200 loops in this file, in each loop will have one line like this: GWA quasiparticle energy with Z factor (eV) And I need 98 lines next after this line. Is... (6 Replies)
Discussion started by: phamnu
6 Replies

3. Shell Programming and Scripting

Extract certain columns from big data

The dataset I'm working on is about 450G, with about 7000 colums and 30,000,000 rows. I want to extract about 2000 columns from the original file to form a new file. I have the list of number of the columns I need, but don't know how to extract them. Thanks! (14 Replies)
Discussion started by: happypoker
14 Replies

4. Shell Programming and Scripting

Extract certain entries from big file:Request to check

Hi all I have a big file which I have attached here. And, I have to fetch certain entries and arrange in 5 columns Name Drug DAP ID disease approved or notIn the attached file data is arranged with tab separated columns in this way: and other data is... (2 Replies)
Discussion started by: manigrover
2 Replies

5. UNIX for Advanced & Expert Users

Delete first 100 lines from a BIG File

Hi, I need a unix command to delete first n (say 100) lines from a log file. I need to delete some lines from the file without using any temporary file. I found sed -i is an useful command for this but its not supported in my environment( AIX 6.1 ). File size is approx 100MB. Thanks in... (18 Replies)
Discussion started by: unohu
18 Replies

6. Shell Programming and Scripting

Extract some lines from one file and add those lines to current file

hi, i have two files. file1.sh echo "unix" echo "linux" file2.sh echo "unix linux forums" now the output i need is $./file2.sh unix linux forums (3 Replies)
Discussion started by: snreddy_gopu
3 Replies

7. Shell Programming and Scripting

Re: Deleting lines from big file.

Hi, I have a big (2.7 GB) text file. Each lines has '|' saperator to saperate each columns. I want to delete those lines which has text like '|0|0|0|0|0' I tried: sed '/|0|0|0|0|0/d' test.txt Unfortunately, it scans the file but does nothing. file content sample:... (4 Replies)
Discussion started by: dipeshvshah
4 Replies

8. Shell Programming and Scripting

Print #of lines after search string in a big file

I have a command which prints #lines after and before the search string in the huge file nawk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r;print;c=a}b{r=$0}' b=0 a=10 s="STRING1" FILE The file is 5 gig big. It works great and prints 10 lines after the lines which contains search string in... (8 Replies)
Discussion started by: prash184u
8 Replies

9. UNIX for Dummies Questions & Answers

How big is too big a config.log file?

I have a 5000 line config.log file with several "maybe" errors. Any reccomendations on finding solvable problems? (2 Replies)
Discussion started by: NeedLotsofHelp
2 Replies

10. UNIX for Dummies Questions & Answers

How to view a big file(143M big)

1 . Thanks everyone who read the post first. 2 . I have a log file which size is 143M , I can not use vi open it .I can not use xedit open it too. How to view it ? If I want to view 200-300 ,how can I implement it 3 . Thanks (3 Replies)
Discussion started by: chenhao_no1
3 Replies
Login or Register to Ask a Question