Comparing two huge files Post: 302231724

Sponsored Content

Top Forums Shell Programming and Scripting Comparing two huge files Post 302231724 by kmkbuddy_1983 on Wednesday 3rd of September 2008 03:36:51 AM

09-03-2008

Registered User

Comparing two huge files

Hi,

I have two files file A and File B. File A is a error file and File B is source file. In the error file. First line is the actual error and second line gives the information about the record (client ID) that throws error. I need to compare the first field (which doesnt start with '//') of file A with fifth field of file B. It field values in file A and file B matches i need to write it to output file as below.

File A
// 223 missing
223,Jan,ee,bla,bla

// data not found
254-11,Jan,ee,bla,bla

// data rejected
214-1,Jan,ee,bla,bla

File B
aaaa,bbbb,ccc,dddd,20054-11,fff,ggg...
aaaa,bbbb,ccc,dddd,254-11,fff,ggg...
aaaa,bbbb,ccc,dddd,2545456-1,fff,ggg...

output:
// data not found
254-11,Jan,ee,bla,bla

if First field of File A and Fifth field of File B (254-11) matches, then i need to write the records from file A (current line and the previous line) to a output file as above.

I could achieve it very easily using awk and grep with if loop. Problem is files are hugh. Nearly 1 million records are in both the files. script run for 3-4 hours. I would appreciate if some one could help me in giving good logic or better script which could complete the task in few minutes.

Note: File A and File B look exactly in the same format. Caution about the blanks in file A and Client ID fomat 000 or 000-0 or 000-00.

kmkbuddy_1983

View Public Profile for kmkbuddy_1983

Find all posts by kmkbuddy_1983

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

comparing Huge Files - Performance is very bad

Hi All, Can you please help me in resolving the following problem? My requirement is like this: 1) I have two files YESTERDAY_FILE and TODAY_FILE. Each one is having nearly two million data. 2) I need to check each record of TODAY_FILE in YESTERDAY_FILE. If exists we can skip that by...

2. UNIX for Dummies Questions & Answers

Difference between two huge files

Hi, As per my requirement, I need to take difference between two big files(around 6.5 GB) and get the difference to a output file without any line numbers or '<' or '>' in front of each new line. As DIFF command wont work for big files, i tried to use BDIFF instead. I am getting incorrect...

3. UNIX for Advanced & Expert Users

Huge files manipulation

Hi , i need a fast way to delete duplicates entrys from very huge files ( >2 Gbs ) , these files are in plain text. I tried all the usual methods ( awk / sort /uniq / sed /grep .. ) but it always ended with the same result (memory core dump) In using HP-UX large servers. Any advice will...

4. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a...

5. Shell Programming and Scripting

Comparing two huge files on field basis.

Hi all, I have two large files and i want a field by field comparison for each record in it. All fields are tab seperated. file1: Email SELVAKUMAR RAMACHANDRAN Email SHILPA SAHU Web NIYATI SONI Web NIYATI SONI Email VIINII DOSHI Web RAJNISH KUMAR Web ...

6. Shell Programming and Scripting

Comparing 2 huge text files

I have this 2 files: k5login sanwar@systems.nyfix.com jjamnik@systems.nyfix.com nisha@SYSTEMS.NYFIX.COM rdpena@SYSTEMS.NYFIX.COM service/backups-ora@SYSTEMS.NYFIX.COM ivanr@SYSTEMS.NYFIX.COM nasapova@SYSTEMS.NYFIX.COM tpulay@SYSTEMS.NYFIX.COM rsueno@SYSTEMS.NYFIX.COM...

7. Shell Programming and Scripting

Perl: Need help comparing huge files

What do i need to do have the below perl program load 205 million record files into the hash. It currently works on smaller files, but not working on huge files. Any idea what i need to do to modify to make it work with huge files: #!/usr/bin/perl $ot1=$ARGV; $ot2=$ARGV; open(mfileot1,...

8. Shell Programming and Scripting

awk to parse huge files

Hello All, I have a situation as below: (1) Read a source file (a single file of 1.2 million rows in it ) (2) Read Destination files one by one and replace the content ( few fields in it ) with the corresponding matching field from source file. I tried as below: ( please note I am not...

9. Shell Programming and Scripting

Work with huge Zipped files

Hello dear members, I have one general and one specific question which I will be very grateful if you could help me with them. Let's start with my general question: 1. I am working on cluster computer shared with other people and I need to manipulate a big zipped text file of 13 GB. There is...

10. Shell Programming and Scripting

Aggregation of Huge files

Hi Friends !! I am facing a hash total issue while performing over a set of files of huge volume: Command used: tail -n +2 <File_Name> |nawk -F"|" -v '%.2f' qq='"' '{gsub(qq,"");sa+=($156<0)?-$156:$156}END{print sa}' OFMT='%.5f' Pipe delimited file and 156 column is for hash totalling....

LEARN ABOUT DEBIAN

createpymb

CREATEPYMB(1)						      General Commands Manual						     CREATEPYMB(1)

NAME

       createPYMB, readPYBase, readPYMB, mb2org, scel2org - fcitx Pinyin related tools

SYNOPSIS

       createPYMB <PinyinFile> <PhraseFile>

       readPYBase [-b <PinyinMBFile>] [-h]

       readPYMB [-f <PhraseMBFile>] [-s] [-h]

       mb2org [-b <PinyinMBFile>] [-f <PhraseMBFile>] [-s] [-h]

       scel2org [-o <Phrase File>] [-h]

DESCRIPTION

       -b <PinyinMBFile>
	      If not specified, it will read system default pybase.mb.

       -f <PhraseMBFile>
	      If not specified, it will read user default PhraseMBFile, which is ~/.config/fcitx/pyusrphrase.mb.

       -s     If specified, it will read PhraseMBFile as system format, otherwise will read it as user format.

       -h     display help and exit

       Pinyin File
	      Pinyin  File  is	a file with pinyin and one character per line, separated with space. One available file is in the source of fcitx,
	      named gbkpy.org.

       Phrase File
	      Phrase File is a file with full pinyin separated with ' and the corresponding phrase. The default phrase file of fcitx can be  down-
	      loaded at http://fcitx.googlecode.com/files/pinyin.tar.gz.

       Pinyin MB File
	      Pinyin MB File is the binary format of Pinyin File.

       Phrase MB File
	      Phrase  MB  File is the binary format of Pinyin File, user's history phrase mb file is ~/.config/fcitx/pyuserphrase.mb. There is two
	      different format, one is system format which can only generated by createPYMB , and other is user format which  can  only  generated
	      while input with fcitx Pinyin IM.

       Output of createPYMB will be pybase.mb, which is Pinyin MB File, and pyphrase.mb, which is Phrase MB File.

       Output  of  mb2org,  readPYBase and readPYMB will be stdout. readPYBase and readPYMB are designed to output more debug message of Pinyin MB
       File and Phrase MB File. mb2org will output in the format of Phrase File.

       scel2org is used for transform Sogou Scel File to Phrase File of fcitx. Output of scel2org will be stdout if -o is not used.

SEE ALSO

       Please see the homepage at http://www.fcitx.org/ and http://fcitx.googlecode.com/

								    2010-12-16							     CREATEPYMB(1)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

comparing Huge Files - Performance is very bad

Discussion started by: madhukalyan

2. UNIX for Dummies Questions & Answers

Difference between two huge files

Discussion started by: pyaranoid

3. UNIX for Advanced & Expert Users

Huge files manipulation

Discussion started by: Klashxx

4. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Discussion started by: jiapei100