Sponsored Content
Full Discussion: File Comparison
Top Forums Shell Programming and Scripting File Comparison Post 302155743 by drl on Saturday 5th of January 2008 01:47:14 PM
Old 01-05-2008
Hi.
Quote:
Originally Posted by net_shree
I have to compare two text files, very few of the lines in these files will have some difference in some column.
The files size is in GB.
By chance I am working with a text file of this size ( 1 GB ). It contains just over 1 GB, and has 15 M (15,000,000) lines. The real time to count the lines with wc is 15-20 seconds ( AMD-64/3000, SATA disk).

If this is correct, and you have 2 such files, then I think any method that reads a line from file1 and uses it with a program to look through file 2 at each step will not end quickly, because there will be 15 M loads of that program involved, not to mention actually reading the file. For example, doing a grep reading /dev/null for 15,000 times takes about 10 seconds (10.2 actually) real time. For 1,000 times that, I'd be looking at 2.75 hours just to load grep from the disk and read an immediate EOF. A grep of a non-existent string takes about 18 seconds for a single search.

I suggest that the files be sorted and diff be run once on the two files (post #8, rikxik). That will be 2 passes across each file, a decrease of close to 100% from 15M passes over 1 file.

If my facts are wrong, then tell me where I missed something of importance or made a mistake. Otherwise, perhaps we should take a step back and you tell us what the higher purpose of the problem is -- what problem you are really trying to solve -- perhaps we can suggest some other approach ... cheers, drl
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

file comparison...help needed.

Hello all, Can anyone help me with this. There are two files and I have to match the second file records with that of first and if matched, print the output in two fies, one containing the matched records and other containing the rest. Here is the example. File1 "111",erter,"00000", ... (4 Replies)
Discussion started by: er_ashu
4 Replies

2. Shell Programming and Scripting

file comparison

hi I have 2 files to comapre ,in file a sible column it is numbers,in file b2 numbers and other values with coma separated. i want compare numbers in file a with file b,and the out put put should be in C with numbers in both file a and b along with other columns of file b. i used folowing... (7 Replies)
Discussion started by: satish.res
7 Replies

3. Shell Programming and Scripting

File Comparison- Need help

I have two text files which have records of thousand rows. Each row is having around 40 columns. Each column is tab delimited. Each row is delimited by newline character. My requirement is to find for each row i need to find whether any column is different between the two files. For each row i... (8 Replies)
Discussion started by: uihnybgte
8 Replies

4. Shell Programming and Scripting

File Comparison

Hi i have 2 csv files a.csv and b.csv with the same number of columns and a list of values in both of it. Each and every individual value in both the files need to compared and if it matches then print correct in a new csv file otherwise print Incorrect eg a.csv 1,12/27/2007,Reward,$10.00... (5 Replies)
Discussion started by: naveenn08
5 Replies

5. Shell Programming and Scripting

two file comparison

now i have a different file zoo.txt with content 123|zoo 234|natan 456|don and file rick.txt with contents 123|dog|pie|pep 123|tail|see|newt 456|som|sin|sim 234|pay|rat|cat i want to look for lines in file zoo.txt column1 that has same corresponding lines in column 1 of... (6 Replies)
Discussion started by: dealerso
6 Replies

6. Shell Programming and Scripting

CSV file comparison

Hi all, i have two .csv files. i need to compare those two files and if there is any difference that should be moved into third .csv file. example, org.csv and dup.csv when we compare those two files org.csv and dup.csv. if there is any change in dup.csv. it should be capture in third... (7 Replies)
Discussion started by: baskivs
7 Replies

7. Shell Programming and Scripting

Help with file comparison

Hello, I am trying to compare 2 files and get only the new lines as output. Note that new lines can be anywhere in the file and not necessarily at the bottom of the file. I have made the following progress so far. /home/aa>cat old.txt 0001 732 A 0002 732 C 0005 732 D... (7 Replies)
Discussion started by: cartrider
7 Replies

8. Shell Programming and Scripting

file comparison

Dear All, I would really appreciate if you can help me to resolve this file comparison I have two files: file1: chr start end ID gene_name chr1 2020 3030 1 test1 chr1 900 5000 2 test1 chr2 5000 8000 3 test2 chr3 6000 12000 4 test3 chr3 6000 15000 5 test3 file2:... (2 Replies)
Discussion started by: paolo.kunder
2 Replies

9. Shell Programming and Scripting

File Comparison: Print Lines not present in another file

Hi, I have fileA.txt like this. B01B02 D0011718 B01B03 D0012540 B01B04 D0006145 B01B05 D0004815 B01B06 D0012069 B01B07 D0004064 B01B08 D0011988 B01B09 D0012071 B01B10 D0005596 B01B11 D0011351 B01B12 D0004814 B01C01 D0011804 I want to compare this against another file (fileB.txt)... (3 Replies)
Discussion started by: genehunter
3 Replies

10. Shell Programming and Scripting

File Comparison

HI, I have two files and contains many Fields with | (pipe) delimitor, wanted to compare both the files and get only unmatched perticular fields. this i wanted to use in shell scriting. ex: first.txt 111 |abc| 230| hbc231 |bbb |210 |bbd405 |ghc |555 |cgv second.txt 111 |abc |230 |hbc231... (1 Reply)
Discussion started by: prawinmca
1 Replies
IOPING(1)							   User Commands							 IOPING(1)

NAME
ioping - simple disk I/O latency monitoring tool SYNOPSYS
ioping [-LCDRq] [-c count] [-w deadline] [-p period] [-i interval] [-s size] [-S wsize] [-o offset] device|file|directory ioping -h | -v DESCRIPTION
This tool lets you monitor I/O latency in real time. OPTIONS
-c count Stop after count requests. -w deadline Stop after deadline time passed. -p period Print raw statistics for every period requests. -i interval Set time between requests to interval (1s). -s size Request size (4k). -S size Working set size (1m). -o offset Offset in input file. -L Use sequential operations rather than random. This also sets request size to 256k (as in -s 256k). -C Use cached I/O. -D Use direct I/O. -R Disk seek rate test (same as -q -i 0 -w 3 -S 64m). -q Suppress human-readable output. -h Display help message and exit. -v Display version and exit. Argument suffixes For options that expect time argument (-i and -w), default is seconds, unless you specify one of the following suffixes (case-insensitive): us, usec microseconds ms, msec milliseconds s, sec seconds m, min minutes h, hour hours For options that expect "size" argument (-s, -S and -o), default is bytes, unless you specify one of the following suffixes (case-insensi- tive): s disk sectors (a sector is always 512). k, kb kilobytes p memory pages (a page is always 4K). m, mb megabytes g, gb gigabytes t, tb terabytes For options that expect "number" argument (-p and -c) you can optionally specify one of the following suffixes (case-insensitive): k kilo (thousands, 1 000) m mega (millions, 1 000 000) g giga (billions, 1 000 000 000) t tera (trillions, 1 000 000 000 000) EXIT STATUS
Returns 0 upon success. The following error codes are defined: 1 Invalid usage (error in arguments). 2 Error during preparation stage. 3 Error during runtime. EXAMPLES
ioping . Show disk I/O latency using the default values and the current directory, until interrupted. ioping -c 10 -s 1M /tmp Measure latency on /tmp using 10 requests of 1 megabyte each. ioping -R /dev/sda Measure disk seek rate. ioping -RL /dev/sda Measure disk sequential speed. SEE ALSO
Homepage <http://code.google.com/p/ioping/>. AUTHORS
This program was written by Konstantin Khlebnikov <koct9i@gmail.com>. Man-page was written by Kir Kolyshkin <kir@openvz.org>. July 2011 IOPING(1)
All times are GMT -4. The time now is 10:08 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy