01-05-2008
Hi.
Quote:
Originally Posted by
net_shree
I have to compare two text files, very few of the lines in these files will have some difference in some column.
The files size is in GB.
By chance I am working with a text file of this size ( 1 GB ). It contains just over 1 GB, and has 15 M (15,000,000) lines. The real time to count the lines with wc is 15-20 seconds ( AMD-64/3000, SATA disk).
If this is correct, and you have 2 such files, then I think any method that reads a line from file1 and uses it with a program to look through file 2 at each step will not end quickly, because there will be 15 M loads of that program involved, not to mention actually reading the file. For example, doing a grep reading /dev/null for 15,000 times takes about 10 seconds (10.2 actually) real time. For 1,000 times that, I'd be looking at 2.75 hours just to load grep from the disk and read an immediate EOF. A
grep of a non-existent string takes about 18 seconds for a single search.
I suggest that the files be sorted and
diff be run once on the two files (post #8, rikxik). That will be 2 passes across each file, a decrease of close to 100% from 15M passes over 1 file.
If my facts are wrong, then tell me where I missed something of importance or made a mistake. Otherwise, perhaps we should take a step back and you tell us what the higher purpose of the problem is -- what problem you are really trying to solve -- perhaps we can suggest some other approach ... cheers, drl
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hello all,
Can anyone help me with this.
There are two files and I have to match the second file records with that of first and if matched, print the output in two fies, one containing the matched records and other containing the rest.
Here is the example.
File1
"111",erter,"00000", ... (4 Replies)
Discussion started by: er_ashu
4 Replies
2. Shell Programming and Scripting
hi
I have 2 files to comapre ,in file a sible column it is numbers,in file b2 numbers and other values with coma separated.
i want compare numbers in file a with file b,and the out put put should be in C with numbers in both file a and b along with other columns of file b.
i used folowing... (7 Replies)
Discussion started by: satish.res
7 Replies
3. Shell Programming and Scripting
I have two text files which have records of thousand rows. Each row is having around 40 columns. Each column is tab delimited. Each row is delimited by newline character.
My requirement is to find for each row i need to find whether any column is different between the two files. For each row i... (8 Replies)
Discussion started by: uihnybgte
8 Replies
4. Shell Programming and Scripting
Hi i have 2 csv files a.csv and b.csv with the same number of columns and a list of values in both of it. Each and every individual value in both the files need to compared and if it matches then print correct in a new csv file otherwise print Incorrect
eg
a.csv
1,12/27/2007,Reward,$10.00... (5 Replies)
Discussion started by: naveenn08
5 Replies
5. Shell Programming and Scripting
now i have a different file zoo.txt with content
123|zoo
234|natan
456|don
and file rick.txt with contents
123|dog|pie|pep
123|tail|see|newt
456|som|sin|sim
234|pay|rat|cat
i want to look for lines in file zoo.txt column1 that has same corresponding lines in column 1 of... (6 Replies)
Discussion started by: dealerso
6 Replies
6. Shell Programming and Scripting
Hi all,
i have two .csv files. i need to compare those two files and if there is any difference that should be moved into third .csv file.
example,
org.csv and dup.csv
when we compare those two files org.csv and dup.csv. if there is any change in dup.csv. it should be capture in third... (7 Replies)
Discussion started by: baskivs
7 Replies
7. Shell Programming and Scripting
Hello, I am trying to compare 2 files and get only the new lines as output. Note that new lines can be anywhere in the file and not necessarily at the bottom of the file.
I have made the following progress so far.
/home/aa>cat old.txt
0001 732 A
0002 732 C
0005 732 D... (7 Replies)
Discussion started by: cartrider
7 Replies
8. Shell Programming and Scripting
Dear All,
I would really appreciate if you can help me to resolve this file comparison
I have two files:
file1:
chr start end ID gene_name
chr1 2020 3030 1 test1
chr1 900 5000 2 test1
chr2 5000 8000 3 test2
chr3 6000 12000 4 test3
chr3 6000 15000 5 test3
file2:... (2 Replies)
Discussion started by: paolo.kunder
2 Replies
9. Shell Programming and Scripting
Hi,
I have fileA.txt like this.
B01B02 D0011718
B01B03 D0012540
B01B04 D0006145
B01B05 D0004815
B01B06 D0012069
B01B07 D0004064
B01B08 D0011988
B01B09 D0012071
B01B10 D0005596
B01B11 D0011351
B01B12 D0004814
B01C01 D0011804
I want to compare this against another file (fileB.txt)... (3 Replies)
Discussion started by: genehunter
3 Replies
10. Shell Programming and Scripting
HI,
I have two files and contains many Fields with | (pipe) delimitor, wanted to compare both the files and get only unmatched perticular fields. this i wanted to use in shell scriting.
ex:
first.txt
111 |abc| 230| hbc231 |bbb |210 |bbd405 |ghc |555 |cgv
second.txt
111 |abc |230 |hbc231... (1 Reply)
Discussion started by: prawinmca
1 Replies
LEARN ABOUT DEBIAN
ioping
IOPING(1) User Commands IOPING(1)
NAME
ioping - simple disk I/O latency monitoring tool
SYNOPSYS
ioping [-LCDRq] [-c count] [-w deadline] [-p period] [-i interval] [-s size] [-S wsize] [-o offset] device|file|directory
ioping -h | -v
DESCRIPTION
This tool lets you monitor I/O latency in real time.
OPTIONS
-c count
Stop after count requests.
-w deadline
Stop after deadline time passed.
-p period
Print raw statistics for every period requests.
-i interval
Set time between requests to interval (1s).
-s size
Request size (4k).
-S size
Working set size (1m).
-o offset
Offset in input file.
-L Use sequential operations rather than random. This also sets request size to 256k (as in -s 256k).
-C Use cached I/O.
-D Use direct I/O.
-R Disk seek rate test (same as -q -i 0 -w 3 -S 64m).
-q Suppress human-readable output.
-h Display help message and exit.
-v Display version and exit.
Argument suffixes
For options that expect time argument (-i and -w), default is seconds, unless you specify one of the following suffixes (case-insensitive):
us, usec
microseconds
ms, msec
milliseconds
s, sec seconds
m, min minutes
h, hour
hours
For options that expect "size" argument (-s, -S and -o), default is bytes, unless you specify one of the following suffixes (case-insensi-
tive):
s disk sectors (a sector is always 512).
k, kb kilobytes
p memory pages (a page is always 4K).
m, mb megabytes
g, gb gigabytes
t, tb terabytes
For options that expect "number" argument (-p and -c) you can optionally specify one of the following suffixes (case-insensitive):
k kilo (thousands, 1 000)
m mega (millions, 1 000 000)
g giga (billions, 1 000 000 000)
t tera (trillions, 1 000 000 000 000)
EXIT STATUS
Returns 0 upon success. The following error codes are defined:
1 Invalid usage (error in arguments).
2 Error during preparation stage.
3 Error during runtime.
EXAMPLES
ioping .
Show disk I/O latency using the default values and the current directory, until interrupted.
ioping -c 10 -s 1M /tmp
Measure latency on /tmp using 10 requests of 1 megabyte each.
ioping -R /dev/sda
Measure disk seek rate.
ioping -RL /dev/sda
Measure disk sequential speed.
SEE ALSO
Homepage <http://code.google.com/p/ioping/>.
AUTHORS
This program was written by Konstantin Khlebnikov <koct9i@gmail.com>.
Man-page was written by Kir Kolyshkin <kir@openvz.org>.
July 2011 IOPING(1)