Sponsored Content
Top Forums UNIX for Advanced & Expert Users File comaprsons for the Huge data files ( around 60G) - Need optimized and teh best way to do this Post 303025148 by kartikirans on Thursday 25th of October 2018 09:54:28 AM
Old 10-25-2018
File comaprsons for the Huge data files ( around 60G) - Need optimized and teh best way to do this

I have 2 large file (.dat) around 70 g, 12 columns but the data not sorted in both the files.. need your inputs in giving the best optimized method/command to achieve this and redirect the not macthing lines to the thrid file ( diff.dat)


File 1 - 15 columns
File 2 - 15 columns

Data is not in sorted order.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

search and grab data from a huge file

folks, In my working directory, there a multiple large files which only contain one line in the file. The line is too long to use "grep", so any help? For example, if I want to find if these files contain a string like "93849", what command I should use? Also, there is oder_id number... (1 Reply)
Discussion started by: ting123
1 Replies

2. Shell Programming and Scripting

How to extract data from a huge file?

Hi, I have a huge file of bibliographic records in some standard format.I need a script to do some repeatable task as follows: 1. Needs to create folders as the strings starts with "item_*" from the input file 2. Create a file "contents" in each folders having "license.txt(tab... (5 Replies)
Discussion started by: srsahu75
5 Replies

3. Shell Programming and Scripting

insert a header in a huge data file without using an intermediate file

I have a file with data extracted, and need to insert a header with a constant string, say: H|PayerDataExtract if i use sed, i have to redirect the output to a seperate file like sed ' sed commands' ExtractDataFile.dat > ExtractDataFileWithHeader.dat the same is true for awk and... (10 Replies)
Discussion started by: deepaktanna
10 Replies

4. Shell Programming and Scripting

Split a huge data into few different files?!

Input file data contents: >seq_1 MSNQSPPQSQRPGHSHSHSHSHAGLASSTSSHSNPSANASYNLNGPRTGGDQRYRASVDA >seq_2 AGAAGRGWGRDVTAAASPNPRNGGGRPASDLLSVGNAGGQASFASPETIDRWFEDLQHYE >seq_3 ATLEEMAAASLDANFKEELSAIEQWFRVLSEAERTAALYSLLQSSTQVQMRFFVTVLQQM ARADPITALLSPANPGQASMEAQMDAKLAAMGLKSPASPAVRQYARQSLSGDTYLSPHSA... (7 Replies)
Discussion started by: patrick87
7 Replies

5. Shell Programming and Scripting

Splitting the Huge file into several files...

Hi I have to write a script to split the huge file into several pieces. The file columns is | pipe delimited. The data sample is as: 6625060|1420215|07308806|N|20100120|5572477081|+0002.79|+0000.00|0004|0001|......... (3 Replies)
Discussion started by: lakteja
3 Replies

6. Shell Programming and Scripting

Problem running Perl Script with huge data files

Hello Everyone, I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this : foreach my $t (@text) { open TEXT, $t or die "Cannot open $t for reading: $!\n"; while(my $line=<TEXT>){ ... (4 Replies)
Discussion started by: ad23
4 Replies

7. Shell Programming and Scripting

Three Difference File Huge Data Comparison Problem.

I got three different file: Part of File 1 ARTPHDFGAA . . Part of File 2 ARTGHHYESA . . Part of File 3 ARTPOLYWEA . . (4 Replies)
Discussion started by: patrick87
4 Replies

8. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment... (7 Replies)
Discussion started by: lv99
7 Replies

9. UNIX for Dummies Questions & Answers

File comparison of huge files

Hi all, I hope you are well. I am very happy to see your contribution. I am eager to become part of it. I have the following question. I have two huge files to compare (almost 3GB each). The files are simulation outputs. The format of the files are as below For clear picture, please see... (9 Replies)
Discussion started by: kaaliakahn
9 Replies

10. UNIX for Advanced & Expert Users

Need Optimization shell/awk script to aggreagte (sum) for all the columns of Huge data file

Optimization shell/awk script to aggregate (sum) for all the columns of Huge data file File delimiter "|" Need to have Sum of all columns, with column number : aggregation (summation) for each column File not having the header Like below - Column 1 "Total Column 2 : "Total ... ...... (2 Replies)
Discussion started by: kartikirans
2 Replies
bup-midx(1)						      General Commands Manual						       bup-midx(1)

NAME
bup-midx - create a multi-index (.midx) file from several .idx files SYNOPSIS
bup midx [-o outfile] <-a|-f|idxnames...> DESCRIPTION
bup midx creates a multi-index (.midx) file from one or more git pack index (.idx) files. Note: you should no longer need to run this command by hand. It gets run automatically by bup-save(1) and similar commands. OPTIONS
-o, --output=filename.midx use the given output filename for the .midx file. Default is auto-generated. -a, --auto automatically generate new .midx files for any .idx files where it would be appropriate. -f, --force force generation of a single new .midx file containing all your This will result in the fastest backup performance, but may take a long time to run. --dir=packdir specify the directory containing the .idx/.midx files to work with. The default is $BUP_DIR/objects/pack and $BUP_DIR/indexcache/*. --max-files maximum number of .idx files to open at a time. You can use this if you have an especially small number of file descriptors avail- able, so that midx can complete (though possibly non-optimally) even if it can't open all your .idx files at once. The default value of this option should be fine for most people. --check validate a .midx file by ensuring that all objects in its contained .idx files exist inside the .midx. May be useful for debugging. EXAMPLE
$ bup midx -a Merging 21 indexes (2278559 objects). Table size: 524288 (17 bits) Reading indexes: 100.00% (2278559/2278559), done. midx-b66d7c9afc4396187218f2936a87b865cf342672.midx DISCUSSION
By default, bup uses git-formatted pack files, which consist of a pack file (containing objects) and an idx file (containing a sorted list of object names and their offsets in the .pack file). Normal idx files are convenient because it means you can use git(1) to access your backup datasets. However, idx files can get slow when you have a lot of very large packs (which git typically doesn't have, but bup often does). bup .midx files consist of a single sorted list of all the objects contained in all the .pack files it references. This list can be binary searched in about log2(m) steps, where m is the total number of objects. To further speed up the search, midx files also have a variable-sized fanout table that reduces the first n steps of the binary search. With the help of this fanout table, bup can narrow down which page of the midx file a given object id would be in (if it exists) with a single lookup. Thus, typical searches will only need to swap in two pages: one for the fanout table, and one for the object id. midx files are most useful when creating new backups, since searching for a nonexistent object in the repository necessarily requires searching through all the index files to ensure that it does not exist. (Searching for objects that do exist can be optimized; for exam- ple, consecutive objects are often stored in the same pack, so we can search that one first using an MRU algorithm.) SEE ALSO
bup-save(1), bup-margin(1), bup-memtest(1) BUP
Part of the bup(1) suite. AUTHORS
Avery Pennarun <apenwarr@gmail.com>. Bup unknown- bup-midx(1)
All times are GMT -4. The time now is 04:44 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy