File comaprsons for the Huge data files ( around 60G) - Need optimized and teh best way to do this


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users File comaprsons for the Huge data files ( around 60G) - Need optimized and teh best way to do this
Prev   Next
# 1  
Old 10-25-2018
File comaprsons for the Huge data files ( around 60G) - Need optimized and teh best way to do this

I have 2 large file (.dat) around 70 g, 12 columns but the data not sorted in both the files.. need your inputs in giving the best optimized method/command to achieve this and redirect the not macthing lines to the thrid file ( diff.dat)


File 1 - 15 columns
File 2 - 15 columns

Data is not in sorted order.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Need Optimization shell/awk script to aggreagte (sum) for all the columns of Huge data file

Optimization shell/awk script to aggregate (sum) for all the columns of Huge data file File delimiter "|" Need to have Sum of all columns, with column number : aggregation (summation) for each column File not having the header Like below - Column 1 "Total Column 2 : "Total ... ...... (2 Replies)
Discussion started by: kartikirans
2 Replies

2. UNIX for Dummies Questions & Answers

File comparison of huge files

Hi all, I hope you are well. I am very happy to see your contribution. I am eager to become part of it. I have the following question. I have two huge files to compare (almost 3GB each). The files are simulation outputs. The format of the files are as below For clear picture, please see... (9 Replies)
Discussion started by: kaaliakahn
9 Replies

3. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment... (7 Replies)
Discussion started by: lv99
7 Replies

4. Shell Programming and Scripting

Three Difference File Huge Data Comparison Problem.

I got three different file: Part of File 1 ARTPHDFGAA . . Part of File 2 ARTGHHYESA . . Part of File 3 ARTPOLYWEA . . (4 Replies)
Discussion started by: patrick87
4 Replies

5. Shell Programming and Scripting

Problem running Perl Script with huge data files

Hello Everyone, I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this : foreach my $t (@text) { open TEXT, $t or die "Cannot open $t for reading: $!\n"; while(my $line=<TEXT>){ ... (4 Replies)
Discussion started by: ad23
4 Replies

6. Shell Programming and Scripting

Splitting the Huge file into several files...

Hi I have to write a script to split the huge file into several pieces. The file columns is | pipe delimited. The data sample is as: 6625060|1420215|07308806|N|20100120|5572477081|+0002.79|+0000.00|0004|0001|......... (3 Replies)
Discussion started by: lakteja
3 Replies

7. Shell Programming and Scripting

Split a huge data into few different files?!

Input file data contents: >seq_1 MSNQSPPQSQRPGHSHSHSHSHAGLASSTSSHSNPSANASYNLNGPRTGGDQRYRASVDA >seq_2 AGAAGRGWGRDVTAAASPNPRNGGGRPASDLLSVGNAGGQASFASPETIDRWFEDLQHYE >seq_3 ATLEEMAAASLDANFKEELSAIEQWFRVLSEAERTAALYSLLQSSTQVQMRFFVTVLQQM ARADPITALLSPANPGQASMEAQMDAKLAAMGLKSPASPAVRQYARQSLSGDTYLSPHSA... (7 Replies)
Discussion started by: patrick87
7 Replies

8. Shell Programming and Scripting

insert a header in a huge data file without using an intermediate file

I have a file with data extracted, and need to insert a header with a constant string, say: H|PayerDataExtract if i use sed, i have to redirect the output to a seperate file like sed ' sed commands' ExtractDataFile.dat > ExtractDataFileWithHeader.dat the same is true for awk and... (10 Replies)
Discussion started by: deepaktanna
10 Replies

9. Shell Programming and Scripting

How to extract data from a huge file?

Hi, I have a huge file of bibliographic records in some standard format.I need a script to do some repeatable task as follows: 1. Needs to create folders as the strings starts with "item_*" from the input file 2. Create a file "contents" in each folders having "license.txt(tab... (5 Replies)
Discussion started by: srsahu75
5 Replies

10. UNIX for Dummies Questions & Answers

search and grab data from a huge file

folks, In my working directory, there a multiple large files which only contain one line in the file. The line is too long to use "grep", so any help? For example, if I want to find if these files contain a string like "93849", what command I should use? Also, there is oder_id number... (1 Reply)
Discussion started by: ting123
1 Replies
Login or Register to Ask a Question
MUPLOT(1)							   User Commands							 MUPLOT(1)

NAME
muplot - plot a multi-curve figure from multiple data by using Gnuplot SYNOPSIS
muplot [OPTION]... [STYLE] [FILE] [AXES] [FILE] [AXES] ... DESCRIPTION
Muplot is a simple, non-interactive gnuplot-wrapper to plot a multi-curve figure from multiple data (files). It can produce PostScript, PDF, PNG or JPEG output file formats. OPTIONS
--help|-H display help --version output version and license message -h display short help -V print program version number -s create PostScript file -S send PostScript output to STDOUT (the same as '-s -o -') -n create PNG file -j create JPEG file -p create PDF file (requires the gnuplot "pdfcairo" driver) -c <cmd> execute gnuplot command(s) (the default plot style is used) -m monochrome plot (valid only for PostScript) -l set plot size to 800x600 (valid for PNG and JPEG) -o base name of the output file -q quiet mode (all messages except errors to be suppressed) -i ignore local command file './.muplotset' -I <file> specify an alternative command file instead of './.muplotset' Styles: l lines p points lp lines and points (default) pp circle points d dots b boxes g grid e errorbars - default used columns are 1:2:3 (x:y:yerror) a fields with arrows; The data file has a special format in this case. Use 'prefield' to prepare such data files. dt=<fmt> date/time series with the specified format; For example: dt="%H:%M.%S@%H:%M" where the first part, in front of "@", defines the data format, and the second part defines the format that will be used for tic labels. Here, hours and minutes are separated by `:', respectively minutes and seconds by `.' Another example could be a date: dt="%Y-%m-%d". u=<fmt> user specified format as defined in Gnuplot Axes: x:y,x:y-z columns in the file defining the x/y-axes of the curve(s); Default are 1:2 or 1:2:3 for data with errors. In case that only one col- umn is provided the default axes are 0:1 - the x-axis will be a simple index then. File(s) could be a single file name whereas '-' means <stdin>, many files enclosed in '' or "" like "file1 file2 file3", or any valid shell pattern as for example "*.dat". The files '$HOME/.muplotset' and './.muplotset', if existing, will be included at the beginning of the gnu- plot script. The command block between "#BEGIN" and "#END" in those files will be pasted to the end of the script. If you want that the global '$HOME/.muplotset' is ignored, create in your local directory a file named '.muplotset.noglobal'. In case you want to view the out- put, define the env variable MUPLOT_VIEWER and export it, for example: MUPLOT_VIEWER="xpdf -z page"; export MUPLOT_VIEWER Then the program will prompt you to view the plot, and after confirmation the viewer will present the graphics. If the postscript file for- mat is chosen ('-s' option), and MUPLOT_VIEWER is not defined, the viewer is preset to 'gv', and per default you are prompted to view the output. To disable this behavior, set MUPLOT_VIEWER="". EXAMPLES
1) On X-terminal view a multi-curve plot of data files with extension 'dat' muplot l "*.dat" 2) Print a sinus curve in black-and-white color on a PostScript printer muplot -m -S -c "set title 'Function f(x)=sin(x)'; plot sin(x);" | lpr 3) Plot data from file "example.dat" using columns 1:2, 3:4, and 3:5 as x/y-axes in the multi-curve plot; a PostScript file with the name "example.ps" is automatically created. muplot -s lp example.dat 1:2,3:4-5 4) Create graphics in PDF format reading data from file "example.1.dat" (columns 1:2), and from file "example.2.dat" (columns 3:4) muplot -p lp example.1.dat 1:2 example.2.dat 3:4 5) View data where the third column is a date of the form 'yyyy-mm-dd' cat example_counts_per_day.dat | muplot dt="%Y-%m-%d" - 3:1 REPORTING BUGS
Report bugs to <gnu@mirendom.net> COPYRIGHT
Copyright (C) 1996-2009, 2011-2012 Dimitar Ivanov License: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. muplot 3.2.1 February 2012 MUPLOT(1)