Sponsored Content
Top Forums UNIX for Advanced & Expert Users Help optimizing sort of large files Post 302924823 by gandolf989 on Wednesday 12th of November 2014 01:58:50 PM
Old 11-12-2014
Since you have 300 GB of data, you are better off loading this into a database. Oracle would work well, especially if you used the Advanced Compression for OLTP with 11gR2 or possibly 12c. You can load it in a table with each row in the file as one row in the database and each character of the in the first half of the line as an individual CHAR(1). You could then create an index with the columns that you want to sort by and you would already have it sorted. You can also store the data as an index organised table. If this is not for profit you probably don't need an Oracle license. You just need to know how to install Oracle and create a database. You can also use mySQL to get nearly the same thing.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Large files

I am trying to understand the webserver log file for an error which has occured on my live web site. The webserver access file is very big in size so it's not possible to open this file using vi editor. I know the approximate time the error occured, so i am interested in looking for the log file... (4 Replies)
Discussion started by: sehgalniraj
4 Replies

2. Shell Programming and Scripting

Large Text Files

Hi All I have approximately 10 files that are at least 100+ MB in size. I am importing them into a DB to output them to the web. What i need to do first is clean the files up so i dont have un necessary rows in the DB. Below is what the file looks like: Ignore the <TAB> annotations as that... (4 Replies)
Discussion started by: caddyjoe77
4 Replies

3. UNIX for Dummies Questions & Answers

large files?

How do we check 'large files' is enabled on a Unix box -- HP-UX B11.11 (2 Replies)
Discussion started by: ranj@chn
2 Replies

4. UNIX for Dummies Questions & Answers

Sort large file

I was wondering how sort works. Does file size and time to sort increase geometrically? I have a 5.3 billion line file I'd like to use with sort -u I'm wondering if that'll take forever because of a geometric expansion? If it takes 100 hours that's fine but not 100 days. Thanks so much. (2 Replies)
Discussion started by: dcfargo
2 Replies

5. Shell Programming and Scripting

a problem with large files

hello all, kindly i need your help, i made a script to print a specific lines from a huge file about 3 million line. the output of the script will be about 700,000 line...the problem is the script is too slow...it kept working for 5 days and the output was only 200,000 lines !!! the script is... (16 Replies)
Discussion started by: m_wassal
16 Replies

6. Shell Programming and Scripting

Divide large data files into smaller files

Hello everyone! I have 2 types of files in the following format: 1) *.fa >1234 ...some text... >2345 ...some text... >3456 ...some text... . . . . 2) *.info >1234 (7 Replies)
Discussion started by: ad23
7 Replies

7. UNIX for Dummies Questions & Answers

Speeding/Optimizing GREP search on CSV files

Hi all, I have problem with searching hundreds of CSV files, the problem is that search is lasting too long (over 5min). Csv files are "," delimited, and have 30 fields each line, but I always grep same 4 fields - so is there a way to grep just those 4 fields to speed-up search. Example:... (11 Replies)
Discussion started by: Whit3H0rse
11 Replies

8. Solaris

How to safely copy full filesystems with large files (10Gb files)

Hello everyone. Need some help copying a filesystem. The situation is this: I have an oracle DB mounted on /u01 and need to copy it to /u02. /u01 is 500 Gb and /u02 is 300 Gb. The size used on /u01 is 187 Gb. This is running on solaris 9 and both filesystems are UFS. I have tried to do it using:... (14 Replies)
Discussion started by: dragonov7
14 Replies

9. UNIX for Advanced & Expert Users

Script to sort the files and append the extension .sort to the sorted version of the file

Hello all - I am to this forum and fairly new in learning unix and finding some difficulty in preparing a small shell script. I am trying to make script to sort all the files given by user as input (either the exact full name of the file or say the files matching the criteria like all files... (3 Replies)
Discussion started by: pankaj80
3 Replies

10. Shell Programming and Scripting

Script to sort large file with frequency

Hello, I have a very large file of around 2 million records which has the following structure: I have used the standard awk program to sort: # wordfreq.awk --- print list of word frequencies { # remove punctuation #gsub(/_]/, "", $0) for (i = 1; i <= NF; i++) freq++ } END { for (word... (3 Replies)
Discussion started by: gimley
3 Replies
OCI_SET_PREFETCH(3)													       OCI_SET_PREFETCH(3)

oci_set_prefetch - Sets number of rows to be prefetched by queries

SYNOPSIS
bool oci_set_prefetch (resource $statement, int $rows) DESCRIPTION
Sets the number of rows to be buffered by the Oracle Client libraries after a successful query call to oci_execute(3) and for each subse- quent internal fetch request to the database. For queries returning a large number of rows, performance can be significantly improved by increasing the prefetch count above the default oci8.default_prefetch value. Prefetching is Oracle's efficient way of returning more than one data row from the database in each network request. This can result in better network and CPU utilization. The buffering of rows is internal to OCI8 and the behavior of OCI8 fetching functions is unchanged regardless of the prefetch count. For example, oci_fetch_row(3) will always return one row. The prefetch buffer is per-statement and is not used by re-executed statements or by other connections. Call oci_set_prefetch(3) before calling oci_execute(3). A tuning goal is to set the prefetch value to a reasonable size for the network and database to handle. For queries returning a very large number of rows, overall system efficiency might be better if rows are retrieved from the database in several chunks (i.e set the prefetch value smaller than the number of rows). This allows the database to handle other users' statements while the PHP script is processing the current set of rows. Query prefetching was introduced in Oracle 8 i. REF CURSOR prefetching was introduced in Oracle 11 gR2 and occurs when PHP is linked with Oracle 11 gR2 (or later) Client libraries. Nested cursor prefetching was introduced in Oracle 11 gR2 and requires both the Oracle Client libraries and the database to be version 11 gR2 or greater. Prefetching is not supported when queries contain LONG or LOB columns. The prefetch value is ignored and single-row fetches will be used in all the situations when prefetching is not supported. When using Oracle Database 12 c, the prefetch value set by PHP can be overridden by Oracle's client oraaccess.xml configuration file. Refer to Oracle documentation for more detail. PARAMETERS
o $statement -A valid OCI8 statement identifier created by oci_parse(3) and executed by oci_execute(3), or a REF CURSOR statement identifier. o $rows - The number of rows to be prefetched, >= 0 RETURN VALUES
Returns TRUE on success or FALSE on failure. CHANGELOG
+------------------------+---------------------------------------------------+ | Version | | | | | | | Description | | | | +------------------------+---------------------------------------------------+ | 5.3.2 (PECL OCI8 1.4) | | | | | | | Before this release, $rows must be >= 1. | | | | |5.3.0 (PECL OCI8 1.3.4) | | | | | | | Before this release, prefetching was limited to | | | the lesser of $rows rows and 1024 * $rows bytes. | | | The byte size restriction has now been removed. | | | | +------------------------+---------------------------------------------------+ EXAMPLES
Example #1 Changing the default prefetch value for a query <?php $conn = oci_connect('hr', 'welcome', 'localhost/XE'); $stid = oci_parse($conn, 'SELECT * FROM myverybigtable'); oci_set_prefetch($stid, 300); // Set before calling oci_execute() oci_execute($stid); echo "<table border='1'> "; while ($row = oci_fetch_array($stid, OCI_ASSOC+OCI_RETURN_NULLS)) { echo "<tr> "; foreach ($row as $item) { echo " <td>".($item !== null ? htmlentities($item, ENT_QUOTES) : "&nbsp;")."</td> "; } echo "</tr> "; } echo "</table> "; oci_free_statement($stid); oci_close($conn); ?> Example #2 Changing the default prefetch for a REF CURSOR fetch <?php /* Create the PL/SQL stored procedure as: CREATE OR REPLACE PROCEDURE myproc(p1 OUT SYS_REFCURSOR) AS BEGIN OPEN p1 FOR SELECT * FROM all_objects WHERE ROWNUM < 5000; END; */ $conn = oci_connect('hr', 'welcome', 'localhost/XE'); $stid = oci_parse($conn, 'BEGIN myproc(:rc); END;'); $refcur = oci_new_cursor($conn); oci_bind_by_name($stid, ':rc', $refcur, -1, OCI_B_CURSOR); oci_execute($stid); // Change the prefetch before executing the cursor. // REF CURSOR prefetching works when PHP is linked with Oracle 11gR2 or later Client libraries oci_set_prefetch($refcur, 200); oci_execute($refcur); echo "<table border='1'> "; while ($row = oci_fetch_array($refcur, OCI_ASSOC+OCI_RETURN_NULLS)) { echo "<tr> "; foreach ($row as $item) { echo " <td>".($item !== null ? htmlentities($item, ENT_QUOTES) : "&nbsp;")."</td> "; } echo "</tr> "; } echo "</table> "; oci_free_statement($refcur); oci_free_statement($stid); oci_close($conn); ?> If PHP OCI8 fetches from a REF CURSOR and then passes the REF CURSOR back to a second PL/SQL procedure for further processing, then set the REF CURSOR prefetch count to 0 to avoid rows being "lost" from the result set. The prefetch value is the number of extra rows fetched in each OCI8 internal request to the database, so setting it to 0 means only fetch one row at a time. Example #3 Setting the prefetch value when passing a REF CURSOR back to Oracle <?php $conn = oci_connect('hr', 'welcome', 'localhost/orcl'); // get the REF CURSOR $stid = oci_parse($conn, 'BEGIN myproc(:rc_out); END;'); $refcur = oci_new_cursor($conn); oci_bind_by_name($stid, ':rc_out', $refcur, -1, OCI_B_CURSOR); oci_execute($stid); // Display two rows, but don't prefetch any extra rows otherwise // those extra rows would not be passed back to myproc_use_rc(). // A prefetch value of 0 is allowed in PHP 5.3.2 and PECL OCI8 1.4 oci_set_prefetch($refcur, 0); oci_execute($refcur); $row = oci_fetch_array($refcur); var_dump($row); $row = oci_fetch_array($refcur); var_dump($row); // pass the REF CURSOR to myproc_use_rc() to do more data processing // with the result set $stid = oci_parse($conn, 'begin myproc_use_rc(:rc_in); end;'); oci_bind_by_name($stid, ':rc_in', $refcur, -1, OCI_B_CURSOR); oci_execute($stid); ?> SEE ALSO
oci8.default_prefetch ini option . PHP Documentation Group OCI_SET_PREFETCH(3)
All times are GMT -4. The time now is 11:14 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy