Sorting blocks by a section of the identifier

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Sorting blocks by a section of the identifier
# 8  
Old 04-13-2017
Don
Thanks! Is there any way to sort the data directly using awk instead of "handing" the task to sort?
# 9  
Old 04-13-2017
Quote:
Originally Posted by Xterra
Don
Thanks! Is there any way to sort the data directly using awk instead of "handing" the task to sort?
Not with standard awk. GNU awk (sometimes invoked as gawk) includes two built-in functions (asort() and asorti()) that can be used to sort arrays. I'm guessing that asorti() could be used to sort an array whose indices are your record's 7th colon separated field values, but these functions are not available in the awk on my system (so I have no way of testing it out). And, of course, you could write your own function in awk to sort an array. Any of these array sorting functions would require that you load your entire input file (other than skipping record #1) into an array in memory, sorting the array, and then writing the sorted array. Your current awk script never keeps more than one input record in memory. (Note that the sort utility uses temporary files when necessary if the file(s) being sorted are too large to fit in memory. You haven't said how large your input file is (in bytes) although you have said it can have up to 15 million records, so I don't know if this would be a concern trying to use awk instead of sort to sort your data on your system.)
# 10  
Old 04-13-2017
Don
My files are ~8,000,000kb. Would that limit my ability to sort the records using awk instead of sort?
# 11  
Old 04-13-2017
Hi Xterra,
Do you have enough memory on your system for awk to build an array larger than 8 terabytes?
# 12  
Old 04-13-2017
I will give it a try on my cluster
# 13  
Old 04-13-2017
Quote:
Originally Posted by Xterra
I will give it a try on my cluster
It is, ahem, highly unlikely anybody can peruse something with 8TB of RAM.

If your input file is relatively static (like there will only be lines appended but rarely to never lines get deleted) you might try to create a smaller file with just your sort-key and a line-number. This file should be considerably smaller (maybe several hundred MB or a few GB) and the might be possible to sort in memory.

You would have to go through this file and either rewrite your big file or filter out the subset you are interested in using the line numbers, which will perhaps take a long time again, but if the file changes not that often (see above) you will have to redo only parts of it, so this might help anyway.

Along the same lines: wouldn't a database with an indexed table be what you want? Databases have methods to deal with files that are bigger as the available main memory. So what you are doing here is perhaps old news for DB-software

I hope this helps.

bakunin
# 14  
Old 04-14-2017
Wouldn't 8.000.000 kB (8 * 10^6 * 10^3) be 8 GB? And thus (sort of) manageable? How come we're talking terabytes?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Row blocks to column blocks

Hello, Searched for a while and found some "line-to-column" script. My case is similar but with multiple fields each row: S02 Length Per S02 7043 3.864 S02 54477 29.89 S02 104841 57.52 S03 Length Per S03 1150 0.835 S03 1321 0.96 S03 ... (9 Replies)
Discussion started by: yifangt
9 Replies

2. UNIX for Dummies Questions & Answers

Sorting arrays horizontally without END section, awk

input: ref001, Europe, Belgium, 1001 ref001, Europe, Spain, 203 ref001, Europe, Germany, 457 ref002, America, Canada, 234 ref002, America, US, 87 ref002, America, Alaska, 652 Without using an END section, I need to write all the info related to the same ref number ($1)and continent ($2) on... (9 Replies)
Discussion started by: lucasvs
9 Replies

3. Shell Programming and Scripting

Prepend first line of section to each line until the next section header

I have searched in a variety of ways in a variety of places but have come up empty. I would like to prepend a portion of a section header to each following line until the next section header. I have been using sed for most things up until now but I'd go for a solution in just about anything--... (7 Replies)
Discussion started by: pagrus
7 Replies

4. Shell Programming and Scripting

how to split this file into blocks and then send these blocks as input to the tool called Yices?

Hello, I have a file like this: FILE.TXT: (define argc :: int) (assert ( > argc 1)) (assert ( = argc 1)) <check> # (define c :: float) (assert ( > c 0)) (assert ( = c 0)) <check> # now, i want to separate each block('#' is the delimeter), make them separate files, and then send them as... (5 Replies)
Discussion started by: paramad
5 Replies

5. Shell Programming and Scripting

is not an identifier

Hi Guys... I am using the following codes in my script: SID_L=`cat /var/opt/oracle/oratab|grep -v "^#"|cut -f1 -d: -s` SID_VAR=$SID_L for SID_RUN in $SID_VAR do ORACLE_HOME=`grep ^$SID_RUN /var/opt/oracle/oratab | \ awk -F: '{print $2}'` ;export ORACLE_HOME export... (2 Replies)
Discussion started by: Phuti
2 Replies

6. Shell Programming and Scripting

Extract section of file based on word in section

I have a list of Servers in no particular order as follows: virtualMachines="IIBSBS IIBVICDMS01 IIBVICMA01"And I am generating some output from a pre-existing script that gives me the following (this is a sample output selection). 9/17/2010 8:00:05 PM: Normal backup using VDRBACKUPS... (2 Replies)
Discussion started by: jelloir
2 Replies

7. UNIX for Dummies Questions & Answers

Convert 512-blocks to 4k blocks

I'm Unix. I'm looking at "df" on Unix now and below is an example. It's lists the filesystems out in 512-blocks, I need this in 4k blocks. Is there a way to do this in Unix or do I manually convert and how? So for container 1 there is 7,340,032 in size in 512-blocks. What would the 4k block be... (2 Replies)
Discussion started by: rockycj
2 Replies

8. Shell Programming and Scripting

not an identifier

Hi I have already gone through this topic on this forum, but still i am getting same problem. I am using solaris 10. my login shell is /usr/bash i have got a script as below /home/gyan> cat 3.cm #!/usr/bin/ksh export PROG_NAME=rpaa001 if i run this script as below , it works fine... (3 Replies)
Discussion started by: gyanibaba
3 Replies

9. Shell Programming and Scripting

Sorting blocks of data

Hello all, Below is what I am trying to accomplish: I have a file that looks like this /* ----------------- xxxx.y_abcd_00000050 ----------------- */ jdghjghkla sadgsdags asdgsdgasd asdgsagasdg /* ----------------- xxxx.y_abcd_00000055 ----------------- */ sdgsdg sdgxcvzxcbv... (8 Replies)
Discussion started by: alfredo123
8 Replies

10. Shell Programming and Scripting

Sorting rules on a text section

Hi all My text file looks like this: start doc ... (certain number of records) REC3|Emma|info| REC3|Lukas|info| REC3|Arthur|info| ... (certain number of records) end doc start doc ... (certain number of records)... (4 Replies)
Discussion started by: Indalecio
4 Replies
Login or Register to Ask a Question