Sorting blocks by a section of the identifier

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Sorting blocks by a section of the identifier
# 15  
Old 04-14-2017
Quote:
Originally Posted by RudiC
Wouldn't 8.000.000 kB (8 * 10^6 * 10^3) be 8 GB? And thus (sort of) manageable? How come we're talking terabytes?
Now, that you mention it: i think you are right. I just read Dons "8TB" and didn't recalculate myself. My bad.

I just counted one record of the posted saple to have 260 characters. As a size of 15-25 million records were mentioned: 15 * 10^6 * 260 ~ 4GB, 25 * 10^6 * 260 ~ 6GB. This should indeed be feasible to sort in memory.

bakunin
# 16  
Old 04-14-2017
I'm sorry for all of the confusion. I had originally intended to type 8GB, but hit the T instead of the G key. Smilie Then while I was reviewing it, I decided to spell it out and converted the 8TB to 8 terabytes compounding, instead of correcting, the error. Smilie

With the BSD based awk on macOS, I don't have the asorti() function and only the 1st character of values assigned to RS matters. So, the following is completely untested, but if I understand the GNU awk page correctly, I think the pipeline:
Code:
awk -vRS="@M0" 'BEGIN{FS="\n"; OFS="\t"}NR>1{print RS$1, $2, $3, $4}' test.txt | sort -t: -k 7 | tr "\t" "\n"

should be replaceable by the following single invocation of awk:
Code:
awk '
BEGIN {	FS = OFS = "\n"
	RS = "@M0"
}
NR > 1 {split($1, f, /:/)
	out[f[7]] = RS $0
	order[f[7]]
}
END {	n = asorti(order)
	for(i = 1; i <= n; i++)
		printf("%s", out[order[i]])
}' test.txt

as long as there are no duplicates in the 7th colon separated field in any of the records in your input file. (If there are duplicates, I think all but the last record in each set of duplicates will be missing in the output produced by the above script.)

I would appreciate it if someone with access to GNU awk could try this out with the sample data in post #1 in this thread and let me know if I came close to getting it right. Smilie
# 17  
Old 04-18-2017
Sorry for the long delay! I will give it a try
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Row blocks to column blocks

Hello, Searched for a while and found some "line-to-column" script. My case is similar but with multiple fields each row: S02 Length Per S02 7043 3.864 S02 54477 29.89 S02 104841 57.52 S03 Length Per S03 1150 0.835 S03 1321 0.96 S03 ... (9 Replies)
Discussion started by: yifangt
9 Replies

2. UNIX for Dummies Questions & Answers

Sorting arrays horizontally without END section, awk

input: ref001, Europe, Belgium, 1001 ref001, Europe, Spain, 203 ref001, Europe, Germany, 457 ref002, America, Canada, 234 ref002, America, US, 87 ref002, America, Alaska, 652 Without using an END section, I need to write all the info related to the same ref number ($1)and continent ($2) on... (9 Replies)
Discussion started by: lucasvs
9 Replies

3. Shell Programming and Scripting

Prepend first line of section to each line until the next section header

I have searched in a variety of ways in a variety of places but have come up empty. I would like to prepend a portion of a section header to each following line until the next section header. I have been using sed for most things up until now but I'd go for a solution in just about anything--... (7 Replies)
Discussion started by: pagrus
7 Replies

4. Shell Programming and Scripting

how to split this file into blocks and then send these blocks as input to the tool called Yices?

Hello, I have a file like this: FILE.TXT: (define argc :: int) (assert ( > argc 1)) (assert ( = argc 1)) <check> # (define c :: float) (assert ( > c 0)) (assert ( = c 0)) <check> # now, i want to separate each block('#' is the delimeter), make them separate files, and then send them as... (5 Replies)
Discussion started by: paramad
5 Replies

5. Shell Programming and Scripting

is not an identifier

Hi Guys... I am using the following codes in my script: SID_L=`cat /var/opt/oracle/oratab|grep -v "^#"|cut -f1 -d: -s` SID_VAR=$SID_L for SID_RUN in $SID_VAR do ORACLE_HOME=`grep ^$SID_RUN /var/opt/oracle/oratab | \ awk -F: '{print $2}'` ;export ORACLE_HOME export... (2 Replies)
Discussion started by: Phuti
2 Replies

6. Shell Programming and Scripting

Extract section of file based on word in section

I have a list of Servers in no particular order as follows: virtualMachines="IIBSBS IIBVICDMS01 IIBVICMA01"And I am generating some output from a pre-existing script that gives me the following (this is a sample output selection). 9/17/2010 8:00:05 PM: Normal backup using VDRBACKUPS... (2 Replies)
Discussion started by: jelloir
2 Replies

7. UNIX for Dummies Questions & Answers

Convert 512-blocks to 4k blocks

I'm Unix. I'm looking at "df" on Unix now and below is an example. It's lists the filesystems out in 512-blocks, I need this in 4k blocks. Is there a way to do this in Unix or do I manually convert and how? So for container 1 there is 7,340,032 in size in 512-blocks. What would the 4k block be... (2 Replies)
Discussion started by: rockycj
2 Replies

8. Shell Programming and Scripting

not an identifier

Hi I have already gone through this topic on this forum, but still i am getting same problem. I am using solaris 10. my login shell is /usr/bash i have got a script as below /home/gyan> cat 3.cm #!/usr/bin/ksh export PROG_NAME=rpaa001 if i run this script as below , it works fine... (3 Replies)
Discussion started by: gyanibaba
3 Replies

9. Shell Programming and Scripting

Sorting blocks of data

Hello all, Below is what I am trying to accomplish: I have a file that looks like this /* ----------------- xxxx.y_abcd_00000050 ----------------- */ jdghjghkla sadgsdags asdgsdgasd asdgsagasdg /* ----------------- xxxx.y_abcd_00000055 ----------------- */ sdgsdg sdgxcvzxcbv... (8 Replies)
Discussion started by: alfredo123
8 Replies

10. Shell Programming and Scripting

Sorting rules on a text section

Hi all My text file looks like this: start doc ... (certain number of records) REC3|Emma|info| REC3|Lukas|info| REC3|Arthur|info| ... (certain number of records) end doc start doc ... (certain number of records)... (4 Replies)
Discussion started by: Indalecio
4 Replies
Login or Register to Ask a Question