Sponsored Content
Full Discussion: Sorting
Top Forums Shell Programming and Scripting Sorting Post 302423661 by durden_tyler on Friday 21st of May 2010 03:27:55 PM
Old 05-21-2010
Quote:
Originally Posted by Ernst
... However, they do not work for my huge file.
With my huge file, I do not get an output file.
Well, how huge is your huge file ? 1000 lines ? 10,000 lines ? 100,000 lines ? 1 million lines ?

And, there was no *output file* in my suggested script. So if you executed the Perl one-liner as I had posted, you wouldn't see any output file either.

The Perl one-liner processes your input file ("input.dat" in my post) and spews the output on stdout - which is your Terminal screen by default.

Quote:
... Whenever I cat the output, I do not get any data.
I did not get an error message either.
If you mean displaying the output file with the use of the "cat" command, then did you redirect the output to a file first ?
If yes, then can you post what exactly you typed on your Terminal screen ? (i.e. can you copy/paste the session from your Terminal screen).

Quote:
...
Below is my input file:
Code:
1  .6=   
1  .5=   
1  .4=   
1  .3=12 
1  .2=348
1  .1=180
10 .6=   
10 .5=   
10 .4=   
10 .3=360
10 .2=192
10 .1=24 
100.6=   
100.5=   
100.4=   
100.3=364
100.2=196
100.1=28 
101.6=   
101.5=   
101.4=   
101.3=464
101.2=296
101.1=128
102.6=   
102.5=   
102.4=   
102.3=444
...
...

As posted by others, your input files do not show consistent data. This is what you've posted earlier:

Quote:
Originally Posted by Ernst
Okay, I have 6 groups. Now I want the script to go through these groups and look at the structure of each group. If the .1 (within a group) is greater than the .2 or .3 OR if .2 is greater than .3 within a group; thus output these groups. In our case the output would be groups.

input file
Code:
1.6=176
1.5=172
1.4=168
1.3=14
1.2=13
1.1=12
 
230.3=146
230.2=147
230.1=148
 
3.3=20
3.2=19
3.1=18
 
5.6=166
5.5=122
5.4=160
5.3=103
5.2=102
5.1=100
 
100.6=176
100.5=172
100.4=168
100.3=20
100.2=12
100.1=16
 
117.3=24
117.2=82
117.1=79

...
I hope it is clear enough.
As you can see, the differences are listed below:

Difference # 1 : Your old input file did not have space between "1" and ".", whereas your new file has the space.

Code:
# First line of old input file
1.6=176
 
# First line of new input file
1  .6=

Difference # 2 : Your old input file has a number to the right of every single "=" character. Your new input file does not have a number to the right of every single "=" character.

Code:
# First  5 lines of old input file

1.6=176
1.5=172
1.4=168
1.3=14
1.2=13
 
# First 5 lines of new input file
1  .6=   
1  .5=   
1  .4=   
1  .3=12 
1  .2=348

Difference # 3 : Your old input file has blank lines at the end of each "group". Your new input file does not have even a single blank line.

Code:
# First 10 lines of old input file; it has two "groups" with a blank line to separate them
1.6=176
1.5=172
1.4=168
1.3=14
1.2=13
1.1=12
 
230.3=146
230.2=147
230.1=148
 
# First 10 lines of new input file; it has no blank lines anywhere in the file
1  .6=   
1  .5=   
1  .4=   
1  .3=12 
1  .2=348
1  .1=180
10 .6=   
10 .5=   
10 .4=   
10 .3=360

Needless to say, you shouldn't expect consistent solutions to inconsistent problems !

Quote:
...
Try your scripts and let me know whether or not it works for you.
...
Sure thing. Since you did not mention how huge your input file is, I'll assume it has 2 million lines.

Here's what I did. I took this input file "input.dat" and kept on appending the content over and over to another file called "input.txt".

Code:
$
$ cat input.dat
1.6=176
1.5=172
1.4=168
1.3=14
1.2=13
1.1=12
 
230.3=146
230.2=147
230.1=148
 
3.3=20
3.2=19
3.1=18
 
5.6=166
5.5=122
5.4=160
5.3=103
5.2=102
5.1=100
 
100.6=176
100.5=172
100.4=168
100.3=20
100.2=12
100.1=16
 
117.3=24
117.2=82
117.1=79
$

The final line count of "input.txt" is 2 million lines roughly.
Here's some information about "input.txt".

Code:
$
$ # the line, word and character counts of "input.txt"; note that it has 2,062,500 lines
$ wc input.txt
 2062500  1687500 14625000 input.txt
$
$ # the first 10 lines of "input.txt"
$ head input.txt
1.6=176
1.5=172
1.4=168
1.3=14
1.2=13
1.1=12
 
230.3=146
230.2=147
230.1=148
$
$ # the last 10 lines of "input.txt"
$ tail input.txt
100.5=172
100.4=168
100.3=20
100.2=12
100.1=16
 
117.3=24
117.2=82
117.1=79
$

And now, I run the Perl one-liner on the file "input.txt" and redirect the output to file "output.txt".

I also feed the entire one-liner to the "time" command.

Code:
$
$
$ time perl -lne 'chomp;
           if (/^\s*$/) {
             if ($x>$y or $x>$z or $y>$z) {print foreach (@a); print}
             @a=(); $x=$y=$z="";
           } else {
             push @a,$_;
             if (/^\d+\.1=(.*)$/) {$x = $1}
             elsif (/^\d+\.2=(.*)$/) {$y = $1}
             elsif (/^\d+\.3=(.*)$/) {$z = $1}
           }
           END {if ($x>$y or $x>$z or $y>$z) {print foreach (@a); print}}
          ' input.txt >output.txt
real    0m15.125s
user    0m0.015s
sys     0m0.031s
$
$
$ wc output.txt
 937500  750000 8250000 output.txt
$
$ head output.txt
230.3=146
230.2=147
230.1=148
 
100.6=176
100.5=172
100.4=168
100.3=20
100.2=12
100.1=16
$
$ tail output.txt
100.5=172
100.4=168
100.3=20
100.2=12
100.1=16
 
117.3=24
117.2=82
117.1=79
$
$

And that's 15.125 seconds to process 2 million lines.

tyler_durden
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need immediate help with sorting!!!

hey, I have a file that looks smthng like this: /*--- abcd_0050 ---*/ asdfjk adsfkja lkjljgafsd /*---abcd_0005 ---*/ lkjkljbfkgj ldfksjgf dfkgfjb /*-- abcd_0055--*/ klhfdghd dflkjgd jfdg I would like it to be sorted so that it looks like this: /*---abcd_0005 ---*/ lkjkljbfkgj (9 Replies)
Discussion started by: sasuke_uchiha
9 Replies

2. UNIX for Dummies Questions & Answers

Sorting help

how can i sort the next list just by look at the numbers (ignore letters) example: abc123 dff4f aaa2aa bbbb55555bb output: aaa2aa dff4f abc123 bbbb55555bb (1 Reply)
Discussion started by: nirnir26
1 Replies

3. UNIX for Dummies Questions & Answers

Sorting help

i have list of files: Wang De Wong CVPR 09.pdf Yaacob AFGR 99 Second edition.pdf Shimon CVPR 01.pdf Den CCC 97 long one.pdf Ronald De Bour CSPP 04.pdf ..... how can i sort this directory so the output will be in the next format: <year>\t<conference/journal>\t<author list> - t is tab (its... (1 Reply)
Discussion started by: nirnir26
1 Replies

4. Homework & Coursework Questions

Sorting help

i have list of files: Wang De Wong CVPR 09.pdf Yaacob AFGR 99 Second edition.pdf Shimon CVPR 01.pdf Den CCC 97 long one.pdf Ronald De Bour CSPP 04.pdf ..... how can i sort this directory so the output will be in the next format: <year>\t<conference/journal>\t<author list> - t is tab (its... (1 Reply)
Discussion started by: nirnir26
1 Replies

5. UNIX for Dummies Questions & Answers

HELP on sorting

hi everyone, I am kind of new to this forum. I need help in sorting this data out accordingly, I am actually doing a traceroute application and wants my AS path displayed in front of my address like this; 192.168.1.1 AS28513 AS65534 AS5089 AS5089 .... till the last AS number and if possible... (1 Reply)
Discussion started by: sam127
1 Replies

6. UNIX for Advanced & Expert Users

HELP on sorting

hi everyone, I am kind of new to this forum. I need help in sorting this data out accordingly, I am actually doing a traceroute application and wants my AS path displayed in front of my address like this; 192.168.1.1 AS28513 AS65534 AS5089 AS5089 .... till the last AS number and if possible... (1 Reply)
Discussion started by: sam127
1 Replies

7. Shell Programming and Scripting

Sorting HELP

Hi, I have posted related topic but as i continue the research I find more need to sort the data. AS(2607:f278:4101:11:dead:beef:f00f:f), AS786 AS6453 AS7575 AS7922 AS(2607:f2e0:f:1db::16), AS786 AS3257 AS36252 AS786 AS3257 AS36252 AS(2607:f2f8:1700::2), AS786 AS6939 AS25795 ... (6 Replies)
Discussion started by: sam127
6 Replies

8. Shell Programming and Scripting

sorting

Hi all, Does anyone can help me the following question? I would like to write an AWK script. In the following input file, each number in "start" is paired with numbers in column "end". No Start End A 22,222,33,22,1233,3232,44 555,333,222,55,1235,3235,66... (7 Replies)
Discussion started by: phoeberunner
7 Replies

9. Shell Programming and Scripting

sorting help

Hi, Please i need help in writing an 'awk' script in sorting the following data; traceroute6 to 2001:1ba0:2a0:5965:0:30:24:1 (2001:1ba0:2a0:5965:0:30:24:1) from 2001:418:1::62, 64 hops max, 16 byte packets 1 2001:418:1::4 0.342 ms 2 2001:418:1::1 0.630 ms 3 2001:504:16::1b1b 0.393 ms 4... (6 Replies)
Discussion started by: sam127
6 Replies

10. Shell Programming and Scripting

sorting

Hii guys, I need to sort my file and remove duplicates before writing to another file. The first line in the file are column names. I dont want this line to be sorted and should always be the first line in the output. sort -u file.txt > file1.txt. is the command that i am using... (4 Replies)
Discussion started by: just4u_sharath
4 Replies
All times are GMT -4. The time now is 07:46 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy