Need a quick and dirty solution


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Need a quick and dirty solution
# 8  
Old 06-29-2012
The suggestions mentioned below are all incorrect for reasons stated in post #3.

Quote:
Originally Posted by elixir_sinari
Code:
awk -F"[ v]" 'a[$1]+0<$3+0{a[$1]=$3} END{for(i in a) print i" v"a[i]}' inputfile

Incorrect numeric comparison. Numerically, 1.10 is less than 1.9, but as a version number, 1.10 is greater than 1.9. To perform the comparison numerically and correctly, you'd have to transform the version number in some way. Perhaps something like a*1000 + b, where a is the first component of the version string and b is the second. Obviously, though, b can never be greater than or equal to the constant multiplier, or there would be some ambiguity.


Quote:
Originally Posted by expert
try below
Code:
for var in `awk -F " " '{print $1}' test_file_data | uniq`
do
  grep $var test_file_data | sort | tail -1
done

A lexicographical sort is inappropriate for a version string. Comparing as strings, 1.9 is greater than 1.10 since 9 follows 1 in the collation sequence of every locale of which I'm aware. Further, lexicographical sort results can vary with locale.

Why specify the field separator in awk -F " "? A single space is already the default value for FS. awk treats that specially. A single space ignores leading and trailing blank characters and splits on sequences of blanks (in the C/POSIX locale, spaces and tabs). If you intended to split on spaces only, you must use a regular expression bracket expression: awk -F '[ ]'.

grep $var is vulnerable to regular expression metacharacters in the first column of the data. This may or may not be a concern, depending on what the real data in that first column looks like. Regardless, fixed string matching, -F, is the correct approach. You do not want to treat the contents of that first column as a regular expression that can match strings that are not exactly identical to itself. You want to treat that column literally. Consider it a bonus that fixed string matching is also simpler and faster.


Quote:
Originally Posted by jayan_jay
Code:
$ sort -nr infile | nawk '!x[$1]++'
b v3.0
a v1.2
$

That is performing a lexicographical sort. See the previous example's critique for why a lexicographical sort is inappropriate.

Why? sort -nr will look for a numeric string at the very beginning of the line. If it doesn't find one, it will behave as if it read a zero. Since the first field in the sample data is alphabetic, all lines will numerically compare as equal to zero and to each other, which then requires sort to break that tie by performing a lexicographical sort on the entire line. Long story short, -n may as well not have been specified; given the sample data (and any data set where the first non-blank sequence is not a valid numeric string), sort -nr and sort -r give identical results.

Regards,
Alister

Last edited by alister; 06-29-2012 at 08:54 AM..
These 3 Users Gave Thanks to alister For This Post:
# 9  
Old 06-29-2012
Is there any problem in this solution, alister?

Code:
awk -F"[ v.]" '$3+0==vleft[$1]+0 && $4"">vright[$1]"" {vright[$1]=$4} $3+0>vleft[$1]+0 {vleft[$1]=$3;vright[$1]=$4} END {for(i in vleft) print i" v"vleft[i]"."vright[i]}' inputfile


Last edited by elixir_sinari; 06-29-2012 at 09:46 AM..
# 10  
Old 06-29-2012
I didn't noticed that tricky difference between -F " " and -F "[ ]"

Thanks, for pointing this!
# 11  
Old 06-29-2012
Quote:
Originally Posted by elixir_sinari
Is there any problem in this solution, alister?

Code:
awk -F"[ v.]" '$3+0==vleft[$1]+0 && $4"">vright[$1]"" {vright[$1]=$4}
$3+0>vleft[$1]+0 {vleft[$1]=$3;vright[$1]=$4} END {for(i in vleft) print i" v"vleft[i]"."vright[i]}' inputfile

Hi, elixir:

Is it my imagination, or did you change +0 to "" in the highlighted expression? (I had looked at it earlier and thought it looked good, but was sidetracked and did not post.)

That string comparison is incorrect. The +0 variant, forcing numeric comparison of $4, should work correctly.

Regards,
Alister
# 12  
Old 06-29-2012
Hi, elixir_sinar.
Quote:
Originally Posted by elixir_sinari
Is there any problem in this solution, alister?

Code:
awk -F"[ v.]" '$3+0==vleft[$1]+0 && $4"">vright[$1]"" {vright[$1]=$4} $3+0>vleft[$1]+0 {vleft[$1]=$3;vright[$1]=$4} END {for(i in vleft) print i" v"vleft[i]"."vright[i]}' inputfile

Well, let us try it:
Code:
#!/usr/bin/env bash

# @(#) user2	Demonstrate proposed awk solution for "version" strings.

pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C awk

FILE=${1-data1}
pl " Input data $FILE:"
cat $FILE

pl " Results:"
# awk -F"[ v]" 'a[$1]+0<$3+0{a[$1]=$3} END{for(i in a) print i" v"a[i]}' $FILE
awk -F"[ v.]" '$3+0==vleft[$1]+0 && $4"">vright[$1]"" {vright[$1]=$4} $3+0>vleft[$1]+0 {vleft[$1]=$3;vright[$1]=$4} END {for(i in vleft) print i" v"vleft[i]"."vright[i]}' $FILE

exit 0

producing:
Code:
% ./user2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
awk GNU Awk 3.1.5

-----
 Input data data1:
d v4.0.9
d v4.0.10
a v1.0
a v1.1
a v1.2
b v2.1
b v2.2
b v2.21
b v3.0
c v3.10
c v3.9

-----
 Results:
a v1.2
b v3.0
c v3.9
d v4.0

That does not look correct to me. Did I paste your code faithfully? Are your results different?

Best wishes ... cheers, drl

---------- Post updated at 09:15 ---------- Previous update was at 09:08 ----------

Hi.

With alister's correction:
Code:
#!/usr/bin/env bash

# @(#) user2	Demonstrate proposed awk solution for "version" strings.

pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C awk

FILE=${1-data1}
pl " Input data $FILE:"
cat $FILE

pl " Results:"
# awk -F"[ v]" 'a[$1]+0<$3+0{a[$1]=$3} END{for(i in a) print i" v"a[i]}' $FILE
# awk -F"[ v.]" '$3+0==vleft[$1]+0 && $4"">vright[$1]"" {vright[$1]=$4} $3+0>vleft[$1]+0 {vleft[$1]=$3;vright[$1]=$4} END {for(i in vleft) print i" v"vleft[i]"."vright[i]}' $FILE
awk -F"[ v.]" '$3+0==vleft[$1]+0 && $4+0>vright[$1]+0 {vright[$1]=$4} $3+0>vleft[$1]+0 {vleft[$1]=$3;vright[$1]=$4} END {for(i in vleft) print i" v"vleft[i]"."vright[i]}' $FILE

exit 0

producing:
Code:
% ./user2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
awk GNU Awk 3.1.5

-----
 Input data data1:
d v4.0.9
d v4.0.10
a v1.0
a v1.1
a v1.2
b v2.1
b v2.2
b v2.21
b v3.0
c v3.10
c v3.9

-----
 Results:
a v1.2
b v3.0
c v3.10
d v4.0

Seems OK for 2-part version strings ... cheers, drl
# 13  
Old 06-29-2012
If your infile is already sorted, maybe you can give a try with:

Code:
tail -r infile | nawk '!x[$1]++' | sort

or
Code:
tac infile | nawk '!x[$1]++' | sort


Last edited by ctsgnb; 06-29-2012 at 11:57 AM..
# 14  
Old 06-29-2012
Quote:
Originally Posted by alister
Hi, elixir:

Is it my imagination, or did you change +0 to "" in the highlighted expression? (I had looked at it earlier and thought it looked good, but was sidetracked and did not post.)
Yes, I had used numeric comparison earlier but then happened to cook up some pretty unusual (and unrealistic) input data which made me change it to a string comparison...Smilie

The numeric comparisons should work in this case as you said...
Login or Register to Ask a Question

Previous Thread | Next Thread

3 More Discussions You Might Find Interesting

1. What is on Your Mind?

Anybody want to talk about Dirty Cow?

Hi All, How worried is everyone about the Dirty Cow Linux exploit? Has anybody experienced attacks yet? From the research I've done it seems that the exploit is "reliable" (that is it works nearly every time on vulverable systems) which is not good news. We all believe that Unix/Linux... (3 Replies)
Discussion started by: hicksd8
3 Replies

2. UNIX for Advanced & Expert Users

Superblock marked dirty

Good morning! I met a problem on a FS with AIX 5.3 It's not possible to mount the FS because of a dirty superblock. I tried few things without success. I need your help to solve my problem guys. Do you have any idea please? Thanks a lot drp01,/home/root # mount /GSPRES/data Replaying... (9 Replies)
Discussion started by: Castelior
9 Replies

3. Shell Programming and Scripting

Quick-and-dirty g++ compilation

I am creating a small bash file that will take one argument and compile it: #!/bin/bash OUT=${$1%.cpp} # C++ source files always end in .cpp g++ -Wall $1 -o $OUT chmod 777 $OUT The error message says 'bad substitution', namely where OUT is defined. How to fix this? (1 Reply)
Discussion started by: figaro
1 Replies
Login or Register to Ask a Question