[Solved] Data manipulation


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting [Solved] Data manipulation
# 1  
Old 03-10-2014
[Solved] Data manipulation

Hallo Team,

I need your help. I have a file that has two colums. See sample below:

Code:
105550	0.28
105550	0.24
125550	0.28
125550	0.24
215650	0.28
215650	0.24
315550	0.28
315550	0.24
335550	0.28
335550	0.24
40555	0.21
40555	0.17
415550	0.21
415550	0.17
43555	0.21
43555	0.17
45555	0.21
45555	0.17
46555	0.21
46555	0.17
47554	0.21
47554	0.17
100650	0.22
100650	0.18
102850	0.22
102850	0.18
120650	0.22
120650	0.18
122850	0.22
122850	0.18
130650	0.22

Name of the file is aaaa.csv and column1 (Prefix) and column2(Rate). As you can see abouve we have multiple duplicate Prefixes which have different rates. So what i would like to do is to have one prefix out of the multiple duplicate prefixes and one rate which the highest from the other rates. See below before and after of the output i would like to have.

Before:

Code:
105550	0.28
105550	0.24
125550	0.28
125550	0.24
215650	0.28
215650	0.24
315550	0.28
315550	0.24
335550	0.28
335550	0.24
40555	0.21
40555	0.17
415550	0.21
415550	0.17
43555	0.21
43555	0.17
45555	0.21
45555	0.17
46555	0.21
46555	0.17
47554	0.21
47554	0.17
100650	0.22
100650	0.18
102850	0.22
102850	0.18
120650	0.22
120650	0.18
122850	0.22
122850	0.18
130650	0.22

After
Code:
105550	0.28
215650	0.28
315550	0.28
40555	0.21
415550	0.21
43555	0.21
46555	0.21
47554	0.21
100650	0.22
120650	0.22
122850	0.22
130650	0.22

Moderator's Comments:
Mod Comment Please use regular code tags instead of inline code tags

Last edited by Scrutinizer; 03-10-2014 at 09:45 AM.. Reason: [icode] => [code] tags
# 2  
Old 03-10-2014
Code:
awk '{a[$1]=$2>a[$1]?$2:a[$1]}END{for(b in a){print b,a[b]}}'

# 3  
Old 03-10-2014
You want the highest field 2 value for each unique field 1? (If so, your output data appears to be missing a few rows, e.g. for 335550)

You could do something like:
Code:
awk '{ if ($2 > rates[$1]) { rates[$1]=$2 } } END { for (i in rates) { print i OFS rates[i] } }' aaaa.csv

Note: This does not preserve the ordering or the fixed-width formatting.
This User Gave Thanks to CarloM For This Post:
# 4  
Old 03-10-2014
Are some numbers are missing from your output?

Code:
$ sort -u -r -n <input file>
415550 0.21
335550 0.28
315550 0.28
215650 0.28
130650 0.22
125550 0.28
122850 0.22
120650 0.22
105550 0.28
102850 0.22
100650 0.22
47554 0.21
46555 0.21
45555 0.21
43555 0.21
40555 0.21

OR

Code:
$ sort -u -n <input file>
40555 0.21
43555 0.21
45555 0.21
46555 0.21
47554 0.21
100650 0.22
102850 0.22
105550 0.28
120650 0.22
122850 0.22
125550 0.28
130650 0.22
215650 0.28
315550 0.28
335550 0.28
415550 0.21

This User Gave Thanks to ni2 For This Post:
# 5  
Old 03-10-2014
Hallo Carlo and Lucas,

I did as you advised but i still get duplicates have a look below:

[paxk@util1-pkl ~]$ grep awk '{ if ($2 > rates[$1]) { rates[$1]=$2 } } END { for (i in rates) { print i OFS rates[i] } }' aaaa.csv > bbb.csv

[paxk@util1-pkl ~]$ grep 105550 bbb.csv
105550,0.28
105550,0.24


---------- Post updated at 03:11 PM ---------- Previous update was at 03:07 PM ----------

ni2 thank you it worked.
# 6  
Old 03-10-2014
Where did the commas come from?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Data manipulation, Please help..

Hello, I have a huge set of data that needs to be reformatted. Here is a simple example to explain the process. I have number n=5 and a input with many numbers separated with comma: ... (11 Replies)
Discussion started by: liuzhencc
11 Replies

2. UNIX for Dummies Questions & Answers

Data Manipulation

Dear Sir, I have file input RGR001|108.28|-2.86489|100-120|RANGGAR RGR002|108.071|-2.69028|80-100|RANNGAR RGR003|108.168|-2.97053|50-80|RANNGAR RGR007|108.192722222|-2.766138889|0-50|RANGGARI want to create files by joining each rows with each rows below Output as below ... (4 Replies)
Discussion started by: radius
4 Replies

3. UNIX for Dummies Questions & Answers

Data manipulation

Hallo Team, I need to manipulate existing data file. Have a look at current data and expected data: Current Data: 27873517141 27873540000 27873515109 27873517140 27873540001 27873540000 27873501343 27873540000 27873517140 27873511292 27873645989 27873540000 27873540000... (7 Replies)
Discussion started by: kekanap
7 Replies

4. Shell Programming and Scripting

[Solved] awk manipulation of sequentially named files

Hello, I am a very novice user of awk, I have a set of files named file001, file002, file003, file004, etc., each contains four fields (columns of data) separated each by a uneven number of spaces. I want to substitute those spaces by a TAB, so I am using this line of awk script: awk -v OFS="\t"... (4 Replies)
Discussion started by: jaldo0805
4 Replies

5. UNIX for Dummies Questions & Answers

[Solved] Text manipulation help

Hello Unix.com How can I sort from a large email list only the emails that finish with .ca domain? cat <list> | grep "\.ca\b" >> <new list> isnt working perfectly. Any tips? Best regards, Galford D. Weller (2 Replies)
Discussion started by: galford
2 Replies

6. UNIX for Dummies Questions & Answers

Script for data manipulation

Hi all! my first post here, so mods -- if this should ideally be in the scripts section, please move there. Thanks! I have data in the following format: key1:value1 key2:value2 key3:value3 A B C D key1:value4 key2:value5 key3:value6 A1 B1 key1: ... and so on I want an output... (2 Replies)
Discussion started by: gnat01
2 Replies

7. UNIX for Dummies Questions & Answers

[Solved] Column manipulation

Hi Everyone, I was wondering if someone could help me to transform my data into a format I need. Here is an example of what my data looks like E F G H A 1 2 3 4 B 5 6 7 8 C 9 1 2 3 D 4 5 6 7 and this is what I would need it to look like: AE 1 BE 5 CE 9 DE 4 AF 2 BF 6 CF 1 (6 Replies)
Discussion started by: zajtat
6 Replies

8. Shell Programming and Scripting

Data manipulation from one file

HI all i have a file consisting of following numbers 0000 0000 0000 0000 0000 1010 0000 0100 0000 0000 0000 1111 0000 1010 0000 0100 (3 Replies)
Discussion started by: vaibhavkorde
3 Replies

9. Shell Programming and Scripting

Tricky data manipulation...

Hi everyone.. I am new here, hello.. I hope this doesn't come across to you folks as a stupid question, I'm somewhat new to scripting :) I'm seeking some help in finding a way to manipulate data output for every two characters - example: numbers.lst contains the following output:... (3 Replies)
Discussion started by: explicit
3 Replies

10. UNIX for Dummies Questions & Answers

Data Manipulation

Hello I am currently having problems in mapulating a certain file which contains vaious data. Belos is a sample content Event=<3190> Client IP=<151.111.11.143> DNS=<abc.sbc.com> TransCount=<139> Client IP=<150.222.133.163> DNS=<xyz.yuu.com> TransCount=<3734> Event=<3120> Client... (11 Replies)
Discussion started by: khestoi
11 Replies
Login or Register to Ask a Question