Using Awk for extracting data in specific format


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using Awk for extracting data in specific format
# 1  
Old 07-21-2010
Using Awk for extracting data in specific format

please help me writing a awk script

Code:
001_r.pdb 0.0265185
001_r.pdb 0.0437049
001_r.pdb 0.0240642
001_r.pdb 0.0310264
001_r.pdb 0.0200482
001_r.pdb 0.0146746
001_r.pdb 0.0351344
001_r.pdb 0.0347856
001_r.pdb 0.036119
001_r.pdb 1.49
002_r.pdb 0.0281011
002_r.pdb 0.0319908
002_r.pdb 0.0516021
002_r.pdb 0.0440953
002_r.pdb 0.0357756
002_r.pdb 0.0289215
002_r.pdb 0.0335896
002_r.pdb 0.0503094
002_r.pdb 1.46839
007_r.pdb 0.0582815
007_r.pdb 0.0738922
007_r.pdb 0.0524815
007_r.pdb 0.0436297
007_r.pdb 0.0476785
007_r.pdb 0.0344794
007_r.pdb 0.0715756
007_r.pdb 1.47235
014_r.pdb 0.0238086
014_r.pdb 0.0410284
014_r.pdb 0.03811
014_r.pdb 0.0343461
014_r.pdb 0.0496776
014_r.pdb 0.0308409
014_r.pdb 1.47679
015_r.pdb 0.036504
015_r.pdb 0.039139
015_r.pdb 0.0505177
015_r.pdb 0.0601075
015_r.pdb 0.0290934
015_r.pdb 1.4956
018_r.pdb 0.00923608
018_r.pdb 0.0506758
018_r.pdb 0.0412613
018_r.pdb 0.0443338
018_r.pdb 1.50705
020_r.pdb 0.0447592
020_r.pdb 0.0346336
020_r.pdb 0.0444563
020_r.pdb 1.50034
027_r.pdb 0.0279227
027_r.pdb 0.0331829
027_r.pdb 1.47212
034_r.pdb 0.0468688
034_r.pdb 1.48727
046_r.pdb 1.49224


the output i wanted
001_r.pdb 0.0265185 0.0437049 0.0240642 0.0310264 0.0200482 0.0146746 0.0351344 0.0347856 0.036119 1.49
002_r.pdb 0.0281011 0.0319908 0.0516021 0.0440953 0.0357756 0.0289215 0.0335896 0.0503094 1.46839
.....
...

..so on..
# 2  
Old 07-21-2010
Code:
awk ' { a[$1] = a[$1] == "" ? $2 : a[$1] " " $2 } END { for ( i in a ) { print i " " a[i] } } ' file | sort

This User Gave Thanks to anbu23 For This Post:
# 3  
Old 07-21-2010
It works brilliantly but please please can you explain me the logic please i m basic beginner in Awk
# 4  
Old 07-21-2010
a[$1] == "" ? $2 : a[$1] " " $2 If array with first field as index is empty then assign second field to the array. Else append second field to whatever there in array.

Lets take this input and see how code works.

Quote:
001_r.pdb 0.0265185
001_r.pdb 0.0437049
a["001_r.pdb"] is empty now. So
a["001_r.pdb"] = "0.0265185"

Now a["001_r.pdb"] is not empty
a["001_r.pdb"] = a["001_r.pdb"] + " " + "0.0437049"
= "0.0265185" + " " + "0.0437049"
= "0.0265185 0.0437049"

for ( i in a ) { print i " " a[i] } Loop thro array and print all the index and the value stored in array for that index
# 5  
Old 07-21-2010
Code:
#!/bin/bash
# bash 4.0

declare -A dict
while read -r LINE
do
 set -- $LINE
 dict[$1]+=$2
done < "file"
for i in ${!dict[@]}
do
  echo "$i - ${dict[$i]}"
done | sort -n

# 6  
Old 07-22-2010
Code:
sort -n urfile |awk '! a[$1] {a[$1]=1; printf RS $1 FS } {printf $2 FS}'

001_r.pdb 0.0146746 0.0200482 0.0240642 0.0265185 0.0310264 0.0347856 0.0351344 0.036119 0.0437049 1.49
002_r.pdb 0.0281011 0.0289215 0.0319908 0.0335896 0.0357756 0.0440953 0.0503094 0.0516021 1.46839
007_r.pdb 0.0344794 0.0436297 0.0476785 0.0524815 0.0582815 0.0715756 0.0738922 1.47235
014_r.pdb 0.0238086 0.0308409 0.0343461 0.03811 0.0410284 0.0496776 1.47679
015_r.pdb 0.0290934 0.036504 0.039139 0.0505177 0.0601075 1.4956
018_r.pdb 0.00923608 0.0412613 0.0443338 0.0506758 1.50705
020_r.pdb 0.0346336 0.0444563 0.0447592 1.50034
027_r.pdb 0.0279227 0.0331829 1.47212
034_r.pdb 0.0468688 1.48727
046_r.pdb 1.49224

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting data from specific rows and columns from multiple csv files

I have a series of csv files in the following format eg file1 Experiment Name,XYZ_07/28/15, Specimen Name,Specimen_001, Tube Name, Control, Record Date,7/28/2015 14:50, $OP,XYZYZ, GUID,abc, Population,#Events,%Parent All Events,10500, P1,10071,95.9 Early Apoptosis,1113,11.1 Late... (6 Replies)
Discussion started by: pawannoel
6 Replies

2. Shell Programming and Scripting

Converting text files to xls through awk script for specific data format

Dear Friends, I am in urgent need for awk/sed/sh script for converting a specific data format (.txt) to .xls. The input is as follows: >gi|1234|ref| Query = 1 - 65, Target = 1677 - 1733 Score = 8.38, E = 0.6529, P = 0.0001513, GC = 46 fd sdfsdfsdfsdf fsdfdsfdfdfdfdfdf... (6 Replies)
Discussion started by: Amit1
6 Replies

3. UNIX for Dummies Questions & Answers

Extracting data between specific lines, multiple times

I need help extracting specific lines in a text file. The file looks like this: POSITION TOTAL-FORCE (eV/Angst) ----------------------------------------------------------------------------------- 1.86126 1.86973 1.86972 ... (14 Replies)
Discussion started by: captainalright
14 Replies

4. UNIX for Advanced & Expert Users

Extracting specific lines from data file

Hello, Is there a quick awk one-liner for this extraction?: file1 49389 text55 52211 text66 file2 59302 text1 49389 text2 85939 text3 52211 text4 13948 text5 Desired output 49389 text2 52211 text4 Thanks!! (5 Replies)
Discussion started by: palex
5 Replies

5. Shell Programming and Scripting

Extracting content from a file in specific format

Hi All, I have the file in this format **** Results Data **** Time or Step 1 2 20 0.000000000e+00 0s 0s 0s 1.024000000e+00 Us 0s 0s 1.100000000e+00 1s 0s 0s 1.100000001e+00 1s 0s 1s 2.024000000e+00 Us Us 1s 2.024000001e+00 ... (7 Replies)
Discussion started by: diehard
7 Replies

6. UNIX for Dummies Questions & Answers

Filtering data -extracting specific lines

I have a table to data which one of the columns include string of text from within that, I am searching to include few lines but not others for example I want to to include some combination of word address such as (address.| address? |the address | your address) but not (ip address | email... (17 Replies)
Discussion started by: A-V
17 Replies

7. Shell Programming and Scripting

Need help with extracting data to MySQL format

Hi guys, I'm doing a project now and extracting tables from a webpage to MySQL table format. I dumped the webpage with lynx and it is like this id Spec 524543 Developed especially for seniors Spec No Java Spec Yes Java MIDP Spec ... (4 Replies)
Discussion started by: Johanni
4 Replies

8. Shell Programming and Scripting

Extracting specific lines of data from a file and related lines of data based on a grep value range?

Hi, I have one file, say file 1, that has data like below where 19900107 is the date, 19900107 12 144 129 0.7380047 19900108 12 168 129 0.3149017 19900109 12 192 129 3.2766666E-02 ... (3 Replies)
Discussion started by: Wynner
3 Replies

9. Shell Programming and Scripting

Assigning a specific format to a specific column in a text file using awk and printf

Hi, I have the following text file: 8 T1mapping_flip02 ok 128 108 30 1 665000-000008-000001.dcm 9 T1mapping_flip05 ok 128 108 30 1 665000-000009-000001.dcm 10 T1mapping_flip10 ok 128 108 30 1 665000-000010-000001.dcm 11 T1mapping_flip15 ok 128 108 30... (2 Replies)
Discussion started by: goodbenito
2 Replies

10. Shell Programming and Scripting

manipulate data with specific format

Hi everybody: I have a problem with how I have to manipulate the data which have specific format like this: 249. 0.30727021E+05 0.30601627E+05 0.37470780E-01 -0.44745335E+02 0.82674536E+03 248. 0.30428182E+05 0.30302787E+05 0.40564921E-01 -0.45210293E+02 ... (5 Replies)
Discussion started by: tonet
5 Replies
Login or Register to Ask a Question