Changing format of file with awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Changing format of file with awk
# 1  
Old 09-11-2014
Changing format of file with awk

Hi all,
I have a file that looks like this:

Code:
Closest words to: manifesto >>>>
[(0.99999999999999978, 'manifesto'), (0.72008211381623111, 'communiqu\xe9'), (0.6942217252661308, 'manifestos'), (0.68892580417319915, 'pamphlet'), (0.68146378689894338, 'communique'), (0.66477336566612566, 'newssheet'), (0.65802727088954649, 'workplan'), (0.65534176275799949, 'counter-proposal'), (0.65430633850582132, 'credo'), (0.65313506395462273, 'report*')]

Closest words to: passport >>>>
[(1.0000000000000004, 'passport'), (0.82035608388470505, 'passports'), (0.74795707589520077, 'photocard'), (0.7029703031026393, 'visa'), (0.66463194673185344, 'certificate'), (0.65157805812927172, 'railcard'), (0.64138220956663572, 'chequebook'), (0.64021573915462227, 'payslip'), (0.63595253934734819, 'cis5'), (0.63233458893012662, 'carnet')]

and I want to reformat this with
Code:
awk

with the following desired result:

Code:
manifesto
0.99999999999999978, 'manifesto'
0.72008211381623111, 'communiqu\xe9'
0.6942217252661308, 'manifestos'
0.68892580417319915, 'pamphlet'
0.68146378689894338, 'communique'
0.66477336566612566, 'newssheet'
0.65802727088954649, 'workplan'
0.65534176275799949, 'counter-proposal'
0.65430633850582132, 'credo'
0.65313506395462273, 'report*'

passport
1.0000000000000004, 'passport'
0.82035608388470505, 'passports'
0.74795707589520077, 'photocard'
0.7029703031026393, 'visa'
0.66463194673185344, 'certificate'
0.65157805812927172, 'railcard'
0.64138220956663572, 'chequebook'
0.64021573915462227, 'payslip'
0.63595253934734819, 'cis5'
0.63233458893012662, 'carnet'

Can someone please let me know if this will be possible?
Thank you in advance.
# 2  
Old 09-11-2014
Dear owwow14,

I have a few to questions pose in response first:-
  • What have you tried so far?
  • What output/errors do you get?
  • What OS and version are you using?
  • You've said awk but would you consider alternatives?
  • What logical process have you considered? (to help steer us to follow what you are trying to achieve)
Most importantly, What have you tried so far?

There are probably many ways to achieve most tasks, so giving us an idea of your style and thoughts will help us guide you to an answer most suitable to you so you can adjust it to suit your needs in future.


We're all here to learn and getting the relevant information will help us all.


Kind regards,
Robin
# 3  
Old 09-11-2014
Quote:
Originally Posted by rbatte1
Dear owwow14,

I have a few to questions pose in response first:-
  • What have you tried so far?
  • What output/errors do you get?
  • What OS and version are you using?
  • You've said awk but would you consider alternatives?
  • What logical process have you considered? (to help steer us to follow what you are trying to achieve)
Most importantly, What have you tried so far?

There are probably many ways to achieve most tasks, so giving us an idea of your style and thoughts will help us guide you to an answer most suitable to you so you can adjust it to suit your needs in future.


We're all here to learn and getting the relevant information will help us all.

Kind regards,
Robin

Hi, Sorry for being cryptic, I have been struggling with this for some hours now.
To answer your questions.
  • I am have been using python and awk. The first approach was challenging for me, as the file is already in what would be considered a dictionary format. So, I do not know how to unnestle the information within the brackets. Also, there is a header information which is a label for the bracketted information, so I could not find a solution for this.
    For the second approach awk I have been trying to isolate each individual chunk of information (i.e. between blank lines). Then, delete the punctuation by replacing it with new lines to create the column format
  • I am using OSx 10.7.5
  • I am open to using a UNIX-type command or python

I hope I provided more information to my question.
Thank you again, in advance.
# 4  
Old 09-11-2014
Hello,

Following may help for the given input.

Code:
awk '/>>>>/ {print $(NF-1)} !/>>>>/ {gsub(/\)\, \(/,"\n",$0);gsub(/\[|\]|\(|\)/,X,$0);print $0}' filename

Output will be as follows.

Code:
manifesto
0.99999999999999978, 'manifesto'
0.72008211381623111, 'communiqu\xe9'
0.6942217252661308, 'manifestos'
0.68892580417319915, 'pamphlet'
0.68146378689894338, 'communique'
0.66477336566612566, 'newssheet'
0.65802727088954649, 'workplan'
0.65534176275799949, 'counter-proposal'
0.65430633850582132, 'credo'
0.65313506395462273, 'report*'
 
passport
1.0000000000000004, 'passport'
0.82035608388470505, 'passports'
0.74795707589520077, 'photocard'
0.7029703031026393, 'visa'
0.66463194673185344, 'certificate'
0.65157805812927172, 'railcard'
0.64138220956663572, 'chequebook'
0.64021573915462227, 'payslip'
0.63595253934734819, 'cis5'
0.63233458893012662, 'carnet'


Thanks,
R. Singh

Last edited by RavinderSingh13; 09-11-2014 at 08:26 AM..
# 5  
Old 09-11-2014
Hello RavinderSingh13,

This looks to be an excellent answer, but can you explain how it achieves the result so we can all learn?


Thanks, in advance,
Robin
# 6  
Old 09-11-2014
Posted by rbatte1:rbatte1
Quote:
Hello RavinderSingh13,

This looks to be an excellent answer, but can you explain how it achieves the result so we can all learn?

Thanks, in advance,
Robin
Hello Robin,

Here is the solution which I have tried for the given input by user.

Code:
awk '
/>>>>/ {print $(NF-1)}            #### Searching for string >>>> and then printing it's second last field.
!/>>>>/ {gsub(/\)\, \(/,"\n",$0); #### Now looking for text which is NOT having string >>>> and then replacing string ), ( (these 3 chars in group)with new line. ###
gsub(/\[|\]|\(|\)/,X,$0);         #### Now replaing ] [ and ( ) characters to NULL as they are not required in user's output. ###
print $0}'                        #### printing the line now


Thanks,
R. Singh

Last edited by rbatte1; 09-11-2014 at 09:39 AM.. Reason: Aligning comments to make it more readable.
This User Gave Thanks to RavinderSingh13 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Changing the file name format

Hello all, I am tryign to change the format of files (which are many in numbers). They at present are named like this: SomeProcess_M-130_100_1_3BR.root SomeProcess_M-130_101_2_3BX.root SomeProcess_M-130_103_3_3RY.root SomeProcess_M-130_105_1_3GH.root SomeProcess_M-130_99_1_3LF.root... (7 Replies)
Discussion started by: emily
7 Replies

2. Shell Programming and Scripting

Changing date format in CSV file

I have a CSV file with a date format like this; 11/19/2012 17:37:00,1.372,121.6 11/19/2012 17:38:00,0.743,121.6 Want to change the time stamp to seconds after 1970 so I can get the data in rrdtool. For anyone interested, this is data from a TED5000 unit and is Kwatts and volts. Needs to... (3 Replies)
Discussion started by: ottsm
3 Replies

3. Shell Programming and Scripting

Command for changing date format in a file

Hi... I have an inputfile name as :- abc_test_20120213.dat (date in yyyymmdd format) I need the output file name as abc_test_13022012.dat (date in ddmmyyyy format) Please help me on this... Thanks in advance. (5 Replies)
Discussion started by: gani_85
5 Replies

4. Shell Programming and Scripting

how I can add a constant to a field without changing the file format

Hi, I need to edit a file Protein Data Bank (pdb) and then open that file with the program VMD but when I edit the file with awk, it changes pdb format and the VMD program can not read it. I need to subtract 34 to field 6 ($ 6). this is a pdb file : ATOM 918 N GLY B 103 -11.855 8.675... (8 Replies)
Discussion started by: bio_
8 Replies

5. Shell Programming and Scripting

Changing file names with AWK

Dear All, I have some thousands of files in a folder and i need to change those file names without opening the file (no need to change anything in the file content, need to change the file name only). The filenames are as follows: Myfile_name.1_parameter Myfile_name.2_parameter... (6 Replies)
Discussion started by: Fredrick
6 Replies

6. Shell Programming and Scripting

Changing the text file format

Hi, I have a shell script to unload all the empname who have salary >50000 from the emp table into a text file(empname.txt) . m_db unload "$dbc_file" -column_delimiter ',' -select "SELECT empname FROM emp where salary > 50000" >> empname.txt Now my text file have data in the following format ... (3 Replies)
Discussion started by: kavithakuttyk
3 Replies

7. Shell Programming and Scripting

changing the format of CSV file

Hi Experts, Please help me to get the following from script for Unix ENvironment(shell, perl, tr, sed, awk). INPUT FILE: 20K,ME,592971 20K,YOU,2 20K,HE,1244998 50K,YOU,480110 50K,ME,17 50K,HIS,10 50K,HE,1370391 OUTPUT FILE: K,ME,YOU,HE,HIS 20K,592971,2,1244998,0... (5 Replies)
Discussion started by: ashis.tewari
5 Replies

8. Shell Programming and Scripting

changing month in Mmm format to mm FORMAT

i have an variable mydate=2008Nov07 i want o/p as in variable mymonth=11 (i.e nov comes on 11 number month) i want some command to do this for any month without using any loop. plz help me (1 Reply)
Discussion started by: RahulJoshi
1 Replies

9. Shell Programming and Scripting

AWK CSV to TXT format, TXT file not in a correct column format

HI guys, I have created a script to read 1 column in a csv file and then place it in text file. However, when i checked out the text file, it is not in a column format... Example: CSV file contains name,age aa,11 bb,22 cc,33 After using awk to get first column TXT file... (1 Reply)
Discussion started by: mdap
1 Replies

10. UNIX for Dummies Questions & Answers

Changing display and format of file

I have an input file which looks like this: 601 a 602 a 603 a 601 b 610 c 615 c 603 d 601 d 612 d I need the utput to look like this 601 a 602 603 602 a 601 603 603 a 601 602 601 b 610 c 615 615 c 610 603 d 601 612 (1 Reply)
Discussion started by: wahi80
1 Replies
Login or Register to Ask a Question