Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Converting unstructured data to structured data Post 302975060 by stomp on Tuesday 7th of June 2016 08:14:08 PM
Old 06-07-2016
A little bit playin with ruby....

Code:
#!/usr/bin/env ruby 

# call it with: ./thisfile.rb data.txt

word  = '[^"]*'
block = '{[^}]+}'

File.open(ARGV[0]).read.scan(/#{block}/m).map{ |current_block|
        puts current_block.scan(/^\s*"(#{word})"/m).map{|key|key[0]}.join(",") + "\n" +
             current_block.scan(/^\s*"#{word}" : ("#{word}")/m).map{|val|val[0]}.join(",")  
        }


Last edited by stomp; 06-07-2016 at 11:39 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Converting HTML data into a spreadsheet

Hi, I have a perl script that prints some data in the form of a table (HTML table) Now, I want to be able to convert this data into a report on an Excel sheet. How can I do this? Regards, Garric (4 Replies)
Discussion started by: garric
4 Replies

2. UNIX for Dummies Questions & Answers

converting a tabular format data to comma seperated data in KSH

Hi, Could anyone help me in changing a tabular format output to comma seperated file pls in K-sh. Its very urgent. E.g : username empid ------------------------ sri 123 to username,empid sri,123 Thanks, Hema:confused: (2 Replies)
Discussion started by: Hemamalini
2 Replies

3. Shell Programming and Scripting

Extract data based on match against one column data from a long list data

My input file: data_5 Ali 422 2.00E-45 102/253 140/253 24 data_3 Abu 202 60.00E-45 12/23 140/23 28 data_1 Ahmad 256 7.00E-45 120/235 140/235 22 data_4 Aman 365 8.00E-45 15/65 140/65 20 data_10 Jones 869 9.00E-45 65/253 140/253 18... (12 Replies)
Discussion started by: patrick87
12 Replies

4. Shell Programming and Scripting

Help converting row data to columns

I've been trying to figure this out for a while but I'm completely stumped. I have files with data in rows and I need to convert the data to columns. Each record contains four rows with a "field name: value" pair. I would like to convert it to four columns with the field names as column headers... (5 Replies)
Discussion started by: happy_ee
5 Replies

5. Shell Programming and Scripting

Help with Converting UTF-8 data to Unicode

How can I get an error when converting 3rd line, since it has invalid characters abcde a®cdée a�cd� Unicode for ® = ® é = é I used "iconv -f UTF-8 -t ISO-8859-15 in.txt > out.txt" (2 Replies)
Discussion started by: arunbs
2 Replies

6. Shell Programming and Scripting

Converting variable space width data into CSV data in bash

Hi All, I was wondering how I can convert each line in an input file where fields are separated by variable width spaces into a CSV file. Below is the scenario what I am looking for. My Input data in inputfile.txt 19 15657 15685 Sr2dReader 107.88 105.51... (4 Replies)
Discussion started by: vharsha
4 Replies

7. Shell Programming and Scripting

[SOLVED] Converting data from one format to the other

Hi All, I need to convert an exel spreadsheet into a SAS dataset, and the following format change is needed. Please help, this is too complex for a biologist. Let me describe the input. 1st row is generation.1st column in keyword 'generation', starting 2nd column there are 5... (9 Replies)
Discussion started by: newbie83
9 Replies

8. Shell Programming and Scripting

[Solved] Converting the data into matrix with 0's and 1's

I have a file that contains 2 columns tag,pos cat input_file tag pos atg 10 ata 16 agt 15 agg 19 atg 17 agg 14 I have used following command to sort the file based on second column sort -k 2 input_file tag pos atg 10 agg 14 agt 15 ata 16 agg 19 atg 17 (2 Replies)
Discussion started by: raj_k
2 Replies

9. Shell Programming and Scripting

Converting data from specific columns

i have a file (csv or txt or anything which has 4 columns (id,name,number,location) and it contains data. i want to convert the data of specific columns like name to ooooo and number to 88888 matching the field length of that columns. for example if name column has anthony which is 7, it should... (2 Replies)
Discussion started by: prajaktaraut
2 Replies

10. UNIX for Beginners Questions & Answers

Data extraction and converting into .csv file.

Hi All, I have a data file and need to extract and convert it into csv format: 1) Read and extract the line containing string ending with "----" (file sample_linebyline.txt file) and to make a .csv file from this. 2) To read the flat file flatfile_sample.txt which consists of similar data (... (9 Replies)
Discussion started by: abhi_123
9 Replies
wdutil(1)						      General Commands Manual							 wdutil(1)

NAME
wdutil - manipulate Native Language I/O word dictionary SYNOPSIS
wdutil [ -c | -i[kcap][,dcap] | -jjfile ] file wdutil [ -pd[desig] | -pk[desig] ] file wdutil [ -sd[[+|-]val] | -sk[[+|-]val] ] file wdutil [ -ud | -uk | -ut ] file wdutil [ -d | -l ] file DESCRIPTION
wdutil is used to manipulate the word dictionary used by Native Language I/O for phrase and word conversion. The word dictionary consists of a key entries block and a data entries block. The key entries block holds the designations, and the data entries block holds the words corresponding to each designation. wdutil also functions as a filter for transforming a word dictionary to a text file, and vice versa. See the Text File section for the layout of a text file. wdutil recognizes one of the options below. If no option is specified and the file is a valid word dictionary, the capacity of the key and data entries blocks in the file is displayed. Otherwise, an error message is printed. The capacity of the key entries block determines the maximum number of designations. The capacity of the data entries block determines the maximum number of words. Options -c Condense the data entries block in the file to obtain a larger contiguous free area. If the format version of the file is old, it is updated. -i[kcap][,dcap] Initialize the file as a word dictionary which has key entries block capacity specified by kcap and data entries block capacity specified by dcap. If the file does not exist, it is created. The default values are 499 for kcap and 650 for dcap. -jjfile Join the dictionary jfile into the file. The capacity of the resulting file is the sum of the capacities of the original file and the jfile. -pk[desig] Display the designations in the order of their code value. If desig ends with *, designations starting with desig are printed. If desig is * or omitted, all designations in the file are printed. -pd[desig] Display the designations and their corresponding words and part of speeches. The string desig has the same format as in -pk. -sd[[+|-]val] Change the capacity of the data entries block in the file. If + or - precedes val, the current value is incremented or decremented by val. Otherwise, the capacity is changed to val. The default value for val is 650. -sk[[+|-]val] Change the capacity of the key entries block in the file. The number val has the same format as in -sd option. The default value for val is 499. -ud Display the capacity and usage of the data entries block, and the size of contiguous free area. -uk Display the capacity and usage of the key entries block. -ut Display the capacity and usage of both the key and data entries blocks, and the size of contiguous free area of the data entries block. -d Read a word dictionary, transform it into text form, and dump it to the standard output. If the word includes a character whose code is undefined in $LANG code set, its internal code is dumped in hexadecimal notation. -l Load the entry lines in text form from the standard input into the specified word dictionary. If specified dictionary exists, wdutil overwrites it with loaded entry lines; otherwise wdutil creates a new one containing them. If an entry line is invalid, it is rejected and an error message is displayed on the standard error. Text File Each entry line in the text file consists of the following fields terminated by . White space can be used as field separator. The 3rd field is effective only if LANG=japanese, japanese.euc, ja_JP.SJIS, or ja_JP.eucJP designation word hinshi(part of speech) designation Consists of up to sixteen characters excluding special characters. However, after being transformed by the -d option, all charac- ters in designation are 2-byte characters in a text file. word The word corresponding to designation consists of up to 50 bytes of multi-byte characters. The word may have hexadecimal notation instead of multi-byte characters. For example, the hexadecimal notation 'x7e7e' is recognized as a character whose internal code is 0x7e7e. hinshi Specify a part of speech which is one of noun, sa-hen verb, surname, personal name, and address. Filling conventions are FUTSU- UMEISHI(or simply MEISHI), SAHENDOUSHI(or simply SAHEN), SEI, MEI and CHIMEI in kanji character. If nothing is specified, wdutil sets it FUTSUUMEISHI automatically. EXTERNAL INFLUENCES
International Code Set Support Single byte and multibyte character code sets are supported. WARNINGS
The smallest prime number not smaller than the given value is used as the capacity of a key entries block. However, if the given value is smaller than 5, 5 is used. Voiced plosive or non_voiced plosive in a designation is counted as 1 character in a text file. User dictionaries with old format version are supported on HP-UX 10.0, but they will not be supported in the future. To update them, use -c option: $ wdutil -c file AUTHOR
wdutil was developed by HP. wdutil(1)
All times are GMT -4. The time now is 07:21 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy