Sponsored Content
Top Forums Shell Programming and Scripting Need to Preprocess a text file and convert into csv Post 302952698 by ajayram on Friday 21st of August 2015 02:57:11 AM
Old 08-21-2015
Need to Preprocess a text file and convert into csv

Hello,

I was working with Machine learning and would like to apply my regression algorithms on binary classification datasets.

So I came across this adult dataset, LIBSVM Data: Classification (Binary Class)

It is a binary dataset, features have values only 1 and 0.

and I wanted to download and use it,. However it is not in CSV format.
It is in this format

Code:
-1 5:1 7:1 14:1 19:1 39:1 40:1 51:1 63:1 67:1 73:1 74:1 76:1 78:1 83:1 
-1 3:1 6:1 17:1 22:1 36:1 41:1 53:1 64:1 67:1 73:1 74:1 76:1 80:1 83:1 
-1 5:1 6:1 17:1 21:1 35:1 40:1 53:1 63:1 71:1 73:1 74:1 76:1 80:1 83:1 
-1 2:1 6:1 18:1 19:1 39:1 40:1 52:1 61:1 71:1 72:1 74:1 76:1 80:1 95:1 
-1 3:1 6:1 18:1 29:1 39:1 40:1 51:1 61:1 67:1 72:1 74:1 76:1 80:1 83:1 
-1 4:1 6:1 16:1 26:1 35:1 45:1 49:1 64:1 71:1 72:1 74:1 76:1 78:1 101:1 
+1 5:1 7:1 17:1 22:1 36:1 40:1 51:1 63:1 67:1 73:1 74:1 76:1 81:1 83:1 
+1 2:1 6:1 14:1 29:1 39:1 42:1 52:1 64:1 67:1 72:1 75:1 76:1 82:1 83:1 
+1 4:1 6:1 16:1 19:1 39:1 40:1 51:1 63:1 67:1 73:1 75:1 76:1 80:1 83:1 
+1 3:1 6:1 18:1 20:1 37:1 40:1 51:1 63:1 71:1 73:1 74:1 76:1 82:1 83:1 
+1 2:1 11:1 15:1 19:1 39:1 40:1 52:1 63:1 68:1 73:1 74:1 76:1 80:1 90:1

so the first line is the class variable, and the remaining part the row
indicates which columns are 1..

How do I convert this to a csv where the columns which are 0 also come ?
like for this input row -1 5:1 7:1 14:1, i should get this output row
Code:
-1 0 0 0 0 1 0 1 0 0 0 0 0 0 1

Maybe a shell script with some awk programming would be needed.

Can someone help me out?

Last edited by Don Cragun; 08-21-2015 at 04:17 AM.. Reason: Add CODE and ICODE tags.
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

how to convert text/csv to excel

Hello All, I have a sql report with 50 columns and 1000 rows result in a file ( txt / csv). is there is any way that we can move them to excel in KSH. Thanks, Sateesh (7 Replies)
Discussion started by: kotasateesh
7 Replies

2. Programming

convert text file to csv

hi all, i have a select query that gives me the output in the following way... SYSTYPE -------------------------------------------------------------------------------- Success Failures Total RFT ---------- ---------- ---------- ---------- TYP 1 0 ... (3 Replies)
Discussion started by: sais
3 Replies

3. Programming

awk script to convert a text file into csv format

hi...... thanks for allowing me to start a discussion i am collecting usb usage details of all users and convert it into csv files so that i can export it into some database.. the input text file is as follows:- USB History Dump by nabiy (c)2008 (1) --- Kingston DataTraveler 130 USB... (2 Replies)
Discussion started by: certteam
2 Replies

4. Shell Programming and Scripting

Perl program to convert PDF to text/CSV

Please suggest ways to easily convert pdf to text in perl only on windows (no other tools can be downloaded) Here is what I have been doing : using a module CAM::PDF to extract data. But it shows everything in messy format :wall: But this module is the only one working with the pdf... (0 Replies)
Discussion started by: chakrapani
0 Replies

5. Shell Programming and Scripting

Convert text to CSV

Hi Gurus I need urgent help to convert a flat log file into csv format to load into database. Log looks like: a=1 b=2 c=3 a=4 b=5 c=6 Only the values at right side of = will come into csv and it should create a new line once it receives "a" field. (8 Replies)
Discussion started by: sandipjee
8 Replies

6. Shell Programming and Scripting

Awk to convert a text file to CSV file with some string manipulation

Hi , I have a simple text file with contents as below: 12345678900 971,76 4234560890 22345678900 5971,72 5234560990 32345678900 71,12 6234560190 the new csv-file should be like: Column1;Column2;Column3;Column4;Column5 123456;78900;971,76;423456;0890... (9 Replies)
Discussion started by: FreddyDaKing
9 Replies

7. Shell Programming and Scripting

Trying extract from text file and convert csv

I want to extract IP address, system ID and engine IDs of this file ( marked in red) and put in a csv. E.g. 1.1.1.1, SYSTEMID, 000012345678981123548912 I get these file by running an expect script from solaris. Here is the text file output of my expect script. working on 1.1.1.1 SNMP... (5 Replies)
Discussion started by: pbshillong
5 Replies

8. Shell Programming and Scripting

How to convert excel file to csv file or text file?

Hi all, I need to find a way to convert excel file into csv or a text file in linux command. The reason is I have hundreds of files to convert. Another complication is the I need to delete the first 5 lines of the excel file before conversion. so for instance input.xls description of... (6 Replies)
Discussion started by: johnkim0806
6 Replies

9. Shell Programming and Scripting

Read csv file, convert the data and make one text file in UNIX shell scripting

I have input data looks like this which is a part of a csv file 7,1265,76548,"0102:04" 8,1266,76545,"0112:04" I need to make the output data should look like this and the output data will be part of text file: 7|1265000 |7654899 |A| 8|12660000 |76545999 |B| The logic behind the... (6 Replies)
Discussion started by: RJG
6 Replies

10. Shell Programming and Scripting

Convert text to csv

Hi, Is there somebody there to post an idea on how to convert this 5 liner row to 1 liner or tab delimiter to be import to database. Here the text file format: Description: Description1 Link: https://www.google.com Date: June 2, 2018 Time: 00:07:44 Age: 1 days ago Description:... (2 Replies)
Discussion started by: lxdorney
2 Replies
H5TOVTK(1)							      h5utils								H5TOVTK(1)

NAME
h5tovtk - convert datasets in HDF5 files to VTK format SYNOPSIS
h5tovtk [OPTION]... [HDF5FILE]... DESCRIPTION
h5tovtk is a program to generate VTK data files from multidimensional datasets in HDF5 files. VTK, the Visualization ToolKit, is an open- source, freely available software system for 3D computer graphics, image processing, and visualization. VTK itself is a programming library, but it is also the basis for a number of end-user graphical visualization programs. HDF5 is a free, portable binary format and supporting library developed by the National Center for Supercomputing Applications at the Uni- versity of Illinois in Urbana-Champaign. A single h5 file can contain multiple datasets; by default, h5tovtk takes the first dataset, but this can be changed via the -d option, or by using the syntax HDF5FILE:DATASET. 1d/2d/3d datasets are converted into 3d VTK datasets. Normally, a single scalar VTK dataset is output, but vectors and fields can be out- put via the -o option below. A typical invocation is of the form 'h5tovtk foo.h5', which will output a VTK data file foo.vtk from the data in foo.h5. OPTIONS
-h Display help on the command-line options and usage. -V Print the version number and copyright info for h5tovtk. -v Verbose output. -o file Save all the input datasets to a single VTK file. If there is only one dataset, it is output to a VTK scalar dataset; if there are three datasets, they are output as a VTK vector dataset; all other numbers of datasets are combined into a VTK field dataset. Otherwise, the default behavior is to save each dataset to a separate VTK file, with the .h5 suffix of the input filename replaced by .vtk in the output filename. Only three-dimensional datasets may be written to the VTK file. If you have a four (or more) dimensional data set, then you must take a three-dimensional "slice" of the multi-dimensional data. To do this, you specify coordinates in one (or more) slice dimen- sion(s), via the -xyzt options. -1, -2, -4 Use 1 , 2, or 4 bytes to store each data point in the output file. Fewer bytes require less storage and memory, but will decrease the resolution in the values. -1 will break up the data values into one of 256 possible values (on a linear scale from the minimum to the maximum value in your data), -2 will allow 65536 possible values, and -4 (the default) will use 4-byte floating-point numbers for an "exact" representation. -a Output in ASCII format; otherwise, VTK's more compact, but less readable and somewhat less portable binary format is used. -n For binary output (see -a above), by default the data is written in bigendian byte order, which is normally the order that VTK expects. However, some external tools and a few VTK classes use the native byte ordering instead (which may not be bigendian), and the -n option causes h5tovtk to output binary data in the native ordering. -m min, -M max When -1 or -2 are used, the input data are converted to a linear integer scale. Normally, the bottom and top of this scale corre- spond to the minimum and maximum values in the data. Using the -m and -M options, you can make the bottom and top of the scale cor- respond to min and max instead, respectively. Data values below or above this range will be treated as if they were min or max respectively. See also the -Z option. -Z For -1 or -2 output, center the linear integer scale on the value zero in the data. -r Invert the output values (map the minimum to the maximum and vice versa). -x ix, -y iy, -z iz, -t it This tells h5tovtk to use a particular slice of a multi-dimensional dataset. e.g. -x uses the subset (with one less dimension) at an x index of ix (where the indices run from zero to one less than the maximum index in that direction). Here, x/y/z correspond to the first/second/third dimensions of the HDF5 dataset. The -t option specifies a slice in the last dimension, whichever that might be. See also the -0 option to shift the origin of the x/y/z slice coordinates to the dataset center. -0 Shift the origin of the x/y/z slice coordinates to the dataset center, so that e.g. -0 -x 0 (or more compactly -0x0) returns the central x plane of the dataset instead of the edge x plane. (-t coordinates are not affected.) -d name Use dataset name from the input files; otherwise, the first dataset from each file is used. Alternatively, use the syntax HDF5FILE:DATASET, which allows you to specify a different dataset for each file. You can use the h5ls command (included with hdf5) to find the names of datasets within a file. BUGS
Send bug reports to S. G. Johnson, stevenj@alum.mit.edu. AUTHORS
Written by Steven G. Johnson. Copyright (c) 2005 by the Massachusetts Institute of Technology. h5utils March 9, 2002 H5TOVTK(1)
All times are GMT -4. The time now is 08:20 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy