Sponsored Content
Top Forums Shell Programming and Scripting Need to Preprocess a text file and convert into csv Post 302952698 by ajayram on Friday 21st of August 2015 02:57:11 AM
Old 08-21-2015
Need to Preprocess a text file and convert into csv

Hello,

I was working with Machine learning and would like to apply my regression algorithms on binary classification datasets.

So I came across this adult dataset, LIBSVM Data: Classification (Binary Class)

It is a binary dataset, features have values only 1 and 0.

and I wanted to download and use it,. However it is not in CSV format.
It is in this format

Code:
-1 5:1 7:1 14:1 19:1 39:1 40:1 51:1 63:1 67:1 73:1 74:1 76:1 78:1 83:1 
-1 3:1 6:1 17:1 22:1 36:1 41:1 53:1 64:1 67:1 73:1 74:1 76:1 80:1 83:1 
-1 5:1 6:1 17:1 21:1 35:1 40:1 53:1 63:1 71:1 73:1 74:1 76:1 80:1 83:1 
-1 2:1 6:1 18:1 19:1 39:1 40:1 52:1 61:1 71:1 72:1 74:1 76:1 80:1 95:1 
-1 3:1 6:1 18:1 29:1 39:1 40:1 51:1 61:1 67:1 72:1 74:1 76:1 80:1 83:1 
-1 4:1 6:1 16:1 26:1 35:1 45:1 49:1 64:1 71:1 72:1 74:1 76:1 78:1 101:1 
+1 5:1 7:1 17:1 22:1 36:1 40:1 51:1 63:1 67:1 73:1 74:1 76:1 81:1 83:1 
+1 2:1 6:1 14:1 29:1 39:1 42:1 52:1 64:1 67:1 72:1 75:1 76:1 82:1 83:1 
+1 4:1 6:1 16:1 19:1 39:1 40:1 51:1 63:1 67:1 73:1 75:1 76:1 80:1 83:1 
+1 3:1 6:1 18:1 20:1 37:1 40:1 51:1 63:1 71:1 73:1 74:1 76:1 82:1 83:1 
+1 2:1 11:1 15:1 19:1 39:1 40:1 52:1 63:1 68:1 73:1 74:1 76:1 80:1 90:1

so the first line is the class variable, and the remaining part the row
indicates which columns are 1..

How do I convert this to a csv where the columns which are 0 also come ?
like for this input row -1 5:1 7:1 14:1, i should get this output row
Code:
-1 0 0 0 0 1 0 1 0 0 0 0 0 0 1

Maybe a shell script with some awk programming would be needed.

Can someone help me out?

Last edited by Don Cragun; 08-21-2015 at 04:17 AM.. Reason: Add CODE and ICODE tags.
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

how to convert text/csv to excel

Hello All, I have a sql report with 50 columns and 1000 rows result in a file ( txt / csv). is there is any way that we can move them to excel in KSH. Thanks, Sateesh (7 Replies)
Discussion started by: kotasateesh
7 Replies

2. Programming

convert text file to csv

hi all, i have a select query that gives me the output in the following way... SYSTYPE -------------------------------------------------------------------------------- Success Failures Total RFT ---------- ---------- ---------- ---------- TYP 1 0 ... (3 Replies)
Discussion started by: sais
3 Replies

3. Programming

awk script to convert a text file into csv format

hi...... thanks for allowing me to start a discussion i am collecting usb usage details of all users and convert it into csv files so that i can export it into some database.. the input text file is as follows:- USB History Dump by nabiy (c)2008 (1) --- Kingston DataTraveler 130 USB... (2 Replies)
Discussion started by: certteam
2 Replies

4. Shell Programming and Scripting

Perl program to convert PDF to text/CSV

Please suggest ways to easily convert pdf to text in perl only on windows (no other tools can be downloaded) Here is what I have been doing : using a module CAM::PDF to extract data. But it shows everything in messy format :wall: But this module is the only one working with the pdf... (0 Replies)
Discussion started by: chakrapani
0 Replies

5. Shell Programming and Scripting

Convert text to CSV

Hi Gurus I need urgent help to convert a flat log file into csv format to load into database. Log looks like: a=1 b=2 c=3 a=4 b=5 c=6 Only the values at right side of = will come into csv and it should create a new line once it receives "a" field. (8 Replies)
Discussion started by: sandipjee
8 Replies

6. Shell Programming and Scripting

Awk to convert a text file to CSV file with some string manipulation

Hi , I have a simple text file with contents as below: 12345678900 971,76 4234560890 22345678900 5971,72 5234560990 32345678900 71,12 6234560190 the new csv-file should be like: Column1;Column2;Column3;Column4;Column5 123456;78900;971,76;423456;0890... (9 Replies)
Discussion started by: FreddyDaKing
9 Replies

7. Shell Programming and Scripting

Trying extract from text file and convert csv

I want to extract IP address, system ID and engine IDs of this file ( marked in red) and put in a csv. E.g. 1.1.1.1, SYSTEMID, 000012345678981123548912 I get these file by running an expect script from solaris. Here is the text file output of my expect script. working on 1.1.1.1 SNMP... (5 Replies)
Discussion started by: pbshillong
5 Replies

8. Shell Programming and Scripting

How to convert excel file to csv file or text file?

Hi all, I need to find a way to convert excel file into csv or a text file in linux command. The reason is I have hundreds of files to convert. Another complication is the I need to delete the first 5 lines of the excel file before conversion. so for instance input.xls description of... (6 Replies)
Discussion started by: johnkim0806
6 Replies

9. Shell Programming and Scripting

Read csv file, convert the data and make one text file in UNIX shell scripting

I have input data looks like this which is a part of a csv file 7,1265,76548,"0102:04" 8,1266,76545,"0112:04" I need to make the output data should look like this and the output data will be part of text file: 7|1265000 |7654899 |A| 8|12660000 |76545999 |B| The logic behind the... (6 Replies)
Discussion started by: RJG
6 Replies

10. Shell Programming and Scripting

Convert text to csv

Hi, Is there somebody there to post an idea on how to convert this 5 liner row to 1 liner or tab delimiter to be import to database. Here the text file format: Description: Description1 Link: https://www.google.com Date: June 2, 2018 Time: 00:07:44 Age: 1 days ago Description:... (2 Replies)
Discussion started by: lxdorney
2 Replies
svm-predict(1)							   User Manuals 						    svm-predict(1)

NAME
svm-predict - make predictions based on a trained SVM model file and test data SYNOPSIS
svm-predict [ -b probability_estimates ] [ -q ] test_data model_file [ output_file ] DESCRIPTION
svm-predict uses a Support Vector Machine specified by a given input model_file to make predictions for each of the samples in test_data The format of this file is identical to the training_data file used in svm_train(1) and is just a sparse vector as follows: <label> <index1>:<value1> <index2>:<value2> . . . . . . There is one sample per line. Each sample consists of a target value (label or regression target) followed by a sparse representation of the input vector. All unmentioned coordinates are assumed to be 0. For classification, <label> is an integer indicating the class label (multi-class is supported). For regression, <label> is the target value which can be any real number. For one-class SVM, it's not used so can be any number. Except using precomputed kernels (explained in another section), <index>:<value> gives a feature (attribute) value. <index> is an integer starting from 1 and <value> is a real number. Indices must be in an ASCENDING order. If you have label data avail- able for testing then you can enter these values in the test_data file. If they are not available you can just enter 0 and will not know real accuracy for the SVM directly, however you can still get the results of its prediction for the data point. If output_file is given, it will be used to specify the filename to store the predicted results, one per line, in the same order as the test_data file. OPTIONS
-b probability-estimates probability_estimates is a binary value indicating whether to calculate probability estimates when training the SVC or SVR model. Values are 0 or 1 and defaults to 0 for speed. -q quiet mode; suppress messages to stdout. FILES
training_set_file must be prepared in the following simple sparse training vector format: <label> <index1>:<value1> <index2>:<value2> . . . . . . There is one sample per line. Each sample consist of a target value (label or regression target) followed by a sparse representation of the input vector. All unmentioned coordinates are assumed to be 0. For classification, <label> is an integer indicating the class label (multi-class is supported). For regression, <label> is the target value which can be any real number. For one-class SVM, it's not used so can be any number. Except using precomputed kernels (explained in another section), <index>:<value> gives a feature (attribute) value. <index> is an integer starting from 1 and <value> is a real number. Indices must be in an ASCENDING order. ENVIRONMENT
No environment variables. DIAGNOSTICS
None documented; see Vapnik et al. BUGS
Please report bugs to the Debian BTS. AUTHOR
Chih-Chung Chang, Chih-Jen Lin <cjlin@csie.ntu.edu.tw>, Chen-Tse Tsai <ctse.tsai@gmail.com> (packaging) SEE ALSO
svm-train(1), svm-scale(1) Linux MAY 2006 svm-predict(1)
All times are GMT -4. The time now is 10:34 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy