Visit Our UNIX and Linux User Community

CSV File parse help in Perl

Thread Tools Search this Thread
Top Forums Shell Programming and Scripting CSV File parse help in Perl
# 1  
Old 03-29-2008
CSV File parse help in Perl


I have a bit of an issue trying to obtain some data from a csv file using PERL. I can sort the file and remove any duplicates leaving only 4 or 5 rows containing data. My problem is that the data contained in the original file contains a lot more columns and when I try ro run this script it finds that all the data is unique.

I have the following fields within the orignal file:

The data which I need to obtain and sort is within the op,probecard and tester_id fields.

How can I go about this?

The code that I use after manually deleting the fields that i do not require is as follows:

#!/usr/bin/perl -w
use strict;
my $csvfile = 'probecards.csv';
my $newfile = 'sorted.csv';
my $fieldnames = 1;
open (IN, "<$csvfile")  or die "Couldn't open input CSV file: $!";
open (OUT, ">$newfile") or die "Couldn't open output file: $!";
my $header;
$header = <IN> if $fieldnames;
my @data = sort <IN>;
print OUT $header;
my $n = 0;
my $lastline = '';
foreach my $currentline (@data) {
  next if $currentline eq $lastline;
  print OUT $currentline;
  $lastline = $currentline;
close IN; close OUT;
print "Processing complete. In = " . scalar @data . " records, Out = $n records\n";

Sample CSV File:
Original File:

12630M196	139	2660	S25E3N36	88BCRA	16/03/2008 12:05	IN01
12630M196	1	2660	S25E3N36	88BLBHD	16/03/2008 13:04	IN04
12630M196	508	2660	S25E3N36	88BCRA	16/03/2008 13:41	IN01
12630M196	437	2660	S25E3N36	88CLNHC	16/03/2008 14:18	IN01
12630M196	465	2660	S25E3N36	88BCRA	16/03/2008 15:34	IN02
12630M196	27	2660	S25E3N36	88BCRA	16/03/2008 18:00	IN01
12630M196	18	2660	S25E3N36	88BCRA	16/03/2008 19:03	IN03


2660	S25E3N21	IN04
2660	S25E3N27	IN02
2660	S25E3N36	IN01
2660	S25E3N39	IN03
2660	S25E3N40	IN04

Any pointers on how I could go about this would be greatly appreciated..


# 2  
Old 03-29-2008
That's not a CSV file, it means literally comma-separated values. Traditionally CVS also uses optional double quotes around the values, and then, depending on which wicked version of Excel it's supposed to be incompatible with, more or less insane conventions for values with double quotes in them. Newlines inside a field are also sometimes permitted in CSV.

Is the data you want always in fields 3, 4, and 7?

cut -f3,4,7 infile

If you require this in Perl, how about

perl -naF"\t" -e 'print (join ("\t", map { $F[$_] } (2, 3, 6)), "\n")'

# 3  
Old 03-29-2008
I guess the actual problem is that you are comparing the lines before throwing out the fields you don't want to include in the comparison. Before you print, copy the print value to a hash, and if the hash already contains the value, don't print. (Or take the Perl script I gave you and pipe to sort -u -- that's what I would do.)
# 4  
Old 03-29-2008
Where would I put the line in the code?
# 5  
Old 03-29-2008
cut -f3,4,7 infile | sort -u

That's the whole program.
# 6  
Old 03-29-2008
sorry to be annoying but how does the following perl code work?
perl -naF"\t" -e 'print (join ("\t", map { $F[$_] } (2, 3, 6)), "\n")'

If i inputed into my code would the { $F[$_] } part need to be made { $F[$csvfile] } ?

I am new to perl so its a learning experience for me Smilie


# 7  
Old 03-29-2008
That's a somewhat idiosyncratic script, I'm not sure which parts are suitable for a person of your age.

(2, 3, 6) is a list. Map takes the function you give it and applies it to each item on the list in turn, and returns the result as a new list. Within map, $_ refers to the current item. So the map returns $F[2] followed by $F[3] followed by $F[6]. The rest is just formatting.

Oh, the final crucial point is the option -a (mnemonic: more like awk) which says to split the input into @F and, -F "\t" which says to do that splitting on the tab character.

Last edited by era; 03-29-2008 at 03:19 PM..

Previous Thread | Next Thread
Test Your Knowledge in Computers #343
Difficulty: Medium
Scott McNealy, Andy Bechtolsheim, and Vinod Khosla, all Caltech graduate students, founded Sun Microsystems.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to parse this file using awk and output in CSV format?

My source file looks like this: Cust-Number = "101" Cust-Name="Joe" Cust-Town="London" Cust-hobby="tennis" Cust-purchase="200" Cust-Number = "102" Cust-Name="Mary" Cust-Town="Newyork" Cust-hobby="reading" Cust-purchase="125" Now I want to parse this file (leaving out hobby) and... (10 Replies)
Discussion started by: Balav
10 Replies

2. Shell Programming and Scripting

Consolidate several lines of a CSV file with firewall rules, in order to parse them easier?

Consolidate several lines of a CSV file with firewall rules Hi guys. I have a CSV file, which I created using an HTML export from a Check Point firewall policy. Each rule is represented as several lines, in some cases. That occurs when a rule has several address sources, destinations or... (4 Replies)
Discussion started by: starriol
4 Replies

3. UNIX for Dummies Questions & Answers

Help to parse csv file with shell script

Hello ! I am very aware that this is not the first time this question is asked here, because I have already read a lot of previous answers, but none of them worked, so... As said in the title, I want to read a csv file with a bash script. Here is a sample of the file: ... (4 Replies)
Discussion started by: Grhyll
4 Replies

4. Shell Programming and Scripting

Korn shell program to parse CSV text file and insert values into Oracle database

Enclosed is comma separated text file. I need to write a korn shell program that will parse the text file and insert the values into Oracle database. I need to write the korn shell program on Red Hat Enterprise Linux server. Oracle database is 10g. (15 Replies)
Discussion started by: shellguy
15 Replies

5. Shell Programming and Scripting

how to parse this file and obtain a .csv or .xls

Hello Expert, I have a file in the following format: SYNTAX_VERSION 5 MONITOR "NAME_TEMPLATES" DESCRIPTION "Monitors for contents of error " INTERVAL "1m" MONPROG " NAME_TEMPLATES" MAXTHRESHOLD GEN_BELOW_RESET SEVERITY Major ... (17 Replies)
Discussion started by: Ant-one
17 Replies

6. Shell Programming and Scripting

How to read and parse the content of csv file containing # as delimeter into fields using Bash?

#!/bin/bash i=0 cat 1.csv | while read fileline do echo "$fileline" IFS="#" flds=( $fileline ) nrofflds=${#flds} echo "noof fields$nrofflds" fld=0 while do echo "noof counter$fld" echo "$nrofflds" #fld1="${flds}" trying to store the content of line to fields but i... (4 Replies)
Discussion started by: barani75
4 Replies

7. Shell Programming and Scripting

Parse csv file

Hi, Our requirement is to parse the input file(.csv format). The each column in the file is delimited with comma. We need to take each column and apply some business validation rule. If data itself contains comma, then those fields are enclosed with double quotes ("). We can see this double... (7 Replies)
Discussion started by: vfrg
7 Replies

8. Shell Programming and Scripting

Parse XML file into CSV with shell?

Hi, It's been a few years since college when I did stuff like this all the time. Can someone help me figure out how to best tackle this problem? I need to parse a file full of entries that look like this: <eq action="A" sectyType="0" symbol="PGR" exch="CA" curr="VEF" sess="NORM"... (7 Replies)
Discussion started by: Pcushing
7 Replies

9. Shell Programming and Scripting

parse csv file, sha1 hash and output

I have a file, not really a csv, but containing delineated data just the same. Lets call that file "raw_data.txt". It contains data in the format of company name:fein number like this: first company name:123456789 second company name:987654321 what i need to do is read this file, apply... (11 Replies)
Discussion started by: FreddyG
11 Replies

10. UNIX for Advanced & Expert Users

How to Parse a CSV file into a Different Format

Hi I have a CSV file with me in this format Currency, USD, EUR, USD, 1.00, 1.32, EUR, 0.66, 1.00, How do I transpose the file to get to the format below. currency, currency, rate USD, USD, 1.00 USD, EUR, 1.32 EUR, USD, 0.66 EUR, EUR, 1.00 Thanks for your help We are using... (2 Replies)
Discussion started by: cdesiks
2 Replies

Featured Tech Videos