Need help with error from certain ASCII characters in a CSV file
7 answers - 2490 bytes -

Hello all,
I am using a script to parse a CSV file with approximately 65,000
records. Some of these records contain characters such as , , etc.
I can read and write lines containing these characters via a file
handle, however when I try and parse the line using the module
Class::CSV, it fails and returns the error:
Failed to parse line: <line it failed on>
My question, rather broadly, is how can I successfully handle these
characters? If I manually edit the file (via vim) and replace all the
'' characters with 'e', and the like, it works fine. However, I
would prefer the script actually handle the characters, or at least
have a automated way to strip them out, without having to enumerate
each character (i.e. =~ s//e/g;).
Does anyone have a suggestion for how I can handle this, or even where
I can look to solve this issue? Is there another possibility for
where the error is occurring that I am not seeing?
Any advice you may lend would be greatly appreciated.
Regards,
Roman
Code follows:
#!\perl\perl.exe
use warnings;
use strict;
use Class::CSV;
# Main program area
my $UPath = "U=EXTERNAL,DC=PRIV,DC=RED,DC=BG,DC=CRD,DC=RG,DC=VP ";
my $error_file = "csv_errors.csv";
if (-e $error_file)
{
open(ERRR, ">$error_file") or die "Cannot open error log file: $!\n";
close(ERRR);
}
my $csv_filename = $ARGV[0] or die "USAGE: csv_perl.pl FILENAME\n";
open(CSV, $csv_filename) or die "Cannot open $csv_filename: $!\n";
my $debug = &promptUserForInformation($UPath);
my $program_time = time();
chomp(my $header = <CSV>);
$header = &processHeader($header);
{ # Print header
my $output = "objectClass," .
"DN," .
"displayName," .
"sn," .
"givenName," .
"initials," .
"title," .
"company," .
"department," .
"physicalDName," .
"telephoneNumber," .
"mailNickname," .
"mail," .
"targetAddress," .
"proxyAddresses," .
"mAPIRecipient";
print($output,"\n");
}
my @header = split(/,/,$header);
close(CSV);
# I believe the error happens here
my $csv_file = Class::CSV->parse(
filename =$csv_filename,
fields =[@header]
);
{# Lexical block for @lines
my @lines = @{$csv_file->lines()};
shift @lines;# Strips off the header line before processing
$csv_file->lines(\@lines);
}
<snip>
No.1 | | 427 bytes |
| 
Jul 6, 2006, at 10:41, Roman Daszczyszak wrote:
Does anyone have a suggestion for how I can handle this, or even where
I can look to solve this issue? Is there another possibility for
where the error is occurring that I am not seeing?
I would create the filehandle specifying the character encoding in
the open call (as documented in perldoc -f open). I didn't test this,
just an idea.
-- fxn
No.2 | | 563 bytes |
| 
Roman Daszczyszak wrote:
Hello all,
I am using a script to parse a CSV file with approximately 65,000
records. Some of these records contain characters such as , , etc.
I can read and write lines containing these characters via a file
handle, however when I try and parse the line using the module
Class::CSV, it fails and returns the error:
Failed to parse line: <line it failed on>
[]
It might help to specify the encoding at the top of your program with
"use encoding 'iso-8859-1';"
perldoc encoding
No.3 | | 1266 bytes |
| 
Roman Daszczyszak wrote:
Hello all,
Hello,
I am using a script to parse a CSV file with approximately 65,000
records. Some of these records contain characters such as , , etc.
I can read and write lines containing these characters via a file
handle, however when I try and parse the line using the module
Class::CSV, it fails and returns the error:
Failed to parse line: <line it failed on>
The characters you describe are not ASCII which is why the module is having
problems with them.
According to the documentation for Class::CSV "Text::CSV_XS is used for
parsing and creating CSV file lines, so any limitations in Text::CSV_XS will
of course be inherant in this module.", and according to the documentation for
Text::CSV_XS:
<quote>
new(\%attr)
[snip]
binary
If this attribute is TRUE, you may use binary characters in quoted fields,
including line feeds, carriage returns and NUL bytes. (The latter must be
escaped as "0.) By default this feature is off.
</quote>
So try setting the binary attribute:
my $csv_file = Class::CSV->new( binary =1 );
if that doesn't work just use the Text::CSV_XS module.
John
No.4 | | 301 bytes |
| 
"Roman Daszczyszak" schreef:
#!\perl\perl.exe
Does that do anything useful? Maybe change to just
#!perl
my @lines = @{$csv_file->lines()};
shift @lines; # Strips off the header line before processing
Variant:
(undef, @my lines) = @{$csv_file->lines()} ;
No.5 | | 569 bytes |
| 
Dr.Ruud wrote:
"Roman Daszczyszak" schreef:
!\perl\perl.exe
Does that do anything useful? Maybe change to just
#!perl
How about:
#!/perl/bin/perl
MS Windows, perl is usually 'C:\PERL\BIN\PERL.EXE'. You would not
need the '.exe' and I'm quite sure you need a 'bin' somewhere in there.
Normally, you would not need to add this as MS Windows associates *.PL
files with perl. But some programs, like web servers, ignore the
Registry and use the shebang to find the interpreter.
No.6 | | 726 bytes |
| 
"Mumia W." schreef:
Roman Daszczyszak:
>I am using a script to parse a CSV file with approximately 65,000
>records. Some of these records contain characters such as , , etc.
>I can read and write lines containing these characters via a file
>handle, however when I try and parse the line using the module
>Class::CSV, it fails and returns the error:
>Failed to parse line: <line it failed on>
>[]
>
It might help to specify the encoding at the top of your program with
"use encoding 'iso-8859-1';"
That changes the encoding of the script, and of STDIN and STDUT, but
not of any open()s, so it might not be enough.
No.7 | | 412 bytes |
| 
Dr.Ruud wrote:
That changes the encoding of the script, and of STDIN and STDUT, but
not of any open()s, so it might not be enough.
You can open with encoding with the three-argument form. For example:
open(FH, "<:utf8", "file")
See `perldoc -f open` for details. Also, after an open, you can change
the file handle's mode with binmode. See `perldoc -f binmode` for details.