Perl

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Need help with error from certain ASCII characters in a CSV file

    7 answers - 2490 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Hello all,
    I am using a script to parse a CSV file with approximately 65,000
    records. Some of these records contain characters such as , , etc.
    I can read and write lines containing these characters via a file
    handle, however when I try and parse the line using the module
    Class::CSV, it fails and returns the error:
    Failed to parse line: <line it failed on>
    My question, rather broadly, is how can I successfully handle these
    characters? If I manually edit the file (via vim) and replace all the
    '' characters with 'e', and the like, it works fine. However, I
    would prefer the script actually handle the characters, or at least
    have a automated way to strip them out, without having to enumerate
    each character (i.e. =~ s//e/g;).
    Does anyone have a suggestion for how I can handle this, or even where
    I can look to solve this issue? Is there another possibility for
    where the error is occurring that I am not seeing?
    Any advice you may lend would be greatly appreciated.
    Regards,
    Roman
    Code follows:
    #!\perl\perl.exe
    use warnings;
    use strict;
    use Class::CSV;
    # Main program area
    my $UPath = "U=EXTERNAL,DC=PRIV,DC=RED,DC=BG,DC=CRD,DC=RG,DC=VP ";
    my $error_file = "csv_errors.csv";
    if (-e $error_file)
    {
    open(ERRR, ">$error_file") or die "Cannot open error log file: $!\n";
    close(ERRR);
    }
    my $csv_filename = $ARGV[0] or die "USAGE: csv_perl.pl FILENAME\n";
    open(CSV, $csv_filename) or die "Cannot open $csv_filename: $!\n";
    my $debug = &promptUserForInformation($UPath);
    my $program_time = time();
    chomp(my $header = <CSV>);
    $header = &processHeader($header);
    { # Print header
    my $output = "objectClass," .
    "DN," .
    "displayName," .
    "sn," .
    "givenName," .
    "initials," .
    "title," .
    "company," .
    "department," .
    "physicalDName," .
    "telephoneNumber," .
    "mailNickname," .
    "mail," .
    "targetAddress," .
    "proxyAddresses," .
    "mAPIRecipient";
    print($output,"\n");
    }
    my @header = split(/,/,$header);
    close(CSV);
    # I believe the error happens here
    my $csv_file = Class::CSV->parse(
    filename =$csv_filename,
    fields =[@header]
    );
    {# Lexical block for @lines
    my @lines = @{$csv_file->lines()};
    shift @lines;# Strips off the header line before processing
    $csv_file->lines(\@lines);
    }
    <snip>
  • No.1 | | 427 bytes | |

    Jul 6, 2006, at 10:41, Roman Daszczyszak wrote:

    Does anyone have a suggestion for how I can handle this, or even where
    I can look to solve this issue? Is there another possibility for
    where the error is occurring that I am not seeing?

    I would create the filehandle specifying the character encoding in
    the open call (as documented in perldoc -f open). I didn't test this,
    just an idea.
    -- fxn
  • No.2 | | 563 bytes | |

    Roman Daszczyszak wrote:
    Hello all,

    I am using a script to parse a CSV file with approximately 65,000
    records. Some of these records contain characters such as , , etc.
    I can read and write lines containing these characters via a file
    handle, however when I try and parse the line using the module
    Class::CSV, it fails and returns the error:
    Failed to parse line: <line it failed on>
    []

    It might help to specify the encoding at the top of your program with
    "use encoding 'iso-8859-1';"

    perldoc encoding
  • No.3 | | 1266 bytes | |

    Roman Daszczyszak wrote:
    Hello all,

    Hello,

    I am using a script to parse a CSV file with approximately 65,000
    records. Some of these records contain characters such as , , etc.
    I can read and write lines containing these characters via a file
    handle, however when I try and parse the line using the module
    Class::CSV, it fails and returns the error:
    Failed to parse line: <line it failed on>

    The characters you describe are not ASCII which is why the module is having
    problems with them.

    According to the documentation for Class::CSV "Text::CSV_XS is used for
    parsing and creating CSV file lines, so any limitations in Text::CSV_XS will
    of course be inherant in this module.", and according to the documentation for
    Text::CSV_XS:

    <quote>
    new(\%attr)

    [snip]

    binary

    If this attribute is TRUE, you may use binary characters in quoted fields,
    including line feeds, carriage returns and NUL bytes. (The latter must be
    escaped as "0.) By default this feature is off.
    </quote>

    So try setting the binary attribute:

    my $csv_file = Class::CSV->new( binary =1 );

    if that doesn't work just use the Text::CSV_XS module.

    John
  • No.4 | | 301 bytes | |

    "Roman Daszczyszak" schreef:

    #!\perl\perl.exe

    Does that do anything useful? Maybe change to just
    #!perl

    my @lines = @{$csv_file->lines()};
    shift @lines; # Strips off the header line before processing

    Variant:

    (undef, @my lines) = @{$csv_file->lines()} ;
  • No.5 | | 569 bytes | |

    Dr.Ruud wrote:
    "Roman Daszczyszak" schreef:

    !\perl\perl.exe

    Does that do anything useful? Maybe change to just
    #!perl

    How about:
    #!/perl/bin/perl

    MS Windows, perl is usually 'C:\PERL\BIN\PERL.EXE'. You would not
    need the '.exe' and I'm quite sure you need a 'bin' somewhere in there.

    Normally, you would not need to add this as MS Windows associates *.PL
    files with perl. But some programs, like web servers, ignore the
    Registry and use the shebang to find the interpreter.
  • No.6 | | 726 bytes | |

    "Mumia W." schreef:
    Roman Daszczyszak:

    >I am using a script to parse a CSV file with approximately 65,000
    >records. Some of these records contain characters such as , , etc.
    >I can read and write lines containing these characters via a file
    >handle, however when I try and parse the line using the module
    >Class::CSV, it fails and returns the error:
    >Failed to parse line: <line it failed on>
    >[]
    >

    It might help to specify the encoding at the top of your program with
    "use encoding 'iso-8859-1';"

    That changes the encoding of the script, and of STDIN and STDUT, but
    not of any open()s, so it might not be enough.
  • No.7 | | 412 bytes | |

    Dr.Ruud wrote:
    That changes the encoding of the script, and of STDIN and STDUT, but
    not of any open()s, so it might not be enough.

    You can open with encoding with the three-argument form. For example:

    open(FH, "<:utf8", "file")

    See `perldoc -f open` for details. Also, after an open, you can change
    the file handle's mode with binmode. See `perldoc -f binmode` for details.

Re: Need help with error from certain ASCII characters in a CSV file


max 4000 letters.
Your nickname that display:
In order to stop the spam: 7 + 6 =
QUESTION ON "Perl"

EMSDN.COM