Need some help filtering thru results
0 answers - 3701 bytes -

Hello,
We need to grab some data from a webpage fetch via the LWP module. This is the
coding and
the $resultdata below, need to regrex out various data, indicated by the [ ]
brackets see below for further explainations.
My regrex is not very strong and need to some help figuring out the best way to
do this.
#!/usr/bin/perl
BEGIN { open (STDERR, ">./mandy_error.log"); }
use CGI::Carp qw(fatalsToBrowser);
use CGI qw(:standard);
use HTTP::Request;
use LWP::UserAgent;
use strict;
my $agent = "Thunder Rain Scraper";
my $adminemail = 'mickalo (AT) frontiernet (DOT) net';
my $urltofetch =
'';
my $resultdata = fetch_results($urltofetch);
print header();
if(defined($resultdata))
{
# process resulting data returned
$resultdata =~ s/&/&/ig;
$resultdata =~ s/ / /ig;
LP:
for my $lines ( split(/\n/,$resultdata) )
{
if($lines =~ /<tr class=\"main\"/i) # THIS IS NT WRKING.
{
# D STUFF HERE -
}
}
}
else
{
print qq~\nNo Result Data Returned\r\n~;
}
print qq~\nProcess Completed\n~;
exit();
sub fetch_results {
my $url = shift();
# MAIN
my $ua = new LWP::UserAgent; # create a new LWP agent
$ua->from($adminemail); # set HTTP From
$ua->agent($agent); # set Agent-Name
# retrieve the file from $url
my $request = new HTTP::Request GET =$url;
my $response = $ua->request($request);
# return content
if ($response->is_success()) { return $response->content(); }
else { return undef; }
}
__END__
Now the data returned, we need to filter out all except where it has <!-- START
GRABBING RESULT HERE
till the <!-- END RESULT HERE I need to grab the data within the [ ]
brackets. Those brackets [ ] I inserted for clarification, there not normally
there. And go through each <tr class="main"(.*?)</trtable cell up to the end
of the </table>
# FILTET T RESULTS
A BUNCH HEADER STUFF HERE
# START TABLE HERE
<table border="0" width="100%" cellpadding="5" cellspacing="0">
<tr class="dbluetoppedbox" bgcolor="#E6EFF8"><td valign="TP">
<span
class="main">Vacancy</span>
</td><td valign="TP"><span class="main">Employer</span>
</td><td valign="TP" nowrap><span
class="main">
Where (Ad posted)</span></td>
<td valign="TP"><span class="main">Duration</span></td>
<td valign="TP" nowrap><span class="main">Pay</span></td>
</tr>
<!-- START GRABBING RESULT HERE
<tr class="main"><td valign="TP"><a href="[jobs3.cfm?v=18327933]">
[Camera / Video Editor]</a></td><td valign="TP">[BigbreakNy]</td>
<td valign="TP">[Manhattan and Union ]([30 Aug ])</td>
<td valign="TP">[ASAP / A few days of shooting]</td><td
valign="TP">[Lo/no]</td>
</tr>
# NEXT RW CELL
<tr class="main"><td valign="TP"><a href="[jobs3.cfm?v=18326674]">
[Video Sub]</a></td><td valign="TP">[Blue Man Group]</td><td valign="TP">[New
York (30 Aug)]
</td><td valign="TP">[ASAP / open ended]</td><td
valign="TP">[Paid]</td></tr>
# NEXT RW CELL
<!-- END RESULT GRABBING HERE
</table>
Mike(mickalo)Blezien
Thunder Rain Internet Publishing
Providing Internet Solution that Work
http://www.thunder-rain.com