Python

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • parsing

    0 answers - 3042 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Hi!
    Give a look at
    BeautifulSoup is a python module designed for parsing html
    Carlo
    what is ITER? www.iter.org

    >>
    >>First, excuse me my English English is not my native
    >>language, but
    >>I hope
    >>that I will be able to describe my problem.
    >>
    >>I am new in python for web, but I want to do such thing:
    >>
    >>Suppose I have a html-page, like this:
    >>"""
    >><title>TITLE</title>
    >><body>
    >>body_1
    >><h1>1_1</h1>
    >><h2>2_1</h2>
    >><div id=one>div_one_1</div>
    >><p>p_1</p>
    >><p>p_2</p>
    >><div id=one>div_one_2</div>
    >><span class=sp_1>
    >>sp_text
    >><div id=one>div_one_2</div>
    >><div id=one>div_one_3</div>
    >></span>
    >><h3>3_1</h3>
    >><h2>2_2</h2>
    >><p>p_3</p>
    >>body_2
    >><h1>END</h1>
    >><table>
    >><tr><td>td_1</td>
    >><td class=sp_2>td_2</td>
    >><td>td_3</td>
    >><td>td_4</td></tr>
    >>
    >></body>
    >>
    >>"""
    >>
    >>I want to get all info from this html in a dictionary

    that
    >>looks like
    >>this:
    >>
    >>rezult = [{'title':['TITLE'],
    >>{'body':['body_1', 'body_2']},
    >>{'h1':['1_1', 'END']},
    >>{'h2':['2_1', '2_2']},
    >>{'h3':['3_1']},
    >>{'p':['p_1', 'p_2']},
    >>{'id_one':['div_one_1', 'div_one_2', 'div_one_3']},
    >>{'span_sp_1':['sp_text']},
    >>{'td':['td_1', 'td_3', 'td_4']},
    >>{'td_sp_2':['td_2']},
    >>
    >>]
    >>
    >>Huh, hope you understand what I need.
    >>Can you advise me what approaches exist to solve tasks

    of such
    >>type
    >>and
    >>may be show some practical examples
    >>Thanks in advance for help of all kind
    >>
    >>
    >>
    >>Try ElementTree or Amara.
    >>
    >>
    >>
    >>If you only cared about contents, BeautifulSoup is the answer.
    >>
    >>Ismael
    >>
    >>Tutor maillist - Tutor (AT) python (DOT) org
    >>
    >>

    Tutor maillist - Tutor (AT) python (DOT) org

Re: parsing


max 4000 letters.
Your nickname that display:
In order to stop the spam: 5 + 4 =
QUESTION ON "Python"

EMSDN.COM