Databases

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • DISTINCT to get distinct *substrings*?

    5 answers - 712 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Hello to the list,
    here's an SQL question, I hope it's not off topic. From a list of
    URLs I want to get only the distinct values of the *web sites* these
    URLs belong to, that is everything before and including the 3rd
    slash, and I think this should be possible within the DB. I would
    like to say something like
    SELECT substring(attribute from '^http://[^/]*/') from pg_atp where
    attribute like 'http://%'
    (which works) but get only the distinct values. SELECT DISTINCT N
    substring doesn't work. Probably I haven't understood the semantics
    of the DISTINCT keyword. Can anybody help?
    thanks in advance
    Christoph
  • No.1 | | 1042 bytes | |

    SELECT DISTINCT substring(attribute from '^http://[^/]*/') from pg_atp
    where attribute like 'http://%';

    w/o DISTINCT there should be duplicates (if any)

    don't use "DISTINCT N" at all, it's evil :-) (why?
    #A13)

    8/8/06, Christoph Pingel <ch.pingel (AT) web (DOT) dewrote:

    Hello to the list,

    here's an SQL question, I hope it's not off topic. From a list of URLs I
    want to get only the distinct values of the *web sites* these URLs belong
    to, that is everything before and including the 3rd slash, and I think this
    should be possible within the DB. I would like to say something like

    SELECT substring(attribute from '^http://[^/]*/') from pg_atp where
    attribute like 'http://%'

    (which works) but get only the distinct values. SELECT DISTINCT N
    substring doesn't work. Probably I haven't understood the semantics of the
    DISTINCT keyword. Can anybody help?

    thanks in advance
    Christoph
    --
  • No.2 | | 1437 bytes | |

    DISTINCT N is extremely useful when you know what you're doing. It's
    postgres' version of oracle's first_value analytical function, and when
    you need it, nothing else really suffices.

    Tue, 8 Aug 2006, Nikolay Samokhvalov wrote:

    SELECT DISTINCT substring(attribute from '^http://[^/]*/') from pg_atp
    where attribute like 'http://%';

    w/o DISTINCT there should be duplicates (if any)

    don't use "DISTINCT N" at all, it's evil :-) (why?
    #A13)

    8/8/06, Christoph Pingel <ch.pingel (AT) web (DOT) dewrote:
    >
    >Hello to the list,
    >
    >here's an SQL question, I hope it's not off topic. From a list of URLs I
    >want to get only the distinct values of the *web sites* these URLs belong
    >to, that is everything before and including the 3rd slash, and I think this
    >should be possible within the DB. I would like to say something like
    >
    >SELECT substring(attribute from '^http://[^/]*/') from pg_atp where
    >attribute like 'http://%'
    >
    >(which works) but get only the distinct values. SELECT DISTINCT N
    >substring doesn't work. Probably I haven't understood the semantics of
    >the
    >DISTINCT keyword. Can anybody help?
    >
    >thanks in advance
    >Christoph
    >
    >
    >
    >
  • No.3 | | 819 bytes | |

    Am 08.08.2006 um 19:49 schrieb Nikolay Samokhvalov:

    SELECT DISTINCT substring(attribute from '^http://[^/]*/') from pg_atp
    where attribute like 'http://%';

    w/o DISTINCT there should be duplicates (if any)

    don't use "DISTINCT N" at all, it's evil :-) (why?
    #A13)

    Thanks for the good advice! From reading this, it seems to be a
    *really* bad thing. And I didn't get it from the official
    documentation. :-)

    ok, SELECT DISTINCT works, and it seems that the results are ordered
    (by the substring) - is this the default behaviour or just by chance
    (and probably version dependent)?

    best regards,
    Christoph

    (end of broadcast)
    TIP 4: Have you searched our list archives?

    http://archives.postgresql.org
  • No.4 | | 2732 bytes | |

    Thanks for the input, I think I get this now. In my case, the query

    SELECT DISTINCT N (substring(attribute from '^http://[^/]*/'))
    attribute from pg_atp where attribute like 'http://%'

    doesn't get me just the root of the URL, but the whole URL - but only
    for the first row for each individual root. While

    SELECT DISTINCT substring(attribute from '^http://[^/]*/') from
    pg_atp where attribute like 'http://%'

    does what I first intended - get a list of all (distinct) root URLs.
    Wieder was gelernt. (Learnt something again. :-)

    best regards,
    Christoph

    Am 08.08.2006 um 20:36 schrieb Ben:

    DISTINCT N is extremely useful when you know what you're doing.
    It's postgres' version of oracle's first_value analytical function,
    and when you need it, nothing else really suffices.

    Tue, 8 Aug 2006, Nikolay Samokhvalov wrote:
    >
    >SELECT DISTINCT substring(attribute from '^http://[^/]*/') from
    >pg_atp
    >where attribute like 'http://%';
    >>

    >w/o DISTINCT there should be duplicates (if any)
    >>

    >don't use "DISTINCT N" at all, it's evil :-) (why?
    >#A13)
    >>

    >8/8/06, Christoph Pingel <ch.pingel (AT) web (DOT) dewrote:

    Hello to the list,
    here's an SQL question, I hope it's not off topic. From a list of
    URLs I
    want to get only the distinct values of the *web sites* these
    URLs belong
    to, that is everything before and including the 3rd slash, and I
    think this
    should be possible within the DB. I would like to say something like
    SELECT substring(attribute from '^http://[^/]*/') from pg_atp where
    attribute like 'http://%'
    (which works) but get only the distinct values. SELECT DISTINCT N
    substring doesn't work. Probably I haven't understood the
    semantics of the
    DISTINCT keyword. Can anybody help?
    thanks in advance
    Christoph
    >>
    >>

    >--
    >Best regards,
    >Nikolay
    >>

    >(end of
    >broadcast)
    >TIP 1: if posting/reading through Usenet, please send an appropriate
    >subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that
    >your
    >message can get through to the mailing list cleanly
    >>

    >

    (end of
    broadcast)
    TIP 2: Don't 'kill -9' the postmaster
  • No.5 | | 935 bytes | |

    Christoph Pingel <ch.pingel (AT) web (DOT) dewrites:
    Am 08.08.2006 um 19:49 schrieb Nikolay Samokhvalov:
    >don't use "DISTINCT N" at all, it's evil :-) (why?
    >#A13


    Thanks for the good advice! From reading this, it seems to be a
    *really* bad thing. And I didn't get it from the official
    documentation. :-)

    That page is complaining about DISTINCT N as it was defined in 1999.
    It's a lot harder to shoot yourself in the foot now:

    regression=# select distinct on (ten) hundred from tenk1 order by unique2;
    ERRR: SELECT DISTINCT N expressions must match initial RDER BY expressions

    I don't deny that it's nonstandard and pretty ugly, but sometimes it's
    just really hard to solve a problem any other way.

    regards, tom lane

    (end of broadcast)
    TIP 3: Have you checked our extensive FAQ?

Re: DISTINCT to get distinct *substrings*?


max 4000 letters.
Your nickname that display:
In order to stop the spam: 8 + 7 =
QUESTION ON "Databases"

EMSDN.COM