www.emsdn.com
Class Profile: Home »» Databases [Databases] under "Databases" »»» DISTINCT to get distinct *substrings*?

DISTINCT to get distinct *substrings*?


Hello to the list,
here's an SQL question, I hope it's not off topic. From a list of
URLs I want to get only the distinct values of the *web sites* these
URLs belong to, that is everything before and including the 3rd
slash, and I think this should be possible within the DB. I would
like to say something like
SELECT substring(attribute from '^http://[^/]*/') from pg_atp where
attribute like 'http://%'
(which works) but get only the distinct values. SELECT DISTINCT N
substring doesn't work. Probably I haven't understood the semantics
of the DISTINCT keyword. Can anybody help?
thanks in advance
Christoph


No. 1# | By Developer Tags User at [2008-5-5] | size: 1042 bytes

SELECT DISTINCT substring(attribute from '^http://[^/]*/') from pg_atp
where attribute like 'http://%';

w/o DISTINCT there should be duplicates (if any)

don't use "DISTINCT N" at all, it's evil :-) (why?
#A13)

8/8/06, Christoph Pingel <ch.pingel (AT) web (DOT) dewrote:

Hello to the list,

here's an SQL question, I hope it's not off topic. From a list of URLs I
want to get only the distinct values of the *web sites* these URLs belong
to, that is everything before and including the 3rd slash, and I think this
should be possible within the DB. I would like to say something like

SELECT substring(attribute from '^http://[^/]*/') from pg_atp where
attribute like 'http://%'

(which works) but get only the distinct values. SELECT DISTINCT N
substring doesn't work. Probably I haven't understood the semantics of the
DISTINCT keyword. Can anybody help?

thanks in advance
Christoph
--

No. 1# | By Developer Tags User at [2008-5-5] | size: 1437 bytes

DISTINCT N is extremely useful when you know what you're doing. It's
postgres' version of oracle's first_value analytical function, and when
you need it, nothing else really suffices.

Tue, 8 Aug 2006, Nikolay Samokhvalov wrote:

SELECT DISTINCT substring(attribute from '^http://[^/]*/') from pg_atp
where attribute like 'http://%';

w/o DISTINCT there should be duplicates (if any)

don't use "DISTINCT N" at all, it's evil :-) (why?
#A13)

8/8/06, Christoph Pingel <ch.pingel (AT) web (DOT) dewrote:
>
>Hello to the list,
>
>here's an SQL question, I hope it's not off topic. From a list of URLs I
>want to get only the distinct values of the *web sites* these URLs belong
>to, that is everything before and including the 3rd slash, and I think this
>should be possible within the DB. I would like to say something like
>
>SELECT substring(attribute from '^http://[^/]*/') from pg_atp where
>attribute like 'http://%'
>
>(which works) but get only the distinct values. SELECT DISTINCT N
>substring doesn't work. Probably I haven't understood the semantics of
>the
>DISTINCT keyword. Can anybody help?
>
>thanks in advance
>Christoph
>
>
>
>

No. 1# | By Developer Tags User at [2008-5-5] | size: 819 bytes

Am 08.08.2006 um 19:49 schrieb Nikolay Samokhvalov:

SELECT DISTINCT substring(attribute from '^http://[^/]*/') from pg_atp
where attribute like 'http://%';

w/o DISTINCT there should be duplicates (if any)

don't use "DISTINCT N" at all, it's evil :-) (why?
#A13)

Thanks for the good advice! From reading this, it seems to be a
*really* bad thing. And I didn't get it from the official
documentation. :-)

ok, SELECT DISTINCT works, and it seems that the results are ordered
(by the substring) - is this the default behaviour or just by chance
(and probably version dependent)?

best regards,
Christoph

(end of broadcast)
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

No. 1# | By Developer Tags User at [2008-5-5] | size: 2732 bytes

Thanks for the input, I think I get this now. In my case, the query

SELECT DISTINCT N (substring(attribute from '^http://[^/]*/'))
attribute from pg_atp where attribute like 'http://%'

doesn't get me just the root of the URL, but the whole URL - but only
for the first row for each individual root. While

SELECT DISTINCT substring(attribute from '^http://[^/]*/') from
pg_atp where attribute like 'http://%'

does what I first intended - get a list of all (distinct) root URLs.
Wieder was gelernt. (Learnt something again. :-)

best regards,
Christoph

Am 08.08.2006 um 20:36 schrieb Ben:

DISTINCT N is extremely useful when you know what you're doing.
It's postgres' version of oracle's first_value analytical function,
and when you need it, nothing else really suffices.

Tue, 8 Aug 2006, Nikolay Samokhvalov wrote:
>
>SELECT DISTINCT substring(attribute from '^http://[^/]*/') from
>pg_atp
>where attribute like 'http://%';
>>

>w/o DISTINCT there should be duplicates (if any)
>>

>don't use "DISTINCT N" at all, it's evil :-) (why?
>#A13)
>>

>8/8/06, Christoph Pingel <ch.pingel (AT) web (DOT) dewrote:

Hello to the list,
here's an SQL question, I hope it's not off topic. From a list of
URLs I
want to get only the distinct values of the *web sites* these
URLs belong
to, that is everything before and including the 3rd slash, and I
think this
should be possible within the DB. I would like to say something like
SELECT substring(attribute from '^http://[^/]*/') from pg_atp where
attribute like 'http://%'
(which works) but get only the distinct values. SELECT DISTINCT N
substring doesn't work. Probably I haven't understood the
semantics of the
DISTINCT keyword. Can anybody help?
thanks in advance
Christoph
>>
>>

>--
>Best regards,
>Nikolay
>>

>(end of
>broadcast)
>TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to majordomo (AT) postgresql (DOT) org so that
>your
>message can get through to the mailing list cleanly
>>

>

(end of
broadcast)
TIP 2: Don't 'kill -9' the postmaster

No. 1# | By Developer Tags User at [2008-5-5] | size: 935 bytes

Christoph Pingel <ch.pingel (AT) web (DOT) dewrites:
Am 08.08.2006 um 19:49 schrieb Nikolay Samokhvalov:
>don't use "DISTINCT N" at all, it's evil :-) (why?
>#A13


Thanks for the good advice! From reading this, it seems to be a
*really* bad thing. And I didn't get it from the official
documentation. :-)

That page is complaining about DISTINCT N as it was defined in 1999.
It's a lot harder to shoot yourself in the foot now:

regression=# select distinct on (ten) hundred from tenk1 order by unique2;
ERRR: SELECT DISTINCT N expressions must match initial RDER BY expressions

I don't deny that it's nonstandard and pretty ugly, but sometimes it's
just really hard to solve a problem any other way.

regards, tom lane

(end of broadcast)
TIP 3: Have you checked our extensive FAQ?



Databases Hot!

Databases New!


Copyright © 2008 www.emsdn.com • All rights reserved • CMS Theme by www.emsdn.com - 0.375