KDE

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Patch to fix encoding problems in both incoming andoutgoing Oscar messages.

    16 answers - 4888 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Hello!
    Merry Christmas and Happy New Year to everyone!
    It appeared that my previous patch fixing encoding in incoming messages does
    not solve the whole problem with encodings. Unfortunately, to fix encoding
    problems in outgoing messages significant refactoring was needed. The patch
    attached fixes encodings in (hopefully) all incoming and outgoing messages,
    searches and user info. No more hardcoded latin1 encoding. Per-contact
    encoding or Unicode is used everywhere. It also fixes a related bug leading
    to resetting capabilities in contact's UserDetails when contact changes
    online status.
    In order to achieve these goals I've changed semantics of ::Message class
    and some other classes which are used as interfaces between highlevel part
    (Account, Contact) and lowlevel part (Task classes). Before,
    ::Message was used as dual-purpose: to store message as Unicode QString
    and as raw QByteArray, depending on usage. I've changed ::Message to
    store raw QByteArray only, and made text() and setText() methods and
    appropriate constructors to perform conversions from/to Unicode QString.
    These methods and constructors now expect contact-specific codec as argument
    to perform conversion correctly.
    The reason for this change is because lowlevel part knows nothing about
    per-contact encodings and has no enough information to convert Unicode
    QString to outgoing message using contact-specific encoding. Also, lowlevel
    part knows nothing about contact client's capabilities to decide whether to
    use Unicode or contact-specific encoding for outgoing message. highlevel
    part has enough information to make decisions like this. Now things made
    simple: highlevel part uses raw messages to communicate with lowlevel part,
    because only highlevel part knows which contact message belongs to and which
    encoding to use for this message.
    The following classes have been changed to use QCString instead of QString for
    similar reasons: ICQShortInfo, ICQGeneralUserInfo, ICQWorkUserInfo,
    ICQMoreUserInfo, ICQEmailInfo, ICQInterestInfo, ICQSearchResult.
    Also, I've fixed bug in UserDetails processing which led to loss of client
    capabilities information when contact changes online status. This was because
    each time contact changes online status, a new UserDetails structure is
    filled from status change message, and old UserDetails was overwritten by the
    new one. This is not correct to do it, because only the first such message
    (when user becomes online) contains information about client capabilities.
    All subsequent status change messages (when user changes status from
    to Away, for example) contain no client capabilities. As a result, in this
    case contact's old UserDetails was replaced by the new one with no client
    capabilities, leading to loss of client capabilities information.
    I've fixed this bug by adding bool fields to UserDetails class describing
    which parts of information is specified. Also, I've added
    UserDetails::merge() method which assigns only those parts of UserDetails
    which are marked as specified, leaving all other parts without change.
    The UserDetails bug is related to encoding issues, because after this patch
    highlevel part uses client capabilities to determine whether send messages in
    Unicode or contact-spoecific encoding, and whether to send channel 1 or
    channel 2 messages.
    There are some questions which are remained open for me and I would like to
    ask if anybody knows the answer.
    1. What is the limit of ICQ online message size? I know, that offline message
    size limit is 450 bytes, and official ICQ5 client does not allow to send
    larger offline messages. But it allows to send large online messages. I
    failed to determine the limit of their size, so I've hardcoded this limit to
    4096 characters. Is there more correct information?
    2. Is there the limit of AIM message size?
    3. The ICQContact::slotSendMsg() and AIMContact::slotSendMsg() use
    () method to notify chat session that
    message was successfully sent. Is there the way to notify chat session that
    message sending failed for some reason? This would be good, for example, to
    notify user that the message can not be encoded using contact-specific
    encoding, and request user to correct the message.
    4. Why is Kopete::Account::contacts() method not const?
    5. The patch attached to this message applies to kdenetwork in KDE 3.5 branch.
    It has some minor conflicts with dev-0.12 branch, which can be resolved
    manually. Should I make a patch against dev-0.12 branch?
    Best regards,
    -- Girko, http://www.infoserver.ru/~ol/
    kopete-devel mailing list
    kopete-devel (AT) kde (DOT) org
  • No.1 | | 559 bytes | |

    Girko wrote:
    >+ case ::Message::UCS2:
    >+ {
    >+ uint len = m_textArray.size() / 2;
    >+ QString result;
    >+ result.setLength( len );
    >+ QByteArray::ConstIterator p = m_textArray.begin();
    >+ for ( uint i = 0; i < len; i++)
    >+ {
    >+ char row = *p++;
    >+ char cell = *p++;
    >+ result[i] = QChar( cell, row );
    >+ }
    >+ return result;
    >+ }


    I hope the compiler is smart enough to realise the above is nothing but a
    memcpy in big-endian architectures.
  • No.2 | | 807 bytes | |

    Girko wrote:
    >+void ::Message::setTextArray( const QCString& newTextArray )
    >+{
    >+ m_textArray.duplicate( newTextArray );
    >+ uint len = m_textArray.size();
    >+ if ( len 0 )
    >+ {
    >+ ;
    >+ if ( m_textArray[len] == '\0' )
    >+ {
    >+ // Strip trailing null byte.
    >+ if ( ! m_textArray.resize( len ) )
    >+ {
    >+ // XXX What should we do if allocation

    failed? + // throw std::bad_alloc();
    >+ }
    >+ }
    >+ }
    >+}


    If an UCS-2 string is passed and ends with a 0x00 byte (like U+0100), it
    will be stripped.

    Also note that, for KDE4, QByteArray has an automatic-NUL semantic, so
    this code has to be revised for then. Can you leave a comment?
  • No.3 | | 422 bytes | |

    Girko wrote:
    >+ kdDebug(SCAR_RAW_DEBUG) << k_funcinfo << "Sending

    outgoing message in "
    + << "per-contact encoding" << endl;

    Please change this to:
    kdDebug(SCAR_RAW_DEBUG) << k_funcinfo << "Sending outgoing message in "
    "per-contact encoding" << endl;

    (note the lack of << between two consecutive string constants)
  • No.4 | | 5324 bytes | |

    Friday 30 December 2005 07:48, Girko wrote:
    Hello!

    Merry Christmas and Happy New Year to everyone!

    It appeared that my previous patch fixing encoding in incoming messages
    does not solve the whole problem with encodings. Unfortunately, to fix
    encoding problems in outgoing messages significant refactoring was needed.
    The patch attached fixes encodings in (hopefully) all incoming and outgoing
    messages, searches and user info. No more hardcoded latin1 encoding.
    Per-contact encoding or Unicode is used everywhere. It also fixes a related
    bug leading to resetting capabilities in contact's UserDetails when contact
    changes online status.

    In order to achieve these goals I've changed semantics of ::Message
    class and some other classes which are used as interfaces between highlevel
    part (Account, Contact) and lowlevel part (Task classes). Before,
    ::Message was used as dual-purpose: to store message as Unicode
    QString and as raw QByteArray, depending on usage. I've changed
    ::Message to store raw QByteArray only, and made text() and setText()
    methods and appropriate constructors to perform conversions from/to Unicode
    QString. These methods and constructors now expect contact-specific codec
    as argument to perform conversion correctly.

    The reason for this change is because lowlevel part knows nothing about
    per-contact encodings and has no enough information to convert Unicode
    QString to outgoing message using contact-specific encoding. Also, lowlevel
    part knows nothing about contact client's capabilities to decide whether to
    use Unicode or contact-specific encoding for outgoing message.
    highlevel part has enough information to make decisions like this. Now
    things made simple: highlevel part uses raw messages to communicate with
    lowlevel part, because only highlevel part knows which contact message
    belongs to and which encoding to use for this message.

    The following classes have been changed to use QCString instead of QString
    for similar reasons: ICQShortInfo, ICQGeneralUserInfo, ICQWorkUserInfo,
    ICQMoreUserInfo, ICQEmailInfo, ICQInterestInfo, ICQSearchResult.

    Also, I've fixed bug in UserDetails processing which led to loss of client
    capabilities information when contact changes online status. This was
    because each time contact changes online status, a new UserDetails
    structure is filled from status change message, and old UserDetails was
    overwritten by the new one. This is not correct to do it, because only the
    first such message (when user becomes online) contains information about
    client capabilities. All subsequent status change messages (when user
    changes status from to Away, for example) contain no client
    capabilities. As a result, in this case contact's old UserDetails was
    replaced by the new one with no client capabilities, leading to loss of
    client capabilities information.

    I've fixed this bug by adding bool fields to UserDetails class describing
    which parts of information is specified. Also, I've added
    UserDetails::merge() method which assigns only those parts of UserDetails
    which are marked as specified, leaving all other parts without change.

    The UserDetails bug is related to encoding issues, because after this patch
    highlevel part uses client capabilities to determine whether send messages
    in Unicode or contact-spoecific encoding, and whether to send channel 1 or
    channel 2 messages.

    There are some questions which are remained open for me and I would like to
    ask if anybody knows the answer.

    1. What is the limit of ICQ online message size? I know, that offline
    message size limit is 450 bytes, and official ICQ5 client does not allow to
    send larger offline messages. But it allows to send large online messages.
    I failed to determine the limit of their size, so I've hardcoded this limit
    to 4096 characters. Is there more correct information?

    The limit is approximately 8000 characters, depending on how big the rest of
    the SNAC data is. The limit for the size of SNAC data is 8192 bytes. This
    applies to both AIM and ICQ

    2. Is there the limit of AIM message size?

    See above. :)

    3. The ICQContact::slotSendMsg() and AIMContact::slotSendMsg() use
    () method to notify chat session that
    message was successfully sent. Is there the way to notify chat session that
    message sending failed for some reason? This would be good, for example, to
    notify user that the message can not be encoded using contact-specific
    encoding, and request user to correct the message.

    no, only messageSucceeded is provided.

    4. Why is Kopete::Account::contacts() method not const?

    I don't know, but we can't make it const now due to keeping BC until 4.0

    5. The patch attached to this message applies to kdenetwork in KDE 3.5
    branch. It has some minor conflicts with dev-0.12 branch, which can be
    resolved manually. Should I make a patch against dev-0.12 branch?

    You can make one. I'd like to review the patch for kopete in KDE 3.5 first,
    because i'm sure i'll have comments, so you'll see another message later. :)
  • No.5 | | 1078 bytes | |

    Hello!

    Friday 30 December 2005 16:33, Thiago Macieira wrote:
    Girko wrote:
    >+ case ::Message::UCS2:
    >+ {
    >+ uint len = m_textArray.size() / 2;
    >+ QString result;
    >+ result.setLength( len );
    >+ QByteArray::ConstIterator p = m_textArray.begin();
    >+ for ( uint i = 0; i < len; i++)
    >+ {
    >+ char row = *p++;
    >+ char cell = *p++;
    >+ result[i] = QChar( cell, row );
    >+ }
    >+ return result;
    >+ }
    >

    I hope the compiler is smart enough to realise the above is nothing but a
    memcpy in big-endian architectures.

    This code is not equivalent to memcpy, because QString is much more complex
    than just array of 16-bit integers, and QChar is more complex than 16-bit
    integer. Depending on compilation flags, QChar may contain 32-bit Unicode
    character. Hence, the code above provides a portable way to achieve the goal.
    -- Girko, http://www.infoserver.ru/~ol/

    kopete-devel mailing list
    kopete-devel (AT) kde (DOT) org
  • No.6 | | 2859 bytes | |

    Hello!

    Friday 30 December 2005 16:36, Thiago Macieira wrote:

    Girko wrote:
    >+void ::Message::setTextArray( const QCString& newTextArray )
    >+{
    >+ m_textArray.duplicate( newTextArray );
    >+ uint len = m_textArray.size();
    >+ if ( len 0 )
    >+ {
    >+ ;
    >+ if ( m_textArray[len] == '\0' )
    >+ {
    >+ // Strip trailing null byte.
    >+ if ( ! m_textArray.resize( len ) )
    >+ {
    >+ // XXX What should we do if allocation

    failed? + // throw std::bad_alloc();
    >+ }
    >+ }
    >+ }
    >+}
    >

    If an UCS-2 string is passed and ends with a 0x00 byte (like U+0100), it
    will be stripped.

    There are no places where UCS-2 string is represented as QCString. All places
    where QCString is used as an argument to ::Message::setTextArray() or
    ::Message constructor this QCString is retrieved from packet as LNTS (or
    constructed as the result of QString::utf8() method).

    In all these cases the QCString contains string in either UTF-8 or per-contact
    (8-bit) encoding. And QCString is _always_ nul-terminated. In other places
    raw QByteArray is used. Actually, the only place where UCS-2 string can be
    found in incoming packet is Channel 1 message.

    When I started writing this patch, I was not thinking about nul termination,
    and I was relying on automatic conversion from QCString to QByteArray.
    Unfortunately, it appeared that such conversion leaves undesirable nul byte
    at the end of QByteArray, leading, for example, to sending channel 2 messages
    wich are terminated with _two_ nul bytes instead of just one.

    This undesirable effect made me to add overloaded
    ::Message::setTextArray() method and ::Message constructor to
    specifically deal with nul-terminated QCString.

    Also note that, for KDE4, QByteArray has an automatic-NUL semantic, so
    this code has to be revised for then. Can you leave a comment?

    As I understand from Qt 4.1 documentation, Qt 4 has no QCString class at all.
    It has more powerful QByteArray instead. Actually, that's good, because it
    makes life and code simplier. Currently some string-retrieving methods of
    Buffer class return QByteArray, and some return QCString, which leads to need
    to constatnly worry whether we have nul byte at the end of string or not.
    When converting to Qt 4, all methods which return QCString (like
    Buffer::getLNTS() and Buffer::getLELNTS()) must be converted to return
    QByteArray _without_ explicit nul at the end. After doing that, all methods
    and constructors of ::Message which deal with QCString can be safely
    deleted.
    -- Girko, http://www.infoserver.ru/~ol/

    kopete-devel mailing list
    kopete-devel (AT) kde (DOT) org
  • No.7 | | 935 bytes | |

    Hello!

    Friday 30 December 2005 16:43, Thiago Macieira wrote:

    Girko wrote:
    >+ kdDebug(SCAR_RAW_DEBUG) << k_funcinfo << "Sending

    outgoing message in "
    + << "per-contact encoding" << endl;

    Please change this to:
    kdDebug(SCAR_RAW_DEBUG) << k_funcinfo << "Sending outgoing message in "
    "per-contact encoding" << endl;

    (note the lack of << between two consecutive string constants)

    Unfortunately, I have no write access to Subversion repository, so I have to
    wait until either my patch is committed and than send another patch
    containing minor corrections, or my patch is rejected and than send another
    version of this patch, which corrects problems preventing it from approval.
    -- Girko, http://www.infoserver.ru/~ol/

    kopete-devel mailing list
    kopete-devel (AT) kde (DOT) org
  • No.8 | | 524 bytes | |

    Girko wrote:
    >This code is not equivalent to memcpy, because QString is much more

    complex than just array of 16-bit integers, and QChar is more complex
    than 16-bit integer. Depending on compilation flags, QChar may contain
    32-bit Unicode character. Hence, the code above provides a portable way
    to achieve the goal.

    Which means that, depending on the compilation flags, it *is* a
    memcpy. :-)

    Too bad the wire format is big endian, or we could add a special case.
  • No.9 | | 663 bytes | |

    Girko wrote:
    >Unfortunately, I have no write access to Subversion repository, so I

    have to wait until either my patch is committed and than send another
    patch containing minor corrections, or my patch is rejected and than
    send another version of this patch, which corrects problems preventing
    it from approval.

    , then I suggest that you:

    1) ask for an account
    2) make the correction in your local copy
    3) wait for an ok from Matt

    He has some more comments, so he may have code to be changed. Whether
    he'll do the changes and commit or he'll ask you to commit, I can't tell.
  • No.10 | | 774 bytes | |

    Hello!

    Friday 30 December 2005 21:43, Thiago Macieira wrote:

    Girko wrote:
    >Unfortunately, I have no write access to Subversion repository, so I

    have to wait until either my patch is committed and than send another
    patch containing minor corrections, or my patch is rejected and than
    send another version of this patch, which corrects problems preventing
    it from approval.

    , then I suggest that you:

    1) ask for an account
    [skiped]

    Please tell me more about this. Having account will be more convenient than
    sending patches. Who I should ask to make me an account?
    -- Girko, http://www.infoserver.ru/~ol/

    kopete-devel mailing list
    kopete-devel (AT) kde (DOT) org
  • No.11 | | 206 bytes | |

    Girko wrote:
    >[skiped]
    >
    >Please tell me more about this. Having account will be more convenient

    than sending patches. Who I should ask to make me an account?
  • No.12 | | 6541 bytes | |

    Friday 30 December 2005 07:48, Girko wrote:
    Hello!

    Merry Christmas and Happy New Year to everyone!

    It appeared that my previous patch fixing encoding in incoming messages
    does not solve the whole problem with encodings. Unfortunately, to fix
    encoding problems in outgoing messages significant refactoring was needed.
    The patch attached fixes encodings in (hopefully) all incoming and outgoing
    messages, searches and user info. No more hardcoded latin1 encoding.
    Per-contact encoding or Unicode is used everywhere. It also fixes a related
    bug leading to resetting capabilities in contact's UserDetails when contact
    changes online status.

    In order to achieve these goals I've changed semantics of ::Message
    class and some other classes which are used as interfaces between highlevel
    part (Account, Contact) and lowlevel part (Task classes). Before,
    ::Message was used as dual-purpose: to store message as Unicode
    QString and as raw QByteArray, depending on usage. I've changed
    ::Message to store raw QByteArray only, and made text() and setText()
    methods and appropriate constructors to perform conversions from/to Unicode
    QString. These methods and constructors now expect contact-specific codec
    as argument to perform conversion correctly.

    The reason for this change is because lowlevel part knows nothing about
    per-contact encodings and has no enough information to convert Unicode
    QString to outgoing message using contact-specific encoding. Also, lowlevel
    part knows nothing about contact client's capabilities to decide whether to
    use Unicode or contact-specific encoding for outgoing message.
    highlevel part has enough information to make decisions like this. Now
    things made simple: highlevel part uses raw messages to communicate with
    lowlevel part, because only highlevel part knows which contact message
    belongs to and which encoding to use for this message.

    The following classes have been changed to use QCString instead of QString
    for similar reasons: ICQShortInfo, ICQGeneralUserInfo, ICQWorkUserInfo,
    ICQMoreUserInfo, ICQEmailInfo, ICQInterestInfo, ICQSearchResult.

    Also, I've fixed bug in UserDetails processing which led to loss of client
    capabilities information when contact changes online status. This was
    because each time contact changes online status, a new UserDetails
    structure is filled from status change message, and old UserDetails was
    overwritten by the new one. This is not correct to do it, because only the
    first such message (when user becomes online) contains information about
    client capabilities. All subsequent status change messages (when user
    changes status from to Away, for example) contain no client
    capabilities. As a result, in this case contact's old UserDetails was
    replaced by the new one with no client capabilities, leading to loss of
    client capabilities information.

    I've fixed this bug by adding bool fields to UserDetails class describing
    which parts of information is specified. Also, I've added
    UserDetails::merge() method which assigns only those parts of UserDetails
    which are marked as specified, leaving all other parts without change.

    The UserDetails bug is related to encoding issues, because after this patch
    highlevel part uses client capabilities to determine whether send messages
    in Unicode or contact-spoecific encoding, and whether to send channel 1 or
    channel 2 messages.

    There are some questions which are remained open for me and I would like to
    ask if anybody knows the answer.

    [snip the questions]

    Best regards,
    -- Girko, http://www.infoserver.ru/~ol/

    And now, on to the patch! :)

    First, could you seperate the user details merging from the encoding stuff and
    after you've made modifications to the patch, send two patches instead of
    one? (you can send them in the same email message, of course)

    For the UserDetails merging, I think it would be better to use an enum as a
    bitmask rather than using a bunch of bools. Something like this is what I
    have in mind:

    public:
    enum { UserClass = 0x01,
    MemberSince = 0x02,
    Since = 0x04,
    IdleTime = 0x05
    }

    private:
    int m_updatedElements;

    and in UserDetails::merge( UserDetails& ud ):
    {
    if ( m_updatedElements & 0x01 )
    // do stuff

    }

    Hopefully, you get the idea about what I'm going for here. It saves allocating
    a bunch of bools at least, and probably makes for cleaner code.

    I think overloading an operator so that something like "m_details += details"
    in the appropriate place to combine the old details with the new details
    would be cool.

    As for the rest of it, most of it looks ok. The one glaring problem is the
    introduction of Account into the liboscar code. I'd like to keep the
    liboscar code as free of Kopete and KDE specific parts as possible (yes, i
    know i use kdebug) so i'm wondering if it's possible to do all of this w/o
    relying on Account in the liboscar code and then submit a new patch.

    various comments:

    Account::defaultCodec should return the native encoding if an account
    specific encoding has not been set by the user. For example, if I'm using a
    german system, the default codec should be that for german rather than for
    latin1.

    The change in icqaccount.cpp results in a very very long line. Could you split
    that up so the lines are shorter? (It will also make it easier to read)

    I don't understand why you split the encodings out from the rest of the
    message properties in ::Message other than to distinguish between
    encodings and the rest of the possible message properties. Please explain.

    in chatservicetask.cpp, language and encoding are only there for debug output
    and can remain QStrings.

    The conversion from the using of Buffer::getWordBlock for the charset 0x02
    handling in messagereceivertask.cpp is wrong as is the appropriate code to do
    the conversion in oscarmessage.cpp. I suggest a setText method in
    ::Message that takes a word pointer and a length so that the conversion
    can be done with QString::fromUcs2.

    The changes in the other places look ok.

    Your work is much appreciated as fixing things with encodings is something
    that I've yet to get around to doing.

    Thanks!
  • No.13 | | 1161 bytes | |

    Matt Rogers wrote:
    >For the UserDetails merging, I think it would be better to use an enum

    as a bitmask rather than using a bunch of bools. Something like this is
    what I have in mind:
    >
    >public:
    >enum { UserClass = 0x01,
    >* * * *MemberSince = 0x02,
    >* * * *Since = 0x04,
    >* * * *IdleTime = 0x05
    >* * * }
    >
    >private:
    >int m_updatedElements;
    >
    >and in UserDetails::merge( UserDetails& ud ):
    >{
    >********if ( m_updatedElements & 0x01 )
    >****************// do stuff
    >
    >
    >}
    >
    >Hopefully, you get the idea about what I'm going for here. It saves

    allocating a bunch of bools at least, and probably makes for cleaner
    code.

    Alternatively, use a bitfield. The best of both worlds: you treat it like
    a bool and the compiler generates a bitmask.

    + bool m_userClassSpecified : 1;
    + bool m_memberSinceSpecified : 1;
    + bool m_onlineSinceSpecified : 1;
    + bool m_numSSpecified : 1;
    + bool m_idleTimeSpecified : 1;
    + bool m_extendedStatusSpecified : 1;
  • No.14 | | 5052 bytes | |

    Friday 30 December 2005 07:48, Girko wrote:
    Hello!

    Merry Christmas and Happy New Year to everyone!

    It appeared that my previous patch fixing encoding in incoming messages
    does not solve the whole problem with encodings. Unfortunately, to fix
    encoding problems in outgoing messages significant refactoring was needed.
    The patch attached fixes encodings in (hopefully) all incoming and outgoing
    messages, searches and user info. No more hardcoded latin1 encoding.
    Per-contact encoding or Unicode is used everywhere. It also fixes a related
    bug leading to resetting capabilities in contact's UserDetails when contact
    changes online status.

    In order to achieve these goals I've changed semantics of ::Message
    class and some other classes which are used as interfaces between highlevel
    part (Account, Contact) and lowlevel part (Task classes). Before,
    ::Message was used as dual-purpose: to store message as Unicode
    QString and as raw QByteArray, depending on usage. I've changed
    ::Message to store raw QByteArray only, and made text() and setText()
    methods and appropriate constructors to perform conversions from/to Unicode
    QString. These methods and constructors now expect contact-specific codec
    as argument to perform conversion correctly.

    The reason for this change is because lowlevel part knows nothing about
    per-contact encodings and has no enough information to convert Unicode
    QString to outgoing message using contact-specific encoding. Also, lowlevel
    part knows nothing about contact client's capabilities to decide whether to
    use Unicode or contact-specific encoding for outgoing message.
    highlevel part has enough information to make decisions like this. Now
    things made simple: highlevel part uses raw messages to communicate with
    lowlevel part, because only highlevel part knows which contact message
    belongs to and which encoding to use for this message.

    The following classes have been changed to use QCString instead of QString
    for similar reasons: ICQShortInfo, ICQGeneralUserInfo, ICQWorkUserInfo,
    ICQMoreUserInfo, ICQEmailInfo, ICQInterestInfo, ICQSearchResult.

    Also, I've fixed bug in UserDetails processing which led to loss of client
    capabilities information when contact changes online status. This was
    because each time contact changes online status, a new UserDetails
    structure is filled from status change message, and old UserDetails was
    overwritten by the new one. This is not correct to do it, because only the
    first such message (when user becomes online) contains information about
    client capabilities. All subsequent status change messages (when user
    changes status from to Away, for example) contain no client
    capabilities. As a result, in this case contact's old UserDetails was
    replaced by the new one with no client capabilities, leading to loss of
    client capabilities information.

    I've fixed this bug by adding bool fields to UserDetails class describing
    which parts of information is specified. Also, I've added
    UserDetails::merge() method which assigns only those parts of UserDetails
    which are marked as specified, leaving all other parts without change.

    The UserDetails bug is related to encoding issues, because after this patch
    highlevel part uses client capabilities to determine whether send messages
    in Unicode or contact-spoecific encoding, and whether to send channel 1 or
    channel 2 messages.

    There are some questions which are remained open for me and I would like to
    ask if anybody knows the answer.

    1. What is the limit of ICQ online message size? I know, that offline
    message size limit is 450 bytes, and official ICQ5 client does not allow to
    send larger offline messages. But it allows to send large online messages.
    I failed to determine the limit of their size, so I've hardcoded this limit
    to 4096 characters. Is there more correct information?

    2. Is there the limit of AIM message size?

    3. The ICQContact::slotSendMsg() and AIMContact::slotSendMsg() use
    () method to notify chat session that
    message was successfully sent. Is there the way to notify chat session that
    message sending failed for some reason? This would be good, for example, to
    notify user that the message can not be encoded using contact-specific
    encoding, and request user to correct the message.

    4. Why is Kopete::Account::contacts() method not const?

    5. The patch attached to this message applies to kdenetwork in KDE 3.5
    branch. It has some minor conflicts with dev-0.12 branch, which can be
    resolved manually. Should I make a patch against dev-0.12 branch?

    Best regards,
    -- Girko, http://www.infoserver.ru/~ol/

    Anyways, this patch is too good to let it fall by the wayside, so i'll massage
    it a bit and commit it when it's ready unless an updated patch will be ready
    for review soon.
  • No.15 | | 10413 bytes | |

    Hello!

    Sorry for delay of my answer, it was New Year holidays, lots of drinking and
    than lots of sleeping :-)

    Sunday 01 January 2006 18:35, Matt Rogers wrote:

    And now, on to the patch! :)

    First, could you seperate the user details merging from the encoding stuff
    and after you've made modifications to the patch, send two patches instead
    of one? (you can send them in the same email message, of course)

    course, I can (but it will require editing patches manually, because
    "oscarcontact.cpp" file is involved in both patches).

    I will send fixed patch splitted into two parts shortly after this message.

    For the UserDetails merging, I think it would be better to use an enum as a
    bitmask rather than using a bunch of bools. Something like this is what I
    have in mind:

    public:
    enum { UserClass = 0x01,
    MemberSince = 0x02,
    Since = 0x04,
    IdleTime = 0x05
    }

    private:
    int m_updatedElements;

    and in UserDetails::merge( UserDetails& ud ):
    {
    if ( m_updatedElements & 0x01 )
    // do stuff

    }

    Hopefully, you get the idea about what I'm going for here. It saves
    allocating a bunch of bools at least, and probably makes for cleaner code.

    I don't think so. In contrary, it makes code less clean: I think that

    if ( m_updatedElements & 0x01 )

    and even

    if ( m_updatedElements & UserClass )

    is more difficult to read than

    if ( m_userClassSpecified )

    Moreover, using bitmask is more error-prone. Look at your code example, it
    already contains an error, which is not so easy to find:

    IdleTime = 0x05

    In this case I agree with Thiago Macieira that the best solution is to pack
    boolean attributes using bitfields. This improves memory consumption without
    sacrificing code readability.

    I think overloading an operator so that something like "m_details +=
    details" in the appropriate place to combine the old details with the new
    details would be cool.

    Looks like a beautiful solution on the first glance, but think about the
    following consideration.

    People who read code have intuitive perception that "a += b" is conceptually
    equivalent to "a = a + b". First, there is no operator+ for UserDetails.
    Second, even if we implement operator+ for UserDetails, it won't behave
    symmetric way people expect from operator+, so that "a + b" and "b + a" will
    yield different results!

    Hence, operator+= for UserDetails behaves intuitively only on the surface, but
    in depth it does not meet several intuitive expectations people have.

    May be, I'm overcautious in this case, but I think that operators should be
    overloaded very carefully, after examining all possible pitfalls, to be sure
    it really improves readability and understandability of code, and does not
    give Java proponents another cause to say that operator overloading is
    evil. :-)

    As for the rest of it, most of it looks ok. The one glaring problem is the
    introduction of Account into the liboscar code. I'd like to keep the
    liboscar code as free of Kopete and KDE specific parts as possible (yes, i
    know i use kdebug) so i'm wondering if it's possible to do all of this w/o
    relying on Account in the liboscar code and then submit a new patch.

    I have one idea, but it seems little bit tricky.

    First, let's have an abstract class inside Client class defined like this:

    class CodecFactory
    {
    public:
    virtual void codecForContact( const QString& contactName ) const = 0;
    };

    Second, we modify Client class so that CodecFactory to be used by Client
    instance can be specified either by Client::setCodecFactory( const
    CodecFactory* ) method or by an argument to Client's constructor.

    Third, we either use multiple inheritance to inherit Account from
    Client::CodecFactory and redefine codecForContact() inside Account, or
    define class derived from Client::CodecFactory which redefines
    codecForContact() using backpointer to Account inside.

    What do you think about this idea?

    various comments:

    Account::defaultCodec should return the native encoding if an account
    specific encoding has not been set by the user. For example, if I'm using a
    german system, the default codec should be that for german rather than for
    latin1.

    Unfortunately, most modern Linux distributions use UTF-8 as the default
    encoding independent of whether default language is German or Russian, so
    QTextCodec::codecForLocale() is useless. There is no point to specify UTF-8
    as the encoding for non-Unicode messages. MS Windows has "Language for
    non-Unicode programs" setting, and I presume that this setting is used by
    official ICQ client for non-Unicode messages. Unfortunately, there is no such
    setting for Linux and other Unix-like systems.

    Moreover, some languages have several different popular non-Unicode encodings.
    For example, most popular 8-bit Russian encoding in Unix world is KI-8, but
    using it as a default for ICQ is meaningless, because most ICQ users use
    Windows, and default Windows encoding for non-Unicode applications is CP1251
    for Russian language.

    Hence, I don't see simple straightforward way to find best (most popular) ICQ
    encoding for specified language.

    The change in icqaccount.cpp results in a very very long line. Could you
    split that up so the lines are shorter? (It will also make it easier to
    read)

    Actually, I was avoiding splitting arguments of function calls because I don't
    see such splitting in the existing code and as a result I don't understand
    rules of splitting. For example, I don't know line length limit (I see many
    lines which exceed 80-character line length), and I don't know in which
    position the next argument of function call should start on the next line.

    I don't understand why you split the encodings out from the rest of the
    message properties in ::Message other than to distinguish between
    encodings and the rest of the possible message properties. Please explain.

    I don't see the reason to pack several unrelated enums and booleans inside one
    bitmask. It sacrifices readability for saving several bytes of memory.
    ::Message is short-lived object, which exists only for passing messages
    between highlevel and lowlevel parts. Considering single-threaded nature of
    Kopete, I presume that there are only few instances of ::Message exist
    at any given moment of time, so small extra number of bytes is not an issue
    in this case. Anyway, in order to save bytes, bitfields can be used.

    Also, using non-anonymous enum for encoding increases reliability through
    type-safety. It's impossible to use by mistake an arbitrary integer as an
    encoding: this error will be detected at compile time.

    Concerning code readability, it's unclear from reading code which bits of
    bitmask stored in m_properties member are conceptially enums and which are
    independent boolean properties. For example, is it valid to have
    ( Normal | AutoResponse ) or ( EMail | WWP ) mask?

    I suggest to split m_properties member into several booleans and enums (packed
    as bitfields for efficiency), but this change is unrelated to encoding issues
    I fix with my patch, so I don't touch them.

    As a sidenote, I presume that frequent use of bitmasks instead of bitfields to
    pack boolean values arises from misunderstanding the reasons bitmasks are
    widely used inside Unix system calls API. Unix system calls API was developed
    when C compiler was very primitive and bitfields was not supported. In modern
    times C (and C++) compilers which do not support bitfields became completely
    extinct, but inertia of thought remained, and use (and abuse) of bitmasks
    became popular development pattern.

    in chatservicetask.cpp, language and encoding are only there for debug
    output and can remain QStrings.

    They are processed the same way as message, so I've just made it uniform way.

    Anyway, if these strings are assumed to have latin1 encoding, they should be
    converted to QString using QString::fromLatin1() to specify explicitly that
    latin1 encoding is used. All conversions to QString which implicitly assume
    latin1 encoding must be eliminated.

    The conversion from the using of Buffer::getWordBlock for the charset 0x02
    handling in messagereceivertask.cpp is wrong as is the appropriate code to
    do the conversion in oscarmessage.cpp. I suggest a setText method in
    ::Message that takes a word pointer and a length so that the
    conversion can be done with QString::fromUcs2.

    First, the code used before my patch

    msg.setText( QString::fromUcs2( message.getWordBlock( messageLength ) ) );

    is incorrect because it leads to memory leak: getWordBlock() method allocates
    array of short integers which is never deallocated after that.

    Second, equivalent code with memory leak fixed, like

    WRD *ucs2 = message.getWordBlock( messageLength );
    msg.setText( QString::fromUcs2( ucs2 ) );
    delete[] ucs2;

    is incorrect as well, because it has memory leak in case comething in line

    msg.setText( QString::fromUcs2( ucs2 ) );

    throws an exception.

    Third, if ::Message stores message in raw format alongside with
    information about encoding, it should do it always. Using ::Message as
    dual-purpose class to store either raw or decoded messages is evil. Lowlevel
    part should not care about conversions between encodings at all.

    And if using ::Message as a raw message storage (like it should be),
    I see no point to convert raw data from QByteArray to array of short integers
    using Buffer::getWordBlock(), than convert it to QString using
    QString::fromUcs2, than convert it again to QByteArray inside ::Message
    using ::Message::setText(). It's equivalent to just copy part of buffer
    into ::Message like I do, but with more overhead.

    Thanks for rewiewing my patch,
    -- Girko, http://www.infoserver.ru/~ol/

    kopete-devel mailing list
    kopete-devel (AT) kde (DOT) org
  • No.16 | | 12098 bytes | |

    Tuesday 03 January 2006 22:21, Girko wrote:
    Hello!

    Sorry for delay of my answer, it was New Year holidays, lots of drinking
    and than lots of sleeping :-)

    Sunday 01 January 2006 18:35, Matt Rogers wrote:
    And now, on to the patch! :)

    First, could you seperate the user details merging from the encoding
    stuff and after you've made modifications to the patch, send two patches
    instead of one? (you can send them in the same email message, of course)

    course, I can (but it will require editing patches manually, because
    "oscarcontact.cpp" file is involved in both patches).

    I will send fixed patch splitted into two parts shortly after this message.

    I committed the UserDetails merging stuff already, since it was quite fine the
    way it is and i was being too picky. I've already sent a modified version of
    the encodings patch because I got a bit impatient and wanted to make sure we
    didn't lose such good work.

    For the UserDetails merging, I think it would be better to use an enum as
    a bitmask rather than using a bunch of bools. Something like this is what
    I have in mind:

    public:
    enum { UserClass = 0x01,
    MemberSince = 0x02,
    Since = 0x04,
    IdleTime = 0x05
    }

    private:
    int m_updatedElements;

    and in UserDetails::merge( UserDetails& ud ):
    {
    if ( m_updatedElements & 0x01 )
    // do stuff

    }

    Hopefully, you get the idea about what I'm going for here. It saves
    allocating a bunch of bools at least, and probably makes for cleaner
    code.

    I don't think so. In contrary, it makes code less clean: I think that

    if ( m_updatedElements & 0x01 )

    and even

    if ( m_updatedElements & UserClass )

    is more difficult to read than

    if ( m_userClassSpecified )

    Moreover, using bitmask is more error-prone. Look at your code example, it

    already contains an error, which is not so easy to find:
    IdleTime = 0x05

    In this case I agree with Thiago Macieira that the best solution is to pack
    boolean attributes using bitfields. This improves memory consumption
    without sacrificing code readability.

    agreed, although i'm not sure i took that route when I committed that part of
    the patch. Have you applied for a KDE SVN account yet? If not, apply for one,
    and then commit when we have the bitfields. :)

    I think overloading an operator so that something like "m_details +=
    details" in the appropriate place to combine the old details with the new
    details would be cool.

    Looks like a beautiful solution on the first glance, but think about the
    following consideration.

    People who read code have intuitive perception that "a += b" is
    conceptually equivalent to "a = a + b". First, there is no operator+ for
    UserDetails. Second, even if we implement operator+ for UserDetails, it
    won't behave symmetric way people expect from operator+, so that "a + b"
    and "b + a" will yield different results!

    Hence, operator+= for UserDetails behaves intuitively only on the surface,
    but in depth it does not meet several intuitive expectations people have.

    May be, I'm overcautious in this case, but I think that operators should be
    overloaded very carefully, after examining all possible pitfalls, to be
    sure it really improves readability and understandability of code, and does
    not give Java proponents another cause to say that operator overloading is
    evil. :-)

    yeah, well, i realized that my idea about having an operator += wasn't such a
    good idea after reading some bits about operator overloading from the C++
    FAQ.

    As for the rest of it, most of it looks ok. The one glaring problem is
    the introduction of Account into the liboscar code. I'd like to keep
    the liboscar code as free of Kopete and KDE specific parts as possible
    (yes, i know i use kdebug) so i'm wondering if it's possible to do all of
    this w/o relying on Account in the liboscar code and then submit a
    new patch.

    I have one idea, but it seems little bit tricky.

    First, let's have an abstract class inside Client class defined like this:

    class CodecFactory
    {
    public:
    virtual void codecForContact( const QString& contactName ) const = 0;
    };

    Second, we modify Client class so that CodecFactory to be used by Client
    instance can be specified either by Client::setCodecFactory( const
    CodecFactory* ) method or by an argument to Client's constructor.

    Third, we either use multiple inheritance to inherit Account from
    Client::CodecFactory and redefine codecForContact() inside Account, or
    define class derived from Client::CodecFactory which redefines
    codecForContact() using backpointer to Account inside.

    What do you think about this idea?

    this would be a good idea, indeed and would keep the seperation. Much better
    than my using ::Settings to store everything.

    various comments:

    Account::defaultCodec should return the native encoding if an
    account specific encoding has not been set by the user. For example, if
    I'm using a german system, the default codec should be that for german
    rather than for latin1.

    Unfortunately, most modern Linux distributions use UTF-8 as the default
    encoding independent of whether default language is German or Russian, so
    QTextCodec::codecForLocale() is useless. There is no point to specify UTF-8
    as the encoding for non-Unicode messages. MS Windows has "Language for
    non-Unicode programs" setting, and I presume that this setting is used by
    official ICQ client for non-Unicode messages. Unfortunately, there is no
    such setting for Linux and other Unix-like systems.

    Moreover, some languages have several different popular non-Unicode
    encodings. For example, most popular 8-bit Russian encoding in Unix world
    is KI-8, but using it as a default for ICQ is meaningless, because most
    ICQ users use Windows, and default Windows encoding for non-Unicode
    applications is CP1251 for Russian language.

    Hence, I don't see simple straightforward way to find best (most popular)
    ICQ encoding for specified language.

    True, and hopefully with the per-account and the per-contact settings, we can
    at least work around it enough so that things work once all the settings are
    correct.

    The change in icqaccount.cpp results in a very very long line. Could you
    split that up so the lines are shorter? (It will also make it easier to
    read)

    Actually, I was avoiding splitting arguments of function calls because I
    don't see such splitting in the existing code and as a result I don't
    understand rules of splitting. For example, I don't know line length limit
    (I see many lines which exceed 80-character line length), and I don't know
    in which position the next argument of function call should start on the
    next line.

    haha, yeah, maintainer's perogative, i guess. i can break my own coding style
    whenever i want. ;)

    I don't understand why you split the encodings out from the rest of the
    message properties in ::Message other than to distinguish between
    encodings and the rest of the possible message properties. Please
    explain.

    I don't see the reason to pack several unrelated enums and booleans inside
    one bitmask. It sacrifices readability for saving several bytes of memory.
    ::Message is short-lived object, which exists only for passing
    messages between highlevel and lowlevel parts. Considering single-threaded
    nature of Kopete, I presume that there are only few instances of
    ::Message exist at any given moment of time, so small extra number of
    bytes is not an issue in this case. Anyway, in order to save bytes,
    bitfields can be used.

    Also, using non-anonymous enum for encoding increases reliability through
    type-safety. It's impossible to use by mistake an arbitrary integer as an
    encoding: this error will be detected at compile time.

    Concerning code readability, it's unclear from reading code which bits of
    bitmask stored in m_properties member are conceptially enums and which are
    independent boolean properties. For example, is it valid to have
    ( Normal | AutoResponse ) or ( EMail | WWP ) mask?

    good points.

    I suggest to split m_properties member into several booleans and enums
    (packed as bitfields for efficiency), but this change is unrelated to
    encoding issues I fix with my patch, so I don't touch them.

    As a sidenote, I presume that frequent use of bitmasks instead of bitfields
    to pack boolean values arises from misunderstanding the reasons bitmasks
    are widely used inside Unix system calls API. Unix system calls API was
    developed when C compiler was very primitive and bitfields was not
    supported. In modern times C (and C++) compilers which do not support
    bitfields became completely extinct, but inertia of thought remained, and
    use (and abuse) of bitmasks became popular development pattern.

    in chatservicetask.cpp, language and encoding are only there for debug
    output and can remain QStrings.

    They are processed the same way as message, so I've just made it uniform
    way.

    Anyway, if these strings are assumed to have latin1 encoding, they should
    be converted to QString using QString::fromLatin1() to specify explicitly
    that latin1 encoding is used. All conversions to QString which implicitly
    assume latin1 encoding must be eliminated.

    The conversion from the using of Buffer::getWordBlock for the charset
    0x02 handling in messagereceivertask.cpp is wrong as is the appropriate
    code to do the conversion in oscarmessage.cpp. I suggest a setText method
    in ::Message that takes a word pointer and a length so that the
    conversion can be done with QString::fromUcs2.

    First, the code used before my patch

    msg.setText( QString::fromUcs2( message.getWordBlock( messageLength ) )
    );

    is incorrect because it leads to memory leak: getWordBlock() method
    allocates array of short integers which is never deallocated after that.

    oops. :(

    Second, equivalent code with memory leak fixed, like

    WRD *ucs2 = message.getWordBlock( messageLength );
    msg.setText( QString::fromUcs2( ucs2 ) );
    delete[] ucs2;

    is incorrect as well, because it has memory leak in case comething in line

    msg.setText( QString::fromUcs2( ucs2 ) );

    throws an exception.

    normally, i would agree with the above (about setText throwing an exception),
    but KDE doesn't use exceptions, and if we have problems allocating memory,
    well, then we're pretty much hopeless as it is. :)

    Third, if ::Message stores message in raw format alongside with
    information about encoding, it should do it always. Using ::Message as
    dual-purpose class to store either raw or decoded messages is evil.
    Lowlevel part should not care about conversions between encodings at all.

    And if using ::Message as a raw message storage (like it should be),
    I see no point to convert raw data from QByteArray to array of short
    integers using Buffer::getWordBlock(), than convert it to QString using
    QString::fromUcs2, than convert it again to QByteArray inside
    ::Message using ::Message::setText(). It's equivalent to just
    copy part of buffer into ::Message like I do, but with more overhead.

    Thanks for rewiewing my patch,
    -- Girko, http://www.infoserver.ru/~ol/

    yeah, after writing my email, I realized all my points are pretty much moot,
    other than the seperation of the lowlevel and the highlevel code, hence the
    reason I committed some things like the UserDetails merging, and sent you a
    revised patch for the encoding stuff. :)

    Thanks again for taking on this beast of a problem.

Re: Patch to fix encoding problems in both incoming andoutgoing Oscar messages.


max 4000 letters.
Your nickname that display:
In order to stop the spam: 6 + 5 =
QUESTION ON "KDE"

EMSDN.COM