org.apache.poi.util.StringUtil

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

org.apache.poi.util.StringUtil

Nick Burch
Hi All

It has been suggested (in Bugzilla) that my PowerPoint code's
util.TextMunger class is largely a duplicate of util.StringUtil.

However, I'm really struggling to figure out exactly what that class does.
Comments like "write compressed unicode" don't really explain much...

Could someone perhaps tell me if there are any methods to do the
following?

* Take little endian unicode bytes, and return a string
* Take a string, and return little endian unicode bytes
* Take a string, and return the closest approximation in US-ASCII bytes
* Take a string, try to convert it US-ASCII bytes, and either return the
   bytes or indicate (exception, null return etc) that it couldn't be
   done?

I'll happily do a patch the javadocs for the methods I end up using, once
I know what they do!

Thanks
Nick


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta POI Project: http://jakarta.apache.org/poi/

Reply | Threaded
Open this post in threaded view
|

Re: org.apache.poi.util.StringUtil

Avik Sengupta
On Fri, 2005-05-13 at 16:35 +0100, Nick Burch wrote:
> Hi All
>
> It has been suggested (in Bugzilla) that my PowerPoint code's
> util.TextMunger class is largely a duplicate of util.StringUtil.
>
Since i did the suggesting, I suppose it behoves me to reply :). But let
me say that I haven't looked very closely at what you require, it just
looked similar.

> However, I'm really struggling to figure out exactly what that class does.
> Comments like "write compressed unicode" don't really explain much...
>
> Could someone perhaps tell me if there are any methods to do the
> following?
>
> * Take little endian unicode bytes, and return a string
public static String getFromUnicodeLE(
                final byte[] string,
                final int offset,
                final int len)

The javadoc is completely off! Also, I am not sure if the method that
takes only the byte array is correct... I think we mostly use the above
method.


> * Take a string, and return little endian unicode bytes
public static void putUnicodeLE(
                final String input,
                final byte[] output,
                final int offset)
the output is not returned, but put into the byte array.

> * Take a string, and return the closest approximation in US-ASCII bytes
?? What's closest? taking only the low bytes? I dont think there's
anything that does that (there were, but they were bugfixed out :)

> * Take a string, try to convert it US-ASCII bytes, and either return the
>    bytes or indicate (exception, null return etc) that it couldn't be
>    done?

public static boolean isUnicodeString(final String value)  does the
checking, and returns true of false.

public static void putCompressedUnicode(
                final String input,
                final byte[] output,
                final int offset)

converts to a US-ASCII byte array, or throws an java.lang.InternalError

> I'll happily do a patch the javadocs for the methods I end up using, once
> I know what they do!
Thanks! the term Compressed/Uncompressed unicode is an unfortunate
Excel'ism that's got into our code.


Hope that helps. I'm pretty sure the above is correct, but...

Shout if you need anything else.

Regards
-
Avik



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta POI Project: http://jakarta.apache.org/poi/

Reply | Threaded
Open this post in threaded view
|

Re: org.apache.poi.util.StringUtil

Nick Burch
On Fri, 13 May 2005, Avik Sengupta wrote:
> On Fri, 2005-05-13 at 16:35 +0100, Nick Burch wrote:
> > * Take a string, and return the closest approximation in US-ASCII bytes
>
> ?? What's closest? taking only the low bytes? I dont think there's
> anything that does that (there were, but they were bugfixed out :)

It's in the CurrentUserAtom. For compatibility (it appears), it stores the
best attempt at the last edit user in 8 bit, then stores the unicde
version a little bit later

> > I'll happily do a patch the javadocs for the methods I end up using, once
> > I know what they do!
>
> Thanks! the term Compressed/Uncompressed unicode is an unfortunate
> Excel'ism that's got into our code.

Hopefully the patch I added yesterday will cover it

Nick


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
Mailing List:    http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta POI Project: http://jakarta.apache.org/poi/