Similar Threads:
1.HTML/XML character encoding getting changed
I have a software application I've written called PowerBlog (PowerBlog.net)
that takes the editing capability of the Internet Explorer WebBrowser
control (essentially a DHTMLTextBox), extracts the user-typed HTML, assigns
it as an XML node's InnerText property (using C#: System.Xml.XmlDocument
obj; obj.InnerText = myHTML). Then I later get the InnerText as a string and
write to disk.
When this text is displayed in a web browser, special characters that are
beyond the standard ASCII charset are not rendered correctly. Frequently, I
have copied text from a web site, pasted in the DHTMLTextbox, saved, and
published it, and my published output has corrupt characters. However, prior
to publishing, when previewing my document it looks fine -- it is only when
it is published (extracted, written to disk, uploaded to the server via FTP,
downloaded via HTTP) that the corruption occurs.
There are several places where this problem could be occurring, and I don't
know how to figure it out.
- A "design feature" in the XmlNode's InnerText property that converts the
##; encoding into an actual character.
- An encoding flaw when written to disk (currently I'm using the default,
UTF-8 I guess).
- A flaw in the FTP client class where the file is being corrupted during
upload (I think I'm using binary upload format but perhaps I should
double-check).
- A flaw in IIS (no known strange settings exist)
I still need to do some homework on this but I was wondering if anyone has
any bright ideas before I continue searching this out?
Thanks,
Jon
2.Now getting errors when parsing XML doc-invalid characters
I am using a VBA application that uses MSXML 4.0 Service Pack 2
but all of the sudden as of yesterday, I am now getting errors when the
parser finds an invalid character such as a TM, Copy right symbol etc.
These characters always existed and were being read fine by the parser. I
assume it is one of the recent updates to IE or XP, but not sure which one,
maybe: kb922760....BUT... I removed this one and a few recent ones and I
still get the same error:
"an invalid character was found in text content"
is this a bug? if not, how can I get around it in the VBA code without
having to redo all the xml files because I cannot.
here is my info:
the xml file has this at the top:
<?xml version="1.0" encoding="iso-8859-1"?>
using this code in ms access:
------------------------------
Set test = New DOMDocument
test.async = False
test.validateOnParse = False
theurl = 'the url to the xml file'
Dim oXMLHTTP As MSXML2.ServerXMLHTTP40
Set oXMLHTTP = New MSXML2.ServerXMLHTTP40
With oXMLHTTP
.setTimeouts 30000, 30000, 120000, 300000
.Open "POST", theurl, , "username", "password"
.setRequestHeader "CONTENT-TYPE", "application/x-
www-form-urlencoded"
.send
strresult = .responseText
End With
strfile = strresult
test.loadXML strfile
--------------------
Thanks
3.Latin characteres within XMLA
Hi, we are using SSAS 2005 SP 1 though msmdpump, but when text values contain
latin characters (such as: ) the XML contains double question marks "??".
We tried connecting from Excel through OLEDB for Analysis Services, and it
works fine.
Thanks
4.VB strings, MSXML, Latin 9, and the Euro
Hi,
I have an Excel application which is receiving XML via MSXML4, encoded
with charset=ISO-8859-15. The XML text may contain the Euro symbol. I
want to place the text into a VB string, and sometime later, a cell.
If I decode the XML text into a byte array, I can see that the Euro
has been correctly decoded using the Windows charset CP1252 to the
value 128 (0x80).
However if I create an empty string and append the VB Euro symbol -
Dim str as String
Dim b() as Byte
str = Chr(128)
b = str
msgbox "high byte=" & b(0) & ", low byte=" & b(1)
I find that the Euro should be encoded with high and low values of 177
& 32. So when I create a VB string from the XML text, the Euro
character is completely wrong.
I'd really appreciate any insight into what's happening here, and how
to correctly specify the charset when I create the VB string, or
alternatively, the correct character set to use on the XML encoding
side.
Many thanks,
Rod
5.Parsing XML with numeric entities outside Latin 1
Hi,
I have an XML file with this text in it:
<price>10 €</price>
(just an example)
My input and output encodings are Latin 1 (ISO-8859-1).
When PHP parses it, the characterdata function outputs
10 ?
I would like it to output
10 €
i.e. just leave the entity as it is, as the character can't be
represented directly in Latin 1.
Is this possible ?
Thank you.
6. VB strings, MSXML, Latin 9, and the Euro
7. polish font - ISO 8859-2 (Latin 2) - from external xml file
8. Problem getting nested xml nodes with C#, .NET 2.0 (XmlDocument)