I thought I'd start by doing a few documents that test the XML layer of a parser. The following feeds all have identical XML information conveyed in slightly different encodings. If a parser considers them to be different then it is buggy.
If there is such a bug then it probably lies in the XML parser, rather than the RDF or RSS parser. However it is possible that the XML parser is designed to inform you of the differences between these feeds, in which case the bug is in the RSS parser using that information, rather than the "plain" information that is relevant.
The correct handling varies according to your use case. An RSS reader should either receive identical information from the XML parser, or as a work-around correct for the differences automatically. An RSS validator may give warnings for some of these cases (except feed 1, which is pretty much "canonical") but should not consider any of these to be in error.
You can download the feed documents individually here, or as a single .zip file.
| Feed | Description |
|---|---|
| First Feed | Pretty standard RSS feed. UTF-8 encoded, uses same namespace prefixes as examples in the Spec. |
| Second Feed | As is common uses character encoding for character outside of ASCII range. |
| Third Feed | UTF-16 encoded (little-endian) |
| Fourth Feed | UTF-16 encoded (big-endian) |
| Fifth Feed | Namespace prefixes differing from those used in the examples in the Spec. |
| Sixth Feed | Has some of the text in a CDATA block. And begins with a UTF-8 BOM (three-byte marking of byte order that is unnecessary with UTF-8 but which should be prepared for, the bytes will be EF, BB and BF - a UTF-8 encoding of U+FEFF). |
| Seventh Feed | Like the second feed this uses a character encoding, but it uses the hexadecimal rather than decimal form. |
| Eighth Feed | Uses various style of character encoding for characters within the ASCII range. |
| Ninth Feed | Has namespace declarations on places other than root, and some redeclaration of namespaces. |