Removing the BOM from Outgoing BizTalk Files
August 6, 2013 3 Comments
Today we had an issue with sending a file to an external SFTP client – they did not like the Byte Order Mark (BOM) that was automatically pre-pended to the file by the BizTalk file adapter.
A BOM is a short set of invisible characters that is added to the beginning of a UTF-16 or UTF-8 file to indicate the endianess (or byte order) of a text stream. Among other uses, it helps consuming systems determine which encoding (UTF-8 or UTF-16) has been used in creating the stream. The BOM is optional in most cases, and you typically will not see it unless you use a special text editor that can display ISO-8859-1 or CP1252 characters (WinDiff appears to work well for this).
Here is some text that starts with a UTF-8 byte order mark (BOM); notice the three initial odd-looking characters? Represents the sequence 0xEF 0xBB 0xB in hexadecimal, or 239 187 191 in CP-1252 decimal. You won’t see these non-printable characters in most text editors.
Most modern systems can cope with the presence or non-presence of the BOM (especially in UTF-8 where the BOM is really superfluous), but this client clearly had issues with it. So it was off to Dr. Google to find the quickest solution that hopefully involved no custom code (since we are nearly in the Acceptance Testing phase).
Microsoft’s documentation partially came to the rescue, although some claims in this article do not present all the facts (at least not for BizTalk 2010). For example, the statement “If you use a PassThruReceive pipeline or a PassThruTransmit pipeline, a byte order mark is not appended to a message” is questionable, since we were in fact using the PassThruTransmit pipeline and the BOM was definitely present in the output. In addition, the article exclusively suggests creating a custom pipeline as the solution – but this is only necessary if you are not outputting XML.
Our solution? Simply swap the PassThruTransmit pipeline for another OOTB option, the XmlTransmit pipeline. Doing this gives you access to a pipeline PreserveBOM instance property that will strip the BOM from the message for you if you change it from its default setting of “True” to “False”:
That was it! No code changes, no recompilation, not even a restart – just a simple binding update and the troublesome BOM is now history. Note the difference between the before & after output messages compared here in WinDiff:
BOM’s away!
That is rather interesting about the PassThru adding a BOM. Everything I have seen so far said “nothing is done to the message on a PassThru” pipeline. I wonder if it’s only for XML content?
That’s what I thought too, Matt. In fairness, I have no proof that the mapping in the orchestration that created & sent the message didn’t actually add the BOM at that point – which would exonerate the pipeline of this charge. I’ve toned down my comment appropriately. 🙂
Nice to know that the XML Transmit pipeline allows you to strip the BOM. Although I doubt that the pass through pipeline is guilty as charged in this case. 🙂 Some of the HIPAA EDI schema provided with BizTalk lack a BOM and they are UTF-16. You try to open them with the XML editor in VS and it fails!