To convert a UTF-8 encoded byte array to a string
in C#, you can use the Encoding.UTF8
class from the System.Text
namespace. Below are detailed steps and examples:
Step-by-Step Explanation
- Use
Encoding.UTF8.GetString()
Convert the byte array directly to a string using the UTF-8 encoder. - Handle Edge Cases
Check fornull
or empty byte arrays to avoid exceptions. - Optional BOM (Byte Order Mark)
UTF-8 may include a BOM (0xEF, 0xBB, 0xBF
), which is optional. The method handles it automatically. - Error Handling
Handle invalid UTF-8 sequences (optional).
Examples
1. Basic Conversion
using System.Text;
byte[] utf8Bytes = { 72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100 }; // "Hello World"
string result = Encoding.UTF8.GetString(utf8Bytes);
Console.WriteLine(result); // Output: Hello World
2. Handle Null/Empty Arrays
byte[] utf8Bytes = null;
// Check for null or empty arrays
string result = utf8Bytes != null ? Encoding.UTF8.GetString(utf8Bytes) : string.Empty;
Console.WriteLine(result);
3. With BOM (Byte Order Mark)
// Byte array with BOM: 0xEF, 0xBB, 0xBF
byte[] utf8BytesWithBOM = { 0xEF, 0xBB, 0xBF, 72, 101, 108, 108, 111 }; // "Hello"
// The BOM is automatically recognized and ignored in the resulting string
string result = Encoding.UTF8.GetString(utf8BytesWithBOM);
Console.WriteLine(result); // Output: Hello
// To explicitly skip the BOM (if present):
if (utf8BytesWithBOM.Length >= 3 &&
utf8BytesWithBOM[0] == 0xEF &&
utf8BytesWithBOM[1] == 0xBB &&
utf8BytesWithBOM[2] == 0xBF)
{
result = Encoding.UTF8.GetString(utf8BytesWithBOM, 3, utf8BytesWithBOM.Length - 3);
}
4. Handle Invalid UTF-8 Sequences
By default, invalid bytes are replaced with the Unicode replacement character (�
). To throw an error instead, use a custom encoder:
// Create an encoder that throws on invalid bytes
var encoder = Encoding.GetEncoding(
"UTF-8",
new EncoderExceptionFallback(),
new DecoderExceptionFallback()
);
byte[] invalidBytes = { 72, 101, 0xFF, 108, 111 }; // Invalid byte 0xFF
try
{
string result = encoder.GetString(invalidBytes);
}
catch (DecoderFallbackException ex)
{
Console.WriteLine("Invalid UTF-8 sequence: " + ex.Message);
}
Key Considerations
- Namespace: Always include
System.Text
. - BOM Handling: The BOM is optional and rarely used in UTF-8. Most modern systems omit it.
- Performance: For large byte arrays,
Encoding.UTF8.GetString()
is efficient. - Error Behavior: By default, invalid bytes are replaced. Use
DecoderFallback
to customize this behavior.
Alternative: Using Span<byte>
(C# 7.2+)
For memory efficiency with large data:
byte[] utf8Bytes = { 72, 101, 108, 108, 111 };
ReadOnlySpan<byte> byteSpan = utf8Bytes;
string result = Encoding.UTF8.GetString(byteSpan);
Console.WriteLine(result); // Output: Hello
Summary
- Use
Encoding.UTF8.GetString(byteArray)
for basic conversions. - Check for
null
or empty arrays. - Handle BOM if necessary.
- Customize error handling with
DecoderFallback
.
This approach ensures proper handling of UTF-8 byte arrays in C#.