How to convert UTF-8 byte[] to string in C# ?

To convert a UTF-8 encoded byte array to a string in C#, you can use the Encoding.UTF8 class from the System.Text namespace. Below are detailed steps and examples:

Step-by-Step Explanation

Use Encoding.UTF8.GetString()
Convert the byte array directly to a string using the UTF-8 encoder.
Handle Edge Cases
Check for null or empty byte arrays to avoid exceptions.
Optional BOM (Byte Order Mark)
UTF-8 may include a BOM (0xEF, 0xBB, 0xBF), which is optional. The method handles it automatically.
Error Handling
Handle invalid UTF-8 sequences (optional).

Examples

1. Basic Conversion

using System.Text;

byte[] utf8Bytes = { 72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100 }; // "Hello World"
string result = Encoding.UTF8.GetString(utf8Bytes);
Console.WriteLine(result); // Output: Hello World

2. Handle Null/Empty Arrays

byte[] utf8Bytes = null;

// Check for null or empty arrays
string result = utf8Bytes != null ? Encoding.UTF8.GetString(utf8Bytes) : string.Empty;
Console.WriteLine(result);

3. With BOM (Byte Order Mark)

// Byte array with BOM: 0xEF, 0xBB, 0xBF
byte[] utf8BytesWithBOM = { 0xEF, 0xBB, 0xBF, 72, 101, 108, 108, 111 }; // "Hello"

// The BOM is automatically recognized and ignored in the resulting string
string result = Encoding.UTF8.GetString(utf8BytesWithBOM);
Console.WriteLine(result); // Output: Hello

// To explicitly skip the BOM (if present):
if (utf8BytesWithBOM.Length >= 3 && 
    utf8BytesWithBOM[0] == 0xEF && 
    utf8BytesWithBOM[1] == 0xBB && 
    utf8BytesWithBOM[2] == 0xBF)
{
    result = Encoding.UTF8.GetString(utf8BytesWithBOM, 3, utf8BytesWithBOM.Length - 3);
}

4. Handle Invalid UTF-8 Sequences

By default, invalid bytes are replaced with the Unicode replacement character (�). To throw an error instead, use a custom encoder:

// Create an encoder that throws on invalid bytes
var encoder = Encoding.GetEncoding(
    "UTF-8",
    new EncoderExceptionFallback(),
    new DecoderExceptionFallback()
);

byte[] invalidBytes = { 72, 101, 0xFF, 108, 111 }; // Invalid byte 0xFF

try
{
    string result = encoder.GetString(invalidBytes);
}
catch (DecoderFallbackException ex)
{
    Console.WriteLine("Invalid UTF-8 sequence: " + ex.Message);
}

Key Considerations

Namespace: Always include System.Text.
BOM Handling: The BOM is optional and rarely used in UTF-8. Most modern systems omit it.
Performance: For large byte arrays, Encoding.UTF8.GetString() is efficient.
Error Behavior: By default, invalid bytes are replaced. Use DecoderFallback to customize this behavior.

Alternative: Using `Span<byte>` (C# 7.2+)

For memory efficiency with large data:

byte[] utf8Bytes = { 72, 101, 108, 108, 111 };
ReadOnlySpan<byte> byteSpan = utf8Bytes;
string result = Encoding.UTF8.GetString(byteSpan);
Console.WriteLine(result); // Output: Hello

Summary

Use Encoding.UTF8.GetString(byteArray) for basic conversions.
Check for null or empty arrays.
Handle BOM if necessary.
Customize error handling with DecoderFallback.

This approach ensures proper handling of UTF-8 byte arrays in C#.