In this example the device is exposed to the system as /dev/dri/renderD128, the subject to correction. The profile string, filtering chain output pixel format and H.264 level should match exactly to what the VAAPI context (device pipeline) supports. Here the example is done for Intel Core i5-6200U Mobile with the following capabilities (VLD entries are not implemented in this CPU, and they are left as stubs):

vainfo: VA-API version: 1.20 (libva 2.12.0)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 24.1.0 ()
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            :	VAEntrypointVLD
      VAProfileMPEG2Main              :	VAEntrypointVLD
      VAProfileH264Main               :	VAEntrypointVLD
      VAProfileH264Main               :	VAEntrypointEncSliceLP
      VAProfileH264High               :	VAEntrypointVLD
      VAProfileH264High               :	VAEntrypointEncSliceLP
      VAProfileJPEGBaseline           :	VAEntrypointVLD
      VAProfileJPEGBaseline           :	VAEntrypointEncPicture
      VAProfileH264ConstrainedBaseline:	VAEntrypointVLD
      VAProfileH264ConstrainedBaseline:	VAEntrypointEncSliceLP
      VAProfileVP8Version0_3          :	VAEntrypointVLD
      VAProfileHEVCMain               :	VAEntrypointVLD
ffmpeg -init_hw_device vaapi=vadev:/dev/dri/renderD128 \
    -hwaccel vaapi \
    -hwaccel_device vadev \
    -i <input_file> \
    -filter_hw_device vadev \
    -vf "format=nv12,hwupload" \
    -c:v h264_vaapi \
    -profile:v main \
    -level:v 4.1 \
    -q 4 \
    <output_file>