You can use the a11y-media-player web component for audio. Here’s how:

<div data-cutom-element="a11y-media-player" 
    media-title="Example Video" 
    accent-color="yellow" 
    dark 
    linkable 
    crossorigin="anonymous">
    <audio>
        <source src="//www.personal.psu.edu/lnm105/bueller.mp3" type="audio/mp3" />
    </audio>
</div>

<div data-cutom-element="a11y-media-player"
    media-title="Example Video" 
    accent-color="indigo" 
    style="font-size: 12px;" linkable crossorigin="anonymous">
    <audio>
        <source src="//www.personal.psu.edu/lnm105/bueller.mp3" type="audio/mp3" />
        <track src="//sites.psu.edu/webcomponents/files/2020/07/bueller.vtt" 
            label="English" 
            kind="subtitles" 
            srclang="en" 
            default/>
    </audio>
</divr>

Player requires a local copy of captions for the caption/transcript features. If you have a captions VTT file, upload it to your Sites media to avoid issues with CORS (cross-site restrictions).

Learn more about how to customize it at https://webcomponents.psu.edu.