Listen to

arXiv PDFsnewsletterslongform

you don’t have

time to read

FWD turns your favorite newsletters & papers into podcast-quality episodes.

Add custom content with our 1-week free trial.

Listen to arXiv papers
Apple PodcastsPocket CastsOvercast
Apple PodcastsPocket CastsOvercast

+ more

Finish your reading. Every time.

Listen to papers on-the-go, in your spare time.

Read + Listen
Estimated length
19 mins

Attention is All You Need

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder.

lrate=dmodel0.5min(step_num0.5,step_numwarmup_steps1.5)𝑙𝑟𝑎𝑡𝑒superscriptsubscript𝑑model0.5𝑠𝑡𝑒𝑝_𝑛𝑢superscript𝑚0.5𝑠𝑡𝑒𝑝_𝑛𝑢𝑚𝑤𝑎𝑟𝑚𝑢𝑝_𝑠𝑡𝑒𝑝superscript𝑠1.5italic_l italic_r italic_a italic_t italic_e = italic_d start_POSTSUBSCRIPT model end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT ⋅ roman_min ( italic_s italic_t italic_e italic_p _ italic_n italic_u italic_m start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT , italic_s italic_t italic_e italic_p _ italic_n italic_u italic_m ⋅ italic_w italic_a italic_r italic_m italic_u italic_p _ italic_s italic_t italic_e italic_p italic_s start_POSTSUPERSCRIPT - 1.5 end_POSTSUPERSCRIPT )
arXiv paper graphics

Beautiful voices + formatting

FWD uses the best voices, period. No more robots.

Follow any topic or blog
Track everything in your field.
RSS link
Series name
arXiv Computer Science
Add this feed

Add any RSS or Atom feed.

Get the latest papers, news, or blog posts. FWD beams the audio + text to your pod app.

Your Podcast App
A custom feed for you!
Attention Is All You Need - Vaswani et al. 2017
Generating discrete data - Hinton et al. 2021
Scaling Forward Gradient With Local Losses - Kornblith
Testing GLOM's ability to infer wholes from ambiguous parts
Hyperparameter-Free Approach for Bayes Risk Decoding
Evolution of urban areas and land surface temperature
A Primer on Temporal Graph Learning
DNA Structure - Watson & Crick

Listen in your favorite podcast app

FWD works with apps that can add custom feeds (nearly all but Spotify). Read along in the show notes or our website.

The RSS Listener.

  • Listen on most podcast apps

    FWD works with nearly every podcast app except Spotify. (Our favorite is PocketCasts)

  • Integrate dozens of feeds

    Never miss a banger blog post.

  • Get the audio right when authors hit publish

    FWD synthesizes your text within seconds.

Try FWD risk-free

Get unlimited listening for 7 days.

$11.99/mo or $89/yr after that. Cancel anytime.

(No strings attached refund for the first trial week.)