Regex substitution with unicode in Haskell
While remaking the site I noticed my automatic embedding of bare youtube links sometimes didn’t work. The culprit was unicode in the document which the regex library couldn’t handle.
Apparently while this would be supported by default by almost all modern languages it’s not the case with Haskell. I tried several regex libraries which either didn’t support substitutions or couldn’t handle unicode properly.
As usual others have had this before me. This is the most elegant working solution I could find:
-- Find and replace bare youtube links separated by <p></p>.
RH.gsub rx replace txt
where
rx = RL.compile "<p>\\s*https?://www\\.youtube\\.com/watch\\?v=([A-Za-z0-9_-]+)\\s*</p>"
[RL.utf8]
replace = \[g] -> "<div class=\"video-wrapper\">\
\<div
Which is to say it’s not very elegant but it does get the job done.