Check Image Urls Using Python-markdown
On a website I'm creating I'm using Python-Markdown to format news posts. To avoid issues with dead links and HTTP-content-on-HTTPS-page problems I'm requiring editors to upload al
Solution 1:
One approach would be to intercept the <img>
node at a lower level just after Markdown parses and constructs it:
import re
from markdown import Markdown
from markdown.inlinepatterns import ImagePattern, IMAGE_LINK_RE
RE_REMOTEIMG = re.compile('^(http|https):.+')
classCheckImagePattern(ImagePattern):
defhandleMatch(self, m):
node = ImagePattern.handleMatch(self, m)
# check 'src' to ensure it is local
src = node.attrib.get('src')
if src and RE_REMOTEIMG.match(src):
print'ILLEGAL:', m.group(9)
# or alternately you could raise an error immediately# raise ValueError("illegal remote url: %s" % m.group(9))return node
DATA = '''
![Alt text](/path/to/img.jpg)
![Alt text](http://remote.com/path/to/img.jpg)
'''
mk = Markdown()
# patch in the customized image pattern matcher with url checking
mk.inlinePatterns['image_link'] = CheckImagePattern(IMAGE_LINK_RE, mk)
result = mk.convert(DATA)
print result
Output:
ILLEGAL: http://remote.com/path/to/img.jpg<p><imgalt="Alt text"src="/path/to/img.jpg" /><imgalt="Alt text"src="http://remote.com/path/to/img.jpg" /></p>
Solution 2:
Updated with Python 3
and Python-Mardown 3
import re
from markdown import Markdown
from markdown.inlinepatterns import Pattern, IMAGE_LINK_RE
RE_REMOTEIMG = re.compile('^(http|https):.+')
classCheckImagePattern(Pattern):
defhandleMatch(self, m):
node = Pattern.handleMatch(self, m)
# check 'src' to ensure it is local
src = node.attrib.get('src')
if src and RE_REMOTEIMG.match(src):
print'ILLEGAL:', m.group(9)
# or alternately you could raise an error immediately# raise ValueError("illegal remote url: %s" % m.group(9))return node
DATA = '''
![Alt text](/path/to/img.jpg)
![Alt text](http://remote.com/path/to/img.jpg)
'''
mk = Markdown()
# patch in the customized image pattern matcher with url checking
mk.inlinePatterns['image_link'] = CheckImagePattern(IMAGE_LINK_RE, mk)
result = mk.convert(DATA)
print result
Hope it's useful!
Post a Comment for "Check Image Urls Using Python-markdown"