First and foremost, Python-Markdown is intended to be a python library module used by various projects to convert Markdown syntax into HTML.
To use markdown as a module:
import markdown
html = markdown.markdown(your_text_string)
Python-Markdown provides two public functions (markdown.markdown
and markdown.markdownFromFile
) both of which wrap the
public class markdown.Markdown
. If you’re processing one
document at a time, these functions will serve your needs. However, if you need
to process multiple documents, it may be advantageous to create a single
instance of the markdown.Markdown
class and pass multiple documents through
it. If you do use a single instance though, make sure to call the reset
method appropriately (see below).
markdown.markdown (text [, **kwargs])
¶The following options are available on the markdown.markdown
function:
text
(required): The source unicode string.
Important
Python-Markdown expects Unicode as input (although some simple ASCII strings may work) and returns output as Unicode. Do not pass encoded strings to it! If your input is encoded, (e.g. as UTF-8), it is your responsibility to decode it. For example:
input_file = codecs.open("some_file.txt", mode="r", encoding="utf-8")
text = input_file.read()
html = markdown.markdown(text)
If you want to write the output to disk, you must encode it yourself:
output_file = codecs.open("some_file.html", "w",
encoding="utf-8",
errors="xmlcharrefreplace"
)
output_file.write(html)
extensions
: A list of extensions.
Python-Markdown provides an API for third parties to write extensions to the parser adding their own additions or changes to the syntax. A few commonly used extensions are shipped with the markdown library. See the extension documentation for a list of available extensions.
The list of extensions may contain instances of extensions and/or strings of extension names.
extensions=[MyExtension(), 'path.to.my.ext', 'extra']
When passing in extension instances, each class instance must be a subclass
of markdown.extensions.Extension
and any configuration options should be
defined when initiating the class instance rather than using the
extension_configs keyword. For example:
from markdown.extensions import Extension
class MyExtension(Extension):
# define your extension here...
markdown.markdown(text, extensions=[MyExtension(configs={'option': 'value'}))
If an extension name is provided as a string, the extension must be
importable as a python module on your PYTHONPATH. Python’s dot notation is
supported. Therefore, to import the ‘extra’ extension, one could do
extensions=['markdown.extensions.extra']
However, if no dots are provided
in the string (extensions=['extra']
) Markdown will first look for the
module markdown.extensions.extra
(the built-in extension), then a module
named mdx_extra
(‘mdx_’ will be appended to the beginning of the string)
at the root of your PYTHONPATH.
When loading an extension by name (as a string), you may either pass in configuration settings to the extension using the extension_configs keyword or by appending the settings to the name in the following format:
extensions=['name(option1=value,option2=value)']
Note that there are no quotes or whitespace in the above format, which severely limits how it can be used. For more complex settings, it is suggested that the extension_configs keyword be used or an instance of a class be passed in.
See Also
See the documentation of the Extension API for assistance in creating extensions.
extension_configs
: A dictionary of
configuration settings for extensions.
Any configuration settings will only be passed to extensions loaded by name (as a string). When loading extensions as class instances, pass the configuration settings directly to the class when initializing it.
The dictionary of configuration settings must be in the following format:
extension_configs = {'extension_name_1':
[
('option_1', 'value_1'),
('option_2', 'value_2')
],
'extension_name_2':
[
('option_1', 'value_1')
]
}
See the documentation specific to the extension you are using for help in specifying configuration settings for that extension.
output_format
: Format of output.
Supported formats are:
"xhtml1"
: Outputs XHTML 1.x. Default."xhtml5"
: Outputs XHTML style tags of HTML 5"xhtml"
: Outputs latest supported version of XHTML (currently XHTML 1.1)."html4"
: Outputs HTML 4"html5"
: Outputs HTML style tags of HTML 5"html"
: Outputs latest supported version of HTML (currently HTML 4).The values can be in either lowercase or uppercase.
Warning
It is suggested that the more specific formats (“xhtml1”, “html5”, & “html4”) be used as the more general formats (“xhtml” or “html”) may change in the future if it makes sense at that time.
safe_mode
: Disallow raw html.
If you are using Markdown on a web system which will transform text provided by untrusted users, you may want to use the “safe_mode” option which ensures that the user’s HTML tags are either replaced, removed or escaped. (They can still create links using Markdown syntax.)
The following values are accepted:
False
(Default): Raw HTML is passed through unaltered.
replace
: Replace all HTML blocks with the text assigned to
html_replacement_text
To maintain backward compatibility, setting
safe_mode=True
will have the same effect as safe_mode='replace'
.
To replace raw HTML with something other than the default, do:
md = markdown.Markdown(safe_mode='replace',
html_replacement_text='--RAW HTML NOT ALLOWED--')
remove
: All raw HTML will be completely stripped from the text with
no warning to the author.
escape
: All raw HTML will be escaped and included in the document.
For example, the following source:
Foo <b>bar</b>.
Will result in the following HTML:
<p>Foo <b>bar</b>.</p>
Note
“safe_mode” also alters the default value for the
enable_attributes
option.
See Also
HTML sanitizers (like Bleach) may provide a better solution for dealing with markdown text submitted by untrusted users. That way, both the HTML generated by Markdown and user submited raw HTML are fully sanitized.
import markdown
import bleach
html = bleach.clean(markdown.markdown(evil_text))
html_replacement_text
: Text used when
safe_mode is set to replace
. Defaults to [HTML_REMOVED]
.
tab_length
: Length of tabs in the source. Default: 4
enable_attributes
: Enable the conversion of
attributes. Defaults to True
, unless safe_mode
is enabled,
in which case the default is False
.
Note
safe_mode
only overrides the default. If enable_attributes
is explicitly set, the explicit value is used regardless of safe_mode
.
However, this could potentially allow an untrusted user to inject
JavaScript into your documents.
smart_emphasis
: Treat _connected_words_
intelligently Default: True
lazy_ol
: Ignore number of first item of ordered lists.
Default: True
Given the following list:
4. Apples
5. Oranges
6. Pears
By default markdown will ignore the fact the the first line started
with item number “4” and the HTML list will start with a number “1”.
If lazy_ol
is set to False
, then markdown will output the following
HTML:
<ol>
<li start="4">Apples</li>
<li>Oranges</li>
<li>Pears</li>
</ol>
markdown.markdownFromFile (**kwargs)
¶With a few exceptions, markdown.markdownFromFile
accepts the same options as
markdown.markdown
. It does not accept a text
(or Unicode) string.
Instead, it accepts the following required options:
input
(required): The source text file.
input
may be set to one of three options:
None
(default) which will read from stdin
.output
: The target which output is written to.
output
may be set to one of three options:
None
(default) which will write to stdout
.encoding
: The encoding of the source text file. Defaults
to “utf-8”. The same encoding will always be used for input and output.
The ‘xmlcharrefreplace’ error handler is used when encoding the output.
Note
This is the only place that decoding and encoding of unicode takes place in Python-Markdown. If this rather naive solution does not meet your specific needs, it is suggested that you write your own code to handle your encoding/decoding needs.
markdown.Markdown ([**kwargs])
¶The same options are available when initializing the markdown.Markdown
class
as on the markdown.markdown
function, except that the class does
not accept a source text string on initialization. Rather, the source text
string must be passed to one of two instance methods:
Markdown.convert(source)
The source
text must meet the same requirements as the text
argument of the markdown.markdown
function.
You should also use this method if you want to process multiple strings without creating a new instance of the class for each string.
md = markdown.Markdown()
html1 = md.convert(text1)
html2 = md.convert(text2)
Depending on which options and/or extensions are being used, the parser may
need its state reset between each call to convert
, otherwise performance
can degrade drastically:
html1 = md.convert(text1)
md.reset()
html2 = md.convert(text2)
To make this easier, you can also chain calls to reset
together:
html3 = md.reset().convert(text3)
Markdown.convertFile(**kwargs)
The arguments of this method are identical to the arguments of the same
name on the markdown.markdownFromFile
function (input
,
output
, and encoding
). As with the
convert
method, this method should be used to
process multiple files without creating a new instance of the class for
each document. State may need to be reset
between each call to
convertFile
as is the case with convert
.