Importing Old Blog Posts with Python
I believe that this world deserves to see the literary treasure trove of posts that disappeared when my old sandfly.net.nz blog fell offline due to lack of internal and external interest.
I kept a backup of the database, and thanks to a quick piece of Python I now have the text of those posts in a somewhat tractable form. Much of what I wrote is either out-of-date or just rubbish, but I plan on moving at least a few representative posts over to this new system.
Ripping out the articles from my WordPress database backup turned out to be easier than I had anticipated. Python has some good libraries for accessing MySQL and the WordPress schema is pretty straight-forward.
Incidentally, this post seems like as good as place as any to try out the code-formatting markdown syntax, so here is the python script I whipped up to do the job.
# Export files from a wordpress database to a directory structure with # formatted gensite-style files import os import pymysql import files import argparse import datetime def export(host, port, user, password, database, output_dir, author): results =  connection = pymysql.connect(host=host, port=port, user=user, password=password, db=database) try: with connection.cursor() as c: c.execute("select post_title, post_content, post_modified_gmt from wp_posts where post_type='post'") while True: result = c.fetchone() if result == None: break if (result == None): continue title = result content = result.replace("\r\n", "\n") note = " ->[This post was automatically imported from my old sandfly.net.nz blog. It may look a little weird since it was not originally written for this format.]" pos = content.find(".") content = content[:pos] + note + content[pos:] t = result.timetuple() p = files.create_new_article(output_dir, title, author, t, initial_contents=content) print(p, " created") finally: connection.close() if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("host", help="DB Server Host") parser.add_argument("port", type=int, help="DB Service Port") parser.add_argument("user", help="User name") parser.add_argument("password", help="Password") parser.add_argument("database", help="Database") parser.add_argument("dest", help="Destination directory") parser.add_argument("author", help="Post author") args = parser.parse_args() export(args.host, args.port, args.user, args.password, args.database, args.dest, args.author)
That did the bulk of the work. There is still some manual fiddling with each file before gensite will produce good output. I have all the images in a separate folder, but they need to be manually copied together, plus some of the formatting looks weird with the new stylesheet - some judicious on-the-fly editing is sometimes required.
That said, it is still a quick job to import one of my old articles and I plan on bringing most of the archive back online in dribs and drabs over the next month or so.