PostgreSQL and Jupyter notebook

Franck Pachot
4 min readApr 28, 2019

Here is a little test of Jupyter Notebook to access a PostgreSQL database with very simple installation, thanks to Anaconda. I did it on Windows 10 but the same simplicity is on Linux and Mac.

Anaconda

I’ll use Anaconda to install the required components

Conda is an open source package management system and environment management system that runs on Windows, macOS and Linux. Conda quickly installs, runs and updates packages and their dependencies.

https://www.anaconda.com/distribution/#download-section

Installation

The install is very easy — just go to https://www.anaconda.com/ and download for your environment. Then, manage everything from the Anaconda Navigator. I choose the Python3 64-bit version for Windows.

I used all defaults, the installation directory being in the %USERPROFILE% home directory (like C:\Users\Franck\Anaconda3).

This installs a few shortcuts in the Start Menu, such as the Anaconda Prompt (the command line where you can run conda, with all environment set) in CMD and PowerShell version, or the Jupyter notebook.

You can run everything from the Anaconda Navigator, for example, get the command line with all the environment set:

Or you can simply run the Anaconda Prompt in the Start Menu which is a shortcut for:

%windir%\System32\cmd.exe "/K" C:\Users\Franck\Anaconda3\Scripts\activate.bat C:\Users\Franck\Anaconda3

Now, from this prompt, I’ll install a few additional packages

IPython SQL

ipython-sql introduces a %sql (or %%sql) magic to your notebook allowing you to connect to a database, using SQLAlchemy connect strings, then issue SQL commands within IPython or IPython Notebook.

Here is how to install it from the Anaconda Prompt:

conda install -y -c conda-forge ipython-sql

PostgreSQL

I’ll run the postgres server directly in this environment:

conda install -y -c conda-forge postgresql

The link between ipython-sql and the postgresql API is done by psycopg2:

Psycopg is the most popular PostgreSQL adapter for the Python programming language. Its main features are the complete implementation of the Python DB API 2.0 specification and the thread safety.

conda install -y -c anaconda psycopg2

I also install Pgspecial to run the ‘backslash’ commands like in psql

conda install -y -c conda-forge pgspecial

Create a database

From the Anaconda Prompt, I create my PostgreSQL database in C:\Anaconda\pgdata:

set PGDATA=C:\Anaconda\pgdata
mkdir %PGDATA%
pg_ctl initdb
pg_ctl start
psql -c "create database DEMO;" postgres

I’m now ready to run some SQL from the Jupyter notebook and do not need the Anaconda Prompt anymore (I can run shell commands from a notebook).

Jupyter Notebook

I can start Jupyter from the Anaconda Navigator, but it defaults to the %USERPROFILE% directory.

I prefer to change the Jupyter Notebook shortcut (right-click-more-open file location-properties) to replace USERPROFILE with my C:\Anaconda directory which will be where I’ll create notebooks:

C:\Users\Franck\Anaconda3\python.exe C:\Users\Franck\Anaconda3\cwp.py C:\Users\Franck\Anaconda3 C:\Users\Franck\Anaconda3\python.exe C:\Users\Franck\Anaconda3\Scripts\jupyter-notebook-script.py "C:\Anaconda/"

Or simply run it from the Anaconda Prompt:

jupyter.exe notebook --notebook-dir=C:\Anaconda

This runs Jupyter and opens it in my browser. I create a new notebook with New-Python3:

I load the iPython SQL extension:

%load_ext sql

connect to the DEMO database

%sql postgresql://localhost/demo

and I can run some SQL statements, like:

%sql select version()

But I’ll not put more commands in this blog post, because that’s the main advantage of a Jupyter Notebook: show the commands, the output, and some comments.

GitHub Gist and Medium

I’ve uploaded my notebook on GitHub Gist for easy sharing: Blog20190428-postgresql-and-jupyter-notebook-on-windows.ipynb

GitHub display it correctly, and you can download it to test on your environment. And Medium seems to embed it in a very way:

https://gist.github.com/FranckPachot/31fd2450506a60ae8698b066e0876de9

🤔 Now thinking about it, most of my blog posts contain some code, some output, and titles and comments around them… And I try to build the examples with scripts that can be re-run easily. That’s exactly the goal of the notebook, without the risk of copy/paste error and easy to re-run on a newer version. I’ll try this idea but don’t worry, I’ll link from Medium so continue to follow here and comment on Twitter.

--

--

Franck Pachot

Developer Advocate for YugabyteDB (Open-Source, PostgreSQL-compatible Distributed SQL Database. Oracle Certified Master and AWS Data Hero.