New In

Connecting to databases using JDBC

Connecting Stata with databases has gotten even easier. jdbc allows us to exchange data with some of the most popular database vendors such as Oracle, MySQL, Amazon Redshift, Snowflake, Microsoft SQL Server, and much more. What’s great about jdbc is that it’s a cross-platform solution, so our JDBC setup works the same way for Windows, Mac, and Unix systems. Once you install a JDBC driver, that driver and your Stata code are all you need to switch from, say, your Mac laptop to your company’s Windows cloud systems.

Highlights

  • Access data from many databases, including Oracle, MySQL, Amazon Redshift, Snowflake, Microsoft SQL Server, and more
  • Completely cross-platform compatible
  • Load an entire database table into Stata. or use a SQL SELECT to just load specific columns from a table into Stata
  • Insert all your variables into a database table, or insert just a subset of your data
  • Execute SQL statements from Stata
  • Store connection settings as a data source name (DSN)

Let’s see it work

We have email data stored on Amazon Web Services in a Redshift cluster, and we need to load these data into Stata. We first log in to AWS and go to the Amazon Redshift configuration page to download the correct JDBC driver and get the correct connection information. We then place the downloaded JDBC JAR file along our Stata adopath. Now in the Stata Do-file Editor, we store our connection information by typing

. local jar "redshift-jdbc42-2.0.0.0.jar"
. local driverc "com.amazon.redshift.jdbc42.Driver"
. local url "jdbc:redshift://redshift-cluster-1.cziajbxjzi3e.us-west-2.redshift.amazonaws.com:5439/emails"
. local user "admin"
. local pass "secret"

. jdbc connect,  jar("`jar'") driverclass("`driverc'") url("`url'")
        user("`user'") password("`pass'")

If these database settings need to be used by others or you just want to make remembering them easier, we can store them by typing

. local jar "redshift-jdbc42-2.0.0.0.jar"
. local driverc "com.amazon.redshift.jdbc42.Driver"
. local url "jdbc:redshift://redshift-cluster-1.cziajbxjzi3e.us-west-2.redshift.amazonaws.com:5439/emails"
. local user "admin"
. local pass "secret"

. jdbc add MyRed,  jar("`jar'") driverclass("`driverc'") url("`url'")
        user("`user'") password("`pass'")

We can now add the above commands to profile.do to save these connection settings in between Stata sessions, and we now can connect to our Redshift database by typing

. jdbc connect MyRed

To see what tables are availiable to load from our connection, we type

. jdbc showtables

Database: emails
Tables
category
response_info
employees

We can describe a table by typing

. jdbc describe response_info

Table: response_info
Column name Column type
id BIGINT UNSIGNED filename VARCHAR category_id BIGINT UNSIGNED employee_id BIGINT UNSIGNED datein TIMESTAMP dateout DATE screendate TIMESTAMP rid TEXT keywords TEXT assigntime TIMESTAMP resptime TIMESTAMP timeadded TIMESTAMP sversion DOUBLE correct BIT timetouched TIMESTAMP timemailed TIMESTAMP

To load the data, we type

. jdbc load, table("response_info") clear
(128 observations loaded)

Now we have a Stata dataset and can perform our analysis!