SqlDBClient API Reference

Module contents

Sql DB Client is a Python interface for interacting with a database.

Its main goal is to provide a Python-based alternative to basic database client software applications (e.g. DBeaver), especially in terms of executing SQL queries. This package mostly aims at SQL scripts executing since other types of database related activities (e.g. database navigation) can be done more conveniently with specifically designed graphical UI.

Based on powerful Python packages such as sqlalchemy, pandas and sqlparse, it provides easy-to-use interface for executing SQL code along with other additional functionalities:

keeping track of all executed queries, their execution information and results
parsing SQL queries (e.g. automatically adding LIMIT clause to prevent memory overflow)
performing transaction by simply using with operator

sqldbclient is especially helpful for data analysts and engineers who are used to work with Python and its packages inside Jupyter Notebook environment, since it’s meant for an interactive use with the goal of analyzing, visualizing and interpreting data. Note that a SQL query result will be shown and saved as a pandas DataFrame object.

The module is compatible with Python 3.6+ and released under the terms of the MIT License.

Visit the project page at https://github.com/YuriyKozhev/SqlDbClient for further information about this project.

Subpackages

sqldbclient.db_inspector

Note

This module is under development, and currently provides minimal functionality.

Provides a wrapper around sqlalchemy inspect function.

Apart from standard sqlalchemy.engine.reflection.Inspector methods, a DBInspector instance has the following functionalities:

creates text representation of table columns

provides get_views method to get consistent result throughout different sqlalchemy versions

sqldbclient.db_inspector.db_inspector.inspect(*args, **kwargs) → DBInspector[source]: Wrapper around sqlalchemy inspect function, that adds custom methods.

class sqldbclient.db_inspector.db_inspector.DBInspector(bind)[source]

Bases: Inspector

get_columns_repr(table: str, schema: str | None = None) → str[source]

Constructs string with structured information about table columns and their data types.

Parameters:

table – table name
schema – schema name

get_views(schema: str | None = None) → List[str][source]

Wrapping around sqlalchemy Inspector get_view_names function, and since version 2.0.0 get_materialized_view_names_function.

Parameters:: schema – schema in database
Returns:: list of view names

print_columns(table: str, schema: str | None = None) → None[source]

Prints structured information about table columns and their data types.

Parameters:

table – table name
schema – schema name

sqldbclient.dialects.postgresql

SqlViewFactory collects all available data about a regular or materialized view and all its dependencies into a Python View object. SqlViewMaterializer applies changes made to a View instance to a corresponding database object and all its dependencies. Together, SqlViewFactory and SqlViewMaterializer provide a tool which helps redefine a view in a database without the need of manually dropping it and its dependencies and then recreating them all. It also takes care of all the permissions that recreated objects had, that is the permissions will be automatically restored along with the view and its dependencies. Note that all the necessary steps will be executed in a separate transaction, which ensures that the whole operation either will be completed fully or will not be done at all.

Warning

‘INSTEAD OF’ view triggers are not supported yet and will not be automatically restored during view recreation.

class sqldbclient.dialects.postgresql.SqlViewFactory(view_name: str, view_schema: str, sql_executor: SqlExecutor)[source]

Bases: object

Factory to create View objects, which store all information obout them to be able to fully restore them in database, if necessary.

create() → View[source]

Creates View object with all necessary information.

Returns:: View object

class sqldbclient.dialects.postgresql.SqlViewMaterializerUtils(view: View, sql_executor: SqlExecutor)[source]

Bases: object

Class that performs standard Postgres database actions, such as setting owner, granting privileges, dropping and creating objects and indices, and refreshing materialized views.

copy_privileges_to(obj: View)[source]: “Sets privileges, that is granted to one object, to another

create() → None[source]: “Creates database object

create_indexes() → None[source]: Creates indexes

drop() → None[source]: Drops database object

drop_indexes() → None[source]: Drops indexes

refresh() → None[source]: Refreshes materialized view

restore() → None[source]: Fully restores object in database

set_descriptions() → None[source]: Sets description

set_owner() → None[source]: Sets owner

set_privileges() → None[source]: Grants privileges

class sqldbclient.dialects.postgresql.SqlViewMaterializer(view: View, sql_executor: SqlExecutor)[source]

Bases: object

Class that is used to materialize in database changes that were made to corresponding View object.

materialize() → None[source]: Materializes in database changes that were made to corresponding View object.

class sqldbclient.dialects.postgresql.grant_access(object_name: str, object_schema: str, user_name: str, sql_executor: SqlExecutor, privilege: str = 'SELECT')[source]

Bases:

Grants privilege on object to user

Parameters:

object_name – object name in database
object_schema – object schema in database
user_name – user_name in database
sql_executor – instance of SqlExecutor
privilege – one of { SELECT | INSERT | UPDATE | DELETE | TRUNCATE | REFERENCES | TRIGGER }

sqldbclient.sql_asyncio

Note

The following tools are available only with sqlalchemy version >= 1.4 installed, since the support for asynchronous engines was added in that release.

SqlAsyncExecutor is a simplified version of SqlExecutor, which provides a single method to execute queries asynchronously. It may be useful for the case when one needs to execute queries in parallel or to schedule an execution without blocking the main program.

SqlQueryPreparator is a wrapper around SqlAsyncExecutor with builtin tasks queue, which is used to store and obtain results of asynchronous executions. All queries are immediately scheduled for execution once they are added to the queue.

class sqldbclient.sql_asyncio.SqlAsyncExecutor(engine: AsyncEngine)[source]

Bases: object

Simplified asynchronous version of SqlExecutor. It provides a handy async execute method.

async execute(**kwargs)

class sqldbclient.sql_asyncio.SqlAsyncPlanner(engine: AsyncEngine)[source]

Bases: SqlAsyncExecutor

Wrapper around SqlAsyncExecutor with builtin tasks queue, which is used to store their results. All queries in queue are immediately scheduled for execution.

async get() → Any[source]

Tries to get result of query execution from queue.

Returns:: (optional) If query selects any rows then a pandas DataFrame will be returned.

put(query: str) → None[source]

Schedules query for execution and corresponding task to queue

Parameters:: query – query text

sqldbclient.sql_engine_factory

One can use sql_engine_factory to create sqlalchemy engines and avoid resource leakage by keeping only one engine per a unique set of parameters:

from sqldbclient import sql_engine_factory

# pass arguments and keyword arguments as to sqlalchemy create_engine function

sqlite_engine = sql_engine_factory.get_or_create('sqlite:///my_sqlite.db')

class sqldbclient.sql_engine_factory.SqlEngineFactory[source]

Bases: object

Class that provides factory for creating engines, and makes only one engine for a unique combination of arguments.

get_or_create(*args, **kwargs) → Engine[source]: Wrapping around sqlalchemy create_engine function. Only one engine will be created for each unique combination of arguments.

sqldbclient.sql_executor

Main class, SqlExecutor, inherits all functionalities from SqlHistoryManager, SqlQueryPreparator and SqlTransactionManager:

SqlHistoryManager
- stores information about query executions and their results in local SQLite database
- provides easy access to saved data via UUID
- performs database cleaning to keep its size limited
SqlQueryPreparator
- validates that there is exactly 1 statement in a query which is being executed
- determines query type
- formats query
- automatically adds LIMIT clause to query
SqlTransactionManager
- provides context manager for performing transactions

Moreover, SqlExecutor keeps configuration (sqlalchemy engine parameters, default LIMIT clause value, file name for history database) and provides single method for executing SQL queries.

class sqldbclient.sql_executor.SqlExecutor(engine: Engine, max_rows_read: int, history_db_name: str)[source]

Bases: SqlTransactionManager, SqlQueryPreparator, SqlHistoryManager

Main class for executing SQL queries, inherits all functionalities from SqlTransactionManager, SqlQueryPreparator, SqlHistoryManager.

Including

query preparation and parsing, with auto adding LIMIT clause to SELECT queries

transaction management using context manager:

with sql_executor:
    sql_executor.execute('DROP TABLE IF EXISTS foo')
    sql_executor.execute('CREATE TABLE foo AS SELECT 1 AS a')
    sql_executor.execute('SELECT * FROM foo')
    sql_executor.commit()

storing queries results and accessing them anywhere from local file-based SQLite database via UUID:
```
pg_executor['ce19362a9ac54e06b3be66d5cf858932']
```

builder = <sqldbclient.sql_executor.sql_executor_builder.sql_executor_builder.SqlExecutorBuilder object>

execute(query: TextClause | str, use_raw_query: bool = False, add_limit: bool = True, max_rows_read: int | None = None, outside_transaction: bool = False, force_result_fetching: bool = False, dump_execution_info: bool = True, dump_result: bool = True) → DataFrame | None[source]

Executes a SQL statement, and when applicable, saves result to local database and returns it in form of pandas DataFrame.

Parameters:

query – query text to execute in format of str or sqlalchemy TextClause.
use_raw_query – If True, no preparation or checking will be applied to query (including limit adding), that is query will be executed as is. May come in handy when query is parsed incorrectly, for some reason. By default, it is recommended to leave it set to False.
add_limit – If True, tries to add limit to query statement if it doesn’t exist, or decrease the limit value to ‘max_rows_read’ in case of exceeding.
max_rows_read – Number of rows used to limit SELECT query. If not specified, the default value from SqlExecutor instance will be used.
outside_transaction – If True, sqlalchemy will not create separate transaction to execute query. It may come in handy while executing stored procedures (e. g. in PostgreSQL), which commit results themselves. Otherwise, InvalidTransactionTermination may be raised.
force_result_fetching – If True, will try to fetch rows from cursor result that is obtained after executing query, even when the type of query does not imply returning any rows.
dump_execution_info – If True, query execution info will be dumped to history database. If False, query execution info will be logged but will not be accessible via UUID from history database.
dump_result – If True, query result will be dumped to history database (when query selects any rows). If False, query result will be returned but will not be accessible via UUID from history database.

Returns:

(optional) If query selects any rows then a pandas DataFrame will be returned.

execute_query(query: TextClause | str, outside_transaction: bool = False) → DataFrame | None[source]: Deprecated method, use execute

read_query(query: TextClause | str) → DataFrame | None[source]: Deprecated method, use execute

class sqldbclient.sql_executor.SqlExecutorConf(engine: Engine | None = None, max_rows_read: int | None = 10000, history_db_name: str | None = 'sql_executor_history_v1')[source]

Bases: object

Class that stores parameters used to build SqlExecutor

set(parameter: str, *args, **kwargs) → SqlExecutorConf[source]

Sets value for parameter. For parameter == ‘engine_options’, instance of sqlalchemy Engine will be created. Available parameters at the moment:

engine_options: args and kwargs to pass to sqlalchemy create_engine function
max_rows_read: default value for LIMIT clause
history_db_name: a file name for SQLLite database

class sqldbclient.sql_executor.SqlExecutorBuilder[source]

Bases: object

Class that defines builder for SqlExecutor class, creates only one instance per unique set of arguments given SqlExecutorConf

config(config: SqlExecutorConf) → SqlExecutorBuilder[source]: Reads parameter values from config

engine

get_or_create() → SqlExecutor[source]: Creates SqlExecutor instance from SqlExecutorConf parameters. Only one instance will be created per unique set of arguments.

history_db_name

max_rows_read

sqldbclient.sql_history_manager

SqlHistoryManager

stores information about query executions and their results in local SQLite database
provides easy access to saved data via UUID
performs database cleaning to keep its size limited

class sqldbclient.sql_history_manager.SqlHistoryManager(history_db_name: str)[source]

Bases: object

Class that stores queries execution information and results. SQLLite file-based database is used to save data. All interactions with history database is managed here. Execution results can be saved via dump() method. Methods get_exec_info(), get_result(), history() are responsible for reading data from history database. Disk storage used by database can be freed up by using delete_results().

delete_results(up_to_start_time: datetime | str | None = None, over_estimated_size: int | None = None, with_uuids: List[str] | None = None)[source]

Frees disk memory that is used by history database. Parameters to consider are query execution date and time, result size, and specified UUIDS. They can be set together, but either one of them should be specified. Otherwise, ValueError is raised.

Parameters:

up_to_start_time – Datetime, before which results should be deleted.
over_estimated_size – Minimum size to consider for removal.
with_uuids – List of concrete UUIDS, which results should be no longer stored.

dump(executed_query: ExecutedSqlQuery, df: DataFrame | None = None) → None[source]

Saves query execution information and result to disk.

Parameters:

executed_query – ExecutedSqlQuery item
df – (optional) result of execution in form of pandas DataFrame

get_exec_info(uuid: str) → ExecutedSqlQuery[source]

Loads execution information for specified query run via UUID.

Parameters:: uuid – UUID of executed query
Returns:: ExecutedSqlQuery item

get_execution_info(uuid: str) → ExecutedSqlQuery[source]

Loads execution information for specified query run via UUID. If UUID is not found, ValueError is raised.

Parameters:: uuid – UUID of executed query
Returns:: ExecutedSqlQuery item

get_result(uuid: str, reload: bool = False) → DataFrame[source]

Gets result from specified query run via UUID. Also performs caching looked up result in memory for easy access. If UUID is not found, ValueError is raised.

Parameters:

uuid – UUID of executed query
reload – If True, cache will not be used and result will be loaded from disk.

Returns:

pandas DataFrame

property history: DataFrame: Returns all ExecutedSqlQuery items, that is execution info for each executed query

sqldbclient.sql_query_preparator

SqlQueryPreparator

validates that there is exactly 1 statement in a query which is being executed
determines query type
formats query
automatically adds LIMIT clause to query

class sqldbclient.sql_query_preparator.SqlQueryPreparator(limit_nrows: int)[source]

Bases: object

Class that is used to prepare query text. It leverages sqlparse package for extracting separate statements and their types, and also for formatting. Another main feature is automatically adding LIMIT clause with specified value to SELECT queries.

LIMIT_REGEX = 'LIMIT\\s*(\\d*)$'

prepare(query_text: str, add_limit: bool = True, limit_nrows: int | None = None) → PreparedSqlQuery[source]

Main method for query preparation, which includes formatting and query type extraction. If query has more or less than exactly 1 statement, IncorrectSqlQueryException will be raised.

Parameters:

query_text – Query text in form of string.
add_limit – If True, LIMIT clause will be added to query with value equals to limit_nrows or self._limit_nrows when limit_nrows is not set. If query already has LIMIT clause, its value will be decreased in case of exceeding.
limit_nrows – value that will be used in LIMIT clause.

Returns:

PreparedSqlQuery instance

sqldbclient.sql_transaction_manager

SqlTransactionManager

provides context manager for performing transactions

class sqldbclient.sql_transaction_manager.SqlTransactionManager(engine: Engine)[source]

Bases: object

Class that is responsible for transaction management. Provides handy context manager, which used as follows (sql_executor is an instance of SqlTransactionManager):

with sql_executor:
    sql_executor.execute('DROP TABLE IF EXISTS foo')
    sql_executor.execute('CREATE TABLE foo AS SELECT 1 AS a')
    sql_executor.execute('SELECT * FROM foo')
    sql_executor.commit()

commit()[source]: Commits transaction

commit_transaction()[source]: Deprecated, use commit

rollback()[source]: Rolls transaction back

rollback_transaction()[source]: Deprecated, use rollback

sqldbclient.utils

sqldbclient.utils.deprecated.deprecated(foo: Callable)[source]: This is a decorator which can be used to mark functions as deprecated. It will result in a warning being emitted when the function is used.

sqldbclient.utils.log_decorators.time_logifier(method, class_)[source]: Counts duration of method execution

sqldbclient.utils.log_decorators.class_logifier(methods)[source]: Counts duration of class methods execution

sqldbclient.utils.pandas.cursor_result_to_df.cursor_result_to_df(cursor_result: CursorResult, force_result_fetching: bool = False) → DataFrame | None[source]

Fetches rows from cursor_result when it returns them, and creates pandas DataFrame.

Parameters:

cursor_result – CursorResult that is obtained from calling sqlalchemy execute method
force_result_fetching – If True, will try to fetch rows from cursor result that is obtained after executing query, even when the type of query does not imply returning any rows.

Returns:

(optional) If query selects any rows then a pandas DataFrame will be returned.

sqldbclient.utils.pandas.set_full_display.set_full_display(max_rows: int | None = 1000, max_columns: int | None = 1000, max_colwidth: int | None = 500, display_all_columns_by_default: bool | None = True, display_all_rows_by_default: bool | None = True, display_whole_colwidth_by_default: bool | None = False) → None[source]

Sets method full_display to a pandas.DataFrame object with configured parameters

Parameters:

max_rows – Maximum number of rows to display fully. If a pandas.DataFrame has more rows than max_rows, a TooBigToDisplayException is raised.
max_columns – Maximum number of columns to display fully. If a pandas.DataFrame has more columns than max_columns, a TooBigToDisplayException is raised.
max_colwidth – Maximum column width to display. If a pandas.DataFrame column is wider, the content will be suppressed to max_colwidth characters.
display_all_columns_by_default – If True, full_display method will try to display all columns by default.
display_all_rows_by_default – If True, full_display method will try to display all rows by default.
display_whole_colwidth_by_default – If True, full_display method will try to display whole column width by default.

sqldbclient.utils.pandas.parse_dates.parse_dates(df: DataFrame) → DataFrame[source]: Tries to convert columns of pandas DataFrame to datetime64[ns]