SqlDBClient API Reference
Module contents
Sql DB Client is a Python interface for interacting with a database.
Its main goal is to provide a Python-based alternative to basic database client software applications (e.g. DBeaver), especially in terms of executing SQL queries. This package mostly aims at SQL scripts executing since other types of database related activities (e.g. database navigation) can be done more conveniently with specifically designed graphical UI.
Based on powerful Python packages such as sqlalchemy, pandas and sqlparse, it provides easy-to-use interface for executing SQL code along with other additional functionalities:
keeping track of all executed queries, their execution information and results
parsing SQL queries (e.g. automatically adding LIMIT clause to prevent memory overflow)
performing transaction by simply using
withoperator
sqldbclient is especially helpful for data analysts and engineers
who are used to work with Python and its packages
inside Jupyter Notebook environment, since it’s meant for an interactive use
with the goal of analyzing, visualizing and interpreting data.
Note that a SQL query result will be shown and saved as a pandas DataFrame object.
The module is compatible with Python 3.6+ and released under the terms of the MIT License.
Visit the project page at https://github.com/YuriyKozhev/SqlDbClient for further information about this project.
Subpackages
sqldbclient.db_inspector
Note
This module is under development, and currently provides minimal functionality.
Provides a wrapper around sqlalchemy inspect function.
Apart from standard sqlalchemy.engine.reflection.Inspector methods,
a DBInspector instance has the following functionalities:
creates text representation of table columns
provides get_views method to get consistent result throughout different sqlalchemy versions
- sqldbclient.db_inspector.db_inspector.inspect(*args, **kwargs) DBInspector[source]
Wrapper around sqlalchemy inspect function, that adds custom methods.
- class sqldbclient.db_inspector.db_inspector.DBInspector(bind)[source]
Bases:
Inspector- get_columns_repr(table: str, schema: str | None = None) str[source]
Constructs string with structured information about table columns and their data types.
- Parameters:
table – table name
schema – schema name
sqldbclient.dialects.postgresql
SqlViewFactory collects all available data about a regular
or materialized view and all its dependencies into a Python View object.
SqlViewMaterializer applies changes made to a View instance to a corresponding database
object and all its dependencies.
Together, SqlViewFactory and SqlViewMaterializer provide
a tool which helps redefine a view in a database without
the need of manually dropping it and its dependencies and then recreating them all.
It also takes care of all the permissions that recreated objects had,
that is the permissions will be automatically restored along with the view
and its dependencies.
Note that all the necessary steps will be executed in a separate transaction,
which ensures that the whole operation either will be completed fully
or will not be done at all.
Warning
‘INSTEAD OF’ view triggers are not supported yet and will not be automatically restored during view recreation.
- class sqldbclient.dialects.postgresql.SqlViewFactory(view_name: str, view_schema: str, sql_executor: SqlExecutor)[source]
Bases:
objectFactory to create View objects, which store all information obout them to be able to fully restore them in database, if necessary.
- class sqldbclient.dialects.postgresql.SqlViewMaterializerUtils(view: View, sql_executor: SqlExecutor)[source]
Bases:
objectClass that performs standard Postgres database actions, such as setting owner, granting privileges, dropping and creating objects and indices, and refreshing materialized views.
- class sqldbclient.dialects.postgresql.SqlViewMaterializer(view: View, sql_executor: SqlExecutor)[source]
Bases:
objectClass that is used to materialize in database changes that were made to corresponding View object.
- class sqldbclient.dialects.postgresql.grant_access(object_name: str, object_schema: str, user_name: str, sql_executor: SqlExecutor, privilege: str = 'SELECT')[source]
Bases:
Grants privilege on object to user
- Parameters:
object_name – object name in database
object_schema – object schema in database
user_name – user_name in database
sql_executor – instance of SqlExecutor
privilege – one of { SELECT | INSERT | UPDATE | DELETE | TRUNCATE | REFERENCES | TRIGGER }
sqldbclient.sql_asyncio
Note
The following tools are available only with sqlalchemy version >= 1.4 installed, since the support for asynchronous engines was added in that release.
SqlAsyncExecutor is a simplified version of SqlExecutor,
which provides a single method to execute queries asynchronously.
It may be useful for the case when one needs to execute queries in parallel or
to schedule an execution without blocking the main program.
SqlQueryPreparator is a wrapper around SqlAsyncExecutor
with builtin tasks queue, which is used to store and obtain results of
asynchronous executions. All queries are immediately scheduled for execution
once they are added to the queue.
- class sqldbclient.sql_asyncio.SqlAsyncExecutor(engine: AsyncEngine)[source]
Bases:
objectSimplified asynchronous version of SqlExecutor. It provides a handy async execute method.
- async execute(**kwargs)
- class sqldbclient.sql_asyncio.SqlAsyncPlanner(engine: AsyncEngine)[source]
Bases:
SqlAsyncExecutorWrapper around SqlAsyncExecutor with builtin tasks queue, which is used to store their results. All queries in queue are immediately scheduled for execution.
sqldbclient.sql_engine_factory
One can use sql_engine_factory to create sqlalchemy engines and avoid resource leakage
by keeping only one engine per a unique set of parameters:
from sqldbclient import sql_engine_factory # pass arguments and keyword arguments as to sqlalchemy create_engine function sqlite_engine = sql_engine_factory.get_or_create('sqlite:///my_sqlite.db')
sqldbclient.sql_executor
Main class, SqlExecutor, inherits all functionalities from SqlHistoryManager,
SqlQueryPreparator and SqlTransactionManager:
SqlHistoryManagerstores information about query executions and their results in local SQLite database
provides easy access to saved data via UUID
performs database cleaning to keep its size limited
SqlQueryPreparatorvalidates that there is exactly 1 statement in a query which is being executed
determines query type
formats query
automatically adds LIMIT clause to query
SqlTransactionManagerprovides context manager for performing transactions
Moreover, SqlExecutor keeps configuration
(sqlalchemy engine parameters, default LIMIT clause value, file name for history database)
and provides single method for executing SQL queries.
- class sqldbclient.sql_executor.SqlExecutor(engine: Engine, max_rows_read: int, history_db_name: str)[source]
Bases:
SqlTransactionManager,SqlQueryPreparator,SqlHistoryManagerMain class for executing SQL queries, inherits all functionalities from
SqlTransactionManager,SqlQueryPreparator,SqlHistoryManager.Including
query preparation and parsing, with auto adding LIMIT clause to SELECT queries
transaction management using context manager:
with sql_executor: sql_executor.execute('DROP TABLE IF EXISTS foo') sql_executor.execute('CREATE TABLE foo AS SELECT 1 AS a') sql_executor.execute('SELECT * FROM foo') sql_executor.commit()
storing queries results and accessing them anywhere from local file-based SQLite database via UUID:
pg_executor['ce19362a9ac54e06b3be66d5cf858932']
- builder = <sqldbclient.sql_executor.sql_executor_builder.sql_executor_builder.SqlExecutorBuilder object>
- execute(query: TextClause | str, use_raw_query: bool = False, add_limit: bool = True, max_rows_read: int | None = None, outside_transaction: bool = False, force_result_fetching: bool = False, dump_execution_info: bool = True, dump_result: bool = True) DataFrame | None[source]
Executes a SQL statement, and when applicable, saves result to local database and returns it in form of pandas DataFrame.
- Parameters:
query – query text to execute in format of str or sqlalchemy TextClause.
use_raw_query – If
True, no preparation or checking will be applied to query (including limit adding), that is query will be executed as is. May come in handy when query is parsed incorrectly, for some reason. By default, it is recommended to leave it set toFalse.add_limit – If
True, tries to add limit to query statement if it doesn’t exist, or decrease the limit value to ‘max_rows_read’ in case of exceeding.max_rows_read – Number of rows used to limit SELECT query. If not specified, the default value from SqlExecutor instance will be used.
outside_transaction – If
True, sqlalchemy will not create separate transaction to execute query. It may come in handy while executing stored procedures (e. g. in PostgreSQL), which commit results themselves. Otherwise, InvalidTransactionTermination may be raised.force_result_fetching – If
True, will try to fetch rows from cursor result that is obtained after executing query, even when the type of query does not imply returning any rows.dump_execution_info – If
True, query execution info will be dumped to history database. IfFalse, query execution info will be logged but will not be accessible via UUID from history database.dump_result – If
True, query result will be dumped to history database (when query selects any rows). IfFalse, query result will be returned but will not be accessible via UUID from history database.
- Returns:
(optional) If query selects any rows then a pandas DataFrame will be returned.
- class sqldbclient.sql_executor.SqlExecutorConf(engine: Engine | None = None, max_rows_read: int | None = 10000, history_db_name: str | None = 'sql_executor_history_v1')[source]
Bases:
objectClass that stores parameters used to build SqlExecutor
- set(parameter: str, *args, **kwargs) SqlExecutorConf[source]
Sets value for parameter. For parameter == ‘engine_options’, instance of sqlalchemy Engine will be created. Available parameters at the moment:
engine_options: args and kwargs to pass to sqlalchemy create_engine function
max_rows_read: default value for LIMIT clause
history_db_name: a file name for SQLLite database
- class sqldbclient.sql_executor.SqlExecutorBuilder[source]
Bases:
objectClass that defines builder for SqlExecutor class, creates only one instance per unique set of arguments given SqlExecutorConf
- config(config: SqlExecutorConf) SqlExecutorBuilder[source]
Reads parameter values from config
- engine
- get_or_create() SqlExecutor[source]
Creates SqlExecutor instance from SqlExecutorConf parameters. Only one instance will be created per unique set of arguments.
- history_db_name
- max_rows_read
sqldbclient.sql_history_manager
SqlHistoryManagerstores information about query executions and their results in local SQLite database
provides easy access to saved data via UUID
performs database cleaning to keep its size limited
- class sqldbclient.sql_history_manager.SqlHistoryManager(history_db_name: str)[source]
Bases:
objectClass that stores queries execution information and results. SQLLite file-based database is used to save data. All interactions with history database is managed here. Execution results can be saved via
dump()method. Methodsget_exec_info(),get_result(),history()are responsible for reading data from history database. Disk storage used by database can be freed up by usingdelete_results().- delete_results(up_to_start_time: datetime | str | None = None, over_estimated_size: int | None = None, with_uuids: List[str] | None = None)[source]
Frees disk memory that is used by history database. Parameters to consider are query execution date and time, result size, and specified UUIDS. They can be set together, but either one of them should be specified. Otherwise, ValueError is raised.
- Parameters:
up_to_start_time – Datetime, before which results should be deleted.
over_estimated_size – Minimum size to consider for removal.
with_uuids – List of concrete UUIDS, which results should be no longer stored.
- dump(executed_query: ExecutedSqlQuery, df: DataFrame | None = None) None[source]
Saves query execution information and result to disk.
- Parameters:
executed_query – ExecutedSqlQuery item
df – (optional) result of execution in form of pandas DataFrame
- get_exec_info(uuid: str) ExecutedSqlQuery[source]
Loads execution information for specified query run via UUID.
- Parameters:
uuid – UUID of executed query
- Returns:
ExecutedSqlQuery item
- get_execution_info(uuid: str) ExecutedSqlQuery[source]
Loads execution information for specified query run via UUID. If UUID is not found, ValueError is raised.
- Parameters:
uuid – UUID of executed query
- Returns:
ExecutedSqlQuery item
- get_result(uuid: str, reload: bool = False) DataFrame[source]
Gets result from specified query run via UUID. Also performs caching looked up result in memory for easy access. If UUID is not found, ValueError is raised.
- Parameters:
uuid – UUID of executed query
reload – If
True, cache will not be used and result will be loaded from disk.
- Returns:
pandas DataFrame
- property history: DataFrame
Returns all ExecutedSqlQuery items, that is execution info for each executed query
sqldbclient.sql_query_preparator
SqlQueryPreparatorvalidates that there is exactly 1 statement in a query which is being executed
determines query type
formats query
automatically adds LIMIT clause to query
- class sqldbclient.sql_query_preparator.SqlQueryPreparator(limit_nrows: int)[source]
Bases:
objectClass that is used to prepare query text. It leverages sqlparse package for extracting separate statements and their types, and also for formatting. Another main feature is automatically adding LIMIT clause with specified value to SELECT queries.
- LIMIT_REGEX = 'LIMIT\\s*(\\d*)$'
- prepare(query_text: str, add_limit: bool = True, limit_nrows: int | None = None) PreparedSqlQuery[source]
Main method for query preparation, which includes formatting and query type extraction. If query has more or less than exactly 1 statement, IncorrectSqlQueryException will be raised.
- Parameters:
query_text – Query text in form of string.
add_limit – If
True, LIMIT clause will be added to query with value equals to limit_nrows or self._limit_nrows when limit_nrows is not set. If query already has LIMIT clause, its value will be decreased in case of exceeding.limit_nrows – value that will be used in LIMIT clause.
- Returns:
PreparedSqlQuery instance
sqldbclient.sql_transaction_manager
SqlTransactionManagerprovides context manager for performing transactions
- class sqldbclient.sql_transaction_manager.SqlTransactionManager(engine: Engine)[source]
Bases:
objectClass that is responsible for transaction management. Provides handy context manager, which used as follows (sql_executor is an instance of SqlTransactionManager):
with sql_executor: sql_executor.execute('DROP TABLE IF EXISTS foo') sql_executor.execute('CREATE TABLE foo AS SELECT 1 AS a') sql_executor.execute('SELECT * FROM foo') sql_executor.commit()
sqldbclient.utils
- sqldbclient.utils.deprecated.deprecated(foo: Callable)[source]
This is a decorator which can be used to mark functions as deprecated. It will result in a warning being emitted when the function is used.
- sqldbclient.utils.log_decorators.time_logifier(method, class_)[source]
Counts duration of method execution
- sqldbclient.utils.log_decorators.class_logifier(methods)[source]
Counts duration of class methods execution
- sqldbclient.utils.pandas.cursor_result_to_df.cursor_result_to_df(cursor_result: CursorResult, force_result_fetching: bool = False) DataFrame | None[source]
Fetches rows from cursor_result when it returns them, and creates pandas DataFrame.
- Parameters:
cursor_result – CursorResult that is obtained from calling sqlalchemy execute method
force_result_fetching – If
True, will try to fetch rows from cursor result that is obtained after executing query, even when the type of query does not imply returning any rows.
- Returns:
(optional) If query selects any rows then a pandas DataFrame will be returned.
- sqldbclient.utils.pandas.set_full_display.set_full_display(max_rows: int | None = 1000, max_columns: int | None = 1000, max_colwidth: int | None = 500, display_all_columns_by_default: bool | None = True, display_all_rows_by_default: bool | None = True, display_whole_colwidth_by_default: bool | None = False) None[source]
Sets method
full_displayto a pandas.DataFrame object with configured parameters- Parameters:
max_rows – Maximum number of rows to display fully. If a pandas.DataFrame has more rows than max_rows, a
TooBigToDisplayExceptionis raised.max_columns – Maximum number of columns to display fully. If a pandas.DataFrame has more columns than max_columns, a
TooBigToDisplayExceptionis raised.max_colwidth – Maximum column width to display. If a pandas.DataFrame column is wider, the content will be suppressed to max_colwidth characters.
display_all_columns_by_default – If
True, full_display method will try to display all columns by default.display_all_rows_by_default – If
True, full_display method will try to display all rows by default.display_whole_colwidth_by_default – If
True,full_displaymethod will try to display whole column width by default.