Saturday, 19 September 2015

SAP HANA: Architecture Overview & Row vs Column Datastructure

2.2 Advance Information on SAP HANA Architecture

As explained in the previous topic SAP HANA consists of Index server and XS.
Apart from this SAP HANA also contains Name Server, Preprocessor Server and Statistics Server.
Let’s have a close look into this.


Index Server:
    • Index server is the main SAP HANA database component
    • It contains the actual data stores and the engines for processing the data.
    • The index server processes incoming SQL or MDX statements in the context of authenticated sessions and transactions.
Persistence Layer:
The database persistence layer is responsible for durability and atomicity of transactions. It ensures that the database can be restored to the most recent committed state after a restart and that transactions are either completely executed or completely undone.

Preprocessor Server:
The index server uses the preprocessor server for analyzing text data and extracting the information on which the text search capabilities are based.

Name Server:
The name server owns the information about the topology of SAP HANA system. In a distributed system, the name server knows where the components are running and which data is located on which server.

Statistic Server:
The statistics server collects information about status, performance and resource consumption from the other servers in the system. The statistics server also provides a history of measurement data for further analysis.

Session and Transaction Manager:
The Transaction manager coordinates database transactions, and keeps track of running and closed transactions. When a transaction is committed or rolled back, the transaction manager informs the involved storage engines about this event so they can execute necessary actions.

XS Engine:
XS Engine is an optional component. Using XS Engine clients can connect to SAP HANA database to fetch data via HTTP.


2.3 Row Store Vs Column Store

Overview of Row Data Storage and Column Data Storage:

Relational databases typically use row-based data storage. However Column-based storage is more suitable for many business applications. SAP HANA supports both row-based and column-based storage, and is particularly optimized for column-based storage.

As shown in the figure below, a database table is conceptually a two-dimensional structure composed of cells arranged in rows and columns.

Because computer memory is structured linearly, there are two options for the sequences of cell values stored in contiguous memory locations:

Row Storage – It stores table records in a sequence of rows.
Column Storage – It stores table records in a sequence of columns i.e. the entries of a column is stored in contiguous memory locations.


Traditional databases store data simply in rows. The HANA in-memory database stores data in both rows and columns. It is this combination of both storage approaches that produces the speed, flexibility and performance of the HANA database.

Advantages of column-based tables:
Faster Data Access:
Only affected columns have to be read during the selection process of a query. Any of the columns can serve as an index.

Better Compression:
Columnar data storage allows highly efficient compression because the majority of the columns contain only few distinct values (compared to number of rows).

Better parallel Processing:
In a column store, data is already vertically partitioned. This means that operations on different columns can easily be processed in parallel. If multiple columns need to be searched or aggregated, each of these operations can be assigned to a different processor core.

Advantages and disadvantages of row-based tables:
Row based tables have advantages in the following circumstances:
    • The application needs to only process a single record at one time (many selects and/or updates of single records).
    • The application typically needs to access a complete record (or row).
    • Neither aggregations nor fast searching are required.
    • The table has a small number of rows (e. g. configuration tables, system tables).
Row based tables have disadvantages in case of analytic applications where aggregation are used and faster search & processing are required. In row based tables all data in a row has to be read even though the requirement may be there to access data from a few columns.

Which type of tables should be preferred – Row-based or Column-based?
In case of analytic applications, where aggregations are used and faster search & processing are required, row-based storage are not good. In row based tables all data stored in a row has to be read even though the requirement may be there to access data from a few columns. Hence, these queries on huge amounts of data would take lots of times.

In columnar tables, this information is stored physically next to each other, that significantly increases the speed of certain data queries.

The following example shows the difference between the usage of column and row storage, and positions them relative to row and column queries. Column storage is most useful for OLAP queries (queries using any SQL aggregate functions). Because, these queries get just a few attributes from every data entry. But for traditional OLTP queries (queries not using any SQL aggregate functions), it is more advantageous to store all attributes side-by-side in row tables. HANA combines the benefits of both row- and column-storage tables.


Conclusion:
To enable fast on-the-fly aggregations, ad-hoc reporting, and to benefit from compression mechanisms it is recommended that transaction data is stored in a column-based table.

The SAP HANA data-base allows joining row-based tables with column-based tables. However, it is more efficient to join tables that are located in the same row or column store. For example, master data that is frequently joined with transaction data should also be stored in column-based tables.

No comments:

Post a Comment