Josef Kunhart

Diploma thesis, Ing.

2 Architecture

2.1 Chapter Overview

The second chapter is entitled Architecture and deals with global concepts, design, and architecture of a content management system. In this work, a global concept means an architectural aspect or a feature that has huge impact on application's design and architecture (in contrast to the extending modules that affect only minor areas). The initial section of this chapter state actual project requirements. The following sections describe application design, internal mechanics, crud library and security of Urchin CMS and web applications. The design part describes system architecture, core objects and database layer. Sections about internal mechanics describe working with pages, modules, and website content. The last part describes both general security of web application and permission system of Urchin CMS.

2.2 Requirements

This short section serves as an introduction to further topics in this chapter and clearly sums up all requirements for Urchin project. Project requirements reflect contemporary expectations for the system as well as historical development. All requirements stand for Urchin CMS up to the version 2.2. This iteration is planned for the year 2013, as noted before. Only fundamental requirements are listed, otherwise the list would be too complex and confusing.

2.2.1 Functional Requirements

Functional requirements describe functions, processes and behaviour of the application. In other words, these requirements represent what the system is required to do. All requirements are sorted by their priority. The first list shows application-wide functional requirements:

  1. website management - presentations with tree-structured pages
  2. content management - system of components, multiple content on each page
  3. multi-language system - each presentation might use different language or locale
  4. multi-user access - multiple users could use administration at the same moment
  5. permission management - user groups and accounts, system of access rights
  6. crud library - tool for generation of basic user interface
  7. WYSIWYG editing - editing of content without advanced knowledge
  8. online preview - user can preview edited content before its publishing
  9. caching - reduction of database queries and load
  10. content versioning - content revisions on the element level

The second part shows module functional requirements:

  1. basic modules - basic content, news, articles, simple form
  2. full-text search - search facility and indexing of content
  3. advanced modules - dynamic form, gallery, products, linked modules
  4. client section - front-end registration, login, and differentiated access

2.2.2 Non-functional Requirements

Non-functional requirements specify attributes and constraints of the application. In other words, these requirements are about expected qualities of the project, mostly from the technical point of view. Opposite to the previous list, requirements in this list are sorted alphabetically, not by priority. Most of these requirements are on a similar level of importance.

  • flexibility - adding custom modules or diverse extensions is simple
  • layered architecture - separated programming code, database queries and templates
  • modern architecture - object-oriented approach, use of software engineering methods
  • performance - programming code is effective, queries are optimized and cached
  • simplicity - simple, minimalistic design with structured and documented code
  • security - secure application according to actual requirements
  • transactions - all database operations use transactions
  • usability - well-designed, intuitive and comprehensive user interface

2.2.3 Software Requirements

Software requirements define software technology required to run Urchin CMS application. In most cases, a LAMP stack is used for working web applications and WAMP stack is used for development.

  1. Unix/Linux/BSD - operational system for web server
  2. Apache 2.2+ - web server
  3. PHP 5.3+ - server-side programming language
  4. MySQL 5.1+ with InnoDB [21] engine or PostgreSQL 8.4+ - database server
  5. jQuery - client-side scripting language
  6. XHTML 1.0 Strict, CSS 3.0 - templates and layout
  7. modern browser, resolution 1280x1024 or more - administration access

2.3 Architecture Design

2.3.1 Model-View-Controller Architectural Pattern

Model-view-controller is one of the most influencing and commonly used software architectural patterns. This pattern was originally proposed in 1979 by Trygve M. H. Reenskaug [45], a Norwegian computer scientist. MVC pattern was proposed as solution for complex software systems working with large data sets and user interaction. This architectural pattern affects the entire application, unlike design patterns that provide solutions for more concrete problems of a software system. Design patterns used in this project will be mentioned in the end of this chapter.

Main purpose of the model-view-controller architectural pattern is separation of user interaction and data representation into three parts. The model part encapsulates application data and business rules. The controller part converts user actions to updates of model and changes in view. The view part displays output based upon information from the model. Multiple views might exist for each controller, e.g. a web template, a XML [51] file, or a PDF [34] document. Figure 2.1 presents modified version of this pattern. This variant is also called model-view-presenter [26]. This modification is typically used in web applications and uses a controller to access the view from the model instead of direct access. The router part in the diagram serves as a designated controller for routing requests to other controllers.

Figure 2.1: Schema of modified MVC pattern with router and user interaction

Schema of modified MVC pattern with router and user interaction

2.3.2 Hierarchical Model-View-Controller Architectural Pattern

Hierarchical model-view-controller is an extended version of basic MVC pattern, inspired by an older presentation-abstraction-control architectural pattern [53]. For detailed information about all described architectural patterns, see [55]. As seen in figure 2.2, this pattern is composed of multiple MVC triplets arranged in a tree hierarchy. Each triplet is relatively independent and composed of all three parts, model, view, and controller. In addition to handling user actions, controller part is responsible for communication between the nodes. First, an action is redirected by the router to a controller in the root-level layer. After processing in this layer, controllers in the child layer are addressed.

Figure 2.2: Schema of hierarchical MVC pattern

Schema of hierarchical MVC pattern

2.3.3 Front Controller Design Pattern

Front controller [54] is a software design pattern, used especially in web applications. Author of the pattern is Martin Fowler, a well-known British software engineer. Main idea of this pattern is to provide a single point that receives and handles user requests for the whole web application. When used in cooperation with HMVC, the front controller serves also as the router part in the root-level of hierarchy. Usually, it is also responsible for security, checking permission, and localization.

2.3.4 Use of Hierarchical Model-View-Controller in Urchin CMS

As stated before in the introductory chapter, Urchin CMS is a web-based application. The application is divided into two major sub-applications, front-end and administration. Both parts are based upon HMVC architectural pattern and use two- or three-level deep hierarchy of nodes. The database layer represents model on all levels, the controller and the view parts differ. Details of use will be shortly described in the following text. Several features (e.g. database layer, modules, or crud library) presented in this text will be described in detail later in this or the following chapter.

In the front-end, each page has assigned a single template and is composed of multiple components. The root level of hierarchy applies to the whole front-end application and is composed of the web front controller, database layer, and a page layout. On the lower level, each node consists of a module controller and a module template. More complex modules, such as dynamic forms, use a third level that handles concrete form controls, e.g. input or radio control. See the left half of figure 2.3 for further details.

The architecture of the administration is similar to the front-end. The view part of the top level structure is represented again by the database layer, the administration front controller and administration layout. There are also two designated front controllers for handling cron and AJAX requests. On the middle level, each node is represented by a crud instance or a custom module. Every crud instance or custom module uses its own controller and templates. Crud instances might be nested and usually contain another level of hierarchy, represented by crud elements such as components and filters. See the right half of figure 2.3 for details.

Figure 2.3: Hierarchical MVC pattern applied to Urchin CMS

Hierarchical MVC pattern applied to Urchin CMS

2.4 Packages

Following the architecture design, Urchin application is divided into multiple packages and sub-packages. Each package is directly mapped to a PHP namespace and consists of number of classes stored in a separate directory. Urchin CMS is a completely object-oriented application and includes more than 300 classes of different types.

The structure of the application is shown in figure 2.4. The most important package is core that contains fundamental classes of the WCFS framework. These classes will be described in detail in section Context and Core Objects. Other important packages are controllers, models and views that represent all parts of the MVC triad. The package controller contains front controllers and two sub-packages with common controllers, both for front-end and administration. The package models equals the database layer and is composed of classes working with database tables. The package with views contains layouts and templates for the whole application.

Figure 2.4: Urchin CMS structure at the package level

Urchin CMS structure at the package level

Other essential packages are crud, enums, and helpers, and converters. The package crud contains a complete library called crud for generation of basic user interfaces. The whole section Crud Library in the next chapter is dedicated to this invaluable library. Enums are similar to model classes, they encapsulate an application-wide sets of values, e.g. page states. Helpers provide various additional functionality, such as working with files, pagination, or sending e-mails. Converters serve a single purpose, to convert values to string representation and back again. The idea of converters has been adapted from JSF [22] technology.

The last three packages are exceptions, ext, and forms. The package exceptions contains all exceptions and the package ext external libraries. At this moment, there is only a single external library, PHPMailer [37]. The package forms contains additional classes for the dynamic form module. This module is much more complex than standard modules and therefore has assigned an independent package. The dynamic form module will also be discussed in section Dynamic Forms of chapter Extending Modules.

2.5 Context and Core Objects

2.5.1 Context

Context is a static class that is accessible from any part of the application. This class is relatively small, its main responsibility is to provide access to all other core objects, all other core objects are accessed via this class. It also supports facilities for localization, logging and basic system messages. This concept was inspired by JSP technology and its interface ServletContext [23], although the realization in Urchin CMS slightly differs. Context was introduced in Urchin 2.0, three years after most core objects.

2.5.2 Session

Session object simply encapsulates session and provides its basic security. Session is a common feature that keeps application settings over otherwise stateless HTTP [18] protocol.

2.5.3 Request and Response

Request is object that provides access to headers and attributes of a HTTP request. In comparison with standard PHP mechanics, the request object also significantly simplifies working with default values or complex arrays in request. Request enables getting post, get and cookie values as well as request values and saving uploaded files. Response is a simple object that allows sending headers or redirections back to the client. Again, design of both classes was hugely affected by JSP technology.

2.5.4 Pool

Pool object serves a single purpose, it supplies data for view. Using pool is the only way how to pass data from the controller to the view. All variables (including nested arrays and objects) are automatically escaped for security reasons. Design of this object is influenced by the registry design pattern from [54].

2.5.5 Cache

Cache is a specialized object for caching database queries and other data. Internally, this object uses a hierarchical tree structure with indices for storing data. This tree structure is saved to the file system. Data are manipulated using tree nodes with labels, e.g. web → pages → cs. Tags for random access are also implemented but not used. Today, caching is used only for basic queries. Use of caching will be definitely improved in future versions of Urchin CMS.

2.5.6 Link

Link object is responsible for global handling of links (urls), both in the front-end and in the administration. These links could be static or dynamic and allow adding and removing parameters. Dynamic links automatically keep previous parameters unless explicitly removed, static links start without any parameters. In the front-end both rewritten and non-rewritten links are supported, current setting depends on system configuration.

2.6 Database Design

Well-designed and mature database architecture is the backbone of the Urchin CMS application. The whole database sub-system is fully transactional according to OLTP standards and normalized to both third and Boyce-Codd normal forms. The database layer is always bound to the chosen relational database management system (currently MySQL). On one hand, each database server requires its own database layer, on the other hand, the concrete database layer is tailored to the selected database for better performance. In contrast to the object-oriented application design, plain old arrays are used for data transfer. Unlike Java, this is a natural approach in the PHP programming language.

2.6.1 Database Layer

The database layer serves as the model part in the model-view-controller architecture and is based upon the table data gateway design pattern. Each class in the model is mapped to a single main table and possibly other related tables. This design allows effective separation of application logic and queries. Other advantage is relatively simple caching of database queries directly in model classes. A typical model class is stateless (has no properties) and provides all methods necessary to work with the underlying table. All models inherit BaseModel class that enables basic data-manipulation operations, e.g. fetch, update, insert, delete and transactional operations, e.g. begin, rollback, commit.

Queries used in the crud library are mostly generated while respecting foreign keys and indices. Both static and generated queries strictly use prepared (parametrized) statements for better performance and prevention of SQL injection and similar exploits. Basic caching of queries is already provided, advanced caching with dependencies is proposed for future implementation.

2.6.2 Database Structure

Database model of Urchin CMS core is composed of about 30 main and 15 additional tables (for lookups and translations). There are three major and several minor database parts. The major areas are the page axis, the component axis and the permission sub-system. Minor areas include search, locale, or logging. Figure 2.5 displays a conceptual database model with basic entities and relations of both axes. A prefix a_ is used for core tables (m_* for module tables and v_* for views). The left side of the diagram shows fundamental tables of the page axis. These tables are a_presentation, a_page, a_template and a_position. The right side of the picture shows the component axis with tables a_module, a_component and a_element. Details about all three major parts will be discussed in the following chapter Core Features and Modules together with the application design. A complete conceptual model with all core entities is available in the appendix Database Model.

Figure 2.5: Database model with core tables and their relations

Database model with core tables and their relations

2.6.3 Alternative Approaches

There are generally three common approaches how to work with a database in a web application. First one, a separated database layer, was already described. The second approach is to simply embed database queries into the programming code when necessary. This is common, but definitely not recommended approach as it mixes database queries and application logic. The last usual option is to use an object-relational mapping (ORM). This is especially common in the Java EE world, where products such as Hibernate [20] are used. More alternative ways for storing data are NoSQL database management systems, such as file-based databases, document-oriented databases, XML-oriented databases, or key-value storages.

2.7 Crud Library

Crud is an important built-in library in Urchin CMS. The library serves as an efficient tool for generating user interfaces and is used for managing most database data in the administration. Controllers inherited from crud library classes are in fact declared, no advanced programming is necessary except implementation of custom features. As the library name suggests, crud basically handles four typical operations with records: create, read, update, and delete. Beyond these elementary functions the crud library allows nesting of crud instances, filtering of records, validation of input, or adding custom functions, e.g. sending a mail. All operations very closely cooperate with Urchin permission system. In example, controller for managing presentations allows only view, detail and edit actions. Members of all users groups can preview presentations but only administrators are allowed to edit these records.

Crud works effectively with any database schema that has set up a single integer primary key for each standard table and properly uses foreign keys. Database relations up to M:N and M:N:X are supported. Queries in crud-based controllers are generated according to the definition in the crud controller class. Main advantages of the crud approach are rapid development, simple maintenance, reduction of duplicated code, and enough flexibility for adding custom elements or individual functions. There are also no generated files or forms, just simple and straightforward classes with definitions.

The crud library is composed of several main parts and sub-packages with various elements. See figure 2.6 for detailed structure of this important Urchin package. The main parts are common crud, element crud, cross crud, and matrix crud. Crud elements are entries, filters, components, decorators, and validators. The central part is common crud that handles single main table and several linked tables. Only a common crud instance might contain nested instances of any other type.

Figure 2.6: Packages and classes of the crud library

Packages and classes of the crud library

Element crud works exactly as the common crud, moreover it is adapted to manage elements. Elements are special database records used in content modules. Cross crud is used to manage M:N database relations, matrix crud to handle M:N:X relations. Nesting of instances precisely follows the composite design pattern. Here, a common crud instance serves as the composite component while cross crud instance, matrix crud instance, or custom implementation create the leaf part. All mentioned parts will be discussed in the following sections.

2.7.1 Common and Element Crud

Common crud is the most important and most popular part of the crud library. A controller built upon a common crud instance manages records in a single main database table (or its row-based partition) along with data from related tables. The common crud employs actions to work with records. Four displaying actions and four modifying actions are present. Displaying actions just show record(s), they do not modify anything. These actions are view, detail, edit, and add. Modifying actions modify records. These actions are delete, edit, add, and copy.

All modifying actions are accessed from views and provide transactionally secure before & after action callbacks to add custom functionality. Both types of actions are implemented using the command behavioural design pattern. See the diagram in figure 2.7 for details of interaction between actions in the common crud (selection of action after update/insert is omitted to keep diagram simple). The element crud works exactly as the common crud. In addition, it provides facility to work with elements, such as built-in components, work-flow support or extended modifying actions to work with views instead of tables.

Figure 2.7: State diagram for common crud with transitions between actions

State diagram for common crud with transitions between actions

The view action is the default one and its purpose is to display basic information about multiple records at once. The view action uses pagination, components to display data, and filters to narrow results. The detail action displays extended information about a selected record. It also contains any nested instances that are linked to this record. The edit action shows a form with existing record and allows updating this record.

The add action shows an empty form for adding a new record. Both edit and add actions use components to display data and validators to check user input. In addition to components, all four displaying actions utilize decorators and converters when necessary. The builder design pattern is applied for literally building common crud instances from entries, components, and other elements.

2.7.1.1 Entries

An entry is a basic unit used in the crud instance. Each entry serves as a container for a single component and validators. If assigned, a component tells entry how to display value. A database entry works with a single database column while simple entry does not require a database field at all. Entries are also responsible for pre-loading component data if necessary and marking components as required based upon assigned validators.

2.7.1.2 Components

Components present database values to the user in an appropriate format and also assemble these values back before updating or inserting a record. There exist about 40 various component types in the crud library. Some components only display values in the desired format, e.g. simple text, date/time, url link, or a value from a joined table. Other components are used in forms for editing records, such as input field, text area, radio buttons, or selection box. Specific components enable selection of files and images, working with elements, AJAX-based editing, or synchronization between fields. The action component enables calling custom actions implemented in the controller. This option is used e.g. for generating new password and sending it to the user in the administration. The hidden component allows adding default values for the database fields.

2.7.1.3 Decorators

Decorators are simple objects for enhancing displayed data independently of the assigned component. Several decorators exist in the crud library: to display text before or after component, to show a custom tool-tip, and to show bold text. Decorators strictly follow the decorator design pattern.

2.7.1.4 Validators

Validators are used to check user input in the edit and add actions. Each entry might include one or multiple elements of this type. Validators check user-provided data according to rules defined in the crud instance. The crud library has included many useful validators, for e.g. checking non-empty field, comparison with a custom value, regular expression based checks, or securing value uniqueness against database.

2.7.1.5 Converters

Converters modify values both before injecting them from the database to components and before putting values from components inside a form back into the database. There is huge difference between converters and components. Components only format data for rendering, they in fact do not modify anything. Converter package is located outside the crud library because converters are widely used in the whole application. Converters are used mostly to encode and decode strings, transform date formats, or specifically treat database null values if required.

2.7.1.6 Filters

Filters are simple objects that serve a single purpose, filtering records in the view action. The crud library provides filters for searching by text, comparison with value or date, custom list or by selection from other database table.

2.7.2 Cross and Matrix Crud

The cross crud is the third important part of the crud library. It is used to manage M:N database relations (also called 'cross' tables, hereby the name). M:N tables contain nothing more than two foreign keys to other database tables. A cross crud instance cannot exist on its own, it must be nested inside a common crud instance. The reason is that the value of the first column is taken from the parent instance and the user chooses values from the second column. The cross crud is much more simple than previous parts, it utilizes only two actions: the view action for displaying data and the update action for saving changes.

The matrix crud works similarly to the cross crud. It manages three-dimensional M:N:X relations. There is only one table of this type in Urchin CMS. This table is composed of exactly three foreign keys and is used in the permission sub-system. Again, the value of the selected column is set up while the user is allowed to work with two remaining columns. The user interface of the matrix crud therefore looks like a two-dimensional grid. The matrix crud also uses two actions, the view action and the update action.

2.7.3 Flexibility and Extensibility

The crud library is highly flexible and extensible. There are multiple ways how to extend the library. The easiest way is to create custom components and other elements. Implementation of components is very simple, these classes are usually simple and straightforward. The situation is similar with other elements, validators being the most simple. Many new elements have emerged this way during the development of Urchin CMS. The second possibility to extend the crud library is to use built-in callbacks. These methods are called before and after either delete, or insert, or update if used. All callbacks are provided with primary key value of the record and a corresponding model with data. Modifying the model is also possible. This option is used very often, e.g. for assigning default page group to the newly added page or working with positions in records works exactly this way.

Using the action component and implementation of this action in an inherited controller is the third way to extend the system. In example, sending a new generated password from the administration to the existing user uses this feature. Advanced programmers might also override existing methods with custom functionality. The element crud was developed this way, it is a significant extension of the original common crud with rewritten and enhanced modifying actions. In fact, an element crud instance works with a view instead of a table. This view is defined on exactly two database tables, a table with elements and a module-specific table that differs according to managed module.

[Pages 5-18]