- 1 Overview
- 1.1 Content Adaptation Platform (CAP)
- 1.2 CXE (Content Exchange Engine)
The following two sections provide a detailed breakdown of GlobalSight's architecture from a functional viewpoint.
GlobalSight comprises two main subsets of components: the CAP (Content Adaptation Platform), and the CXE (Content Exchange Engine).
Content Adaptation Platform (CAP)
GlobalSight's Content Adaptation Platform (CAP) contains components related to the main functions of the GMS including:
- user interface
- database persistence
- user management and security
- translation memory
- on-line editor and off-line editor handling
The persistence service is an RMI component that services requests from other RMI services, EJBs and servlets. The persistence layer is based on the Java Database Connectivity (JDBC) API. The code that makes the database connection is completely transparent to the calling code, and it is possible to externalize the database URL, username and password. The persistence layer uses prepared statements to avoid the overhead of repeated parsing and it caches prepared statements to avoid the overhead of frequent object creation. The code makes use of bulk inserts and batch updates to avoid the overhead of frequent round trips between the database server and application server. There are several levels to the caching available through Hibernate. In the highest level, the objects are permanently stored in the application server cache. In the next level, objects are removed as soon as a size limit has been reached. In the last level, there is no caching at all. Typically, objects that are few in number are placed at the highest level (e.g. Locales, Localization Profiles, Workflow Templates, Projects, Translation Memory Profiles and Locale Pairs). Objects that fall in the middle category typically include Jobs, Workflows and Tasks. Segments and template parts are objects that are not cached at all.
The WorkflowManager is a service component responsible for managing all the workflow instance-related activities in GlobalSight. It communicates with the third party workflow engine (jBPM) through the WorkflowServer component. The Workflow Server is used to wrap the jBPM model API and is the main proxy for communication with the workflow engine.
The initial step is workflow creation, which happens during content import to GlobalSight. Each workflow represents a locale pair process instance within a job and is based on a workflow template (the workflow templates are associated with the localization profile used for the import process). As soon as the dispatch process of a particular workflow is requested through the WorkflowManager, the first activity is activated and assigned to the role members associated with that activity.
WorkflowManager is also responsible for dynamic modification of a workflow. Modification could include a structural edit of a workflow (adding/removing activities), modification of the attributes of the existing activities, or reassigning the active task to a different user/role. The owner of the workflow is the only person who can perform a structural modification. Workflow ownership is determined during the workflow template design by selecting a Project Manager (the main owner) and Workflow Manager (an assistant to the Project Manager).
Each activity can have an optional “Create Secondary Target File” system action associated with it. The purpose of this system action is to generate and associate an un-extracted version of the target pages (that were updated in the previous activity) to the newly activated task. This allows customers to import a desktop publishing format file (for example QuarkXPress™), have it extracted for use with GlobalSight’s linguistic technologies and, after translation, have it returned to the natural Quark format for review, formatting and graphics stages within workflow.
Import / Export
The import process within CAP starts after CAP receives GXML from the Content Exchange Engine (CXE). CXE sends a JMS message to the RequestHandler object, which prepares the localization request within CAP and calls the PageManager and PageImporter services. These services are responsible for parsing the GXML into separate translatable and localizable segments (TUVs) for persistence in the database, and creation of SourcePage and TargetPage objects, Workflow objects and Job objects. These objects, and the applicable TUs and TUVs, are modified as pages and are worked on through GlobalSight's workflow. The PageManager is also responsible for initiating TM leveraging and Terminology leveraging, and the subsequent workflow (and/or job) creation.
User Management and Security
GlobalSight's User Management component provides:
- user authentication along with management
- storage of user profiles.
User profiles include basic user information like :
- phone number
- access groups --- provide access control for system resources
- the roles --- used by jBPM for GlobalSight's workflow system
The UserManager component, along with the SecurityManager component, is accessed by other GlobalSight components through RMI.
Translation Memory Management
Translation Memory (TM) is a software module that enables the reuse of previously translated text. GlobalSight TM has several unique strengths inherent in its design.
- True Multi-lingual TM Functionality
- Page TM and Segment TM
- TMX Compliance
- Leverage Options
- TM Population
- TM Leverage
- TM Import and TM Export
- TM Maintenance
The terminology subsystem provides a concept-oriented terminology management system (TMS). The subsystem is only loosely coupled with GlobalSight, sharing its user management, persistence and UI control layers. It can also be used as a stand-alone Web-based TMS that can be accessed anonymously (if desired.)
The online editor displays content to the user in a number of different page view modes including:
- Preview mode – an HTML preview of the page itself (applicable only to HTML content).
- Text mode – the full text (markup) of the page. Note that only the localizable or translatable elements (human language segments or certain properties like font and color) are editable, not the markup itself.
- List mode – a list of all the editable segments (localizable and translatable).
- Dynamic preview – for applicable formats (database content, TeamSite content), this allows content to be sent back to the end data repository from where it is previewed.
Offline Editing Environment
The Offline Edit Manager is responsible for processing requests for target files to be translated offline, and also for returning those same target files back to the system after they have been translated.
The OfflineEditManager has two main sub-components:
- the Downloader
- the Uploader.
Unextracted files are initially imported into the system in their native format and are not segmented. These files are simply passed through the OfflineEditorManager in their native (binary) form. Unextracted files include Macromedia files, images, etc., or any file for which the project manager does not wish to use GlobalSight's linguistic technologies.
Extracted files are segmented during import and the segments are separated from the native file code and formatting. These extracted segments are thus normalized and can be used to build a number of common file formats that are suitable for offline translation using any modern word processor that supports RTF or Unicode plain text. Thus the word processor that was used to author the files is not required.
Such content files are not downloaded in their original format to the user. The special RTF format of the download file only presents the human language segments that need localization to the user. The user does not have access to change the original markup or presentation of the document. Of course, if the project manager wishes, users can work with the native format. In this case, the files can either be imported as Unextracted files, or the SecondaryTargetFile action can be used to create the native format file for the user to modify using the appropriate desktop application (for example QuarkXPress).
CXE (Content Exchange Engine)
GlobalSight's CXE ( Content Exchange Engine ) contains components related to reading and writing customer content from various data repositories, and converting that customer content from various formats to (and from) a common Unicode pivot format called GXML ( GlobalSight XML ).
GXML contains word-counted, linguistically segmented content in a form that separates the presentation structure or format from the actual human language text.
The use of asynchronous messages allows CXE to handle content from different sources (data repositories) in different formats. Some formats might require more processing or interaction with more adapters for conversion to other formats, and therefore take longer to process.
Content Repository Adapters
GlobalSight takes an adapter-based approach to integration with different content repositories or external systems. The external system’s API develops a specific adapter. That adapter is responsible for reading and writing document content (files, etc.) from the external system, and saving meta-data about that document in an internal format called EventFlowXml. The EventFlowXml contains all the necessary meta-data about the document needed to write it back to the appropriate system, as well as information needed for processing the document through CXE, including format and extraction information.
Desktop Publishing Format Adapters
Because desktop publishing formats (Quark, FrameMaker, Word, PDF, etc.) are generally binary formats, GlobalSight has developed separate converter(Web format adapters) servers using the FrameMaker, Quark, and Microsoft APIs in order to handle the round-trip conversion of these formats to XML. The resulting XML is sent via JMS to the Web Format Adapters for conversion to GXML.
Web Format Adapters
The GlobalSight Web Format Adapters are also known as the Extractors.
- The Extractors parse the Web format and separate the human language content from the markup of the Web format file.
- Additionally the language content is segmented into sentences or paragraphs (depending on how the system is set up).
- GlobalSight performs segmentation for all source languages.
- The segments are also word-counted.
- The Extractors also identify (according to rules defined in an XML rule file if appropriate) what is translatable, what is localizable, and what is skeleton. This information comprises the GXML.
- Translatable items --- are items in one human language that need to be converted (translated) to another human language.
- Localizable items --- are generally property settings like font, or color, or images or URLs. Those values are not translated directly, but the user enters in an appropriate value.
- Skeleton items --- are anything that comprises the presentation and markup within the Web format document.
Example HTML File
Example GXML Markup
GXML is the normalized, Unicode (UTF-8) pivot format used by GlobalSight for all content that is processed by GlobalSight's linguistic technologies (TM, Terminology). GXML is also the format in which TUVs are stored in GlobalSight's TM tables (in-process, PageTM, and SegmentTM).
Unextracted (binary) files do not use GXML at all, and are sent directly by the content repository adapters into CAP.
Additionally, CXE provides adapters that communicate with CAP for:
- secondary target file creation
- dynamic preview handling.
These adapters initiate the import process within CAP, and initiate the export process within CXE after receiving JMS messages from CAP.
CAP Pre- and Post-Processors
Each adapter within CXE has the capability to run plug-in processors for modifying the content or EventFlowXml within CXE. The processors are called either pre-processors or post-processors.
In general, an adapter invokes a pre-processor before doing its intended function, and invokes a post-processor after doing its intended function. For example, the Extractors invoke a pre-processor before extracting a Web format to create GXML. They invoke a post-processor after creating the GXML, but before sending the GXML out in a JMS message.
A processor cannot abort the operation of an adapter, but it can modify the content and information with which the adapter works.
Each adapter within CXE is associated with its own property file where the processor plug-ins can be registered. In order to create a plug-in, a Java class is created that implements the CxeProcessor interface (shown above in Figure 13). A processor only has one method called process() which allows the processor a chance to manipulate the content, the EventFlowXml, or additional meta-data carried in the CxeMessage object. (JavaDoc for the CxeMessage class, and other applicable classes is available separately.)
Example processors that have been used in the past for specific client environments have done: email generation, content deployment, content manipulation, integration with external systems.