Download What is Marklogic?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Oracle Database wikipedia , lookup

IMDb wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Functional Database Model wikipedia , lookup

Ingres (database) wikipedia , lookup

Concurrency control wikipedia , lookup

Relational model wikipedia , lookup

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Versant Object Database wikipedia , lookup

Database model wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Transcript
Hadoop & NoSQL Database
Project
Spring 2016
MARKLOGIC DATABASE
Aashi Rastogi (0997297), Sanket Patel (0999383)
Introduction:
NoSQL means non-SQL or non-relational databases which provides mechanism to store
and retrieve the data other than relational databases. NoSQL database is in use nowadays because
of simplicity of design, easy to scale out and control over availability.
Types of NoSQL Databases:





Key-value based
Column oriented
Graph oriented
Document based
Multi-model
Multi-model database is an only designed to support multiple data models against a single
application. Marklogic DB is one of the NoSQL database that uses multi-model database design.
What is Marklogic?
Marklogic is the only Enterprise NoSQL Database. It is optimized for structured and
unstructured data that allows you to store, manage, query and search across JSON, XML, RDF
(Triplestore), Geospatial data, text, and large binaries. With Marklogic one can handle data in a
schema-agnostic fashion or built in application server and leads to faster time-to-results.
It provides capabilities like ACID Transaction, high availability and disaster recovery,
Security. Marklogic is designed to run on Hadoop and help you to use their technology in better
way. It is also easily deployed on cloud to maintain hardware and provide all benefits of elasticity.
It also has built in application services and text search capabilities. It allows to discover new facts
by acting as a triplestore with inference capabilities.
How it works?
It uses XML document as its data model, and stores the documents within a transactional
repository. It indexes the words and values from each of the loaded documents, as well as the
document structure. And, because of its unique Universal Index, Marklogic doesn’t require
advance knowledge of the document structure nor complete adherence to a particular schema.
Marklogic Server clusters on commodity hardware using a shared-nothing architecture and
differentiates itself in the market by supporting massive scale and fantastic performance- customer
deployments have scaled to hundreds of terabytes of source data while maintaining sub-second
query response time.
In addition to XML, Marklogic can store JSON, text, and binary documents. JSON
documents are internally transformed to XML for purposes of indexing. Text documents are
indexed as if each was an XML text node without a parent. Binary documents are by default
unindexed, with the option to index their metadata and extracted contents.
Motivations: Data has changed overtime and technology to handle it also changed. First Hierarchical
Era where data is tied with its application. Then it comes Relational era where data are stored
independent of the application but all of the data has to be fitted in tabular tables. But what if it
didn’t fit? Then one has to increase the size of table or chopped the data to fit it. But that will be
not done for all data like unstructured data. And this led to growth of NoSQL database.
Characteristics: Following are the characteristics to be discussed
1.
2.
3.
4.
5.
6.
7.
8.
Flexible Data Model
Search and Query
Clean Semantics
Scalability and Elasticity
ACID Transactions
High availability and Disaster Recovery
Hadoop Integration
Bitemporal
1. Flexible Data Model – It is the only database that can natively store and rapidly query
JSON, XML, RDF, and more- providing a single powerful platform for all data. The
document-centric data model is schema-agnostic, which provides flexibility in modelling
data.
Fig: Shows how Marklogic read documents(Schema-agnostics).
2. Search and Query – Marklogic indexes data on load and makes it immediately searchable.
It works on Universal index like a search engine. The Universal Index keeps track of words,
phrases, and values in documents. It also indexes the structure of documents—thus
providing context for search. By indexing like a search engine, queries become really fast.
This indexes provides ability to run complex queries across multiple data types.
3. Semantics – Semantics provides a new approach to modelling data that focuses on
relationships and context. It simply links two entities together based on the relationship
between them to form a triple. This triples form a graph that is without heirarchy.
4. Scalability and Elasticity – Marklogic is designed with a shared nothing architecture
means all nodes have its own memory and disk. In Marklogic it scales horizontally in
clusters on commodity hardware to hundreds of nodes, petabytes of data, and billions of
documents and still processes tens of thousands of transactions per second.
5. ACID Transactions – ACID stands for atomicity, consistency, isolation and durability.
Marklogic gets ACID transactions using MVCC (multi-version concurrency control). In
an MVCC systems, changes are tracked with a timestamp number on each document. The
database uses these timestamps to ensure that all users see consistent data.
6. High availability and Disaster Recovery – It achieves a HA/DR using a shared-nothing
architecture that provides redundancy for failover and high-performance scaling, with no
single point of failure. It can quickly and easily backup selected components or the entire
database, all securely using SSL out-of-the-box. It also has incremental backups means to
only backup the changes since the previous incremental or full backup.
7. Hadoop Integration – Hadoop is popular because it is designed to cheaply store large
amount of data in the Hadoop Distributed File System (HDFS) and run large-scale
MapReduce jobs for batch analysis. MarkLogic is the best database for Hadoop because it
can seamlessly run alongside the Hadoop ecosystem, acting as the database to power realtime, transactional applications.
8. Bitemporal – Bitemporal ensures that you always have a full and accurate picture of your
data at every point-in-time, which is particularly useful in regulated industries. It enables
you to get better answers from today’s, tomorrow’s, and yesterday’s data. You can go back
in time and explore data, manage historical data across systems, ensure data integrity, and
do complex bitemporal analysis with ease.
Brief Manual –
Supported Platforms: MarkLogic Server is supported on the following platforms:








Microsoft Windows Server 2012 (x64), Microsoft Windows Server 2008 (x64),
Windows 7 and 8 64-bit (x64)*
Sun Solaris 10 (x64)
Red Hat Enterprise Linux 7 (x64)** ***
Red Hat Enterprise Linux 6 (x64)** *** ****
SUSE Linux Enterprise Server 11 (x64) SP3** ***
CentOS 6 (x64)** ***
Amazon Linux 2013.03 (x64)** ***
Mac OS X 10.8 or 10.9*****
* Microsoft Windows 7 and Windows 8 are supported for development only. If MarkLogic Server
fails to start up on Windows with the error 'the application failed to initialize properly
(0xc0150002)', then a dependency is missing from your environment and you need to download
and
install
the
following
DLL
for
64-bit
versions
of
Windows: http://www.microsoft.com/downloads/details.aspx?FamilyID=eb4ebe2d-33c0-4a479dd4-b9a6d7bd44da&DisplayLang=en. Additionally, if you get an error on startup saying you
need MSVCR100.dll, the install the Microsoft Visual C++ 2010 SP1 Redistributable Package
(x64)http://www.microsoft.com/en-us/download/details.aspx?id=13523.
** The deadline I/O scheduler is required on Red Hat Linux platforms. The deadline scheduler is
optimized to ensure efficient disk I/O for multi-threaded processes, and MarkLogic Server can
have many simultaneous threads. For information on the deadline scheduler, see the Red Hat
documentation (for example, http://www.redhat.com/magazine/008jun05/features/schedulers/).
***The redhat-lsb, glibc, and gdb packages are required on Red Hat Linux. Additionally, on 64bit Red Hat Linux, both the 32-bit and the 64-bit glibc packages are required.
****Red Hat Linux 6 (x64) is also supported in a VMWare ESXi 5.0 (installed on bare metal)
environment.
*****Mac OS X is supported for development only. Conversion (Office and PDF) and entity
enrichment are not available on Mac OS X. Mac OS X 10.8 or 10.9 (Mountain Lion or Mavericks)
on a 64-bit capable processor is required (http://support.apple.com/kb/HT3696).
Installing Marklogic Server: This section describes the procedure for installing Marklogic Server on each platform. Perform the
procedure corresponding to the platform to which you are installing.
Platform
Perform the following:
1. Shut down and uninstall the previous release of Marklogic Server (if you are
upgrading from 7.0, 6.0, or 5.0, see Upgrading from Release 7.0, 6.0, Or 5.0, if
Windows
you are upgrading from 8.0-1 or later, see Removing Marklogic Server).
x64
2. Download the Marklogic Server installation package to your desktop. The latest
installation packages are available from http://developer.marklogic.com.
3. Double click the MarkLogic-8.0-1-amd64.msi icon to start the installer.
If you are installing a release other than 8.0-1, double-click on the appropriately
named installer icon.
4.
5.
6.
7.
The Welcome page displays. Click Next.
Select Typical.
Click Install.
Click Finish.
Red Hat1. Shut down and uninstall the previous release of Marklogic Server (if you are
Linux x64
upgrading from 7.0, 6.0, or 5.0, see Upgrading from Release 7.0, 6.0, Or 5.0, if
you are upgrading from 8.0-1 or later, see Removing Marklogic Server).
2. Download the package to /tmp or another location using your web browser. The
latest
installation
packages
are
available
from
the http://developer.marklogic.com.
If you are using Firefox or another browser that is configured to
associate rpm files, the browser will prompt you for the root password (if you are
not already running as root) and you can follow the prompts to complete the
installation. When the installation is complete, you can skip the next step.
Otherwise, continue to the next step.
3. As the root user, install the package with the following command:
rpm -i /tmp/MarkLogic-8.0-1.x86_64.rpm
If you are installing a release other than 8.0-1, replace the characters 8.0-1 in the
line above with the appropriate release number.
4. If you are using HDFS, make sure the server is configured to use HDFS with a
Hadoop HDFS client and any needed environment variables set in
the /etc/sysconfig/Marklogic file. For details, see HDFS Storage in the Query
Performance and Tuning Guide.
Sun
Solaris
x64
1. Shut down and uninstall the previous release of Marklogic Server (see Removing
Marklogic Server).
2. Download the package to /var/spool/pkg using your web browser. The latest
installation packages are available from http://developer.marklogic.com.
3. Unpack the compressed tar file in /var/spool/pkg with the following shell
commands:
4. % cd /var/spool/pkg
5. % uncompress MARKlogic-8.0-1-amd64.tar.Z
6. % tar xf MARKlogic-8.0-1-amd64.tar
% rm MARKlogic-8.0-1-amd64.tar
If you are installing a release other than 8.0-1, replace the characters 8.0-1 in the
line above with the appropriate release number.
7. As the root user, install the package with the following command:
# pkgadd Marklogic
Mac OS X1. Download the Marklogic Server installation package to your desktop. The latest
installation packages are available from the http://developer.marklogic.com.
2. Double click the MarkLogic-8.0-1-x86_64.dmg icon to open the folder that
contains the MarkLogic-8.0-1-x86_64.pkg installer. Double click on the installer
to start.
3. The Welcome page displays. Click Continue.
4. In the Select a Destination window, select a destination to install Marklogic
Server or Continue to select the default destination.
5. In the Installation Type window, click Install. An Installation window appears
that displays the progress of the installation.
6. When the installation Summary window appears, click Close.
7. A Marklogic control window appears from which you can start/stop Marklogic
Server, open the Admin Interface, and view the Error Log.
The following table shows the installation directory (<marklogic-dir>) and the default data
directory for each platform:
Platform
Installation Directory
Default Data Directory (for configuration and
log files)
Windows
c:\Program
Files\Marklogic\
c:\Program Files\Marklogic\Data
/opt/Marklogic
/var/opt/Marklogic
Sun Solaris
/opt/Marklogic
/var/opt/Marklogic
Mac OS X
~/Library/Marklogic
~/Library/Application Support/Marklogic/Data
Red
Linux
Hat
The default forest directory is the same as the default data directory if the optional data directory
is not specified during forest creation. On UNIX platforms, if you want Marklogic Server to use
another location for its default data directory, make your data directory (/var/opt/Marklogic on
Linux and /var/opt/Marklogic on Solaris) a soft link to the alternate location.
Starting Marklogic Server: Marklogic Server will automatically start when the computer reboots. To start Marklogic Server
without rebooting, perform the following command for the platform on which you are running:
Platform
Perform the following:
Windows
Select Start > Programs > Marklogic Server > Start Marklogic Server.
When you start Marklogic Server from the Start menu, the Windows service
configuration for Marklogic Server is set to start automatically. Also, if you are
using Windows Vista or Windows 7, to start the service you must right-click the
Start Marklogic Server link in the Start menu and choose Run as Administrator,
then choose to allow the action.
Red Hat
Linux
As the root user, enter the following command:
/etc/init.d/Marklogic start
Sun
Solaris
As the root user, enter the following command:
/etc/init.d/Marklogic start
Mac OS X
Select System Preferences > Marklogic to open the Marklogic control window.
Click Start Marklogic Server.
This starts all of the App Servers that are configured on your Marklogic Server.
Configuring the First and Subsequent Hosts: The following configuration procedures different depending on if you run Marklogic Server in a
cluster configuration or on a single host. The procedures are as follows:



Configuring a Single Host or the First Host in a Cluster
Configuring an Additional Host in a Cluster
Leaving a Cluster and Becoming a Single Host
If you are configuring Marklogic Server as a standalone host, or if this is the first host in a cluster
configuration, follow the installation instructions. Otherwise, follow the installation instructions.
If you are upgrading a cluster to a new release, see Upgrading a Cluster to a New Maintenance
Release of Marklogic Server in the Scalability, Availability, and Failover Guide. The security
database and the schemas database must be on the same host, and that host should be the first host
you upgrade when upgrading a cluster.
Configuring a Single Host or the First Host in a Cluster
To configure this installation as a single host, or as the first host in a cluster, perform the following
steps:
1. Install Marklogic and start Marklogic as described in Installing Marklogic Server and Starting
Marklogic Server.
2. Log into the Admin Interface in a browser. It is on port 8001 of the host in which Marklogic is
running (for example, on the localhost, http://localhost:8001). The Server Install page appears.
3. Click OK to continue.
4. Wait for the server to restart.
5. After the server restarts, you will be prompted to join a cluster.
6. Click Skip.
7. You will be prompted to create an admin user. Enter the login name and password for the admin
user.
8. Click OK.
9. You will be prompted to log in with your admin username and password.
You will now see the Admin Interface.
Configuring an Additional Host in a Cluster
All hosts in a cluster have to be on the same platform. To configure this installation as an additional
host in a cluster of the same platform, perform the following steps:
1. On the node you want to add to an existing cluster, install Marklogic and startMarklogic, as
described in Installing Marklogic Server and Starting Marklogic Server.
2. Log into the Admin Interface in a browser. It is on port 8001 of the host in which Marklogic is
running (for example, on the localhost, http://localhost:8001). The Server Install page appears.
3. Click OK to continue.
4. Wait for the server to restart.
5. After the server restarts, you will be prompted to join a cluster.
6. Enter the DNS name or the IP address of one of the machines in the cluster. For instance, if this is
the second host you are installing, you can enter the DNS name of the first host you installed.
7. Click OK.
8. You will be prompted for an admin username and password. You can use the admin username and
password you created when installing the first host. Click OK.
9. Select a Group to assign this host. Click OK.
10. Click OK to confirm that you are joining the cluster.
11. You have now joined the cluster.
12. Click OK to transfer the cluster configuration information.
You have completed the process to join a cluster and will now see the Admin Interface.
Leaving a Cluster and Becoming a Single Host
If your host is currently in a cluster of multiple hosts, and you would like to leave the cluster and
switch to a single host environment, follow the steps in this section.
A host cannot leave a cluster if there are still forests assigned to it or if it has any foreign clusters
associated with it; you must delete all forests assigned to the host and de-couple any clusters
associated with a host before you can leave the cluster. However, you can delete the configuration
only for a forest and the forest data will remain on the filesystem, allowing you to add the forest
back to the host after changing the configuration. For instructions on adding a forest to a host, see
the Administrator's Guide.
Perform the following steps to leave the cluster to which a host is connected.
1. Run the Admin Interface from the host you want to remove from the cluster.
2. Click the Hosts icon in the left menu tree. The Host Summary page appears.
3. Click the name of the host you want to remove from the cluster, either from the left menu tree or
from the Host Summary page. The Host Configuration page appears:
The Leave button only appears if the Admin Interface is running from this host.
4.
5.
6.
7.
Click the Leave button
Click OK to confirm leaving the cluster.
The host restarts to load the new configuration.
Follow the instructions in sections 'Configuring a Single Host or the First Host in a Cluster' or
'Configuring an Additional Host in a Cluster' as appropriate.
Entering a License Key
Marklogic will run without a license key, but you should enter a valid key for what you are licensed
for after installing Marklogic. At any time, you can change the license key for a host from the Host
Status page. You might need to change the license key if your license key expires, if you need to
use some features that are not covered in your existing license key, if you upgrade your hardware
with more CPUs and/or more cores, if you need a license that covers a larger database, if you
require different languages, or for various other reasons. Changing the license key sometimes
results in an automatic restart of Marklogic (for example, if your new license enables a new
language).
To change the license key for a host, perform the following steps using the Admin Interface:
1. Click the Hosts icon on the left tree menu.
2. Click the name of the host in which contains you want to change the license key, either on the tree
menu or the summary page. The Host Configuration page appears.
3. Click the Status tab. The Host Status page appears.
4. Click the License Key button. The License Key Entry page appears.
5. Enter your new license key information. For information about licensing of Marklogic Server,
contact your Marklogic sales representative.
6. After entering valid information in the Licensee and License Key fields, click OK. If it needs to,
Marklogic will automatically restart, and the new license key will take effect.
Checking for the Correct Software Version
After logging in with your admin username and password, the Admin Interface appears. In the left
corner of the Admin Interface, the version number and product edition are displayed.
To view more details about the release of Marklogic Server that is installed and licensed, complete
the following steps:
1. Click the Hosts icon on the left tree menu.
2. Select the name of the host you just installed, either from the left menu tree or from the Host
Summary page.
3. Click the Status tab. The Host Status page appears.
4. Check that <version> is correct.
To begin using Marklogic Server, see the following document:

Getting Started With Marklogic Server
Otherwise, you are finished with the Admin Interface for now. You have successfully installed
Marklogic on your system.
Configuring Marklogic Server on UNIX Systems to Run as a Non-daemon User
On UNIX-based systems (Linux and Solaris), Marklogic runs as the UNIX user
named daemon. This section describes how to change a configuration to run as a
different named UNIX user. This procedure must be run by the root user.
Additionally, the root user is still required for installing and uninstalling Marklogic
and for starting and stopping Marklogic from the startup scripts.
To modify an installation to run as a user other than daemon, perform the following
steps:
1. In a command window on the machine in which you installed Marklogic, log in as
the root user.
2. Make sure Marklogic is stopped. If it is still running, stop it as follows:
Platform
Perform the following to stop Marklogic:
Red Hat Linux
As the root user, enter the following command:
/etc/init.d/Marklogic stop
Sun Solaris
As the root user, enter the following command:
/etc/init.d/Marklogic stop
3. Edit the configuration file for your platform using a text editor such as vi.
Platform
Configuration File to Edit
/etc/sysconfig/Marklogic
Red Hat Linux
Sun Solaris
/etc/Marklogic.conf
4. In the file, edit the MARKLOGIC_USER environment variable to point to the user
in which you want Marklogic Server to run. For example, if you want it to run as a
user named raymond, change the following line:
MARKLOGIC_USER=daemon
to the following:
MARKLOGIC_USER=raymond
5. Save the changes to the /etc/sysconfig/Marklogic or /etc/Marklogic.conf file.
6. If you have not yet started Marklogic after performing a clean installation (that is,
after installing into a directory where Marklogic has never been installed), then you
are done and you can skip the rest of the steps in this procedure. If have an existing
installation (for example, if you are upgrading to a maintenance release), then
continue with the following steps.
7. For all of the Marklogic files owned by daemon, you need to change the owner to
the new user. This includes all forest data and all of the configuration files. By
default, the forest data is in the following directories:
Platform
Default Data Directory (for configuration and log files, and
default forest directory)
Red
Linux
/var/opt/Marklogic
Hat
Sun Solaris
/var/opt/Marklogic
8. For example, on a Linux system, perform a command similar to the following, which
changes the owner to the user specified earlier in the /etc/sysconfig/Marklogic file:
9. chown -R raymond /var/opt/Marklogic
10. Make sure to change the owner for all forests in the system, otherwise forests will
fail to mount upon startup. Note that the above command only changes the owner for
forests installed in the default directory. You need to run a similar command on the data directory
for each forest in which a data directory is specified.
11. When you have completed all the file and directory ownership changes, start Marklogic as
described in Starting Marklogic Server.
Once you have performed this procedure, all new files created by Marklogic are created with the
new user ownership; there will be no need to change any ownership again.
The configuration changes you made to the startup scripts need to be merged in during any upgrade
of Marklogic (because the installation installs a new version of the startup scripts). Under Linux,
the
uninstallation
process
saves
an
old
version
of
the
scripts
(for
example, /etc/sysconfig/Marklogic.rpmsave), so you can use that version to merge in your
changes. If you perform a clean installation (not an upgrade installation), however, you will need
to run this entire procedure again.
Removing Marklogic Server
To remove Marklogic from your system, complete the following steps:
1. Stop Marklogic by performing the following action based on the platform in which you are
running:
Platform
Perform the following:
Windows
Select Start > Programs > Marklogic Server > Stop Marklogic Server.
If you are using Windows Vista or Windows 7, to stop the service you must
right-click the Stop Marklogic Server link in the Start menu and choose Run as
Administrator, then choose to allow the action.
Red Hat
Linux
As the root user, enter the following command:
Sun Solaris
As the root user, enter the following command:
/etc/init.d/Marklogic stop
/etc/init.d/Marklogic stop
Mac OS X
Select System Preferences > Marklogic to open the Marklogic control window.
Click Stop Marklogic Server.
2. Once the server is stopped, you can uninstall Marklogic package by performing the following
action based on the platform in which you are running:
Platform
Perform the following:
Windows
Use the Add/Remove Programs Control Panel to uninstall Marklogic.
Red
Linux
As the root user, enter the following command:
Hat
rpm -e Marklogic
Sun Solaris
As the root user, enter the following command:
pkgrm Marklogic
Mac OS X
No action is necessary when upgrading.
If you want to remove the user data and do a fresh install, then remove the
following directory:
~/Library/Application Support/Marklogic/Data
To entirely remove Marklogic, remove the following directories:
~/Library/Marklogic
~/Library/Application Support/Marklogic
~/Library/StartupItems/Marklogic
~/Library/PreferencePanes/Marklogic.prefPane
To make Mac OS X completely forget it ever had a Marklogic installation, run
the following command from a terminal window:
sudo pkgutil --forget com.Marklogic.server
3. Using this procedure to remove Marklogic from your system will not remove user data
(configuration information, XQuery files used by HTTP or XDBC servers, or forest content). This
data is left in place to simplify the software upgrade process. If you wish to remove the user data,
you must do so manually using standard operating system commands.
Database Definition and Manipulation:
Creating a New Database
Follow the following steps to create a new database.
1. Click the Databases icon in the left tree menu.
2. Click the Create tab at the top right. The Create Database page displays:
3. Enter the name of the database. This is the name the system will use to refer to this database.
4. Select a security database to be associated with this database. We recommend selecting Security
as the security database.
5. Select a schema database to be associated with this database.
6. You may leave the rest of the parameters unchanged or set them according to your needs.
7. Click OK. Your database is now created. You can now attach forests to the database. Creating
a database is a “hot” admin task.
Attaching and/or Detaching Forests to/from a Database
In order to query content in a forest, it must be attached to a database. Forests can be moved from
one database to another (detached from one database and attached to another). Detaching a forest
from a database does not delete the forest; the forest remains on the host on which it was created
with the data intact. Forests can be moved from one database to another (detached from one and
attached to another). However, before you attach the forest to another database, ensure that the
new database has the same configuration as the old database. If the configuration of the new
database is different and the reindex enable setting is set to true on the new database, the forest
will begin reindexing to match the database configuration as soon as it is attached.
Perform the following steps using the Admin Interface to attach or detach one or more forests to a
database:
1. Click the database to which you want to attach forests.
2. Click the Forests icon for the database. The Database Forest Configuration Page appears.
3. Check the box corresponding to forest(s) you want to attach to the database. You can also
uncheck forests you want to detach from the database.
4. Click OK.
The forests you attached or detached are now reflected in the database configuration.
Attaching and detaching a forest to a database are “hot” admin tasks.
Viewing Database Settings
To view the settings for a particular database, perform the following steps:
1. Click the Databases icon on the left tree menu.
2. Locate the database for which you want to view settings, either in the tree
menu or in
the Database Summary table.
3. Click the name of the database for which you want to view the settings.
4. View the settings.
5. Click Forests, Triggers, Content Processing, Fragment Roots, Fragment Parents, ElementWord-Query-Throughs, Phrase-Throughs, Phrase-Arounds, Element Indexes and Attribute
Indexes to view settings specific to those aspects of the database.
Loading Documents into a Database
You can use the Admin Interface to load documents into the database. The documents will
be loaded with the default permissions and added to the default collections of the user with
which you logged into the Admin Interface. To load a set of documents into a database,
perform the following steps:
1. Click the Databases icon on the left tree menu.
2. Click on the database into which you want to load the documents.
3. Click on the Load tab near the top right.
4. Enter the name of the directory in which the documents are located. This directory
must be accessible by the host from which the Admin Interface is currently running.
5. Enter a filter for the names of the documents to be loaded (for example, *.xml to load
all files with an xml extension). For an exact match, enter the full name of the document.
6. Click OK to proceed.
7. The load confirmation screen will list all documents in the specified directory
matching the specified filter. Click OK to complete the load.
The documents are loaded into the database. The URI path of the documents are the
same as your filesystem path.
Merging a Database
You can merge all of the forest data in the database using the Admin Interface. The
Merge button allows you to explicitly merge the forest data for this database. To
explicitly merge the database, complete the following procedure:
1. Click the Databases icon on the left tree menu.
2. Decide which database you want to merge.
3. Click the database name, either on the tree menu or the summary page. The Database
Configuration page displays.
4. Click the Merge button on the Database Configuration page. A
confirmation
message displays.
5. Confirm that you want to merge the forest data in this database and click OK. Merging
data in a database is a “hot” admin task; the changes take effect immediately.
Reindexing a Database
You can reindex all of the document data in the database using the Admin Interface. The
reindex operation sets the reindexer timestamp to the current system timestamp, which
causes a reindex and refragment operation on all fragments in the database that have a
timestamp equal to or less than the timestamp (assuming reindexer enable is set to true).
The Reindex button forces a complete reindex/refragment operation on the database.
To reindex the database, complete the following procedure:
1. Click the Databases icon on the left tree menu.
2. Decide which database you want to reindex.
3. Click the database name, either on the tree menu or the summary page. The Database
Configuration page displays.
4. Click the Reindex button on the Database Configuration page. A confirmation message
displays.
5. Confirm that you want to reindex this database and click OK.
Reindexing data in a database is a “hot” admin task; the changes take effect immediately.
Clearing a Database
You can clear all of the forest content from the database using the Admin Interface.
Clearing a database deletes all of the content from all of the forests in the database, but
leaves the database configuration intact.
To clear all data from a database, complete the following procedure:
1. Click the Databases icon on the left tree menu.
2. Decide which database you want to clear.
3. Click the database name, either on the tree menu or the summary page. The Database
Configuration page displays.
4. Click the Clear button on the Database Configuration page. A confirmation message
displays.
5. Confirm that you want to clear the forest data from this database and click OK.
Clearing a database is a “hot” admin task; the changes take effect immediately.
Deleting a Database
A database cannot be deleted if there are any HTTP, WebDAV, or XDBC servers that
refer to the database. Deleting a database detaches the forests that are attached to it, but
does not delete them. The forests remain on the hosts on which they were created with the
data intact. Perform the following steps to delete a database:
1. Click the Databases icon on the left tree menu.
2. Locate the database you want to delete, either in the tree menu or in the Database
Summary table.
3. Click the name of the database which you want to delete.
4. Click on the Delete button near the top right.
Note: Clicking the Clear button clears all of the forests attached to this database, removing
all of the data from the forests. Clicking the Delete button removes the database
configuration, but does not delete the data stored in the forests.
5. Assuming that there are not any HTTP, WebDAV, or XDBC servers referring to the
database, a delete confirmation screen appears. Click OK.
The database is now permanently deleted. Deleting a database is a “hot” admin task.
Application: Oscar Search Application using Application
Builder and its Snapshots
Overview of Application Builder
Using Application Builder requires no coding on your part. Its user interface is easy to use, while
its search applications can have many high-end search features such as a search box with
Google-style search grammar, search suggestions, faceted navigation, and results visualization
widgets. It scales for huge database sizes while maintaining its speed.
The generated application uses the Search API and can be used as is or customized with your
own code.You can define many aspects of an application, such as:




Facets
Details appearing on the search result page
Content display control via item rendering
Visualization widgets for search results
Typically, building an application is an iterative process. To begin, you must have a
representative content set loaded in a database with any needed indexes already set up. If your
content is not complete or not completely indexed, you can still generate an application and
modify it as you modify your content.
Setting Up and Starting Application Services
Application Builder is bundled with MarkLogic Server Application Services. On a fresh
installation of MarkLogic 8, Application Builder is preconfigured and ready to use. For an
upgrade installation, your existing application data remains intact although some renaming of
your Application database and App Server may occur during the installation process. This
section describes the following scenarios:


Clean Installation
Starting Application Services
Clean Installation
When you install MarkLogic Server for the first time, the installation process does the following:


Creates an HTTP App Server named App-Services on port 8000 for Application Services
Creates a database named App-Services to store the Application Builder application documents
Starting Application Services
To start Application Services, open a browser and go to your server's port 8000. For example, if
your browser runs on the same machine as MarkLogic Server, open the following URL:
http://localhost:8000/appservices
When MarkLogic Server prompts you for a username and password, enter them for a user with
either the admin or app-builder role.
Building the Oscars Application
Application Builder includes a template to build a sample application based on Oscar awards
data from Wikipedia. To build this application, go through the Application Builder wizard as
follows:
1. Start Application Builder by going to the following URL (If MarkLogic Server is installed on a
different host or your App Server uses a different port, substitute those values):
http://localhost:8000/appservices
2. On the Application Builder Applications screen, click New Example Application.
3. The New Example Application screen appears.
a. Enter a name for the application, in this case Oscars.
b. Select New Database and enter a database name, in this case Oscars.
c. Click Create Application.
4. Application Builder creates an Oscars App Server, forest, and database and then displays the
Search tab page.
1. On the Search page, you can accept the default search constraints and facets or change the
settings.
2. Click the next button at the upper right to go to the Assemble tab page.
3. On the Assemble page, you can accept the defaults widgets and layout or change the settings.
1. Click the next button at the upper right to go to the Results tab page.
2. On the Results page, you can accept the defaults for the contents of an individual search result or
change the settings.
1. Click the next button to move to the Sort tab page.
a. The Sort tab page displays.
1. On the Sort page, you can accept the default search results ordering(s) or change the settings.
2. Click the next button to go to the Content page.
a. In the Content tab page, you can control how the application renders content as XHTML for web
browsers.
1. Click the next button to move to the Apperance tab page.
a. On the Appearance page, you can specify your application's title and overall look and feel.
1. Click the next button to go to the Deploy tab page.
a. The Deploy page appears.
1. On the Deploy page, select New App Server. (You can only select Existing App Server if an App
Server is already configured for this application.) Accept the default values or provide the App
Server with a name and port number.
2. Click the Deploy button and confirm.
a. Application Builder creates and configures the new App Server and opens a new window where
it launches the new application. This may take a short while. When Application Builder prompts
you to log in, enter a username and password and click OK.
b. You can test the Oscars application by entering search terms or clicking on the browse links to
narrow the diplayed results.
Loading the Complete Set of Oscars Data
Initally, Application Builder only loads a few sample data files for use by the Oscars application.
To load the full 20 MB content set, use the following steps:
1. In the Information Studio Flows section of the Application Services page, click New Flow.
The Flow Editor appears.
2. Click Change Collector at the bottom of the Collect section:
3. In the Select A Collector window, select Oscars Example Data Loader:
4. In the Configure Settings window, select Done. Do not make any changes in this window.
5. The Collect section of the Flow now shows that the Oscars Example Data Loader is configured
and the URL from which it will download the data.
6. In the Load section, select oscars as the Destination Database and click Document Settings:
7. In the Document Settings window, change the URI Structure Configuration to:
/oscars/{$filename}{$dot-ext}
Click Done.
8. Click Start Loading to load the content into the database.
The data downloads automatically over your Internet connection, while a spinning icon appears
until the load is complete. When done, you will see different count values and additional facet
values.
Using the Oscars Sample Application
The Oscars sample application enables you to search, browse, and display articles about Oscar
award winners from the last nine decades. It is uses the Search API's standard features, including
query text parsing, faceted navigation, snip petting, and many more.
While you can learn about the application by playing with it, this section highlights some of its
main features, including:




Keyword Searching, Search Suggestions, and Parsing
Browsing with Facets
Search Result Page
Displaying Content Details
Keyword Searching, Search Suggestions, and Parsing
You can enter keywords into the search box and press return to search the database. For example,
a search for raymond shows snippets for the first 10 of 37 results.
You can also search using constraints. For example, the following query text finds everything
about the actor Dustin Hoffman:
actor:"Dustin Hoffman"
This is not a standard full-text search, but is a constraint showing all the documents matching
where an particular value in the source XML, <actor>, has the content Dustin Hoffman. You can
combine the constraint with other terms to further narrow the results:
buck actor:"Dustin Hoffman"
When you click on any links in the user interface, notice that the query text in the search box
shows the current query.
Browsing with Facets
The left side of the application shows facets used to browse through the content. When you click
on a facet, it narrows the overall search results to those results in the facet's category, while
keeping the existing categories or search terms active.
For example, if you first click on the Award:Best Director browse link, then on the
Decade:1970s link, and then on the Winners:True link, your results are all of the 1970s winners
of the Best Director award.
Each of the browse facets has a count of how many of its results match your current query.
Search Result Page
The search result page shows a link with a text summary of the content, highlighted snippets of
the content matching your search, and other information about the search match. Clicking the
result link takes you to the content details.
Displaying Content Details
The content details page includes the complete content for the search result. The rendering is
based on the configuration specified on Application Builder's Content Display page. The page's
style is based on the skin you chose and on any custom CSS entered on the Appearance page.
Using Application Builder to Modify the Oscars Sample Application
This section describes how to add a year facet to the Oscars application. With the year facet, you
can first drill down on results with the decade facet, then drill down further using the year facet.
The year facet uses the same index as the decade facet.
To create a year facet, do the following:
1. Start Application Builder (for example, open http://localhost:8000/appservices in a browser).
2. On the Application Builder Applications page, click the application name that you used for your
Oscars sample application (in this case, Oscars).
3. Click the Search tab. At the bottom of the Search page, click Add New:
4. In the New Constraint dialog, Click Range.
a. In the New Constraint dialog, enter year for the Name and select year for the Source Index.
b. Click Create Range Constraint. Application Builder creates the constraint.
5. In the application name menu, click Deploy Now from the Oscars pull-down menu:
6. Application Builder compiles and deploys the new application code to your App Server's
modules database. During deployment, the following appears in a new window:
When Application Builder is done, the newly modified application replaces the status page,
including the new year facet. Test the facet by doing a search, selecting a decade, and then
selecting a year to find the results for a single year from that decade.
Conclusion: Marklogic is designed to handle the volume, variety, and velocity of Big Data like other
NoSQL solutions, and has the enterprise features that made last-generation relational
databases so reliable. And Marklogic gives a way to manage the hierarchical content,
distributed graph data, and also XML and RDF in the same database.
MarkLogic has dramatically accelerated the deployment of products and services,
while greatly reducing the costs of content loading and design – translating into even faster
research cycles and clinical diagnoses – thanks to a new generation of solutions for helping
professionals find exactly the information they need, when they need it most.