Sponsored By

A guide on how to install the free version of the data warehousing system / tool "Pentaho Community Edition". Data warehouses are for making your (e.g. game) data useful, persistent, and consolidated as well as merge data from multiple sources.

Jan Pontzen, Blogger

October 13, 2015

27 Min Read

My job at my current employer is to set up and maintain a data warehouse, which is a very exciting task and challenge for me. What I found rather unpleasant was the installation of the data processing tool we are using, which is Pentaho Community Edition (v5.4).

"But wait this has nothing to do with games!?" Actually, it does. Games are software and any software can gather and process data on how it's being used. If you can track data, why shouldn't you? You actually get quite a lot of interesting insights on your users and, with putting some work into it, can acquire pretty interesting insights on what is awesome about your product and what sucks! :)

Installing Pentaho Community Edition (v5.4) is pretty painful. There is an official guide, however that one is for the Pentaho Enterprise Edition, which costs a few thousand $/€ per month. Community Edition is free, but there's no good documentation on it - Pentaho obviously has in interest in you as a customer not bothering with the free edition and license the paid-for version.

Ok, so here's the installation guide from my experiences:

  • For Mac or Linux installations, create the user "pentaho" to increase Pentaho performance.

  • Prepare your environment - download and install

    • Java v7.0 (Pentaho is Java-based),

    • PostgreSQL (it's generally considered the better SQL database for data warehouse related data tasks) - make sure you set your collation to en_US if you are not from the United States - and

    • a SQL GUI / query tool (personally, I use HeidiSQL as it supports both mySQL and PostgreSQL databases.

  • Download and install unpack

  • Set the environment variable (WindowsLinux) PENTAHO_JAVA_HOME to your Java installation directory.

  • Initialize the PostgreSQL BA Repository Databases via running the followingn SQL scripts in your SQL GUI tool:

  1. Create the hibernate database

--
-- note: this script assumes pg_hba.conf is configured correctly
--

-- \connect postgres postgres

drop database if exists hibernate;
drop user if exists hibuser;

CREATE USER hibuser PASSWORD 'password';

CREATE DATABASE hibernate WITH OWNER = hibuser ENCODING = 'UTF8' TABLESPACE = pg_default TEMPLATE template0 LC_COLLATE 'English_United States.1252' LC_CTYPE 'English_United States.1252';

GRANT ALL PRIVILEGES ON DATABASE hibernate to hibuser;

2) Create the quartz database

-- note: this script assumes pg_hba.conf is configured correctly
--

-- \connect postgres postgres

drop database if exists quartz;
drop user if exists pentaho_user;

CREATE USER pentaho_user PASSWORD 'password';

CREATE DATABASE quartz WITH OWNER = pentaho_user ENCODING = 'UTF8' TABLESPACE = pg_default TEMPLATE template0 LC_COLLATE 'English_United States.1252' LC_CTYPE 'English_United States.1252';

GRANT ALL ON DATABASE quartz to pentaho_user;

3) Switch to user "pentaho_user" (e.g. in your database GUI tool)

4) Create the quartz database #2

drop table if exists qrtz5_job_listeners;
drop table if exists qrtz5_trigger_listeners;
drop table if exists qrtz5_fired_triggers;
drop table if exists qrtz5_paused_trigger_grps;
drop table if exists qrtz5_scheduler_state;
drop table if exists qrtz5_locks;
drop table if exists qrtz5_simple_triggers;
drop table if exists qrtz5_cron_triggers;
drop table if exists qrtz5_blob_triggers;
drop table if exists qrtz5_triggers;
drop table if exists qrtz5_job_details;
drop table if exists qrtz5_calendars;

CREATE TABLE qrtz5_job_details
(
JOB_NAME VARCHAR(200) NOT NULL,
JOB_GROUP VARCHAR(200) NOT NULL,
DESCRIPTION VARCHAR(250) NULL,
JOB_CLASS_NAME VARCHAR(250) NOT NULL, 
IS_DURABLE BOOL NOT NULL,
IS_VOLATILE BOOL NOT NULL,
IS_STATEFUL BOOL NOT NULL,
REQUESTS_RECOVERY BOOL NOT NULL,
JOB_DATA BYTEA NULL,
PRIMARY KEY (JOB_NAME,JOB_GROUP)
);

CREATE TABLE qrtz5_job_listeners
(
JOB_NAME VARCHAR(200) NOT NULL, 
JOB_GROUP VARCHAR(200) NOT NULL,
JOB_LISTENER VARCHAR(200) NOT NULL,
PRIMARY KEY (JOB_NAME,JOB_GROUP,JOB_LISTENER),
FOREIGN KEY (JOB_NAME,JOB_GROUP) 
REFERENCES qrtz5_JOB_DETAILS(JOB_NAME,JOB_GROUP) 
);

CREATE TABLE qrtz5_triggers
(
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
JOB_NAME VARCHAR(200) NOT NULL, 
JOB_GROUP VARCHAR(200) NOT NULL,
IS_VOLATILE BOOL NOT NULL,
DESCRIPTION VARCHAR(250) NULL,
NEXT_FIRE_TIME BIGINT NULL,
PREV_FIRE_TIME BIGINT NULL,
PRIORITY INTEGER NULL,
TRIGGER_STATE VARCHAR(16) NOT NULL,
TRIGGER_TYPE VARCHAR(8) NOT NULL,
START_TIME BIGINT NOT NULL,
END_TIME BIGINT NULL,
CALENDAR_NAME VARCHAR(200) NULL,
MISFIRE_INSTR SMALLINT NULL,
JOB_DATA BYTEA NULL,
PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
FOREIGN KEY (JOB_NAME,JOB_GROUP) 
REFERENCES qrtz5_JOB_DETAILS(JOB_NAME,JOB_GROUP) 
);

CREATE TABLE qrtz5_simple_triggers
(
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
REPEAT_COUNT BIGINT NOT NULL,
REPEAT_INTERVAL BIGINT NOT NULL,
TIMES_TRIGGERED BIGINT NOT NULL,
PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP) 
REFERENCES qrtz5_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
);

CREATE TABLE qrtz5_cron_triggers
(
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
CRON_EXPRESSION VARCHAR(120) NOT NULL,
TIME_ZONE_ID VARCHAR(80),
PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP) 
REFERENCES qrtz5_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
);

CREATE TABLE qrtz5_blob_triggers
(
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
BLOB_DATA BYTEA NULL,
PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP) 
REFERENCES qrtz5_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
);

CREATE TABLE qrtz5_trigger_listeners
(
TRIGGER_NAME VARCHAR(200) NOT NULL, 
TRIGGER_GROUP VARCHAR(200) NOT NULL,
TRIGGER_LISTENER VARCHAR(200) NOT NULL,
PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP,TRIGGER_LISTENER),
FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP) 
REFERENCES qrtz5_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
);


CREATE TABLE qrtz5_calendars
(
CALENDAR_NAME VARCHAR(200) NOT NULL, 
CALENDAR BYTEA NOT NULL,
PRIMARY KEY (CALENDAR_NAME)
);


CREATE TABLE qrtz5_paused_trigger_grps
(
TRIGGER_GROUP VARCHAR(200) NOT NULL, 
PRIMARY KEY (TRIGGER_GROUP)
);

CREATE TABLE qrtz5_fired_triggers 
(
ENTRY_ID VARCHAR(95) NOT NULL,
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
IS_VOLATILE BOOL NOT NULL,
INSTANCE_NAME VARCHAR(200) NOT NULL,
FIRED_TIME BIGINT NOT NULL,
PRIORITY INTEGER NOT NULL,
STATE VARCHAR(16) NOT NULL,
JOB_NAME VARCHAR(200) NULL,
JOB_GROUP VARCHAR(200) NULL,
IS_STATEFUL BOOL NULL,
REQUESTS_RECOVERY BOOL NULL,
PRIMARY KEY (ENTRY_ID)
);

CREATE TABLE qrtz5_scheduler_state 
(
INSTANCE_NAME VARCHAR(200) NOT NULL,
LAST_CHECKIN_TIME BIGINT NOT NULL,
CHECKIN_INTERVAL BIGINT NOT NULL,
PRIMARY KEY (INSTANCE_NAME)
);

CREATE TABLE qrtz5_locks
(
LOCK_NAME VARCHAR(40) NOT NULL, 
PRIMARY KEY (LOCK_NAME)
);

INSERT INTO qrtz5_locks values('TRIGGER_ACCESS');
INSERT INTO qrtz5_locks values('JOB_ACCESS');
INSERT INTO qrtz5_locks values('CALENDAR_ACCESS');
INSERT INTO qrtz5_locks values('STATE_ACCESS');
INSERT INTO qrtz5_locks values('MISFIRE_ACCESS');

ALTER TABLE qrtz5_job_listeners OWNER TO pentaho_user;
ALTER TABLE qrtz5_trigger_listeners OWNER TO pentaho_user;
ALTER TABLE qrtz5_fired_triggers OWNER TO pentaho_user;
ALTER TABLE qrtz5_paused_trigger_grps OWNER TO pentaho_user;
ALTER TABLE qrtz5_scheduler_state OWNER TO pentaho_user;
ALTER TABLE qrtz5_locks OWNER TO pentaho_user;
ALTER TABLE qrtz5_simple_triggers OWNER TO pentaho_user;
ALTER TABLE qrtz5_cron_triggers OWNER TO pentaho_user;
ALTER TABLE qrtz5_blob_triggers OWNER TO pentaho_user;
ALTER TABLE qrtz5_triggers OWNER TO pentaho_user;
ALTER TABLE qrtz5_job_details OWNER TO pentaho_user;
ALTER TABLE qrtz5_calendars OWNER TO pentaho_user;

CREATE TABLE "QRTZ"
(
NAME VARCHAR(200) NOT NULL,
PRIMARY KEY (NAME)
);

5) Create the jackrabbit database

--
-- note: this script assumes pg_hba.conf is configured correctly
--

-- \connect postgres postgres

drop database if exists jackrabbit;
drop user if exists jcr_user;

CREATE USER jcr_user PASSWORD 'password';

CREATE DATABASE jackrabbit WITH OWNER = jcr_user ENCODING = 'UTF8' TABLESPACE = pg_default TEMPLATE template0 LC_COLLATE 'English_United States.1252' LC_CTYPE 'English_United States.1252';

GRANT ALL PRIVILEGES ON DATABASE jackrabbit to jcr_user;

  • Next, it's time to modify quite a few property / settings files. Follow the instructions outlined on the official Pentaho documentation, skipping the Pentaho Operations Mart stuff (Enterprise Edition only) and the JBoss stuff (Pentaho CE comes pre-installed with a TomCat application server when you unpack the BA server file).

  • Make sure you've got the most recent JDBC driver for PostgreSQL is in the following folders:

    • pentaho/server/biserver-ce/tomcat/lib

    • pentaho/design-tools/data-integration/lib

  • In your biserver-ce/tomcat/webapps/pentaho/META-INF/context.xml file, comment in / paste

<ResourcevalidationQuery="select 1"url="jdbc:postgresql://localhost:5432/hibernate"driverClassName="org.postgresql.Driver"password="password"username="hibuser"maxWait="10000"maxIdle="5"maxActive="20"factory="org.apache.commons.dbcp.BasicDataSourceFactory"type="javax.sql.DataSource"auth="Container"name="jdbc/Hibernate"/>

<ResourcevalidationQuery="select 1"url="jdbc:postgresql://localhost:5432/hibernate"driverClassName="org.postgresql.Driver"password="password"username="hibuser"maxWait="10000"maxIdle="5"maxActive="20"factory="org.apache.commons.dbcp.BasicDataSourceFactory"type="javax.sql.DataSource"auth="Container"name="jdbc/Audit"/>

<Resource validationQuery="select 1" url="jdbc:postgresql://localhost:5432/quartz"driverClassName="org.postgresql.Driver" password="password" username="pentaho_user"maxWait="10000" maxIdle="5" maxActive="20"factory="org.apache.commons.dbcp.BasicDataSourceFactory" type="javax.sql.DataSource"auth="Container" name="jdbc/Quartz"/>

  • Comment out the remainder.

  • Start the server via running the start-pentaho.bat (Windows) or start-pentaho.sh (Linux) file.

  • Open http://localhost:8080/pentaho in your browser and log in.

    •  if the login fails / page is not loaded, run stop-pentaho.bat / stop-pentaho.sh

    • next, clear your Pentaho CE cache 

    • delete the folders  

      • ...\pentaho-solutions\system\jackrabbit\repository

      • ...\tomcat\work\Catalina

    •  delete the file 

      • \tomcat\conf\Catalina\{host}\pentaho.xml

    • delete all files in the folders 

      • ...\pentaho-solutions\system\osgi\cache

      • ...\tomcat\temp

    • run start-pentaho.bat / start-pentaho.sh 

    • if the web interface of the Pentaho BA server looks strange, clear your browser's cache / history, then reload

That should get things running for you. I hope this helps, happy data warehousing! :)

Read more about:

Blogs

About the Author(s)

Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like