Download Java and Python - Amazon Web Services

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Java, Python, Zope and Indexing
Having Your Cake and Eating It
Chris Withers
[email protected]
http://www.zope.org/Members/chrisw
http://zope.nipltd.com/
Overview
• Java and Python Integration
• Indexing
– ZCatalog
– Lucene
http://zope.nipltd.com/
New Information Paradigms
(NIP)
• In Business 12 years
• Specialise in Knowledge & Content Management
• Customers include:
– Most large Pharmaceutical companies
– London Stock Exchange
– Readers Digest
http://zope.nipltd.com/
NIP’s Technologies
• Wide range of skills including:
– Zope Consulting & Hosting
– J2EE and Oracle
– Lotus Notes
• Operating Systems:
– Windows
– Solaris
– Linux
http://zope.nipltd.com/
Contacting NIP
• http://www.nipltd.com
– For an overview
• http://zope.nipltd.com
– For Zope specific stuff
• [email protected]
– To contact by email
http://zope.nipltd.com/
Java and Python
• Why use Java?
–
–
–
–
It’s overly verbose
Not very dynamic
Painful Exception Handling
“Too” object oriented
• But…
http://zope.nipltd.com/
Java and Python
• Why use Java?
– Quicker Execution
– Very Popular Language
• More libraries
• More robust
– more testers
• Better documentation
– more authors around
– Politically acceptable
http://zope.nipltd.com/
But I want to use Python!
…so find a way to use Java and Python in the same
environment.
• So what are the options?
– Jython / JPython
– Web Services
…and other loose couplings
– Java Python Environment
http://zope.nipltd.com/
Jython / JPython
• Python implemented in Java instead of C
+ Very politically acceptable
– Can’t use C extensions to Python
– Not the “main branch” of Python development
• Status?
http://zope.nipltd.com/
Loose Couplings
• Web Services
• Shared Files
• Low-level socket protocols
+ No restrictions on versions of languages or extensions
to languages used.
+ Easy to distribute applications over several machines
– A lot more work for the developer
– Inefficient communication between virtual machines
http://zope.nipltd.com/
Java Python Environment (JPE)
• Low-level bridge between a Java virtual machine
and a Python virtual machine
+ Use almost any Java library from Python
+ Use almost any Python library from Java
+ Very Transparent
– Difficult to Build, Install and find out about
http://zope.nipltd.com/
How does JPE work?
Java
Virtual Machine
JPE
Python
Virtual Machine
• Bridge written mainly in Python and Java
• C extension to Python (wrapped in Python package)
• C extension to Java (wrapped in Java package)
http://zope.nipltd.com/
So lets see it in action…
• Using Java from Python
• Using Python from Java
http://zope.nipltd.com/
Using Java from Python
import java
if not java.isInitialized():
java.initialize()
out = java.importClass( 'java.lang.System').out
out.println('Hello Python World from Java')
• How about a demo?
http://zope.nipltd.com/
Using Python from Java
import python.PyModule;
import python.PyObject;
class HelloWorld
{
static void main( String args[])
{
PyModule sys = new PyModule( "sys");
PyObject stdout = (PyObject)sys.getattr(
"stdout");
stdout.callmethod( "write", new PyObject[]
{ PyObject.asPython( "Hello Java world from
Python\n")});
}
}
• How about a demo?
http://zope.nipltd.com/
What are the problems?
• Needs the environment correctly set up
– Python & Java versions important
– PATH, CLASSPATH & PYTHONPATH important
• Difficult to build
– See How-To
– DON’T use nmake install!
• Performance
– But only in recent versions!
http://zope.nipltd.com/
Questions ?
http://zope.nipltd.com/
Indexing
• What do we mean by indexing?
–
–
–
–
Numbers
Dates
Text
Sorting in Relevance Ranking
• It’s a HARD problem!
– Don’t let Google fool you…
http://zope.nipltd.com/
What are the options?
• Commercial Solutions
– Verity
– Google boxes
– $$$ 
• ZCatalog
• Lucene
http://zope.nipltd.com/
ZCatalog
• Solves generic indexing problem for Zope
• Stores information in ZODB
– Participates in transaction framework 
– Stores all old revisions 
• TextIndex has very limited functionality
http://zope.nipltd.com/
Lucene
• Written originally by Doug Cutting
– Xerox's Palo Alto Research Center (PARC)
– Apple
– Excite@Home
•
•
•
•
Now part of the Apache Jakarta project
Only tackles text indexing
High Perforance
Fully Featured
– Phrase matching
• Written in Java 
http://zope.nipltd.com/
Let’s see some code…
…written in Python!
• Indexing Files
• Searching Indexed Files
http://zope.nipltd.com/
How does Lucene handle concurrency?
• File locks
• Never add to an existing index
Reader
1
2
Optimize
http://zope.nipltd.com/
Writer
3
4
1
2
3
LuceneIndex
• A PluggableIndex for Zope 2’s ZCatalog
• Really painful to implement 
– PluggableIndexes are Clunky
– Undocumented reliance on id attribute
– Really hoping it’ll be better in Zope 3…
• Lets have a look…
http://zope.nipltd.com/
A Comparison
• 1000 Documents, Average length 5781 Bytes
700
Indexing
600
Peripheral Stuff
Time (seconds)
500
400
300
200
100
0
Lucene through Java
http://zope.nipltd.com/
TextIndex
Lucene through JPE
LuceneIndex
Was that fair?
• Performance for BIG numbers of long documents
• TextIndex doesn’t do phrase matching
– Something which did took MUCH longer
• Lucene doesn’t support undo
– Do we care?
• LuceneIndex proved that the Lucene architecture
and ZCatalog’s architecture aren’t very compatible

http://zope.nipltd.com/
Conclusions
You can have your cake and eat it…
…just slowly 
…for now 
http://zope.nipltd.com/
Where from here?
• Optimise JPE?
• CORBA?
• Re-implement Lucene in Python?
• TextIndexNG?
http://zope.nipltd.com/
Questions ?
http://zope.nipltd.com/
Thankyou!
(PS: Swishdot is still on the way ;-)
(PPS: It was Steve A’s birthday yesterday!)
http://zope.nipltd.com/