zaparker / tikaondotnet

Use the Java Tika text extraction library on the .NET platform

Home Page:http://kevm.github.com/tikaondotnet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Developers Guide to Tika on .NET

This project is a simple wrapper around the very excellent and robust Tika text extraction Java library.

##Building TikaOnDotNet##

This project uses rake for build automation.

  1. Install Ruby
  2. Install Rake gem install rake
  3. Run rake

If successful this should build and run the Tika text extraction integration tests.

To ensure you have all the required gems installed Bundler is used and should be automatically installed and setup the first time you rake the project. To manage our Nuget dependencies we are using a tool called Ripple but you should hopefully not have to worry about that unless you are updating dependencies.

##Building the Tika-App .NET Assembly##

You should only need to do this step to upgrade the version of Tika being used by this project.

At it's core this project simply wraps the Java Tika library. To accomplish this the tika-app-{version}.jar is transpiled into a .Net assembly using the IKVM compiler.

ikvmc.exe -target:library -assembly:tika-app tika-app-{version}.jar

The result of this process is a .NET assembly tika-app.dll which is stored in this repo's lib directory.

The tika-app .jar file can be downloaded from the Tika Download page.

##Updating the IKVM Nuget dependency##

ripple update -n IKVM -p TikaOnDotNet -v {version}

##Releasing TikaOnDotNet##

There is a handy release.bat which will create a release build and package the nuget. The resulting nuget package will be in the artifacts directory.

About

Use the Java Tika text extraction library on the .NET platform

http://kevm.github.com/tikaondotnet

License:Apache License 2.0


Languages

Language:C# 91.9%Language:Ruby 7.5%Language:Shell 0.6%