Current development on JAMWiki is primarily focused on maintenance rather than new features due to a lack of developer availability. If you are interested in working on JAMWiki please join the jamwiki-devel mailing list.

Tech:Database storage of binary files

ktip.png This page (and all pages in the Tech: namespace) is a developer discussion about a feature that is either proposed for inclusion in JAMWiki or one that has already been implemented. This page is NOT documentation of JAMWiki functionality - for a list of documentation, see Category:JAMWiki.
Status of this feature: NOT IMPLEMENTED.
Contents

Description[edit]

Storage of binary files could be acheived via using SQL Database BLOB support. BLOB feature is supported by all SQL engines and could be very trivial :

A simple table, with 4 columns :

id (integer), resume (varchar), mime/type (varchar), body (blob).

With such simple structure we could even think about indexing compatible mime/type via Lucene for example.

Henri 24-Jan-2007 11:06 CET

Author(s)[edit]

  • None

Status[edit]

Comments[edit]

One feature i would like to see in JAMWiki is the ability to have file uploads stored in the database. With our system, the database is the only thing that gets backed up regularly and therefore file uploads would be lost in the event of a server failure. I could write this feature if it would help. Alexander Boyd 21-Jan-2007 21:31 PST

Very good idea, as DB will be the single storage to worried about Henri 23-Jan-2007 14:43 CET
I didn't see this message when it was posted originally - sorry! Storage of files in the database would be great as an optional feature if it could be done without complicating the existing code. I don't want database storage the default since putting binary data in the database can have performance and other implications. Some issues to consider with this feature:
  • Supporting different databases (Oracle, Postgres, HSQL, etc) may be an issue; I'm not sure all databases allow storage of binary content, and many handle it differently.
  • Affects on the exist jam_file and jam_file_version schemas. We would need to have an implementation that worked with both database and file-based storage.
If either of you have ideas on how this might be implemented please start a new article such as Tech:Database storage of binary files and we can hopefully work out the details. -- Ryan 23-Jan-2007 22:05 PST
Commented Henri 24-Jan-2007 11:07 CET

A couple of additional comments:

  • It's been a while since I've dealt with binary data in a database, but at the time (2000 - 2004) each database had its own quirks when storing and retrieving binary data. I seem to recall that Oracle in particular required some weird initialization parameters depending on what size binary data was being stored. Have these issues gone away, or is there a reference somewhere that collects things to look out for?
  • We would need to modify the jam_file and jam_file_version tables to have foreign keys that point to the new binary data table. Also, since jam_file stores mime/type it probably wouldn't be needed in the new table.
  • What is the resume column for?
  • After retrieving content from the database I assume it would be stored immediately on the filesystem? Ideally it would be good if the only code that changed as a result of this feature was the code to store & retrieve file data in AnsiDataHandler.

If someone is interested in putting together some code for this it could definitely be done in a branch of the JAMWiki Subversion repository, and if it can be done in a way that doesn't have much impact on the rest of the JAMWiki code (readability, complexity, performance) I'd have no objection to merging it into the official release. However, if this was something that had a big impact on existing code I'd want to find ways to simplify things before merging. -- Ryan 25-Jan-2007 22:22 PST

In reply to your item on storing binary data not being consistent across multiple databases, I haven't had much luck with blobs either. The way that my website stores binary data is by having 2 tables. the first is called largeobjects and it has 2 columns, id (int) and name (varchar). the second table is called largeobjectsections and it has 3 columns. the first is id (int, references largeobjects.id), index (int), and data (varchar). when you store a large object, 1 row is added to largeobjects with a unique id and the name of the object. then the data of the object is split into 4096 byte chunks, and for each chunk, an entry is placed within largeobjectsections with the id of the object (the id that was inserted into largeobjects), an index starting at 0 for the first chunk, 1 for the second chunk, and so on, and the base 64 encoded version of the 4096 byte block of data that is currently being processed. this could be a possibility, but it isn't very fast. however, if it was cached on the file system, then it might be acceptable. Alexander Boyd 26-Jan-2007 13:59 PST