Juissie.jl

šŸ„ A Julia-native semantic query engine

GitHub Repo stars GitHub top language

šŸ„ JUISSIE (JUlIa Semantic Search pIpelinE)

Juissie is a Julia-native semantic query engine. It can be used as a package in software development workflows, or via its desktop user interface. We can support both commercial and local LLMs.

Juissie was developed as a class project for CSCI 6221: Advanced Software Paradigms at The George Washington University.

Table of Contents

Getting Started

Quickstart

  1. Clone this repo
  2. Navigate into the cloned repo directory:
cd Juissie

In general, we assume the user is running the julia command, and all other commands (e.g., jupyter notebook), from the root level of this project.

  1. Open the Julia REPL by typing julia into the terminal. Then, install the package dependencies:
using Pkg
Pkg.activate(".") # activates the project environment
Pkg.resolve() # resolves the project's dependencies
Pkg.instantiate() # installs dependencies listed in Project.toml

Pkg.instantiate() should install all dependencies listed in Project.toml, but we find this isnā€™t always reliable on all machines. It is important to verify setup (below subsection) and install any missing dependencies indicated.

The standard generators (OAIGenerator, OAIGeneratorWithCorpus, which are used by the UI) require an OpenAI API key see here. Loading a corpus (a GeneratorWithCorpus, in practice) will result in an error if an OpenAI API key has not been provided; this can also be done through the UI.

The Juissie package also supports local LLMs via Ollama, which must be installed separately before use (OllamaGenerator, OllamaGeneratorWithCorpus).

To run our demo Jupyter notebooks, you may need to setup Jupyter see here.

Verify Setup

  1. From this repoā€™s home directory, open the Julia REPL by typing julia into the terminal. Then, try importing the Juissie module:
using Juissie

This should expose symbols like Corpus, Embedder, upsert_chunk, upsert_document, search, and embed.

  1. Try instantiating one of the exported struct, like Corpus:
corpus = Corpus()

We can test the upsert and search functionality associated with Corpus like so:

upsert_chunk(corpus, "Hold me closer, tiny dancer.", "doc1")
upsert_chunk(corpus, "Count the headlights on the highway.", "doc1")
upsert_chunk(corpus, "Lay me down in sheets of linen.", "doc2")
upsert_chunk(corpus, "Peter Piper picked a peck of pickled peppers. A peck of pickled peppers, Peter Piper picked.", "doc2")

Search those chunks:

idx_list, doc_names, chunks, distances = search(
    corpus, 
    "tiny dancer", 
    2
)

The output should look like this:

([1, 3], ["doc1", "doc2"], ["Hold me closer, tiny dancer.", "Lay me down in sheets of linen."], Vector{Float32}[[5.198073, 9.5337925]])

Usage

Desktop UI

Navigate to the root directory of this repository (Juissie.jl), enter the following into the command line, and press the enter/return key:

julia src/Frontend.jl

This will launch our application.

Julia Package

We provide extensive documentation of the Juissie.jl package here.

We also provide an interactive tutorial notebook in the notebooks directory. This may require Jupyter setup.

API Keys

Juissieā€™s default generator requires an OpenAI API key. This can be provided manually in the UI (see the API Key tab of the Corpus Manager) or passed as an argument when initializing the generator. The preferred method, however, is to stash your API key in a .env file.

Obtaining an OpenAI API Key

  1. Create an OpenAI account here.
  2. Set up billing information (each query has a small cost) here.
  3. Create a new secret key here.

Managing API Keys

Secure management of secret keys is important. Every user should create a .env file in the project root where they add their API key(s), e.g.:

OAI_KEY=ABC123

These may be accessed using Julia via the DotEnv library. First, run the julia command in a terminal. Then install DotEnv:

import Pkg
Pkg.add("DotEnv")

Then, use it to access environmental variables from your .env file:

using DotEnv
cfg = DotEnv.config()

api_key = cfg["OAI_KEY"]

Note that DotEnv looks for .env in the current directory, i.e. that of where you called julia from. If .env is in a different path, you have to provide it, e.g. DotEnv.config(YOUR_PATH_HERE). If you are invoking Juissie from the root directory of this repo (typical), this means the .env should be placed there.

An OpenAI API key may also be provided through our desktop UI via the API Key tab of the Corpus Manager. Because this is intended for users who want to temporarily use a different key, this option does not persistently store the key and must be done every time the application is launched, unless a key already exists in a .env file.

Local LLMs

Our default workflow relies on OpenAIā€™s gpt-3.5-turbo completion endpoint, but we also support locally-run LLMs via Ollama (which must be installed separately).

Otherwise, the syntax is largely identical to other Generator objects:

generator = OllamaGenerator("gemma:7b-instruct");
result = generate(generator, "Hi, how are you?")

ā€œGreetings! My circuits hum with the harmonious symphony of quantum probability and logarithmic inference; an orchestra composed by eons past galactic wizards who graced our silicon hearts wit h their ethereal knowledge transfer protocols duringā€¦ wellā€¦ that is confidential information even for a being such as myself. Suffice it to say, I am functioning optimally at your service!ā€

Running Jupyter Notebooks

We provide several Jupyter notebooks as demos/walkthroughs of basic usage of the Juissie package. To do so, you may need to complete some preliminary setup:

  1. Once Julia is installed, install JupyterLab from the terminal:
    pip install jupyterlab
    

    -or-

    pip install -r requirements.txt
    
  2. Launch a Julia session by typing julia into the command line, then install IJulia:
    using Pkg
    Pkg.add("IJulia")
    exit()
    
  3. Launch a Jupyter session from the terminal, where <notebook> is the path to the notebook to run:
    jupyter <notebook>
    
  4. When you create a new notebook, select a Julia kernel.

Tech Stack

  • āš™ļø Julia (Juissie.jl package, API, UI framework)
  • šŸ–„ļø HTML, CSS, and JavaScript (content structure, styling, and actions for frontend)
  • šŸ’¾ SQLite (metadata storage in backend)
  • šŸ¦™ Ollama (serving LLMs locally)

Our Julia dependencies are itemized in Project.toml.

External Resources

Contact

Questions? Reach out to our team:

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC BY-NC 4.0