ivy.direct • @ivynya illm

Access your ollama (or any other local llm) instance from across the internet

Project

No License

illm

Internet LLM (illm) lets you access your locally run LLM from any computer with a web browser and internet connection, even if you aren't on the same network.

It does this by running a server on a cloud instance that acts as a forwarder between your local machine and the internet. The server is responsible for authenticating clients and forwarding requests to your local machine, and forwarding responses back to the client.

This repository's reference implementation is designed to work with ollama as the provider and ivynya/aura as the user interface.

⚠️ This project should mostly be treated as a proof of concept. You may run into stability issues, especially if you share your LLM provider with many people at once.

Architecture

You host an illm/server instance on a cloud provider and expose it to the internet on a domain (e.g. illm.example.com).
You run illm/client on your local machine and configure it to your server. The client connects to the server at /aura/provider, identifying itself as an LLM provider.
You connect to /aura/client using an illm client like Aura and authenticate to the server. Now, requests will be pipelined from the client to the server to the provider and back.
Requests from clients are sent as JSON with an action and other parameters. See /internal/types.go. Requests are tagged by the server with a unique ID (Tag) corresponding to each client connection, then sent to the provider. The provider is responsible for processing the request and sending back a Request object with the same Tag. The server then sends the response back to the client with a matching Tag.

Because the server hosts websocket endpoints, connections can be made from anywhere without reverse proxying.

Usage

This repository contains a reference implementation of an illm provider (in /client). It needs ollama installed on your local machine running at localhost:11434 and will make API requests outside of the docker container to that URL. It is designed to work with the reference implementation of the user client, Aura.

Example docker compose file for running the server on your cloud instance:

version: "3.8"

services:
  illm:
    image: ghcr.io/ivynya/illm/server:latest
    ports:
      - 8080:3000
    restart: unless-stopped
    environment:
      - USERNAME=admin
      - PASSWORD=password

Example docker compose file for running the client on your local machine:

version: "3.8"

services:
  illm:
    image: ghcr.io/ivynya/illm/client:latest
    restart: unless-stopped
    environment:
      - AUTH=<a base64 encoded username:password>
      - IDENTIFIER=your-computer-name
      - ILLM_SCHEME=<ws|wss>
      - ILLM_HOST=illm.example.com
      - ILLM_PATH=/aura/provider
      - OLLAMA_URL=http://host.docker.internal:11434

Run the server first, then the client. The client should log that it is connected. Then, if you don't want to write your own user interface, set up Aura as described in the README. Make sure to pull models before using the user interface because the client will not auto-pull them for you, it will just error.

Development

This repository uses a modified subset of langchaingo's ollama implementation in the reference client. It was modified to return additional data during generation, since the original returns text only (without extra info like tokens, duration, and context). It was also modified to accept chat context as a parameter.