hckrnws

Show HN: Autotab – Programmable AI browser for turning web tasks into APIs

10h

by jonasnelle

Hey HN, we're Alexi and Jonas the co-founders of Autotab (https://autotab.com). Autotab is a chrome-based browser you can teach to do complex tasks, with a simple API for running them from your app or backend.

Here is a walkthrough of how it works: https://youtu.be/63co74JHy1k, and you can try it for free at https://autotab.com by downloading the app.

Why a dedicated editor?

The number one blocker we've found in building more flexible, agentic automations is performance quality BY FAR (https://www.langchain.com/stateofaiagents#barriers-and-chall...). For all the talk of cost, latency, and safety, the fact is most people are still just struggling to get agents to work. The keys to solving reliability are better models, yes, but also intent specification. Even humans don't zero-shot these tasks from a prompt. They need to be shown how to perform them, and then refined with question-asking + feedback over time. It is also quite difficult to formulate complete requirements on the spot from memory.

The editor makes it easy to build the specification up as you step through your workflow, while generating successful task trajectories for the model. This is the only way we've been able to get the reliability we need for production use cases.

But why build a browser?

Autotab started as a Chrome extension (with a Show HN post! https://news.ycombinator.com/item?id=37943931). As we iterated with users, we realized that we needed to focus on creating the control surface for intent specification, and that being stuck in a chrome sidepanel wasn't going to work. We also knew that we needed a level of control for the model that we couldn't get without owning the browser. In Autotab, the browser becomes a canvas on which the user and the model are taking turns showing and explaining the task.

Key features:

1. Self-healing automations that don't break when sites change

2. Dedicated authoring tool that builds memory for the model while defining steps for the automation

3. Control flows and deep configurability to keep automations on track, even when navigating complex reasoning tasks

4. Works with any website (no site-specific APIs needed)

5. Runs securely in the cloud or locally

6. Simple REST API + client libraries for Python, Node

We'd love to get any early feedback from the HN community, ideas for where you'd like the product to go, or experiences in this space. We will be in the comments for the next few hours to respond!

thedays

50m

Is Autotab able to scrape data from multiple websites with different structures and combine this data into structured data in one CSV or JSON file? Example: scrape interest rates offered on savings accounts from multiple bank websites and extract the name of the bank, bank logo, product name and interest rate for each account and run this saved query on a regular schedule (daily, weekly etc)?

pugio

I love the idea - owning the browser definitely seems like the right approach.

I tried it out on a workflow I've been manually piecing together and it gave me a bunch of "Error encountered, contact support" messages when doing things like clicking on a form input field, or even a button.

The more complex "Instruction" block worked correctly instead (literally things like "click the "Sign In" button), but then I ran out of the 5 minutes of free run time when trying to go through the full flow. I expect this kind of thing will be fixed soon, as it grows.

In terms of ultimate utility, what I really want is something which can export scripts that run entirely locally, but falling back to the more dynamic AI enhanced version when an error is encountered. I would want AutoTab to generate the workflow which I could then run on my own hardware in bulk.

Anyway, great work! This is definitely the best implementation I've seen of that glimpsed future of capable AI web browsing agents.

alexirobbins

sorry you encountered that issue! what website was the form on? we'll see if we can catch the error!

curious what you mean by generating the workflow that you run on your own hardware? Is this different than running Autotab locally?

pugio

Hah, looks like you guys found my account error via my profile email, nice! Thanks for fixing that bug. I'll try again tomorrow when the fix is pushed.

My other request is probably not in line with your business model. I get the sense that Autotab is always communicating with some server on your end, probably for the various bits of AI functionality. What I was asking for is the ability to export the actions/workflow as, say, a python script (like a Selenium script, or even better, a script which drives your browser) which performs the actions in the Autotab workflow.

I need AI understanding when creating the workflow, or healing in case of an error, but I don't always need it when just executing a prepared script. In those (non AI needed) cases, I don't really want to use up my runtime minutes just because I'm executing a previously generated workflow.

adamkhakhar

This is awesome! What is your most common use case? Have you thought of competing with https://scribehow.com/ in the documentation space?

jonasnelle

Thanks! Our most common use cases are repetitive tasks people have at work, think updating Hubspot with analytics data from an internal tool or reconciling payments between an invoicing system, a payment system and a CRM.

Haven’t done a lot with Scribe-like documentation cases. Given the pace at which this technology is developing we’re focused on making Autotab really good at the most economically valuable tasks.

MattDaEskimo

Very neat in theory but I'm failing to find any technical details.

Which layer is the automation happening? Inside using Dev tools? Multiple?

What is the self-healing mechanic? I'm guessing invoking an LLM to find what happened and fix it?

I guess what I'm wondering is. Is this some sort of hybrid between computer use and Dev tools usage?

jonasnelle

Autotab is definitely a hybrid approach, because when it comes to deciding where on the page to take an action, Autotab has to be fast & cheap (humans are both of those) while also being robust to changes. The solution we use is a "ladder of compute" where Autotab uses everything from really fast heuristics and local models up to the biggest frontier models, depending on how difficult the task is.

For instance, if Autotab is trying to click the "submit" button on a sparse page that looks like previous versions of that page, that click might take a few hundred milliseconds. But if the page is very noisy, and Autotab has to scroll, and the button says "next" on it because the flow has an additional step added to it, Autotab will probably escalate to a bigger model to help it find the right answer with enough certainty to proceed.

There is a certain cutoff in that hierarchy of compute that we decided to call "self-healing" because latency is high enough that we wanted to let users know it might take a bit longer for Autotab to proceed to the next step.

diegolazcano

This is awesome. I was just trying to get a rudimentary version of this for some "user" interaction heavy data extraction. Definitely giving it a try.

For a case with lots of requests how does Autotab handle ip-blocking? Does each run use a different portal instance?

jonasnelle

When you run Autotab in the app it runs locally, so no IP blocking issues there. If you want to run it in the cloud eg via API, by default your IP will be from the data center but we have residential proxies that we can enable on a case by case basis.

diegolazcano

Just tried it - very cool indeed. I did a page loop extraction but it seems to be the same speed when I run it. The elements I am doing the loop on look pretty much the same, just different images. I think it would be great if it was able to generalize how to find an element like with css selectors for example to speed up once its sure that is the data you are looking to extract for a given loop.

jonasnelle

Totally agree, making page loop faster is on the top of our list of things to do! There are cases where you need page loop to do quite a bit of reasoning so it will be this slow until models get faster, but we can make it a lot faster today on happy paths - stay tuned :)

throwup238

> we have residential proxies that we can enable on a case by case basis.

Who is your vendor for residential proxies? That’s quite a sketchy industry.

jonasnelle

We use a range of different providers, it really depends on the customer and use case. We only enable the proxy in rare cases that need it for a specific reason.

handfuloflight

I see it's able to perform data extraction, but what if you wanted to enter in data from another system, or generated by an LLM during the workflow?

jonasnelle

Data from external systems can be provided to Autotab in the form of CSV files or string inputs, which can be passed to the API to parametrize skills. However, in most cases, ingesting data into Autotab is easiest by just having Autotab navigate to the website where the data is present.

Autotab has a structured type system underlying the workflows, so any data processed in the course of an automation can be referenced in later steps. It's a bit like a fuzzy programming language for automation, and the model generates schemas to ensure data flows reliably through the series of steps.

For example, users often start by collecting information in one system (using an extract step as you mentioned), then cross reference it in another and then submit some data by having Autotab type it into a third system. In Autotab, you can just type @ to reference a variable, each step has access to data from previous steps.

At the end, you can get a dump of all of Autotab's data from a run as a JSON file, or turn specific arrays of data into CSV files using a table step.

grugagag

I don’t know what your intention is but I imagine that’s how more and more are going to push LLM slop on all corners of the internet. It’ll be easy to do in massive quantities.

Carrok

You say "try it for free" but your website has no pricing information at all. Is this free for just a while? Free forever? What is your monetization strategy?

Can I point it at my own LLM or am I locked into using OpenAI?

alexirobbins

We have unlimited free editing, so you can fully try everything out and know your skill will work before we ask you to subscribe. You also get 5m of free runtime. Subscriptions start at $39/month with 300 minutes of runtime included.

Right now we do not let you BYO llm, but it's something we would love to provide an option for where possible!

Carrok

5 minutes seems like barely enough time to complete any given task, let alone actually try it out. $40/mo for a capped plan seems steep, but maybe I'm not your target customer. Best of luck!

alexirobbins

The free edit mode has all of the features of run mode, and lets you fully test the skill. The only difference is that inside of a loop it will ask you to click to continue.

A lot of AI tools promise the world and don't deliver. We explicitly don't want anyone to pay us until they're sure Autotab can do their task, even though the model costs during editing are actually much higher than during runtime.

jonasnelle

Good point, will add pricing information to our website ASAP, had skipped that one in the push to launch (it is only available in the app at the moment)

pacifi30

Pretty slick. I recorded a session for ordering from a restaurant website, and it did repeat the entire workflow. It had some issues with a modal popped up but all in all well done! We have been trying to robotify the task of ordering from restaurant for our clients and seems like your solution can work well for us. I am guessing that you want your users to use Autotab browser, what is use for API?

jonasnelle

Thanks! We think of the browser as an authoring tool where you create, test and refine skills.

After you've done that, the API is great for cases where you want to incorporate Autotab into a larger data flow or product.

For instance, say Company A has taught Autotab to migrate their customers' data - so their customers just see a sync button in the Company A product, which kicks off a Autotab run via API. Same for restaurant booking, if you'd want that to happen programatically.

pacifi30

Understood! How does it work if we have several different restaurants to order from, do I need to record each ordering session and create skills for each restaurant or it can infer on its own given the task to order from a restaurant. Secondly, any docs or samples to see how to integrate this with your API?

jonasnelle

Depends on how different the flows are for different restaurants. If they're just different names but use the same booking system you'd typically use an input and have Autotab find the correct restaurant first. If they're totally different booking systems you can try the instruct (open ended agentic) step but my guess is that will be too slow and unreliable for now, so you'd probably want to record different skills for each.

Docs are here with sample code: https://docs.autotab.com/api-reference

handfuloflight

Is the API also charged based on runtime? And I'm assuming that workflow happens in the cloud? What if it's behind a login? What if that login requires 2FA?

alexirobbins

Yep exactly. Authentication is primarily handled with session data, so passwords never leave your device, but we also support setting secrets.

Here is more info on auth and security: https://docs.autotab.com/manual/security

For 2FA, different users take different approaches. Everything from teaching Autotab to pull auth codes from their email, to setting intervention requests at the top of their skills, to enterprise integrations that we support with SSO and dedicated machine accounts.

jonasnelle

Also for the modal popup - this is the kind of issue that goes away in run mode because Autotab will escalate to bigger models to self-heal.

If the modal pops up frequently you can also record an click to dismiss it and make that click optional so Autotab knows to move on if the modal does not pop up sometimes.

replwoacause

Looks nice. Anybody else in this space? This one is on the pricier end but I’m just a single user so maybe not the target customer

hailpixel

AskUI could be a solution. It's also not just in browser, but the whole desktop: https://github.com/askui/vision-agent

replwoacause

Thanks! Looks promising!

alexirobbins

Curious, what would you be interested in using Autotab for?

replwoacause

Automating the creation of test orders in our Ecom and ERP tools is one possible use case I can think of, though I’m sure I’d find others in my day to day (possibly around some of the rote tasks I have in Confluence or DevOps)

alexirobbins

That sounds like a really good use case! we're constrained by model costs but are interested in offering a lower cost plan – if you email me I'll see what we can do alexi@autotab.com

Onavo

If we are being honest, most of these browser screen scraping startups will be commoditized the moment OpenAI/Anthropic releases their next model. From my experience, having an in-house smaller model working in tandem with the bigger LLMs don't always necessarily produce a better result because in-context learning is just too powerful. The moment OpenAI releases a new model with a better prior, you will see a lot of these companies quietly swapping out their in-house "edge"/specialized fine tuned models. It's like those PDF data extraction companies that have been launching like crazy, 90% will be pivoting if they don't get enough B2B customers locked in. LLMs unfortunately is winner-take-all with the actual model providers cutting out all the middleman.

smashah

If this was an OSS project automating a specific service many HN-ers would come and bleet about TOS violations & being scared/wary of C&Ds.

How does this not violate TOS? Do you have legal protection set up from megacorps trying to bully you with legal threats?

Automation despite TOS via Adversarial Interop should be a Digital Human Right. Godspeed.

jonasnelle

This has been much less of an issue than I would have expected - Autotab is optimized for reasoning heavy tasks in core systems that require high reliability over being really fast at doing giant scrapes. More automating leads in Salesforce, tickets in Jira and data in Airtable than hawking tickets.

smashah

Just want to reiterate I fully support what you're doing and I despise the megacorps that send out legal threats to small companies/OSS devs but according to their overbroad TOS they do not make distinctions between the types of automations and reasoning behind them - technically, they would argue, both you and your users are violating TOS. I'm sure you have already, but make sure the legal help at YC give you the ammo you need to protect yourself and your customers when some of them randomly start getting banned.

As more and more AI Agent enabled tooling comes out, this will become a bigger issue (the fact that people are automating these services against the TOS) so it's good if everyone who can get legal help has and shares the tactics to fight back against any civil TOS-based legal threats so we are all protected.

Crafted by Rajat

Source Code