In an age of LLMs, is it time to reconsider human-edited web directories?
Back in the early-to-mid '90s, one of the main ways of finding anything on the web was to browse through a web directory.
These directories generally had a list of categories on their front page. News/Sport/Entertainment/Arts/Technology/Fashion/etc.
Each of those categories had subcategories, and sub-subcategories that you clicked through until you got to a list of websites. These lists were maintained by actual humans.
Typically, these directories also had a limited web search that would crawl through the pages of websites listed in the directory.
Lycos, Excite, and of course Yahoo all offered web directories of this sort.
(EDIT: I initially also mentioned AltaVista. It did offer a web directory by the late '90s, but this was something it tacked on much later.)
By the late '90s, the standard narrative goes, the web got too big to index websites manually.
Google promised the world its algorithms would weed out the spam automatically.
And for a time, it worked.
But then SEO and SEM became a multi-billion-dollar industry. The spambots proliferated. Google itself began promoting its own content and advertisers above search results.
And now with LLMs, the industrial-scale spamming of the web is likely to grow exponentially.
My question is, if a lot of the web is turning to crap, do we even want to search the entire web anymore?
Do we really want to search every single website on the web?
Or just those that aren’t filled with LLM-generated SEO spam?
Or just those that don’t feature 200 tracking scripts, and passive-aggressive privacy warnings, and paywalls, and popovers, and newsletters, and increasingly obnoxious banner ads, and dark patterns to prevent you cancelling your “free trial” subscription?
At some point, does it become more desirable to go back to search engines that only crawl pages on human-curated lists of trustworthy, quality websites?
And is it time to begin considering what a modern version of those early web directories might look like?
@degoogle #tech #google #web #internet #LLM #LLMs #enshittification #technology #search #SearchEngines #SEO #SEM
- bsammon ( @bsammon@lemmy.sdf.org ) 18•8 months ago
Lycos, Excite, AltaVista, and of course Yahoo all were originally web directories of this sort.
Both Wikipedia and my own memory disagree with you about Lycos and AltaVista. I’m pretty sure they both started as search engines. Maybe they briefly dabbled in being “portals”.
@bsammon And this Archive.org capture of Lycos.com from 1998 contradicts your memory: https://web.archive.org/web/19980109165410/http://lycos.com/
See those links under “WEB GUIDES: Pick a guide, then explore the Web!”?
See the links below that say Autos/Business/Money/Careers/News/Computers/People/Education /Shopping/Entertainment /Space/Sci-Fi/Fashion /Sports/Games/Government/Travel/Health/Kids
That’s exactly what I’m referring to.
Here’s the page where you submitted your website to Lycos: https://web.archive.org/web/19980131124504/http://lycos.com/addasite.html
As far as the early search engines went, some were more sophisticated than others, and they improved over time. Some simply crawled the webpages on the sites in the directory, others
But yes, Lycos definitely was definitely an example of the type of web directory I described.
- bsammon ( @bsammon@lemmy.sdf.org ) 6•8 months ago
1998 isn’t “originally” when Lycos started in 1994. That 1998 snapshot would be their “portal” era, I’d imagine.
And the page where you submitted your website to Lycos – that’s no different than what Google used to have. It just submitted your website to the spider. There’s no indication in that snapshot that suggests that it would get your site added to a curated web-directory.
Those late 90’s web-portal sites were a pale imitation of the web indices that Yahoo, and later DMoz/ODP were at their peak. I imagine that the Lycos portal, for example, was only managed/edited by a small handful of Lycos employees, and they were moving as fast as they could in the direction of charging websites for being listed in their portal/directory. The portal fad may have died out before they got many companies to pony up for listings.
I think in the Lycos and AltaVista cases, they were both search engines originally (mid 90s) and than jumped on the “portal” bandwagon in the late 90s with half-assed efforts that don’t deserve to be held up as examples of something we might want to recreate.
Yahoo and DMoz/ODP are the only two instances I am aware of that had a significant (like, numbered in the thousands) number of websites listed, and a good level of depth.
- Moonrise2473 ( @Moonrise2473@lemmy.ml ) 15•8 months ago
Main problems are:
-
Link rot
-
Sneakily inserted sponsored links
- Tinyrabbit ✅ ( @tinyrabbit@floss.social ) 6•8 months ago
@Moonrise2473 @ajsadauskas
3. Infinitely growing list of categories.
4. Mis-categorisationi remember learning HTML (4.0) and reading that you should put info in a \ tag about the categories your page fits in, and that would help search engines. Did it also help web directories?
-
- ᴇᴍᴘᴇʀᴏʀ 帝 ( @Emperor@feddit.uk ) English10•8 months ago
I used them and contributed to links as well - it was quite a rush to see a contribution accepted because it felt like you were adding to the great summary of the Internet. At least until the size of the Internet made it impossible to create a user-submitted, centrally-approved index of the Net. And so that all went away.
What seemed like a better approach was social bookmarking, like del.icio.us, where everyone added, tagged and shared bookmarks. The tagging basically crowd-sourced the categorisation and meant you could browse, search and follow links by tags or by the users. It created a folksonomy (thanks for the reminder Wikipedia) and, crucially, provided context to Web content (I think we’re still talking about the Semantic Web to some degree but perhaps AI is doing this better). Then after a long series of takeovers, it all went away. The spirit lives on in Pinterest and Flipboard to some degree but as this was all about links it was getting at the raw bones of the Internet.
I’ve been using Postmarks a single user social bookmarking tool but it isn’t really the same as del.icio.us because part of what made it work was the easy discoverablity and sharing of other people’s links. So what we need is, as I named my implementation of Postmarks, Relicious - pretty much del.icio.us but done Fediverse style so you sign up to instances with other people (possibly run on shared interests or region, so you could have a body modification instance or a German one, for example) and get bookmarking. If it works and people find it useful a FOSS Fediverse implementation would be very difficult to make go away.
- ShadowCat ( @ShadowCat@lemmy.dbzer0.com ) English4•8 months ago
- ᴇᴍᴘᴇʀᴏʀ 帝 ( @Emperor@feddit.uk ) English3•8 months ago
Oh indeed there are services out there that do something similar to Delicious, but I put a lot into that site only for it all to disappear due to the whims of some corporate overlord and I am not doing that again. What I am looking for is an easy Fediverse solution so my data is never lost again. Postmarks is definitely getting there but as a single-user service it isn’t quite what I am looking for.
- thegreekgeek ( @thegreekgeek@mstdn.io ) 2•8 months ago
@Emperor
This this this! Some kind of service that would sit alongside a fedi instance and serve as a community directory.
@ajsadauskas- ᴇᴍᴘᴇʀᴏʀ 帝 ( @Emperor@feddit.uk ) English1•8 months ago
Indeed. Places like Lemmy and Reddit might be called “link aggregators” but they are, ultimately, jumped up web forums (and that’s no slight, I’m a web forum guy through and through) and are nothing like the social bookmarking sites, like Delicious, which had greater breadth and depth (just look at your own bookmarks, you’d only share a fraction on here but you put a larger percentage into social bookmarking) but, crucially, essentially crowd-sourced the organisation and categorisation of those links.
Some kind of service that would sit alongside a fedi instance
I have been pondering the idea of “Fediverse plug-ins” that would do that, extending the core functionality of the service.
So in the case of, what we’ll call, Fedilicious users of the service could either punt over links they post to Mastodon or Lemmy to a social bookmarking plug-in where it is stored and categorised (or you could run a not to do this automatically) but they could also add links that might not be worth a new post or storing away for future reference, etc. You would then have a curated, easily-accessible repository of links that reflect the interests of that instance.
It needn’t itself be federated but if you did, you could have some “everything” sites (fedilicious.world?) which would accepted all links from other Fedilicious instances it is federated with (which would tend to be set to broadcast mode, so categorised links go out, they don’t receive all the links, although users could be allowed to add links to it from elsewhere).
- Wren 🐁 ( @Wren@chitter.xyz ) 2•8 months ago
@Emperor @ajsadauskas I’ve been thinking about this myself lately - but I had wondered how a curated directory might scale, I hadn’t considered federated social bookmarking and honestly that sounds like a brilliant solution. I’d love to see something like that happen, maybe even contribute
- ᴇᴍᴘᴇʀᴏʀ 帝 ( @Emperor@feddit.uk ) English2•8 months ago
As the links show, Relicious/Fedilicious has been on my mind a while and I have been mourning the loss of Delicious for a long time. However, the above got me jotting down some notes.
It should be doable. I haven’t had a root through PostMark’s code but it might be they have done the bulk of the work already and it just needs a multiuser interface bolting on top of it.
- Stooryduster ( @stooryduster@mastodon.scot ) 1•8 months ago
@Wren @Emperor @ajsadauskas Back in the day people’s web sites had a links page and if their site was good it was always worth looking at what they listed as worthy links. I still have one but it’s out of habit rather than being useful. Might rethink now tho.
- ᴇᴍᴘᴇʀᴏʀ 帝 ( @Emperor@feddit.uk ) English1•8 months ago
Yes, a lot of ideas knocking around this discussion are really Web 1.0 ideas given a Fediverse makeover. The advantage of using something like a federated social networking service is that you wouldn’t have to put much thought into building a links section, it would build itself as you add links while you are web surfing.
I took a look at your site and it is working on WordPress which now uses the ActivityPub protocol, so something like that should integrate nicely.
- Arjen P. de Vries Timmers 🕊️ ( @arjen@idf.social ) 0•8 months ago
@Emperor @ajsadauskas that’s Lemmy?
- ᴇᴍᴘᴇʀᴏʀ 帝 ( @Emperor@feddit.uk ) English2•8 months ago
Although Lemmy is called a link aggregator it is really just a kind of web forum and nothing like a social bookmarking service.
- Joanna Holman ( @joannaholman@aus.social ) 10•8 months ago
@ajsadauskas @degoogle I guess the problem though is how you make sure they are actually maintained by a human acting in good faith. The way community Facebook groups meant to be for this kinda thing get spammed by likely fake businesses doesn’t give me hope
@joannaholman @degoogle Good point.
If it were run as a private company, I think the solution might be just to pay actual humans as employees.
If it’s a community-run project, the challenge would be to come up with a robust moderation system…
- Lapisdecor ( @lapisdecor@mastodon.earth ) 1•8 months ago
@ajsadauskas @joannaholman @degoogle maybe a mix of wikipedia and search engine would be nice. WikiSearch?
@ajsadauskas @degoogle Webrings! Bring back Webrings!
- Khleedril ( @khleedril@cyberplace.social ) 7•8 months ago
@ajsadauskas @degoogle What we need to do is re-visit the GnuPG philosophy of building rings of trust. If one emerges with enough people proven to provide quality aggregators/summarizers then we can start to depend on that, or those.
- The Octonaut ( @TheOctonaut@mander.xyz ) 5•8 months ago
Reddit and Lemmy are supposed to be what you want: link aggregators.
We’re supposed to link to sites and pages and people vote on how good they are in the context of the sub community topic.
Of course, then Ron Paul happened, and now it’s just memes and Yank politics so… maybe deploy Lemmy and turn off comments.
- SkyNTP ( @SkyNTP@lemmy.ml ) 4•8 months ago
I think you are mostly right, except Lemmy and reddit are not organized.
- ᴇᴍᴘᴇʀᴏʀ 帝 ( @Emperor@feddit.uk ) English1•8 months ago
Yeah, it’s the lack of organisation that is the issue and if we are thinking about web directories, there is the missing element of deliberate creation.
- Tim Richards ( @timrichards@aus.social ) 5•8 months ago
@ajsadauskas @degoogle I actually contributed to one! I was a writer at LookSmart for four years; we manually created categories and added websites to then, with short descriptive reviews. Though an algorithm listed more sites below our selections, we could force the top result, eg we’d make sure the most relevant website was the first result of a search on that topic. Old-skool now, but had better results in some ways.
- nobody you care about ( @SnepperStepper@mastodon.social ) 5•8 months ago
@ajsadauskas @degoogle i love this idea, i’m going to start my own web directory.
- ᴇᴍᴘᴇʀᴏʀ 帝 ( @Emperor@feddit.uk ) English3•8 months ago
Do it!
Then federate it.
- Michelle Hughes ( @MegaMichelle@a2mi.social ) 4•8 months ago
It looks like there’s a couple projects to continue the directory DMOZ. I hope they’re sharing work with each other!
- ᴇᴍᴘᴇʀᴏʀ 帝 ( @Emperor@feddit.uk ) English2•8 months ago
Got any links?
- Michelle Hughes ( @MegaMichelle@a2mi.social ) 2•8 months ago
Yeah. Sorry, I was hesitant to post links at first before I vetted them.
It looks like “Curlie” is the official continuation of the DMOZ project:
The other ones I was seeing, it turns out, are static mirrors of 2017 DMOZ.
- ᴇᴍᴘᴇʀᴏʀ 帝 ( @Emperor@feddit.uk ) English1•8 months ago
Thanks for that, a real blast from the past. I have a vague memory that I was an editor on the ODP or dmoz back in the day.
Sorry, I was hesitant to post links at first before I vetted them.
Yes, perhaps not coincidentally, I thought it best to ask for a human-curated link.
- Michelle Hughes ( @MegaMichelle@a2mi.social ) 1•8 months ago
Y’know, come to think of it, Wikipedia might be a better project to point to here. All the content on there is hand curated. When I’m interested in a subject, I usually go to wikipedia first instead of a search engine. Sometimes I am directed out to other websites from there.
I set up a quick keyword search so I can type “wp blah blah blah” into my url bar and it searches wikipedia.
- ᴇᴍᴘᴇʀᴏʀ 帝 ( @Emperor@feddit.uk ) English1•8 months ago
The only issue with Wikipedia (coming from a long, long time user and Administrator) is that freely open and editable wiki needs a critical mass of users to become self-policing.
One of the projects I’ve been kicking around for a while (and has worked it’s way to the top of my list) is a wiki that integrates with Lemmy (and, potentially, other Fediverse services) which you could definitely use as a form of curated link directory - having an external links sections was definitely one of the uses it could be put to (as well as holding an instances documentation and a community’s FAQs, for example).
- elxeno ( @elxeno@lemm.ee ) 4•8 months ago
And is it time to begin considering what a modern version of those early web directories might look like?
Something like fmhy.net?
- Atemu ( @Atemu@lemmy.ml ) 4•8 months ago
I’d argue that link aggregators like Lemmy (from which I’m posting o/) are the new world version of that. Link aggregators are human-edited web directories; humans post links and other humans vote whether those links are relevant to the “category” (community) they’re in. The main difference is that it’s an open communal effort with implicit trust rather than closed groups of permitted editors.
- René Seindal ( @seindal@mastodon.social ) 4•8 months ago
@ajsadauskas @degoogle DMOZ was once an important part of the internet, but it too suffered from abuse and manipulation for traffic.
For many DMOZ was the entry point to the web. Whatever you were looking for, you started there.
Google changed that, first for the better, then for the worse.
- GnomeComedy ( @GnomeComedy@beehaw.org ) 3•8 months ago
Sounds like you may enjoy https://en.m.wikipedia.org/wiki/Gemini_(protocol) if you haven’t installed a browser and tried it.
- OldWoodFrame ( @OldWoodFrame@lemm.ee ) 3•8 months ago
The tale of the internet has been curation, and I would describe it a little differently.
First we had hand made lists of website (Yahoo directory, or we had a list of websites literally written in pen in a notebook saying “yahoo.com” and “disney.com”).
Then it was bot-assisted search engines like Google.
Then there was so much content we didn’t even know where to start with Google, so we had web rings, then forums, then social media to recommend where to go. Then substack style email newsletters from your chosen taste makers are a half-step further curated from there.
If that is all getting spammed out of existence, I think the next step is an AI filter, you tell the AI what you like and it sifts through the garbage for you.
The reasons we moved past each step are still there, we can’t go back, but we can fight fire with fire.