Spam filters
category: general [glöplog]
You know spam in blogs is a big problem. Several solutions are being used, all with their concrete problems. For example, CAPTCHAs are now beaten by AI algorithms and have accesibility problems. Filters by keywords make impossible to post comments with some words. The "no follow" statement is not enough to prevent people/bots submitting links. Blacklists have two problems, one the possibility of being added as spammer without being one and a distributed blacklisting need big resources to mantain the databases and also make a site using that system dependant of the correct working of it...
So, completely automatized and cheap spam filtering is an open problem yet. Well, you are smart guys... maybe we could think in any good solution, isn't it? :)
I've been thinking in javascript approaches. Even if not everybody has a javascript enabled browser, it looks as if at least 90% of the people has already it and this number should be in crescendo, much more with the constant develop of ajax and similar technologies. The best of javascript is that being client-side, complex computations could be done without problems for the server, only at a little higher bandwich usage. My first idea has been to do something with factorizing big numbers for example. So, for example, picking at random three big enough prime numbers, multiplying and sending it to the javascript program to factorice it would take very few time for the server to generate and probe it, but a lot for the client. Of course it is not a solution at all because of the different speed of computer would make this unusable for slow computers and also because of the slow speed of javascript, making it much faster for a spamming C code much faster to factorice it... Well, at the start the system would work, but if it were spread it would be like nothing. But I wrote the idea because maybe is a good start for brainstorming.
Another idea that I don't know if has been implemented are javascript generated submit forms. The server should generate always different names for inputs and generate the labels with different and random ways, never using direct text output for these. For example a='n'; b='me'; c='a'; label1=a+c+b; would generate label text for output "name". It would use mixed methods for these things, even including some time consuming things like the prime number things. All should be generated randomly by the server each time, to make necesarry a complete or near complete javascript interpreter to execute it.
Well... do you have any ideas?
So, completely automatized and cheap spam filtering is an open problem yet. Well, you are smart guys... maybe we could think in any good solution, isn't it? :)
I've been thinking in javascript approaches. Even if not everybody has a javascript enabled browser, it looks as if at least 90% of the people has already it and this number should be in crescendo, much more with the constant develop of ajax and similar technologies. The best of javascript is that being client-side, complex computations could be done without problems for the server, only at a little higher bandwich usage. My first idea has been to do something with factorizing big numbers for example. So, for example, picking at random three big enough prime numbers, multiplying and sending it to the javascript program to factorice it would take very few time for the server to generate and probe it, but a lot for the client. Of course it is not a solution at all because of the different speed of computer would make this unusable for slow computers and also because of the slow speed of javascript, making it much faster for a spamming C code much faster to factorice it... Well, at the start the system would work, but if it were spread it would be like nothing. But I wrote the idea because maybe is a good start for brainstorming.
Another idea that I don't know if has been implemented are javascript generated submit forms. The server should generate always different names for inputs and generate the labels with different and random ways, never using direct text output for these. For example a='n'; b='me'; c='a'; label1=a+c+b; would generate label text for output "name". It would use mixed methods for these things, even including some time consuming things like the prime number things. All should be generated randomly by the server each time, to make necesarry a complete or near complete javascript interpreter to execute it.
Well... do you have any ideas?
it's also known as the turing test.
Dejavu. Just found new spam on my blog before I opened pouet to see this thread ;P
Someone told me to enable word verification and so I'll do. I hope it'll work now..
Someone told me to enable word verification and so I'll do. I hope it'll work now..
Blogs are gay.
You know spam on Pouet is a big problem.
Code:
HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI
HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI
HUGI HUGI HUGI HUGI HUGI HUGI HUGI
HUGI HUGI HUGI HUGI HUGI HUGI HUGI
HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI
HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI
HUGI HUGI HUGI HUGI HUGI HUGI HUGI
HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI
HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI
HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI
HUGI HUGI HUGI HUGI HUGI HUGI
HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI
HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI HUGI
Quote:
All should be generated randomly by the server each time, to make necesarry a complete or near complete javascript interpreter to execute it.
Because any program containing a Javascript interpreter can't possibly be evil... right?
texel: A simple solution: Every new message is screened and must be unscreened by the blog owner.
gasman, no, because interpreting javascript is slow. The most time consuming solving a captcha system or similar is for a spammer, the most expensive will it be for the spammer, so less attractive also.
What you perhaps don't realise is that spamming these days is done through nets of thousands of 0wned Windows boxes. They have all the CPU they need.
It isn't a lone nutter in a trailer park with a modem any more. spam = organised crime.
It isn't a lone nutter in a trailer park with a modem any more. spam = organised crime.