The Web has grown enormously over the last few years. More and more people are online nowadays and the number of websites is also rapidly rising. The commercialisation of the Internet started when the government stopped sponsoring the Internet in 1995. An obvious result of the rapid growth in Internet users is the misuse and abuse on the Web. The more the Internet is used, the more it becomes necessary to come up with rules and regulations to prevent certain things from happening.
This could have a considerable affect on the standard agreement of the 'Internet Philosophy': freedom of action, user empowerment, end-user responsibility for actions undertaken, and lack of controls 'in' the Net that limit or regulate what users can do. Furthermore, there are many that agree that it is morally wrong for someone like the government to interfere. One of the sources of risky use on the Web is that it is increasingly used for commercial reasons. Whenever there is money involved, there are always people that will try to make as much as they can even if it means bending the 'rules' a bit. Search engines have also changed from solely providing information to becoming more and more commercially orientated. Eric E.
Brewer, 2001, argues that search capabilities in the broadest sense have led to increased overall productivity by millions of workers, and thus to our recent global economic expansion. It can be very beneficial for businesses to have a website in order to promote themselves and make more money. Therefore companies want to be highly ranked in search engines results in order to improve its reputation among customers and the number of visitors to its site. There are several ways in which a company can do this. However there are many debates being held whether they are and should remain legal. Since the Web isn't structured in a very clear way, users don't have a global view of the entire Web.
It is necessary to try and find our way through its chaotic structure. Instead of organising the vast amount of websites on the Web, which is just impossible, more and more technology is developed to find information on them. Searching the Internet is the second most important feature of the Web; communication is the first. A recent study by WebSideStory showed however that search engines are losing their popularity.
As of 6 February 2002, 52% of Web surfers arrived at sites by direct navigation and bookmarks, compared with 46% last year. Could this indicate that people are fed up with the lack of 'relevant' results that are listed. Even though search engines are still the most popular way to retrieve information from the Web, they usually returns thousands of URLs to match the query made by the user, which isn't ideal. Therefore it is important to have some form of filtering to overcome this information overload. Now we take a more detailed look into the way that pages are indexed and labelled. According to Manner, 1999, the index is the most important of the tools for information retrieval.
It has become more and more important to think about this feature since 83% of sites has a commercial content and only 6% scientific or educational content since it can have a great impact on business. Most experts agree that a considerable amount of the Web is not recorded in any search engine. There are several different methods to index Web documents. Manual indexing is one that is likely to become obsolete in the future. A disadvantage of manual indexing is the lack of consistency among the different indexers. However, it is said that it is the most accurate way of indexing.
The reason for this is that experts organise and list the directories and indices in a way which aids the search process. Automatic indexing is the way forward. Even though there are still many problems with this form of indexing, the Web demands that it is indexed automatically since we can't keep up with its growth manually. Many search engines use automatically generated indices, either by themselves or in combination with other technologies. To recognize potentially offensive material, the W 3 C has developed a label called the Platform for Internet Content Selection (PICS). Third parties and the content creator can fill this in.
PICS could be a solution if made mandatory for search engines to check the page content besides metadata. The FTC and others recognize that labels aren't perfect. The problems with labelling is that people ignore them and that not every country has the same rules and regulations. Another disadvantage is that it is self regulated since the author can fill his own label in which means that if there isn't enough control they will be able to fill in false information. Another form of content labels is metadata. It however isn't used for control but only for lookup.
It is an invisible file attached to a web page that helps tell the search engine what the page is all about. Keywords are entered which can help promote a page to the top of a list of search results. The author of the web page however enters these keywords, which means that it can easily be tampered with to increase its website hits. A lot of websites that earn money, when the number of visitors is high or when they charge you a fee to enter the website, might use this in their favour.
For instance, by submitting pages with keywords that don't resemble the content of the actual page, you could try to make your website more accessible. Spamming is also a big problem on the Internet. Besides it meaning the excessive amount of email we get sent, it can also be interpreted as a way to overcome automatic indexing. By entering many keywords or 'hidden' text on a Web page, you can improve the chance that your page is ranked highly in a search engine listing.
Then there is annotation, which attaches more data to a Web document than metadata. This again created and attached by the author to help indexing. Other people can however comment on this attachment which helps evaluating the Web page. An example is linguistic annotation for automatic summarization and content-based retrieval. Unless there is some binding rule that everybody has to use the same form of labelling, it will be useless. There has to be some sort of control over it too since it can easily be corrupted.
Most of the problems that arise within search engines are not of a technical nature. Thus when looking at search engines and the Web, it is also necessary to look at the larger context in which these technical systems exist. It is very important to look at the social and legal structure of the economy, the growing motivations of trustworthiness, and the fact that technology, law, social norms, and markets combine to achieve a balance within this environment. As a result of the commercialisation on the Internet, many search engines are sponsored by major corporations. This brings another problem of unwanted advertising common with many automated search engines since they rely on the author to index his or her page. The lack of control in this area leads to misleading entries made for commercial benefits.
There will always be new ways to get around the restrictions set hence there has to be some form of human power. Although there aren't many laws that specifically talk about Internet Law, there are many legal cases that are trying to find out what is and isn't acceptable when looking at the Internet. For instance 'pay-for-placement' is causing a lot of trouble. The business that pays the most for a keyword, will come up when the keyword is entered in a search. Allen Baden calls it Internet piracy. The existing laws for copyright and use of traditional media are currently being applied to the Internet but there are also areas where they can't or don't apply.
This is one of the biggest problems at the moment. Another aspect is that as a result of these legal cases, existing laws are constantly modified to adapt to these new situations. Some argue that law has a limited amount of power on the Internet since it is trans border. It might be legal to have a certain website in one country, while it is illegal or seen as very offensive in another. The viewer could be in a place with different jurisdiction than the source of the website.
This is all very difficult to control. A problem for showing certain results in a search engine is that the website mightn't be legal or is offensive in one country while it is perfectly normal in another. Search engines can also influence the ad banners that come up when a user enters a query. Estee Lauder filed a lawsuit against Fragrance Counter and Excite at Home in 1999. The reason was that searches for Estee Lauder triggered Fragrance Counter banners to come up on the site.
Estee Lauder won this case. However when Playboy sued the former Playboy bunny Terri Welles for having the trademark 'Playboy bunny' as a keyword on her metadata tag, they lost. Page-jacking should also be looked out for. By copying someone's content and cloaking it, you can achieve higher listings in a search engine. It is practically impossible to discover whether a page has been cloaked. This change has put pressure on search engines.
A solution would be caching content, like Google does already. However it is unsure if this is legal since they could be breaking copyright laws by showing copies of other people's pages without their consent. There has to be a great sense of trust in the person that has indexed the page and filled in the labels. However this trust has diminished rapidly since the commercialisation of the Internet. As previously suggested, there are many potential forces that are changing the Internet and its search engines, trust being one of the biggest. They affect the increased complexity, structure of the Internet and control by the user.
As the Internet grows around the world, the Internet will change because of the cultural differences countries have. The lucrative side of the Internet will have to be monitored since this seems to cause for a lot of the problems that search engines now have. Search is the fundamental answer to sort through the information chaos on the Web. It promotes freedom of information and expression.
Search engines have a lot of power in manipulating their search listings which can be very dangerous. They should be neutral and objective to be effective. Articles: o Adamic, L. A. & Huber man, B.
A. (2001), 'The Web's Hidden Order' Communications of the ACM, September 2001, Vol. 44, No. 9, Pages 55-59 o Angelaccio, M.
& Buttarazzi, B. (2002), 'Local Searching the Internet' IEEE Internet Computing, January-February 2002, Pages 25-33 o Blumenthal, M. S. & Clark, D. D. (2001), 'Rethinking the Design of the Internet: The End-to-End Arguments vs.
the Brave New World' ACM Transactions on Internet Technology, August 2001, Vol. 1, No. 1, Pages 70-109 o Brewer, E. A. (2001), 'When Everything Is Searchable' Communications of the ACM, March 2001, Vol.
44, No. 3, Pages 53-55 o D'Ambra, J & Rice, R. E. (2001), 'Emerging factors in user evaluation of the World Wide Web' Information & Management, No. 38, Pages 373-384 o Glover, E. J.
, Lawrence, S. , Gordon, M. D. , Birmingham, W. P. , & Lee Giles, C.
(2001), 'Web Search - Your Way' Communications of the ACM, December 2001, Vol. 44, No. 12, Pages 97-102 o Kobayashi, M. & Takeda, K. (2000), 'Information Retrieval on the Web' ACM Computing Surveys, June 2000, Vol. 32, No.
2, Pages 144-166 o Lew, M. S. (2000), 'Next-Generations Web Searches for Visual Content' o Li, W-S. & Cand an, K.
S. (2000), 'Integrating Content Search with Structure Analysis for Hypermedia Retrieval and Management' ACM Computing Surveys, December 1999, Vol. 31, No. 4 es o Montgomery, A. L.
& Faloutsos, C. (2001), 'Identifying Web Browsing Trends and Patterns', Pages 94-95 Computer, July 2001 o Williamson, C. (2001), 'Internet Traffic Measurement' IEEE Internet Computing, November-December 2001, pages 70-74 Websites: o CNET News. com Search engines losing popularity, Gwendolyn Mariano, 14-2-2002 web > o Search Engine Watch Europe's Paid Placement Warriors, Danny Sullivan, 4-2-2002 web > o Search Engine Watch How Search Engines Work, Danny Sullivan, 26-6-2001 web > o? Seek and Ye Shall Find...
or Not, Shannon Lafferty, 1-2-2002 web > o? Indexing the Internet, John Hubbard, December 1999 http: //.