Web Science/Part1: Foundations of the web/Web content/Metadata/script
Aspetto
Metadata placed in the head of your HTML page meta tags often have both the name and content attributes
<meta name="description"
content="Authors' web site for Building Java Programs." />
<meta name="keywords" content="java, textbook" />
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1" />
<head>
<meta name="author"
content="web page's author" />
<meta name="revised"
content="web page version and/or last modification date" />
<meta name="generator"
content="the software used to create the page" />
</head>
Metadata was foreseen very early as part of HTML, but:
- it does not really work
- usual problem: ignored
- worse problem: intentionally forged, e.g. wrong keywords given in order to mislead search engine
- result: search engines have ignored metadata for two decades
Situation changed with:
- SearchMonkey
- RichSnippets
- Schema.org
Difference:
- People behave
- to maximize their benefit
- targeted benefits:
- Being found and highly ranked by search engines
- Displaying what you want to be displayed
- Resulting rationale:
- Incentive mechanisms must make sure that the targeted benefits are achieved
- Idea for search monkey:
- don't use metadata to improve finding and ranking by search engines (since it is not readable it incentivizes people to mislead search engines)
- use metadata to improve the display - namely the display of search engine result summarization
- Implications:
- authors care about appropriate presentation
- incentives for authors/publishers, search engine providers and users who search coincide
- Resulting rationale:
Semantic annotations:
- microdata
- microformats
- RDFa
15% of web sites/pages use some form of semantic annotation
Show examples how to do semantic annotation using RDFa (note to Rene. I'd target the philosophy to show students newest developments, such as HTML5; in the same vein: don't show micro format, micro data, but RDFa)