Project Objectives and Scope
The objectives of the group project is that you will have a good understanding of the given
research topic, provide insight into its solution and a well defined strategy for its solution. You
should treat the term project as if you were doing the initial background study for further
in-depth research. In other words, the report should demonstrate an understanding of and an
insight into the problem such that given enough time, you could carry it to its logical conclusion
and complete the research.
The group will write a Literature review, which describes the problem domain with proper
problem definition, and a survey of existing work. The research topic of this literature review is
Web Mining and Content Analysis.
The sub-topics include: a. Crawling and indexing Web content; b. Web recommender
systems and algorithms; c. Summarization of Web data; d. Data, entity, event, and relationship
extraction; e. Knowledge acquisition and automatic construction of knowledge bases; f. Large-
scale graph analysis. Please pick one of them.
You should be looking at the proceedings of conferences such as WSDM, WWW, SIGIR,
ICDM, KDD, SIGMOD, VLDB, ICDE, … and at journals such as IEEE TKDE, Data Mining
and Knowledge Discovery Journal, Journal of Intelligent Information Systems, Intelligent Data
Analysis, World Wide Web, VLDB Journal, Knowledge and Information Systems and many
others. Most of these publications can be obtained through DBLP: https://dblp.uni-trier.
de/db/index.html. This is not meant to be a complete list or may not even be the most
important ones from your perspective. Please do your research and find the relevant papers to
your chosen topic. Our hope is that by the time you complete the project, you’ll have a good
idea of what the area is about and what the most important publications are.
The group project has 40% of the overall course mark. It could be undertaken individually, or
in a group with a maximum of 5 students. We will not check the number of group members
when assessing the final deliverable, i.e., size of a group will not affect the mark.
The overall length will be between 15 to 20 pages using ACM Computer Survey submission
format (https://dl.acm.org/journal/csur/author-guidelines), without considering refer-
ences. No penalty for exceeding the upper limit. Wrong template will cause half mark
reduction on the final group project mark. For example, if you get 36 points out of 40,
but use wrong template, then your nal mark for group project is 18.
The end page before reference will list the contribution of each team members, we will
calculate the mark based on your reported contribution. Two examples here:
• Team A, 2 members, report is marked as 36. The provided contribution is A1:100%, and
A2:100%. Then A1 and A2 both get 36.
• Team B, 2 members, report us marked as 34. The provided contribution is B1:100%, and
B2:80%. Then B1 gets 34 and B2 gets 80% * 34 = 27.2.