thomas' Blog

Thursday, April 20, 2006

錯過的conference

今年的HLT-NAACL 已經截稿了....啊...這是一個好會,應該要投的....
殘念,下次要記得,馬上要來的 ICDM 和CIKM 都再五月底六月初,要把握了!

Monday, April 10, 2006

Upcoming conferences

CIKM 2006 ---

Deadline for research and industrial paper submissions: May 31, 2006
Workshop Proposals due to the Workshop Chair: March 30, 2006
Tutorial Proposals due to the Tutorial Chair: May 31, 2006
Notification to authors: August 1, 2006
Final Camera-ready version due (research paper and Industry track ): August 31, 2006
Conference Date: Nov 6-11, 2006

ISCSLP 2006 ---

Full paper submission by Jun. 15, 2006
Notification of acceptance by Jul. 25, 2006
Camera ready papers by Aug. 15, 2006
Early registration by Nov. 1, 2006
Conference December 13-16, Kent Ridge, Singapore

SLT 2006 ---

Camera-ready paper submission deadline July 15, 2006
Hotel Reservation and Workshop registration opens July 30, 2006
Paper Acceptance / Rejection September 1, 2006
Hotel Reservation and Workshop Registration closes October 15, 2006
Workshop December 10-13, 2006

Tuesday, April 04, 2006

ICASSP 的comments

早先在ICASSP 就被罵過一次了....
呼, 這個題目看來機遇不佳,我還得再加把勁才行啊!!

---- Comments from the Reviewers: ----

While the paper presents an interesting strategy to perform information retrieval, I think that calling this strategy dialog management in the context of conversational applications seems to be a quite narrow interpretation of it. As a matter of fact, a "simulated user" rather that simulating conversational activity seems to focus on query terms. So, all in all, the combination of "simulated users for tunning dialog management" in reality, seems better described by query optimization via key term search.

-----
This paper describes a document retrieval process based on query refinement technique:
the documents to retrieve (the target) are grouped in a set D.
A first query, made of a single key term, leads to a first set of documents. By adding another key term precising the query, a smaller set of documents is obtained (the intersection of the documents obtained with both key terms) and this process goes on until the recall on the K first documents obtained with the multiple key term query is above a given threshold. This recall is estimated according to the target D.

I have several issues with this paper:
1) I believe this work is not relevant to the ICASSP conference, as there is no speech technology involve in the process described
2) I think the title is misleading: I don't see any dialogue strategy in this work. The process described sorts key terms in order to retrieve documents. In the experiments, a key term is randomly selected, then the next key term is chosen according to the tree, and if I understood correctly, the simulated user systematically accept this new key term. So where is the dialogue?
3) I have also some issue with the methodology chosen: how do you randomly chose the set of target document D? How can you control that this set of document can represent a "real" query that can be made by a user?
4) The baseline system is not clearly presented. Is the high failure rate due to a choice of key term that retrieve no documents of D? In this case this evaluation is very artificial, as why a real user would validate a key term outside the scope of his query ?



-----
This paper describes an approach for presenting the user with a hierarchical set of new keywords (which may be presented to the user in a limited space, such as the screen of a hand-held device) to narrow down the search space for document retrieval. The interaction going on between the user and the system is not exactly the type of the dialog that in general people are dealing with, but the application itself is very interesting. However, the testing and evaluation is not clearly defined in the paper, and leaves the author with many questions about the use of the approach. For example, how are the documents in D selected? How is it guaranteed that they are about a specific topic, which may be specified with a few key terms. Also, it would be useful if the authors relate the terms in the paper with the general reinforcement framework (e.g., feedback, etc).
What happens if the user wants to select another term not in the hierarchy?
What is r_0 (recall rate threshold) for the experiments (testing)? If it is also 0.1, isn’t it too low for testing?
What is the difference between a and b, or c and d in Figure 5? Why doesn’t c (a) start at the same point with d (b)?

Here are some typos:
- “ the system interactively help the user” -> “ the system interactively helps the user”
- “ In general all other key term…” -> : In general all other key terms”
- “Each of these possible key term is” –> “Each of these possible key terms are”
- “ all possible expansion” -> “all possible expansions”



-----

SIGIR review comment

這些人真是會罵啊~~

呼,我的論文被批得體無完膚, X的,我的老天!

------------- Review from Reviewer 1 -------------
Relevance to SIGIR : 5
Originality of work : 4
Impact of results : 2
Quality of arguments : 2
Quality of presentation : 2
Confidence in review : 5
Overall recommendation : 2

-- Comments to the author(s):
The paper describes a novel way of dealing with subword-based indexing of spoken documents using reinforcememt learning.



There is a lack of direction and no clear research hypothesis that is tested. The experiments which mainly simulate user information need and behaviour are not well justified.
-- Summary:
positive: novel approach

negative: experiments not well justified; simulated user need and beaviour may be unrealistic
---------- End of Review from Reviewer 1 ----------
------------- Review from Reviewer 2 -------------
Relevance to SIGIR : 4
Originality of work : 3
Impact of results : 3
Quality of arguments : 3
Quality of presentation : 3
Confidence in review : 2
Overall recommendation : 3

-- Comments to the author(s):
The paper deals with the interactive retrieval of spoken documents through the use of hierarchies of keys, which is ranked through reinforced learning. Retrieval of non-text documents is a difficult process and any successful empirical work in this regard is an important contribution to the field.



I am not an expert in the field, so I cannot judge the content of the paper in much depth. My comments are therefore mainly restricted to structure, argumentation, methodology and language issues.

The paper is well structured and the figures are useful to illustrate the processes.

The experiment is large scale enough to be significant, and the methodology is appropriate.

The argumentation is logical.

There are several grammatical errors in the paper, and if the paper is accepted, I would strongly suggest that an English first language speaker proofreads it.


-- Summary:
I am not an expert in the field and my comments are mainly about issues other than content.




---------- End of Review from Reviewer 2 ----------
------------- Review from Reviewer 3 -------------
Relevance to SIGIR : 3
Originality of work : 2
Impact of results : 2
Quality of arguments : 1
Quality of presentation : 2
Confidence in review : 5
Overall recommendation : 1

-- Comments to the author(s):
Main contributions:



This is a poor, unfocused paper about the use of term hierarchies as a mediation device between users and systems in spoken document retrieval. The approach the authors took to select terms may be novel, but it is inadequately described. The use of simulated users is interesting, but the authors' explanation of the research that resulted in this paper is poor.



Technical soundness



The paper suffers from an identity crisis. It is not sure whether it is meant to be about interactive IR, term weighting, document ranking or reinforcement learning. It is a hodge-podge of all these issues, and lacks focus, clarity, and direction.



If it is about Interactive IR, then more time needs to be spent relating the research described to earlier work (see, for example, the paper by Turpin & Hersh at SIGIR 2001, which talks about searcher capabilities in making up for shortcomings in IR performance). The authors make a number of claims about the query formulation process that are questionable: "This is why he [the user] formulates his query as short as possible to keep it flexible"...I doubt whether such rationale is adopted by searchers; chances are they are simply lazy or lack sufficient topic/collection knowledge to formulate queries.



More description is necessary about how the authors feel their approach would benefit interactive IR other than "topic hierarchy to help the user focus on his information needs efficiently" -- i'm sure the authors mean focus on the most important terms to describe their information needs -- the paper is littered with examples such as this.



The authors need to reference more related literature in this field and in areas such as query expansion and user-system communication. The problem they try to describe in Figure 1 has been the topic of some debate for over 50 years in the library sciences community, yet no reference is made to any of that literature.



The authors should reference some of the related work on using searcher simulations. This is a subject that is attracting a great deal of interest from IR researchers (see, for example, White et al, ACM TOIS, July 2005).



The way in which they simulate user behavior is also questionable. The authors claim that they simulate two things: information needs and retrieval behavior. They arrive at the former "by observing a real user's information needs", but provide no description of this process -- how would you observe someone's needs anyway? did you ask experimental participants what they were looking for, what there criteria for relevance were, etc.? The retrieval behavior was simulated through a series of related term selections, but curiously never any document selections. It is true that there needs to be a metric to evaluate performance, and perhaps the authors regarded the selection of documents as an unnecessary behaviour to simulate. If this was the case then you should have made this clear in the paper. However, even if this was the case the authors ignore the fact that the retrieved documents do have an impact on the terms that are selected in subsequent steps.



Where is the proof that this simulated methodology is appropriate for testing your research questions. In fact, what are your research questions? Also, you try to claim that simulations can be used to evaluate the performance of "interactive retrieval in terms of task success" and "success(ful) retrieval", and perhaps even more outrageous "user satisfaction". If you have developed measures that serve as surrogates for these inately human activities then they should be described in detail.



I was surprised as i read Section 5.1 to see "Another 50 real users including the 50 desired document set D". Where did these users come from? there is no discussion of their role in the process.



There is no discussion about why fewer terms should be regarded as a more successful outcome, and the simulation appears to only add one term per intertion



The very low standard of the discussion and conclusions leave me wondering whether this was rushed for submission at the last minute. Much more needs to be said about the approach adopted and the findings.



SDR is menition very rarely in the latter parts of the paper but appear to play an imporant role in your argumentation early on.



Relevance, importance and originality



The authors fail to convince me that this is an important problem, nor do they do a good job in convincing me that this work is important or novel.



Clarity and presentation



Clarity is poor (text is very dense) and the quality of presentation is also poor.



Other comments



- There is no need to expand up acronyms every time you use them. OOV, SDR and NE are defined continually throughout the paper.

- You should reference every claim that is not your own.

- Figure 4 is missing a compete legend.
-- Summary:
I would argue strongly for rejection of the paper. It contains a number of flaws in the methodology, and does not sufficiently describe the approach adopted. The findings are weak and the discussion/conclusions are non-existent.
---------- End of Review from Reviewer 3 ----------
------------- Review from Reviewer 4 -------------
Relevance to SIGIR : 3
Originality of work : 3
Impact of results : 2
Quality of arguments : 2
Quality of presentation : 2
Confidence in review : 3
Overall recommendation : 2

-- Comments to the author(s):
This paper presents a new method for weighting dimensions, which are often disparate and of inconsistent scale, so that query-by-example is more accurate.



The approach proposed is rather ad-hoc, and this reviewer found the argument rather unconvincing. The results appear good...



The performance of this method is reported based on only two queries? Where did this data come from? This test seems way too small to make a concrete conclusion.



Is query-by-example still a viable approach? Certainly none of the search engines are using this approach. Are any users?




-- Summary:
Ad-hoc approach and weak argument. Even weaker test of performance.
---------- End of Review from Reviewer 4 ----------


////////////////////////////////////////////////////
Powered by ConfMaster.net
///////////////////////////////////////////////////