Users tracing in online text systems
Speaker: Le Hoi

Time: 14h, Wednesday, August 17, 2016
Location:
Institute of Mathematics, 18 Hoang Quoc Viet.
Abstract:
Privacy for online systems including social networks, specialized websites such as reviewing systems, movie forums, etc. have become primary concerns for people who use these websites. People have been actively joining in different websites where on each of them normally they have to register accounts and input personal information, which maybe directly related to their identities. The activities on these systems, such as writing reviews, tweeting, comments, or chats, etc., provide more information about identities of users through their writeprints. They become threatened of being revealed their identities and other personal information. Privacy laws and regulations attempt to address these concerns. It is important to provide enhancing systems that protect personal information.
For example, a patient's records need to be accessible for research purposes or be provided to a third party such as their company. However, the user identity must not become exposed in the former case, or the sensitive health status must remain protected in the latter case. It is important to redact all information related to their identity and all information related to their health status. Current methods provide more tools to eliminate portions of text in the records that can be used to infer those sensitive information. In contributing to this, we provide a more rigorous method to select these portions.

Human characteristics such as their writing characteristics can be used to identify them by building up their profile information. This information can be used to trace users' activities across websites (for example from a review website to a movie forum) by performing writing style matching. To protect users from being traced, obfuscating their writing styles is necessary. Linguistic techniques can represent an author's writing by a set of well-designed and complex features that is able to identify an author among thousands of authors. Therefore, obfuscating users's writing styles is not a task that can be easily to accomplish. Current attempts at obfuscating authors' writings have not been successful. We design a new algorithm for obfuscation of writing style which has a number of important properties.

Back