The doctoral student, Miguel Ángel Sánchez Pérez and researchers Alexander Gelbukh and Grigori Sidorov, the Center for Computing Research (CIC) of the National Polytechnic Institute (IPN) developed a model plagiarism detection for identifying texts product piracy.
With that software , polytechnic took first place in the category of Aligning Texts of the 11th edition of Evaluation Lab on Uncovering Plagiarism, Authorship , and Social Software Misuse (known as PAN) held at the University of Sheffield, England.
The model, developed by Sanchez with advice from Gelbukh and Sidorov for the degree of master Computer Science, beat in the contest to works developed by competitors in other countries, such as Chile, USA, Spain, Germany, China and the UK.
By the technological contribution, with that same model, Polytechnic student recently won second place in the National Contest for Best Thesis in Artificial Intelligence, organized by the Mexican Society of Artificial Intelligence (SMIA).
Sanchez said discovering plagiarism involves the search and knowledge of a large number of texts in original sources, so scientists around the world are focusing their research on the generation of models for automatic plagiarism detection.
He explained that the location of fragments of text that are similar between two documents is called alignment. For example, if the first paragraph of the text corresponds to the third paragraph of another paper. “That’s the goal of the model,” said the student.
To compete, the model should be a system or software with high efficiency because evaluates thousands of documents, a large number of comparisons in search of text fragments is kidnapped. “In the event competing teams a base of approximately 5000 data to compare pairs of documents is provided, which may or may not contain plagiarism” he said.
Sanchez also said the competition process is to find the model developed similar fragments between a pair of documents that were provided to them.
“To assess how well we found a couple of similar fragments, the measures used are accuracy and completeness. Accuracy refers to how many characters the text that I detected were actually kidnapped, while completeness refers to how many, of the number of characters that were kidnapped, I detected. The combination of these two parameters allowed us to win the competition, “said the winner.
After PAN evaluated the model and proved to be the best, Sanchez realized that the model may have important implications. “The system could be used, for example, a database manager or Thomson Scopus & amp; Reuters. When a document the system is able to say what it is like and ask the editor to check it was published, “he said.
The polytechnic said it is difficult for a system of this type has a certain 100 percent.
“the intervention of a human it is needed, but the system can help you find texts that perhaps had not considered and specific fragments to make it faster,” he said.
Sanchez said in addition to the detection of plagiarism, the model can help build collective content sites, such as Wikipedia , where many people write articles, but many contents on the same subject are developed; the model could inform the writing if your text is unique or has similarities that would enable integrated to another.
He said that unlike other participants not to disclose how they get their results, ” we have open at a page of Dr. Alexander Gelbukh code, so that anyone can access and use it, simply quote the article “.
No comments:
Post a Comment