The Importance Of Data Mining By Musa Mohd. Nordin, Noor

50 %
50 %
Information about The Importance Of Data Mining By Musa Mohd. Nordin, Noor
Education

Published on November 26, 2008

Author: muzkara

Source: slideshare.net

Description

Noor Conference | Global Knowledge Forum | http://www.noor.org.sa | Day 2 - Panel 2 - The Importance Of Data Mining By Musa Mohd. Nordin, Noor

THE IMPORTANCE OF DATA MINING Musa Mohd. Nordin FRCP, FAMM President Federation of Islamic Medical Associations

WHAT WE WILL DISCUSS : Why data mine ? What is data mining ? Applications of data mining. Data mining vis a vis genomics & informatics Conclusions

Why data mine ?

What is data mining ?

Applications of data mining.

Data mining vis a vis genomics & informatics

Conclusions

Lots of data collected and warehoused Web data, e-commerce Purchases at department stores Bank & Credit Card transactions Computers cheaper & more powerful Competitive pressure is strong Make better decisions Serve customers Gain competitive edge Why Mine Data?

Lots of data collected and warehoused

Web data, e-commerce

Purchases at department

stores

Bank & Credit Card transactions

Computers cheaper & more powerful

Competitive pressure is strong

Make better decisions

Serve customers

Gain competitive edge

Why Mine Data? Scientific Viewpoint Data collected & stored at enormous speeds (GB/hour) remote sensors on a satellite telescopes scanning the skies microarrays generating gene expression data scientific simulations generating terabytes of data Data volumes overwhelm traditional techniques - enormity, multi dimensional, heterogenous data Data mining may help scientists in segmenting & analysing data in hypothesis generation in knowledge discovery

Data collected & stored at enormous speeds (GB/hour)

remote sensors on a satellite

telescopes scanning the skies

microarrays generating gene expression data

scientific simulations generating terabytes of data

Data volumes overwhelm traditional techniques

- enormity, multi dimensional, heterogenous data

Data mining may help scientists

in segmenting & analysing data

in hypothesis generation

in knowledge discovery

SIZE OF MEDICAL KNOWLEDGE NLM Meta Thesaurus - 875,255 concepts - 2.14 million concepts names Biomarkers & prognosis - one marker in 12 years (1989-2001) - 400 markers in 1 year (2004) Drug development - one molecule & 1,000 compound (1985) - 40,000 cDNAs and 1 million compounds (2005)

NLM Meta Thesaurus

- 875,255 concepts

- 2.14 million concepts names

Biomarkers & prognosis

- one marker in 12 years (1989-2001)

- 400 markers in 1 year (2004)

Drug development

- one molecule & 1,000 compound (1985)

- 40,000 cDNAs and 1 million compounds (2005)

Mining Large Data Sets - Motivation Information “hidden” in the data that is not readily evident Human analysts may take weeks to discover useful information Much of the data is never analyzed at all The Data Gap Total new disk (TB) since 1995 Number of analysts From: R. Grossman, C. Kamath, V. Kumar, “Data Mining for Scientific and Engineering Applications”

Information “hidden” in the data that is not readily evident

Human analysts may take weeks to discover useful information

Much of the data is never analyzed at all

 

 

 

 

Data mining is the process of identifying VALID, NOVEL, potentially USEFUL & UNDERSTANDABLE patterns in data.

Emerged late 1980s Flourished 1990s Roots traced to 3 disciplines : - Classical Statistics - Artificial Intelligence - Machine Learning Pre 1993 : “Torturing the data into a confession” Post 1993 : “Charming the data into a confession” Origins of Data Mining Machine Learning/ Pattern Recognition Statistics/ AI Data Mining Database systems

Emerged late 1980s

Flourished 1990s

Roots traced to 3 disciplines :

- Classical Statistics

- Artificial Intelligence

- Machine Learning

Pre 1993 : “Torturing

the data into a confession”

Post 1993 : “Charming

the data into a confession”

Transformed Data Target Data RawData Knowledge Data Mining Transformation Interpretation & Evaluation Selection & Cleaning Integration Understanding Knowledge Discovery Process DATA Ware house Knowledge __ ____ __ ____ __ ____ Patterns and Rules

DATA MINING – MEDICAL APPLICATIONS Medical diagnostics tools Medical image analysis Micro-array gene expression Protein structure & fxn prediction New drug development Disease surveillance Bioterrorism surveillance Environmental health impacts

Medical diagnostics tools

Medical image analysis

Micro-array gene expression

Protein structure & fxn prediction

New drug development

Disease surveillance

Bioterrorism surveillance

Environmental health impacts

Strong Government Initiatives US: US$3B for Human Genome Project Germany: US$62M & US$18M to support proteomics and bacterial genomes over 3 years respectively Britain: Budget to grow at 7% a year for next 4 years in bioinformatics and other post-genomics research Italy: US$195M fund to focus on human genetics, cancer and bioinformatics Sweden: US$91.4M for biotech, biosciences, healthcare Singapore: US$1.2B in life sciences Malaysia: BioValley will be valued at US$13.2B in 10 years Japan: US$489.6M invested towards sequencing and analysis Korea: US$1.7M for 2 plant genome projects

 

There is 6 feet of DNA in each of our cells packed into a structure only 0.0004 inches across There are 100 trillion (100,000,000,000,000) cells in the body If all the DNA in the human body was put end to end it would reach to the sun and back over 600 times (100 trillion x 6 feet divided by 93 million miles = 1200).

There is 6 feet of DNA in each of our cells packed into a structure only 0.0004 inches across

There are 100 trillion (100,000,000,000,000) cells in the body

If all the DNA in the human body was put end to end it would reach to the sun and back over 600 times (100 trillion x 6 feet divided by 93 million miles = 1200).

Life sciences research: from gene to function Gene NH 2 COOH Protein Genome-wide micro-array analysis “ High-throughput” protein-analysis mRNA AAAAAAAAA function-2 function-1 function-n Whole-genome sequence projects Protein function: -prediction by bioinformatics -proof by laboratory research cell nucleus Gene expression by RNA synthesis mRNA translation by protein synthesis DNA

Paradigm Shift in Life Sciences Past experiments were hypothesis driven Evaluate hypothesis Complement existing knowledge Present experiments are data driven Discover knowledge from large amounts of data

Past experiments were hypothesis driven

Evaluate hypothesis

Complement existing knowledge

Present experiments are data driven

Discover knowledge from large amounts of data

DATA MINING : GENOMICS & BIOINFORMATICS Experiments increasingly complex Driven by increase of detector developments Results in an increase in amount and complexity of data DM to harness this development To translate data into useful biological, medical, pharmaceutical & agricultural knowledge

Experiments increasingly complex

Driven by increase of detector developments

Results in an increase in amount and complexity of data

DM to harness this development

To translate data into useful biological, medical, pharmaceutical & agricultural knowledge

DATA MINING : CONCLUSIONS Knowledge discovery from databases “ Cutting edge” of the art & science of medicine “ Competitive edge” of the business of medicine Applicable to other sciences and arts Knowledge discovery : - in search of excellence (IHSAN) - transformation (ISLAH) towards benefiting humanity

Knowledge discovery from databases

“ Cutting edge” of the art & science of medicine

“ Competitive edge” of the business of medicine

Applicable to other sciences and arts

Knowledge discovery :

- in search of excellence (IHSAN)

- transformation (ISLAH) towards benefiting humanity

Thank You

Add a comment

Related presentations

Related pages

2012 International Conference on Statistics in Science ...

2012 International Conference on Statistics in Science, ... Data Mining On The Computation ... Noor Amila Wan Abdullah Zawawi and Mohd Shahir Liew
Read more

Editorial Board Dr Zulkifli Ismail Dr Noor Khatijah Nurani JB

Dr Musa Mohd Nordin Committee Members ... The need to forward data on cases to the Ministry of ... Mining Data Reporting, 3 BERITA MPA ...
Read more

Musa | LinkedIn

kawuwa musa musa. kawuwa musa musa---1. Current Assistant Chief Registrar at "Company Confidential" Education International university of ...
Read more

2010 Fourth Asia International Conference on Mathematical ...

2010 Fourth Asia International Conference on Mathematical/Analytical ... Architecture for Preserving Privacy During Data Mining by ... Musa Mokji, U.U ...
Read more

Muslim extremism is divorced from Palestinian cause ...

COMMENT I cannot agree more with the statement by Musa Mohd Nordin, Mohd Nazri Ismail and Hafidzi Mohd Noor ... to those who place importance on justice ...
Read more

SME Corporation Malaysia - Staff Directory

Dottie Azura binti Mohd Nordin: ... Pirusa binti Musa: Personal Asst. Corporate Communications: STU: ... Siti Noor binti Mohd Nordin:
Read more

MPA - Malaysian Paediatric Association

By Musa Mohd Nordin ... being but there is a need to further educate parents on the importance of having their ... Malaysian Paediatric Association ...
Read more

LMSPUO: SITI SHARMILA BINTI OSMIN

... data protection and privacy and crimes in Information communication ... RAUDYAH BINTI MOHD. TAP. ... SITI ZARIDA BINTI SYED NORDIN. KHUZAIMAH BINTI ABU ...
Read more