Tuesday, July 27, 2010

Make your Firefox faster, safer and stable with few mouse clicks

When did you looked last time in your 'Plugins' section of your Firefox browser?
Did you know that you have lots of garbage there? Many companies such us Microsoft, Yahoo and Google stealthily  install plugins that you don't want and without your permission. Some of them are of GREAT danger for your security. See these 6! articles about a Microsoft plugin:
Java installs also an EXTREMELY dangerous plugin, so dangerous that Firefox decided to automatically disable it!!!!!!!! See this:
Even if you have the upgraded version of that plugin, better disable it especially if you are not a Java developer!


Let's start

Click 'Tools -> Add-ons' and see how many crapware is there? Have you ever install all that crap? No? Then is time to clean it. 
I have done some research to see what is needed and what is safe to disable:

Disable:
  • Yahoo Application State Plugin - it may crash the browser if you have Adblock Plus installed
  • Microsoft DRM Plugin - both of them
  • Windows presentation foundation plugin
  • Google Talk Plugin - if you don't use Google Talk in your browser
  • Google Talk Video Accelerator Plugin
  • Google Update  Plugin 
  • Java development toolkit plugin - Extremely high risk!
Some plugins are adding huge overhead to your browser. After disable them my Firefox loads in only 2 seconds!

    The ONLY plugins that are useful are:
    • Mozilla default Plug-in  -  This is safe
    • Shockwave Flash plug-in  -  This may be unsafe; it has a nasty bug-related history also. Unfortunately we need that dreadful Flash plugin to see SOME websites.
    • Windows media player  -  Needed for some websites that have video content - Very unsafe
    • Java Platform - Needed rarely for some websites - Unsafe







    Monday, July 26, 2010

    DNA sample compression test

    Few months ago I had to run a test to see if DNA sequence files (FASTA files) compress better than simple text files. DNA files contains only 4 characters (A, C, G, T) so you will expect that they will compress really well compared with text files. However, the DNA code is pretty random (well there are some exceptions where the code follows some patterns or have repetitions – but there regions are rare).
    So, here are the results.


    FileSizeuncompSize
    (ZIP comp)
    Size
    (RAR comp)
    FASTA FILE - no cumments (3.0 KB).fasta3.0 KB715 B614 B
    TEXT FILE - random text (3.0 KB).txt3.0 KB1544 B1275 B


    What I have seen later is that if you pack multiple samples (even if they are from relatively different bacteria) together, the compression ratio can be better.




    Test files used in this experiment:

    a) FASTA FILE - no cumments (3.0 KB).fasta

    TGGCGGCGTGCTTAACACATGCAAGTCGAACGAGAAATTCCCTGCTTGCAGGGAAGAGTAAAGTGGCGCA
    CGGGTGAGTAACGCGTGGGTAACCTACCTTTGAATTCGGAATAGCCCGTCGAAAGGTGGATTAATACCGG
    ATACGGTTTAAGGATCTTCGGATTTTTAAATTAAAGGTGACCTCTTCATGAAAGTTGCCGTTCATAGATG
    GGCCCGCGTACCATTAGCTTGTTGGTGGGGTAATGGCCTACCAAGGCGACGATGGTTAGCTGGTCTGAGA
    GGATGATCAGCCACACTGGAACTGGAACACGGTCCAGACTCCTACGGGAGGCAGCAGTGAGGAATTTTGC
    GCAATGGGGGAAACCCTGACGCAGCAACGCCGCGTGAGCGAAGAAGGCCTTCGGGTCGTAAAGCTCTGTC
    AAGTGGGAAAAAAATCTTTTGATGAATAGTTAAAAGACTTGATGGTACCACTGGAGGAAGCACCGGCTAA
    CTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTGTTCGGAATCACTGGGCGTAAAGAGCGT
    GTAGGCGGTTTGACAAGTCAGATGTGAAAGCCCCCGGGCTCAACCCGGGAAGTGCATTTGAAACTGTCTC
    ACTAGAGTATGGGAGAGGAGATTGGAATTCCTGGTGTAGAGGTGAAATTCGTAGATATCAGGAGGAACAC
    CCGTGGCGAAGGCGATTCTCTGGACCAATACTGACGCTGAGACGCGAAAGCGTGGGGAGCAAACAGGATT
    AGATACCCTGGTAGTCCACGCCGTAAACGATGAGAACTAGGTGTAGTGGGTATTGACCCCTGCTGTGCCG
    AAGTTAACGCATTAAGTTCTCCGCCCTGGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGG
    GGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGACGCAACGCGAAGAACCTTACCTGGGTTTGACA
    TCCTTTGACCGTCTGTGAAAGCAGATTTTTCCGGCTTTGCCGGAACAGAGTGACAGGTGCTGCATGGCTG
    TCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCAGCAACGAGCGTAACCCTTGTCTTTAGTTGCCAT
    TATTAAGTTAGGCACTCTAAAGAGACTGCCTCGGTTAACGGGGAGGAAGGTGGGGATGACGTCAAGTCCC
    TCATGGCCTTTATATCCAGGGCTACACACGTGCTACAATGGGCTGTACAAAGGGTTGCTATCCCGCGAGG
    GGGCGCTAATCCCAAAAAGCAGTTCTCAGTTCGGATTGAAGTCTGCAACTCGACTTCATGAAGGTGGAAT
    CGCTAGTAATCGTGGATCAGCATGCCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACAC
    CACGAAAGTCGACTGTACCAGAAGTTGCTGGGCTAACCTTTTCGGAGGAGGCAGGTACCTAAGGTACGGC
    TGGCGGCGTGCTTAACACATGCAAGTCGAACGAGAAATTCCCTGCTTGCAGGGAAGAGTAAAGTGGCGCA
    CGGGTGAGTAACGCGTGGGTAACCTACCTTTGAATTCGGAATAGCCCGTCGAAAGGTGGATTAATACCGG
    ATACGGTTTAAGGATCTTCGGATTTTTAAATTAAAGGTGACCTCTTCATGAAAGTTGCCGTTCATAGATG
    GGCCCGCGTACCATTAGCTTGTTGGTGGGGTAATGGCCTACCAAGGCGACGATGGTTAGCTGGTCTGAGA
    GGATGATCAGCCACACTGGAACTGGAACACGGTCCAGACTCCTACGGGAGGCAGCAGTGAGGAATTTTGC
    GCAATGGGGGAAACCCTGACGCAGCAACGCCGCGTGAGCGAAGAAGGCCTTCGGGTCGTAAAGCTCTGTC
    AAGTGGGAAAAAAATCTTTTGATGAATAGTTAAAAGACTTGATGGTACCACTGGAGGAAGCACCGGCTAA
    CTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTGTTCGGAATCACTGGGCGTAAAGAGCGT
    GTAGGCGGTTTGACAAGTCAGATGTGAAAGCCCCCGGGCTCAACCCGGGAAGTGCATTTGAAACTGTCTC
    ACTAGAGTATGGGAGAGGAGATTGGAATTCCTGGTGTAGAGGTGAAATTCGTAGATATCAGGAGGAACAC
    CCGTGGCGAAGGCGATTCTCTGGACCAATACTGACGCTGAGACGCGAAAGCGTGGGGAGCAAACAGGATT
    AGATACCCTGGTAGTCCACGCCGTAAACGATGAGAACTAGGTGTAGTGGGTATTGACCCCTGCTGTGCCG
    AAGTTAACGCATTAAGTTCTCCGCCCTGGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGG
    GGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGACGCAACGCGAAGAACCTTACCTGGGTTTGACA
    TCCTTTGACCGTCTGTGAAAGCAGATTTTTCCGGCTTTGCCGGAACAGAGTGACAGGTGCTGCATGGCTG
    TCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCAGCAACGAGCGTAACCCTTGTCTTTAGTTGCCAT
    TATTAAGTTAGGCACTCTAAAGAGACTGCCTCGGTTAACGGGGAGGAAGGTGGGGATGACGTCAAGTCCC
    TCATGGCCTTTATATCCAGGGCTACACACGTGCTACAATGGGCTGTACAAAGGGTTGCTATCCCGCGAGG
    GGGCGCTAATCCCAAAAAGCAGTTCTCAGTTCGGATTGAAGTCTGCAACTCGACTTCATGAAGGTGGAAT
    CGCTAGTAATCGTGGATCAGCATGCCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACAC
    CACGAAAGTCGACTGTACCAGAAGTTGCTGGGCTAACCTTTTCGGAGGAGGCAGGTACCTAAGGTACGGC
    CGGTAATTGGGGTGAAGTCGTAACAAGGTATCATTCAGTGATACTCGG


    ----------------------------------------------------------------------------------------------------------------

    Test files used in this experiment:
    b) TEXT FILE - random text (3.0 KB).txt

    Cluster (computing)
    A computer cluster is a group of linked computers, working together closely so that in many respects they form a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or availability over that provided by a single computer, while typically being much more cost-effective than single computers of comparable speed or availability.[1]

    Cluster categorizations

    High-availability (HA) clusters
    High-availability clusters (also known as Failover Clusters) are implemented primarily for the purpose of improving the availability of services which the cluster provides. They operate by having redundant nodes, which are then used to provide service when system components fail. The most common size for an HA cluster is two nodes, which is the minimum requirement to provide redundancy. HA cluster implementations attempt to use redundancy of cluster components to eliminate single points of failure.
    There are many commercial implementations of High-Availability clusters for many operating systems. The Linux-HA project is one commonly used free software HA package for the Linux OSs.

    Load-balancing clusters
    Load-balancing is when multiple computers are linked together to share computational workload or function as a single virtual computer. Logically, from the user side, they are multiple machines, but function as a single virtual machine. Requests initiated from the user are managed by, and distributed among, all the standalone computers to form a cluster. This results in balanced computational work among different machines, improving the performance of the cluster system.

    Compute clusters
    Often clusters are used for primarily computational purposes, rather than handling IO-oriented operations such as web service or databases. For instance, a cluster might support computational simulations of weather or vehicle crashes. The primary distinction within compute clusters is how tightly-coupled the individual nodes are. For instance, a single compute job may require frequent communication among nodes - this implies that the cluster shares a dedicated network, is densely located, and probably has homogenous nodes. This cluster design is usually referred to as Beowulf Cluster. The other extreme is where a compute job uses one or few nodes, and needs little or no inter-node communication. This latter category is sometimes called "Grid" computing. Tightly-coupled compute clusters are designed for work that might traditionally have been called "supercomputing". Middleware such as MPI (Message Passing Interface) or PVM (Parallel Virtual Machine) permits compute clustering programs to be portable to a wide variety of clusters.

    Grid computing
    Grids are usually computer clusters, but more focused on throughput like a computing utility rather than running fewer, tightly-coupled jobs. Often, grids will incorporate heterogeneous collections of computers, possibly distributed xxxxx

    How we spend Moor dividend

    How we spend Moor dividend


    During the last 28 years, the clock speed increased 586 times. The Intel Pentium processor, introduced in 1995, achieved a SPECint95 benchmark score of 2.9, while the Intel Core 2 Duo achieved a SPECint2000 benchmark score of 3108.0, a 375 times increase in performance in 11 years.
    One indication is that array bounds and null pointer checks impose a time overhead of approximately 4.5% in the Singularity OS.

    high-level programming languages that provide more safety, easy, and higher level of abstraction.  Managed languages (such as VB/C# and Java) further increased the level of programming by introducing garbage collection, richer class libraries (such as .NET and the Java Class Library), just-in-time compilation, and runtime reflection. All these features provide powerful abstractions for developing software but also consume memory and processor resources in nonobvious ways.

    The second is that high-level languages hide details of a machine beneath a more abstract programming model. This leaves developers less aware of performance considerations and less able to understand and correct problems.
    I conducted simple programming experiments to compare the cost of implementing the archetypical Hello World program using C, and C Sharp. Without any comment it’s so clear that higher level of abstraction have affect the performance.



    Today many developers don’t care about the performance, and rely on the hardware, and the other layers performance.


    Abundant machine resources have allowed developers to become lazy, and complacent about performance and refusing the optimization idea and less aware of resource consumption in their code. Bill Gates 30 years ago famously changed the prompt in Altair Basic from “READY” to “OK” to save 5B of memory.
     






    Parallel/distributed databases (raw notes)

    Some raw notes about distributed databases.



    If you want to start with parallel databases then you need some good knowledge about traditional (non-distributed) data bases. Here are some books I personally recommend.:

    Books about regular DB


    http://proquest.safaribooksonline.com/0321290933
    http://proquest.safaribooksonline.com
    http://proquest.safaribooksonline.com/078972569X
    http://proquest.safaribooksonline.com/9781593271909 (For true beginners. You may not want to start working with parallel DB systems if this is the book you are reading now :) )

    ==================================================

    Comparison between several parallel DB systems


    Hive (HBase)
    Developer: Facebook/Apache
    Source code available: Yes
    Free: Yes
    Link :


    PostgreSQL to HBase Replication
    Developer: ?
    Source code available: ?
    Free:


    HadoopDB
    Developer: Daniel Adabi - Yale
    Source code available: Yes
    Free: Yes
    Quote: "Is just an academic prototype"


    Yahoo! Data
    Developer: ?
    Source code available: ?
    Free: Yes


    BigData
    Developer: Google
    Source code available: ?
    Free: ?
    Link : see BigTable paper 2006 http://labs.google.com/papers/bigtable.html


    ----------------------------------

    More about HadoopDB
    1. A hybrid of DBMS and MapReduce technologies targeting analytical query workloads
    2. Designed to run on a shared-nothing cluster of commodity machines, or in the cloud
    3. An attempt to fill the gap in the market for a free and open source parallel DBMS
    4. Much more scalable than currently available parallel database systems and DBMS/MapReduce hybrid systems (see longer blog post).
    5. As scalable as Hadoop, while achieving superior performance on structured data analysis workloads

    Source: http://dbmsmusings.blogspot.com/2009/07/announcing-release-of-hadoopdb-shorter.html

    • HadoopDB is primarily focused on high scalability and the required availability at scale.  Daniel questions current MPP’s ability to truly scale past 100 nodes whereas Hadoop has real examples on 3000+ nodes.
    • HadoopDB like many MPP analytical database platforms uses shared nothing relational database as processing units. HadoopDB uses Postgres.  Unlike other MPP databases, HadoopDB uses Hadoop as the distributed mechanism.
    • Daniel doesn’t dispute DeWitt & Stonebrakers (and his) paper which claims Map/Reduce underperforms when compared to current MPP DBMSHadoopDB however is focused on massive scale, hundreds or thousands of nodes.  Currently the largest MPP database we know of is 96 nodes.
    • Early benchmarking shows HadoopDB outperforms Hadoop but is slower than current MPP databases under normal circumstances.  However when simulating node failure mid query HadoopDB outperformed current MPP databases significantly.
    • The higher the scalability the higher the possibility of node failure mid query.  Very large Hadoop deployments may experience at least 1 node failure per query (job).
    • HadoopDB is usable today, but should not be considered an “out of the box” solution.  HadoopDB is an outcome from a database research initiative, not a commercial venture.  Anyone planning to use HapoopDB will require the appropriate systems & development skills to effectively deploy.



    Hadoop DB - How it works

    Database Connector
    The Database Connector is the interface between independent database systems residing on nodes in the cluster and TaskTrackers.

    Catalog
    The catalog maintains metainformation about the databases. This includes the following: (i) connection parameters such as database location, driver class and credentials, (ii) metadata such as data sets contained in the cluster, replica locations, and data partitioning properties.

    Data Loader
    The Data Loader is responsible for (i) globally repartitioning data on a given partition key upon loading, (ii) breaking apart single node data into multiple smaller partitions or chunks and (iii) finally bulk-loading the single-node databases with the chunks.

    SQL to MapReduce to SQL (SMS) Planner
    HadoopDB provides a parallel database front-end to data analysts enabling them to process SQL queries. The SMS planner extends Hive. Hive transforms HiveQL, a variant of SQL, into MapReduce jobs that connect to tables stored as files in HDFS.

    Since each table is stored as a separate file in HDFS, Hive assumes no collocation of tables on nodes. Therefore, operations that involve multiple tables usually require most of the processing to occur in the Reduce phase of a MapReduce job. This assumption does not completely hold in HadoopDB as some tables are collocated and if partitioned on the same attribute, the join operation can be pushed entirely into the database layer.

    Quote: "Hadoop simply scales better than any currently available parallel DBMS product."



    Final words.
    So you need to use a parallel database? Here were the choices I had for my project:

    1. Purchase a parallel DB like Greenplum and Vertica 
    Price: $250K. 
    http://www.dbms2.com/2008/02/07/vertica-update-2
    Thoughts: Everything about this solution is nice except the price.

    2. Reduce the amount of data that DB system must process. For this: Use the existent DB (MySQL). Write the results from Blast MapReduce jobs to disk and then use a script to upload them to DB. This way we won't flood the DB with too much data. 
    Thoughts:  Cheap, some programming required. Not a definitive solution.

    3. Use the DB engine to perform the SQL searches then throw away the data from DB.
    Thoughts:  Cheap, smart, some programming required. Not a definitive solution.

    4. Use the DB provided by Hadoop -> HBase/Hive. It is slower but more computers can be used to improve speed.
    Thoughts: Cheap (actually free). Unstable (Hadoop is early beta). Difficult to install and maintain. 


    Parallel programming (raw notes)

    Here are some raw notes and books I recommend about parallel programming that I used recently for a project. You will find really interesting that the definition of "parallel computing", "cluster", "cloud computing" are extremely loose. Each book will define the terms in a very different way.




    Definitions

    Parallel computing = program parts running simultaneously on multiple processors in the same computer.


    Distributed computing = a form of parallel computing but in multiple computers. Distributed computing differs from cluster computing in that computers in a distributed computing environment are typically not exclusively running "group" tasks, whereas clustered computers are usually much more tightly coupled. Distributed computing also often consists of machines which are widely separated geographically.


    Grid computing = uses the resources of many separate computers, loosely connected (needs little or no inter-node communication), by a network usually the Internet. Grid computing is optimized for workloads which consist of many independent jobs or packets of work, which do not have to share data between the jobs during the computation process.
    CPU-scavenging, creates a “grid” from the unused resources in a network of participants


    Computer cluster = a group of linked computers, working together closely so that in many respects they form a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks


    Parallel Virtual Machine = The Parallel Virtual Machine (PVM) is a software tool for parallel networking of computers. It is designed to allow a network of heterogeneous Unix and/or Windows machines to be used as a single distributed parallel processor.


    Message Passing Interface (MPI) is a specification for an API that allows many computers to communicate with one another. It is used in computer clusters and supercomputers. MPI's goals are high performance, scalability, and portability. MPI remains the dominant model used in high-performance computing today.
    Documentatie: Chapter 1. Introduction to Parallel Programming - http://www.redbooks.ibm.com/redbooks/pdfs/sg245380.pdf


    Amdahl's law

    The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program. For example, if a program needs 20 hours using a single processor core, and a particular portion of 1 hour cannot be parallelized, while the remaining promising portion of 19 hours (95%) can be parallelized, then regardless of how many processors we devote to a parallelized execution of this program, the minimal execution time cannot be less than that critical 1 hour. Hence the speed up is limited up to 20x, as the diagram illustrates.



    Beowulf (computer cluster)

    Beowulf is a multi-computer architecture which can be used for parallel computations. It is a system which usually consists of one server node, and one or more client nodes connected together via Ethernet or some other network. It is a system built using commodity hardware components, like any PC capable of running a Unix-like operating system, with standard Ethernet adapters, and switches. It does not contain any custom hardware components and is trivially reproducible. Beowulf also uses commodity software like the Linux or Solaris operating system, Parallel Virtual Machine (PVM) and Message Passing Interface (MPI). The server node controls the whole cluster and serves files to the client nodes. It is also the cluster's console and gateway to the outside world. Large Beowulf machines might have more than one server node, and possibly other nodes dedicated to particular tasks, for example consoles or monitoring stations. In most cases client nodes in a Beowulf system are dumb, the dumber the better. Nodes are configured and controlled by the server node, and do only what they are told to do. In a disk-less client configuration, client nodes don't even know their IP address or name until the server tells them what it is.
    The typical setup of a beowulf cluster


    One of the main differences between Beowulf and a Cluster of Workstations (COW) is the fact that Beowulf behaves more like a single machine rather than many workstations. In most cases client nodes do not have keyboards or monitors, and are accessed only via remote login or possibly serial terminal. Beowulf nodes can be thought of as a CPU + memory package which can be plugged in to the cluster, just like a CPU or memory module can be plugged into a motherboard. Beowulf is not a special software package, new network topology or the latest kernel hack. Beowulf is a technology of clustering computers to form a parallel, virtual supercomputer. Although there are many software packages such as kernel modifications, PVM and MPI libraries, and configuration tools which make the Beowulf architecture faster, easier to configure, and much more usable, one can build a Beowulf class machine using standard Linux distribution without any additional software. If you have two networked computers which share at least the /home file system via Network File System (protocol), and trust each other to execute remote shells (rsh), then it could be argued that you have a simple, two node Beowulf machine.





    Parallel programming books

    Introduction to Parallel Computing, 2003
     Introduction to Parallel Computing, Second Edition 
    Very good


    Principles of Concurrent and Distributed Programming, 2006
    Principles of Concurrent and Distributed Programming, Second Edition 
    http://proquest.safaribooksonline.com/9780321312839




    Server Architectures: Multiprocessors, Clusters, Parallel Systems, Web Servers, and Storage Solutions 

    Price: 27 pounds
    2004
     

    Large file systems (raw notes)

    I have decided to post some of my 2008/2009 documents containing raw resources about distributed processing. These documents contain just raw material and personal notes. I hope I will have the time to make them more presentable and post them. Below is the first from a larger set of post (about 30) I will create soon.



    ________________________

    Some time ago I had to find a file system that supports really large files (TB/PB).
    Here is the starting point (raw notes) from where I started my research.

     
    File systems
     A file system (often also written as filesystem) is a method of storing and organizing computer files and their data. Essentially, it organizes these files into a database for the storage, organization, manipulation, and retrieval by the computer's operating system.


    Types of file systems

    • File systems with built in fault tolerance
    • Shared disk file systems
    • Distributed file systems
    • Distributed fault tolerant file systems
    • Distributed parallel file systems
    • Distributed parallel fault tolerant file systems
    • Fault tolerant file systems


    Comparison of file systems

    SAN - Storage area network

    A storage area network (SAN) is an architecture to attach remote computer storage devices (such as disk arrays, tape libraries, and optical jukeboxes) to servers in such a way that the devices appear as locally attached to the operating system. Although the cost and complexity of SANs are dropping, they are still uncommon outside larger enterprises.
    Network attached storage (NAS), in contrast to SAN, uses file-based protocols such as NFS or SMB/CIFS where it is clear that the storage is remote, and computers request a portion of an abstract file rather than a disk block.

    http://en.wikipedia.org/wiki/Storage_area_network

    NFS - Network File System (protocol)

    Network File System (NFS) is a network file system protocol originally developed by Sun Microsystems in 1984, allowing a user on a client computer to access files over a network in a manner similar to how local storage is accessed. NFS, like many other protocols, builds on the Open Network Computing Remote Procedure Call (ONC RPC) system. The Network File System is an open standard defined in RFCs, allowing anyone to implement the protocol.

    NFS is the "Network File System" for Unix and Linux operating systems. It allows files to be shared transparently between servers, desktops, laptops etc. It is a client/server application that allows a user to view, store and update files on a remote computer as though they were on their own computer. Using NFS, the user or a system administrator can mount all or a portion of a file system.

    http://en.wikipedia.org/wiki/Network_File_System_(protocol)

    CIFS is the "Common Internet File System" used by Windows operating systems for file sharing. CIFS uses the client/server programming model. A client program makes a request of a server program (usually in another computer) for access to a file or to pass a message to a program that runs in the server computer. The server takes the requested action and returns a response. CIFS is a public or open variation of the Server Message Block Protocol (SMB) developed and used by Microsoft, and it uses the TCP/IP protocol.
    NFS and CIFS are the primary file systems used in NAS. CIFS tends to be a bit more "chatty" in its communications.







    XFS

    XFS is a high-performance journaling file system created by Silicon Graphics, originally for their IRIX operating system and later ported to Linux kernel. XFS is particularly proficient at handling large files and at offering smooth data transfers.
    http://en.wikipedia.org/wiki/XFS

    The CXFS file system (Clustered XFS) is a proprietary distributed networked file system designed by Silicon Graphics (SGI) specifically to be used in a Storage area network (SAN) environment.

    A significant difference between CXFS and other distributed file systems is that data and metadata are managed separately from each other. CXFS provides direct access to data via the SAN for all hosts which will act as clients. This means that a client is able to access file data via the fiber connection to the SAN, rather than over an Ethernet network (as is the case in most other distributed file systems, like NFS). File metadata however, is managed via a metadata broker. The metadata communication is performed via TCP/IP and Ethernet.

    Another difference is that file locks are managed by the metadata broker, rather than the individual host clients. This results in the elimination of a number of problems which typically plague distributed file systems.

    Though CXFS supports having a heterogeneous environment (including Solaris, Linux, Mac OS X, AIX and Windows), either SGI's IRIX Operating System or Linux is required to be installed on the host which acts as the metadata broker.

    http://en.wikipedia.org/wiki/CXFS

     

    ExaStore
    http://www.exanet.com/File/Downloads/4Registration/Brochure.pdf

    Are web "applications" real applications?

    Web applications seem to be more and more popular. But are these "applications" real applications? Due to limitations in web technology it is obvious that web "applications" cannot offer the functionalities that a real desktop application can offer. This is why 96.4% of users are still using desktop applications (such as Word) instead of rudimentary online tools (such as Google Docs).
    I have found some interesting insights about limitations of SOA is "Software Pipelines and SOA: Releasing the Power of Multi-Core Processing".


    "How fast can you adapt your software to meet new needs and competitive threats? The popularity and rapid adoption of service-oriented architecture (SOA) is hard evidence of the demand for more flexible software systems.
    SOA is a superior technology. Compared to earlier trends in IT architecture, SOA delivers better on its promises. But it presents its own challenges. If you're using SOA for development, it's even more important to address performance and scalability, because of the following factors:
    • In general observation, SOA demands significantly more computing power from a system than earlier monolithic or tightly coupled designs.
    • The very notion of loosely coupled services implies message-centric application development. Developers not only have to write traditional processing logic; they also have to handle message transmission, validation, interpretation, and generation-all of which are CPU- and process-intensive.
    • As more organizations use SOA, we can expect messaging volume to explode and put a tremendous load on existing IT systems. The potential for adverse effects will escalate.
    Predictions show that over the next year or two, organizations using SOA will run into performance issues. This is nothing new; historically, each time the business world adopts a new software architecture, it suffers through growing pains. In the past twenty years, the shakeout period for each new major paradigm shift in software development has lasted about one to three years for a given evolutionary phase (any early J2EE user can attest to that). During that time, businesses gradually adopt the new design, and while doing so, they face significant performance- and scalability-related problems. In many cases software developers cannot overcome the steep learning curve; many projects end in outright failure when the deployed application doesn't perform as expected.
    Until recently, hardware was the saving grace for such immature architectures. Whenever the computer industry made a significant advance, mostly in CPU performance, performance bottlenecks could be fixed by plugging in a faster chip or by using some other mechanical solution. That advantage is now gone. We've hit a plateau in microprocessor technology, which comes from physical factors such as power consumption, heat generation, and quantum mechanics. The industry can no longer easily increase the clock speed of single CPUs. Therefore, for now and the foreseeable future, CPU vendors are relying on multi-core designs to increase horsepower. The catch is that if you want to take advantage of these new multi-core chips, your software must implement parallel processing-not a common capability in the majority of today's applications."

    Friday, July 23, 2010

    Don't spend money on fast RAM

    Here is a very intresting test that shows that DD3 is only few percent faster that DD2 even if it is way way way more expensige:
    http://www.tomshardware.com/reviews/ram-speed-tests,1807.html

    "You get the best bang for the buck if you stick to the mainstream of the memory market, which currently is still DDR2-800 or 1066, preferably at low latencies. DDR3-1066 and -1333 memory do not yet result in better performance, and so should only be considered by hardcore enthusiasts, who aim for maximum overclocking performance knowing that they will get little benefit for spending a fortune."
    Don't spend your bucks on fast RAM. Better invest the money is a faster video card/CPU. This way you will see a difference proportional with the amount invested!


    Wednesday, July 21, 2010

    Windows 32 bits using more than 4GB of RAM !!!!

    Here is a "nasty" article about how to hack your Windows 7 32 bit to use more than 4GB or RAM.


    It looks like MS Windows can use more than 4GB or RAM but MS don't want to sell it that way:
    "That 32-bit editions of Windows Vista are limited to 4GB is not because of any technical constraint on 32-bit operating systems. The 32-bit editions of Windows Vista all contain code for using physical memory above 4GB. Microsoft just doesn’t license you to use that code. 
    If you want that this should work for you without contrivance, then pester Microsoft for an upgrade of the license data or at least for credible, detailed reasoning of its policy for licensing your use of your computer’s memory in 32-bit Windows Vista. 
    Both 32-bit and 64-bit Windows can use all of physical memory, including above 4GB, but a 32-bit Windows application has at most 3GB of linear address space through which to access physical memory.
    If you have a 32-bit program that wants more than its 2GB or 3GB, then upgrading to a 64-bit version of that program to run on a 64-bit operating system is your only path ahead. If you’re buying a new computer and new applications, then getting 64-bit Windows and 64-bit applications is obviously the way of the future. Meanwhile, if your concern is only that the system and all your 32-bit applications may together use all your 4GB or more, then keeping your 32-bit operating system would at least be an option for you if Microsoft would provide you with license data to let you use the PAE support that Microsoft has already coded into the product.
    Application-level code and even most system-level code is entirely unconcerned and unaffected. Except for the operating system’s memory manager and for the relatively few drivers that work with physical memory addresses, most notably for Direct Memory Access (DMA), no 32-bit software needs any recoding to benefit from a more-than-32-bit physical address space.
    If you have exactly 4GB of RAM installed, then getting the kernel to use physical addresses above 4GB will be no benefit to you unless some of your 4GB of RAM is remapped above the 4GB address. Whether this remapping is done at present on your particular machine can be checked by using the separately supplied driver. If it is not done, then whether it can be arranged is an issue of hardware configuration. Check your BIOS Setup, read your chipset manual, or consult your computer’s manufacturer.
    If your chipset does not support remapping, then RAM that is overridden for device memory below 4GB will never be seen as usable RAM by 32-bit Windows even with PAE enabled and is just as much lost to you if you install 64-bit Windows.
    If you have physical memory above 4GB and wonder how it can be that the PAE kernel does not use that memory, the answer is licensing. The 32-bit code for using memory beyond 4GB is present in Windows Vista as Microsoft supplies it, but Microsoft prepares license values in the registry so that this code never gets to work with any physical addresses above 4GB.
    Especially unsatisfactory is that Microsoft says something about its product, and about other people’s products, but uses the licensing mechanism to deny the means to test what’s said.
    RAM that is overridden for hardware support is as lost to Windows Vista SP1 as to the original. RAM in excess of the license limits is discarded by Windows Vista SP1 as by the original. Windows Vista SP1 just doesn’t let these losses show as obviously.      "


    Prepare to lick your fingers. You will like it:
    http://www.geoffchappell.com/viewer.htm?doc=notes/windows/license/memory.htm

    Wednesday, July 14, 2010

    There is any advantage in Win XP 64 bit?

    I strongly recommend against Windows XP on 64 bits. It has NO advantage over Win XP 32 bit unless you have more than 3.2GB of RAM.If you have 4GB of RAM, Win XP 64 bits will give you back those 0.8GB of RAM that Win XP 32 bit cannot access, but at the same time you will instantly lose that amount because (I suppose you already know that) programs on 64 bits requires much more memory than programs on 32 bits. This is because
    now the program requires 8 bytes (64 bits) to store its internal data, instead of 4 bytes (32 bits), which is double amount of RAM!

    So, you gain nothing unless you have way over 4GB or RAM (for example 6 or 8GB). Even worst, Win XP 64 bits has massive compatibility problems!! If you MUST use an 64 bit OS, then upgrade directly to Windows 7.

    Before switching to 64 bits, take a paper and a pen and do the math. See what you actually gain in order to compensate for the horrible compatibility problems a 64 bit OS has. Better install the OS on a spare HDD and play with it. If it is stable enough, then you can switch to it definitively. 

    Friday, July 9, 2010

    Saving $250 by setting your computer correctly

    Saddly I have seen that too many of my friends don't have to computer set to automatically stand by or hibernate when it is not used. Even worst they don't even have the 'Hibernate' feature activated on their computer.
    I don't know if it is indifference, lack of education or plain (and typical) stupidity.

    A well documented study (see link below) shows that only by setting the computer correctly you can save $250 per year (in Europe you will save probably more). However, that article stops here and forgets other extra expenses. For example, a full time working computer produces a lot of heat. So, you consume another $70 to remove the heat produced by your computer from your room. Plus, your computer will live longer if it is less used and overheated so you will spend less money for new parts and repairs. Lets say an average of $40 per year.
    The equipments that wear in time due to heating, mechanical wear or ageing are (in this order):
    • The hard drive
    • The monitor (the quality of the image degrades in time as the chemicals and neon tubes inside are consumed)
    • The coolers
    • Power supply
    • The CD ROM
    • In some extreme conditions CPU, memory and main-board chipset can age sooner than expected due to overheating

    Set your power profile correctly right now and then grab your jacket and go buy yourself a $250+ gift 'cose you just saved at least $250.

    Note: screensavers are generally NOT good. There is a full mythology around the "benefits" of screensavers, but now with the modern LCD screens they are good for nothing else than producing heat and wasting CPU power.
    _________________

    Extras from the article:


    http://articles.techrepublic.com.com/5100-10878_11-1054827.html

    "Configuring a system to use standby

    While Windows XP offers two power-saving states, standby and hibernation, standby is probably better suited to a desktop computer environment. This is due to the fact that in standby the system simply goes into a low power state instead of saving the contents of RAM to the hard disk and shutting down.

    Standby works by gradually putting your system into a low-power state in three stages. The first stage cuts power to the monitor and hard drives, the second level cuts power to the CPU and cache, and the third level drops down to provide only enough power to support the contents of RAM. You typically revive the computer from standby with a mouse click or a keystroke.

    Configuring the system to use standby is easy. To begin, access the Control Panel and double-click the Power Options icon. If you’re using the Category view, you’ll find it on the Performance And Maintenance page.

    When you see the Power Options Properties dialog box, select the Home/Office Desk option from the Power Schemes drop-down list. Now, in the Settings panel, select appropriate time intervals to gradually turn off the monitor and hard disks, and to put the system into standby during times of inactivity."

    Thursday, July 8, 2010

    Install Win XP on Toshiba Qosmio x505 (part 2)

    How to install Windows XP on Toshiba Qosmio x505 laptops (part 2)
    Installing the Conexant chipset sound driver

    (Part 1 is here)

    The second stage in installing the Windows XP on Toshiba Qosmio x505 laptops is the dreadful Conexant sound driver. I have modified an existing driver to make it work .

    Here I have uploaded a modified driver that will work with Windows XP 32 bit. So, YES FINALLY THIS IS THE DRIVER THAT TOSHIBA AND CONEXANT REFUSES TO GIVE YOU! YOUR SOUND PROBLEM IS SOLVED NOW!

    Just download the archive and unpack it (you may need WinRar for this). You will find three folders numbered 1,2,3. The numbers correspond with the steps you have to perform (see below).

    Enter folder 1.
    Run "Microsoft UAA Bus driver for High Definition Audio\ASetup.exe"
    Run "Microsoft UAA Bus driver for High Definition Audio\us\Install.bat"
    This will restart your Windows without any questions. So, be prepared (save your work).

    Enter folder 2.
    If Windows ask for drivers go directly to step 3. If not, run this file:
    Run "Restart UAA using DevCon\Restart_UAA.bat"

    Enter folder 3.
    When Windows ask for drivers for your Conexant audio chip, point it to the "High Definition Audio Device\XP32\" folder.

    That's all. If Windows recognizes the driver it will let you know. The SmartAudio utility is FULLY functional also!! The audio chipset will be listed as Connexant Pebble High Definition SmartAudio. Enjoy and don't forget to leave feedback!
    Don't forget to send a "Fuck You" letter to Toshiba for its great nonexistent customer support and for locking its laptops on Windows 7 64 bits (poor Linux bastards, I pity you also :) ).


    _________
    Edit:
    If the process is not successful please leave a message. You may need to change a line in the above script. I will provide guidance.
    I will try to upload a newer driver soon and to create a software tool that will automate the process for you and to make the driver compatible with ALL possible laptops (not only Toshiba) that have a Conexant audio chip.


    ____________________

    Recommended download:

     

    Install Win XP on Toshiba Qosmio x505

    How to install Win XP on Toshiba Qosmio x505 q860
    FULL TUTORIAL

    We all know, Widows Vista and Windows 7 sucks because it makes you brand new computer creep like a zombie. Unfortunately, in its casual "great" wisdom Toshiba decided not to support Windows XP (or any 32 bit OS) anymore. So, we will have to cut our own way through the jungle.

    Here is how to install the most successful OS system from Microsoft (yes, it is Windows XP 32 bit). 

    0. While you are still in Windows 7, go on Internet and find the Win XP 32 bit driver for your network cards
    a) If you are using the wireless to connect to Internet do a Google search for "LAN 8191+8192 SE driver".
    b) If you are using a network cable to connect to Internet do a Google search for "LAN Atheros AR8131 driver".
    Download the driver(s) and keep them in a safe place. Don't install them yet.

    1. Get a Windows XP 32 bit CD or a Windows XP CD image (if you indent to perform step 2).
    2. If you are brave and have some skills and time slip stream Intel SATA drivers into the CD. This process is necessary if you want your drive to run at maximum speed. Anyway, the performance gain is very very low so don't bother. Do a search about how to use nLite to slip stream drivers into the original Windows setup CD. If you have a floppy disk (but I doubt) you don't have to use the sleep stream method: just put the drivers on floppy and press F6 during Windows setup to make Windows to ask for that floppy disk and read the drivers from it.
    3. Burn the Windows setup image on the CD. You can use Toshiba's disk burner utility.
    4. Rebooting
    At this point it is the time to say goodbye to your current Windows installation. So, goodbye Windows 7 on 64 bits. Die in peace sucker.
    If you have skipped the step 2 or you have the SATA drivers on a floppy disk, you need to go in BIOS and switch the disk from 'AHCI' to 'Compatibility' mode. To do this, reboot your computer and as soon as it starts, keep F2 pressed.
    Insert the CD, reboot the computer and keep the F12 key pressed to see the Boot menu. Choose to boot from CD.
    5. Installing Windows
    Install Windows XP 32 bit for the CD.
    When it ask where to install Windows, click every single partition and press "D" to delete it (Windows will ask you two times more if you are sure you want to delete each partition). When no partition is left, create new partition as large as the entire drive (500GB).
    Windows will offer you to format the new created partition. Do so. Choose "Quick format".
    6. After Windows is nicely installed in your computer, the first thing you will need is to install the network drivers to you can access the Internet.
    7. Download and install Intel chipset drivers. Do a search for "Intel PM55 driver for Windows XP".
    8. Download and install nVidia 360M drivers (M comes from "mobile" so don't download the driver for desktop video card but the one that has an "M" for mobile/laptops/notebooks).
    9. See post here, about how to install the Conexant audio drivers.

    Now you have the most important drivers installed and the computer running solid and stable. The rest you will figure out by yourself. But here are some suggestion about what you may need/want to install:

    From Toshiba web site download:
    HDD protection utility (TC30050600D)
    blue tooth stack TC00201800O
    cardreader o2 TC70052200A
    HDD SSD Alert for Windows XP v3.1.02 tc00143300k os2009306a
    laptop checkup v2.0.3.198 tc10060800c
    Conexant sound driver
    Synaptics v14 0 3 C XP32 (from http://drivers.synaptics.com)
    Toshiba Service Station for Windows XP TC00209700D
    usb charge TC00130400K
    web cam TC30049700J

    Notes:
    It should work on all x500-505 Toshiba Qosmio variants.
    I already uploaded for you the most important Toshiba Qosmio drivers here.


    ____________________

    Recommended download: